com.sun.labs.minion.indexer.dictionary
Class DiskBiGramDictionary
java.lang.Object
com.sun.labs.minion.indexer.dictionary.DiskDictionary
com.sun.labs.minion.indexer.dictionary.DiskBiGramDictionary
- All Implemented Interfaces:
- Dictionary, java.lang.Iterable<QueryEntry>
public class DiskBiGramDictionary
- extends DiskDictionary
| Fields inherited from class com.sun.labs.minion.indexer.dictionary.DiskDictionary |
CHANNEL_FULL_POST, CHANNEL_PART_POST, decoder, dh, dictFile, entryClass, entryInfo, entryInfoOffsets, FILE_FULL_POST, FILE_PART_POST, idToPosn, logTag, nameCache, nameOffsets, names, nLoads, part, posnCache, postFiles, postIn, totalSize |
|
Constructor Summary |
DiskBiGramDictionary(java.io.RandomAccessFile dictFile,
java.io.RandomAccessFile postFile,
int postInType,
int cacheSize,
int nameBufferSize,
int offsetsBufferSize,
int infoBufferSize,
int infoOffsetsBufferSize,
Partition part,
DiskDictionary mainDict)
|
|
Method Summary |
int[] |
getAllVariants(java.lang.String wc,
boolean allowPartial)
Get all the terms that could be made from any of the bigrams in
the given term. |
int[] |
getMatching(java.lang.String wc)
Gets the IDs for terms that potentially match the given
wildcard expression. |
int[] |
getMatching(java.lang.String wc,
boolean starts,
boolean ends)
Gets the IDs for terms that potentially match the given
wildcard expression. |
protected ArrayGroup |
getUnigrams(java.lang.String lower,
java.lang.String upper)
Given a lower bound character and upper bound character, find all
the bigrams that have that character as a first character. |
protected ArrayGroup |
intersect(java.util.List entries)
Given a list of entries, find the intersection of their ID sets. |
void |
merge(DiskBiGramDictionary[] dicts,
int[] starts,
int[][] postIDMaps,
java.io.RandomAccessFile mDictFile,
PostingsOutput postOut)
|
protected ArrayGroup |
union(java.util.List entries)
Union together (quick or) all the ids pointed to by the entries
that are passed in |
| Methods inherited from class com.sun.labs.minion.indexer.dictionary.DiskDictionary |
customSetup, find, findPos, findPos, get, get, get, get, getBufferedInputs, getBufferedInputs, getLookupState, getMatching, getMaxID, getPartition, getSpellingVariants, getStemMatches, getSubstring, iterator, iterator, iterator, iterator, literator, merge, merge, newEntry, newEntry, put, remapPostings, setCacheSize, setPartition, setUpBuffers, size |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DiskBiGramDictionary
public DiskBiGramDictionary(java.io.RandomAccessFile dictFile,
java.io.RandomAccessFile postFile,
int postInType,
int cacheSize,
int nameBufferSize,
int offsetsBufferSize,
int infoBufferSize,
int infoOffsetsBufferSize,
Partition part,
DiskDictionary mainDict)
throws java.io.IOException
- Throws:
java.io.IOException
getMatching
public int[] getMatching(java.lang.String wc)
- Gets the IDs for terms that potentially match the given
wildcard expression. The terms will need to be checked to see
whether or not they actually match.
- Parameters:
wc - A wildcard expression.
- Returns:
- An array of the IDs that might match the given
wildcard pattern. The names of the entries that map to these IDs
will need to be tested to make sure that they actually match. IF
this value is
null then there are no possible matching
IDs. If this value is an array of length 0, then all IDs must be
tested.
getMatching
public int[] getMatching(java.lang.String wc,
boolean starts,
boolean ends)
- Gets the IDs for terms that potentially match the given
wildcard expression. The terms will need to be checked to see
whether or not they actually match.
- Parameters:
wc - A wildcard expression.starts - If true, then the given expression must
start the term.ends - If true, then the given expression must
end the term.
- Returns:
- An array of the IDs that might match the given
wildcard pattern. The names of the entries that map to these IDs
will need to be tested to make sure that they actually match. IF
this value is
null then there are no possible matching
IDs. If this value is an array of length 0, then all IDs must be
tested.
getAllVariants
public int[] getAllVariants(java.lang.String wc,
boolean allowPartial)
- Get all the terms that could be made from any of the bigrams in
the given term. (This is used generally for spelling suggestions)
- Parameters:
wc - the term to varyallowPartial - if true, allow for partial matches (meaning that
strings that have bigrams that aren't present in the
dictionary will still be returned), otherwise return
an empty array as soon as a bigram is found to be
missing
- Returns:
- a list of term ids
intersect
protected ArrayGroup intersect(java.util.List entries)
- Given a list of entries, find the intersection of their ID sets.
- Parameters:
entries - The entries to intersect.
- Returns:
- A group containing the intersection.
union
protected ArrayGroup union(java.util.List entries)
- Union together (quick or) all the ids pointed to by the entries
that are passed in
- Parameters:
entries - the entries to union
- Returns:
- a scored group containing the union
getUnigrams
protected ArrayGroup getUnigrams(java.lang.String lower,
java.lang.String upper)
- Given a lower bound character and upper bound character, find all
the bigrams that have that character as a first character.
merge
public void merge(DiskBiGramDictionary[] dicts,
int[] starts,
int[][] postIDMaps,
java.io.RandomAccessFile mDictFile,
PostingsOutput postOut)
throws java.io.IOException
- Throws:
java.io.IOException