com.sun.labs.minion.indexer.dictionary
Class DiskBiGramDictionary

java.lang.Object
  extended by com.sun.labs.minion.indexer.dictionary.DiskDictionary
      extended by com.sun.labs.minion.indexer.dictionary.DiskBiGramDictionary
All Implemented Interfaces:
Dictionary, java.lang.Iterable<QueryEntry>

public class DiskBiGramDictionary
extends DiskDictionary


Nested Class Summary
 
Nested classes/interfaces inherited from class com.sun.labs.minion.indexer.dictionary.DiskDictionary
DiskDictionary.DiskDictionaryIterator, DiskDictionary.HE, DiskDictionary.LightDiskDictionaryIterator, DiskDictionary.LookupState
 
Field Summary
 
Fields inherited from class com.sun.labs.minion.indexer.dictionary.DiskDictionary
CHANNEL_FULL_POST, CHANNEL_PART_POST, decoder, dh, dictFile, entryClass, entryInfo, entryInfoOffsets, FILE_FULL_POST, FILE_PART_POST, idToPosn, logTag, nameCache, nameOffsets, names, nLoads, part, posnCache, postFiles, postIn, totalSize
 
Constructor Summary
DiskBiGramDictionary(java.io.RandomAccessFile dictFile, java.io.RandomAccessFile postFile, int postInType, int cacheSize, int nameBufferSize, int offsetsBufferSize, int infoBufferSize, int infoOffsetsBufferSize, Partition part, DiskDictionary mainDict)
           
 
Method Summary
 int[] getAllVariants(java.lang.String wc, boolean allowPartial)
          Get all the terms that could be made from any of the bigrams in the given term.
 int[] getMatching(java.lang.String wc)
          Gets the IDs for terms that potentially match the given wildcard expression.
 int[] getMatching(java.lang.String wc, boolean starts, boolean ends)
          Gets the IDs for terms that potentially match the given wildcard expression.
protected  ArrayGroup getUnigrams(java.lang.String lower, java.lang.String upper)
          Given a lower bound character and upper bound character, find all the bigrams that have that character as a first character.
protected  ArrayGroup intersect(java.util.List entries)
          Given a list of entries, find the intersection of their ID sets.
 void merge(DiskBiGramDictionary[] dicts, int[] starts, int[][] postIDMaps, java.io.RandomAccessFile mDictFile, PostingsOutput postOut)
           
protected  ArrayGroup union(java.util.List entries)
          Union together (quick or) all the ids pointed to by the entries that are passed in
 
Methods inherited from class com.sun.labs.minion.indexer.dictionary.DiskDictionary
customSetup, find, findPos, findPos, get, get, get, get, getBufferedInputs, getBufferedInputs, getLookupState, getMatching, getMaxID, getPartition, getSpellingVariants, getStemMatches, getSubstring, iterator, iterator, iterator, iterator, literator, merge, merge, newEntry, newEntry, put, remapPostings, setCacheSize, setPartition, setUpBuffers, size
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DiskBiGramDictionary

public DiskBiGramDictionary(java.io.RandomAccessFile dictFile,
                            java.io.RandomAccessFile postFile,
                            int postInType,
                            int cacheSize,
                            int nameBufferSize,
                            int offsetsBufferSize,
                            int infoBufferSize,
                            int infoOffsetsBufferSize,
                            Partition part,
                            DiskDictionary mainDict)
                     throws java.io.IOException
Throws:
java.io.IOException
Method Detail

getMatching

public int[] getMatching(java.lang.String wc)
Gets the IDs for terms that potentially match the given wildcard expression. The terms will need to be checked to see whether or not they actually match.

Parameters:
wc - A wildcard expression.
Returns:
An array of the IDs that might match the given wildcard pattern. The names of the entries that map to these IDs will need to be tested to make sure that they actually match. IF this value is null then there are no possible matching IDs. If this value is an array of length 0, then all IDs must be tested.

getMatching

public int[] getMatching(java.lang.String wc,
                         boolean starts,
                         boolean ends)
Gets the IDs for terms that potentially match the given wildcard expression. The terms will need to be checked to see whether or not they actually match.

Parameters:
wc - A wildcard expression.
starts - If true, then the given expression must start the term.
ends - If true, then the given expression must end the term.
Returns:
An array of the IDs that might match the given wildcard pattern. The names of the entries that map to these IDs will need to be tested to make sure that they actually match. IF this value is null then there are no possible matching IDs. If this value is an array of length 0, then all IDs must be tested.

getAllVariants

public int[] getAllVariants(java.lang.String wc,
                            boolean allowPartial)
Get all the terms that could be made from any of the bigrams in the given term. (This is used generally for spelling suggestions)

Parameters:
wc - the term to vary
allowPartial - if true, allow for partial matches (meaning that strings that have bigrams that aren't present in the dictionary will still be returned), otherwise return an empty array as soon as a bigram is found to be missing
Returns:
a list of term ids

intersect

protected ArrayGroup intersect(java.util.List entries)
Given a list of entries, find the intersection of their ID sets.

Parameters:
entries - The entries to intersect.
Returns:
A group containing the intersection.

union

protected ArrayGroup union(java.util.List entries)
Union together (quick or) all the ids pointed to by the entries that are passed in

Parameters:
entries - the entries to union
Returns:
a scored group containing the union

getUnigrams

protected ArrayGroup getUnigrams(java.lang.String lower,
                                 java.lang.String upper)
Given a lower bound character and upper bound character, find all the bigrams that have that character as a first character.


merge

public void merge(DiskBiGramDictionary[] dicts,
                  int[] starts,
                  int[][] postIDMaps,
                  java.io.RandomAccessFile mDictFile,
                  PostingsOutput postOut)
           throws java.io.IOException
Throws:
java.io.IOException