com.sun.labs.minion.indexer.dictionary
Class DiskDictionary

java.lang.Object
  extended by com.sun.labs.minion.indexer.dictionary.DiskDictionary
All Implemented Interfaces:
Dictionary, java.lang.Iterable<QueryEntry>
Direct Known Subclasses:
CachedDiskDictionary, DiskBiGramDictionary, UncachedTermStatsDictionary

public class DiskDictionary
extends java.lang.Object
implements Dictionary

A base class for all classes that implement dictionaries for use during querying.


Nested Class Summary
 class DiskDictionary.DiskDictionaryIterator
          A class that can be used as an iterator for this block.
protected  class DiskDictionary.HE
          A class that will act as a heap entry for merging.
 class DiskDictionary.LightDiskDictionaryIterator
          A lightweight iterator for this dictionary.
 class DiskDictionary.LookupState
          A class that can be used to encapsulate the dictionary state when doing multiple lookups during querying.
 
Field Summary
static int CHANNEL_FULL_POST
          An integer indicating that we should use channel postings inputs.
static int CHANNEL_PART_POST
          Use a file channel and partially load postings
protected  NameDecoder decoder
          A decoder for the names in this dictionary.
protected  DictionaryHeader dh
          The header for the dictionary.
protected  java.io.RandomAccessFile dictFile
          The dictionary file.
protected  java.lang.Class entryClass
          The type of entry that we contain.
protected  ReadableBuffer entryInfo
          The information for the entries.
protected  ReadableBuffer entryInfoOffsets
          The offsets for the entry information.
static int FILE_FULL_POST
          Use a random access file and fully load postings.
static int FILE_PART_POST
          Use a random access file and partially load postings.
protected  ReadableBuffer idToPosn
          The map from entry IDs to positions in the dictionary.
protected static java.lang.String logTag
          The tag for this module.
protected  LRACache<java.lang.Object,QueryEntry> nameCache
          A cache from entry name to a query entry.
protected  ReadableBuffer nameOffsets
          The offsets of the names of the uncompressed entries.
protected  ReadableBuffer names
          The entry names.
 int nLoads
           
protected  Partition part
          The partition that we are associated with.
protected  LRACache<java.lang.Integer,java.lang.Object> posnCache
          A cache from position to entry names used during binary searches for terms.
protected  java.io.RandomAccessFile[] postFiles
          The postings files.
protected  PostingsInput[] postIn
          Our postings inputs.
 long totalSize
           
 
Constructor Summary
protected DiskDictionary()
          Creates an dict
  DiskDictionary(java.lang.Class entryClass, NameDecoder decoder, java.io.RandomAccessFile dictFile, java.io.RandomAccessFile[] postFiles)
          Creates a disk dictionary that we can use for querying.
  DiskDictionary(java.lang.Class entryClass, NameDecoder decoder, java.io.RandomAccessFile dictFile, java.io.RandomAccessFile[] postFiles, int postInType, int cacheSize, int nameBufferSize, int offsetsBufferSize, int infoBufferSize, int infoOffsetsBufferSize, Partition part)
          Creates a disk dictionary that we can use for querying.
  DiskDictionary(java.lang.Class entryClass, NameDecoder decoder, java.io.RandomAccessFile dictFile, java.io.RandomAccessFile[] postFiles, int postInType, Partition part)
          Creates a disk dictionary that we can use for querying.
  DiskDictionary(java.lang.Class entryClass, NameDecoder decoder, java.io.RandomAccessFile dictFile, java.io.RandomAccessFile[] postFiles, Partition part)
          Creates a disk dictionary that we can use for querying.
 
Method Summary
protected  void customSetup(IndexEntry me, QueryEntry e, int start, int[] postIDMap)
          Do any custom setup required for merging one entry onto another.
protected  QueryEntry find(int posn, DiskDictionary.LookupState lus)
          Finds the entry at the given position in this dictionary.
protected  int findPos(java.lang.Object key, DiskDictionary.LookupState lus)
          Determines the position at which a entry falls.
protected  int findPos(java.lang.Object key, DiskDictionary.LookupState lus, boolean partial)
          Determines the position within this block at which a entry falls.
 QueryEntry get(int id)
          Gets a entry from the dictionary, given the ID for the entry.
protected  QueryEntry get(int id, DiskDictionary.LookupState lus)
          Gets a entry from the dictionary, given the ID for the entry.
 QueryEntry get(java.lang.Object name)
          Gets a entry from the dictionary, given the name for the entry.
 QueryEntry get(java.lang.Object name, DiskDictionary.LookupState lus)
          Gets a entry from the dictionary, given the name for the entry.
protected  PostingsInput[] getBufferedInputs()
          Gets a buffered version of the postings inputs for this dictionary so that we can stream a bit better through the postings when doing, for example, dictionary merges.
protected  PostingsInput[] getBufferedInputs(int buffSize)
          Gets a buffered version of the postings inputs for this dictionary so that we can stream a bit better through the postings when doing, for example, dictionary merges.
 DiskDictionary.LookupState getLookupState()
           
 QueryEntry[] getMatching(DiskBiGramDictionary biDict, java.lang.String pat, boolean caseSensitive, int maxEntries, long timeLimit)
          Gets the entries matching the given pattern from the given dictionary.
 int getMaxID()
          Gets the maximum ID in the dictionary.
 Partition getPartition()
          Gets the partition to which this dictionary belongs.
 QueryEntry[] getSpellingVariants(DiskBiGramDictionary biDict, java.lang.String word, boolean caseSensitive, int maxEntries, long timeLimit)
          Gets the list of possible spelling corrections, based on terms in the index, for the string that is passed in.
 QueryEntry[] getStemMatches(DiskBiGramDictionary biDict, java.lang.String term, boolean caseSensitive, int minLen, float matchCutOff, int maxEntries, long timeLimit)
          Gets a set of all the entries with the given stem
 QueryEntry[] getSubstring(DiskBiGramDictionary biDict, java.lang.String substring, boolean caseSensitive, boolean starts, boolean ends, int maxEntries, long timeLimit)
          Gets the entries matching the given pattern from the given dictionary.
 DictionaryIterator iterator()
          Gets an iterator for this dictionary.
 DictionaryIterator iterator(int begin, int end)
          Gets an iterator for the dictionary that starts and stops at the given indices in the dictionary.
 DictionaryIterator iterator(java.lang.Object startEntry, boolean includeStart)
          Creates an iterator that starts iterating at the specified entry, or, if the entry does not exist in the block, starts iterating at the first entry greater than the provided entry.
 DictionaryIterator iterator(java.lang.Object startEntry, boolean includeStart, java.lang.Object stopEntry, boolean includeStop)
          Creates an iterator that starts iterating at the specified startEntry and stops iterating at the specified stopEntry.
 LightIterator literator()
           
 int[][] merge(IndexEntry entryFactory, NameEncoder encoder, DiskDictionary[] dicts, EntryMapper[] mappers, int[] starts, int[][] postIDMaps, java.io.RandomAccessFile mDictFile, PostingsOutput[] postOut, boolean appendPostings)
          Merges a number of dictionaries into a single dictionary.
 int[][] merge(IndexEntry entryFactory, NameEncoder encoder, PartitionStats partStats, DiskDictionary[] dicts, EntryMapper[] mappers, int[] starts, int[][] postIDMaps, java.io.RandomAccessFile mDictFile, PostingsOutput[] postOut, boolean appendPostings)
          Merges a number of dictionaries into a single dictionary.
 QueryEntry newEntry(java.lang.Object name)
          Gets an instance of the kind of entries stored in this dictionary.
protected  QueryEntry newEntry(java.lang.Object name, int posn, DiskDictionary.LookupState lus, PostingsInput[] postIn)
          Creates a new entry and fills in its information.
 IndexEntry put(java.lang.Object name, IndexEntry t)
          Puts a entry into the dictionary.
 void remapPostings(IndexEntry entryFactory, NameEncoder encoder, PartitionStats partStats, int[] postMap, java.io.RandomAccessFile dictFile, PostingsOutput[] postOut)
          Rewrites this dictionary to the files passed in while remapping IDs in the postings to the new IDs passed in.
 void setCacheSize(int s)
          Sets the sizes of the name and position cache.
 void setPartition(Partition p)
          Sets the partition with which this dictionary is associated
protected  void setUpBuffers(int nameBufferSize, int offsetsBufferSize, int infoBufferSize, int infoOffsetsBufferSize)
           
 int size()
          Gets the size of the dictionary.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

totalSize

public long totalSize

nLoads

public int nLoads

dh

protected DictionaryHeader dh
The header for the dictionary.


entryClass

protected java.lang.Class entryClass
The type of entry that we contain.


idToPosn

protected ReadableBuffer idToPosn
The map from entry IDs to positions in the dictionary.


names

protected ReadableBuffer names
The entry names.


nameOffsets

protected ReadableBuffer nameOffsets
The offsets of the names of the uncompressed entries.


entryInfo

protected ReadableBuffer entryInfo
The information for the entries.


entryInfoOffsets

protected ReadableBuffer entryInfoOffsets
The offsets for the entry information.


posnCache

protected LRACache<java.lang.Integer,java.lang.Object> posnCache
A cache from position to entry names used during binary searches for terms.


nameCache

protected LRACache<java.lang.Object,QueryEntry> nameCache
A cache from entry name to a query entry.


decoder

protected NameDecoder decoder
A decoder for the names in this dictionary.


dictFile

protected java.io.RandomAccessFile dictFile
The dictionary file.


postFiles

protected java.io.RandomAccessFile[] postFiles
The postings files.


postIn

protected PostingsInput[] postIn
Our postings inputs.


part

protected Partition part
The partition that we are associated with.


logTag

protected static java.lang.String logTag
The tag for this module.


CHANNEL_FULL_POST

public static final int CHANNEL_FULL_POST
An integer indicating that we should use channel postings inputs.

See Also:
Constant Field Values

CHANNEL_PART_POST

public static final int CHANNEL_PART_POST
Use a file channel and partially load postings

See Also:
Constant Field Values

FILE_FULL_POST

public static final int FILE_FULL_POST
Use a random access file and fully load postings.

See Also:
Constant Field Values

FILE_PART_POST

public static final int FILE_PART_POST
Use a random access file and partially load postings.

See Also:
Constant Field Values
Constructor Detail

DiskDictionary

protected DiskDictionary()
Creates an dict


DiskDictionary

public DiskDictionary(java.lang.Class entryClass,
                      NameDecoder decoder,
                      java.io.RandomAccessFile dictFile,
                      java.io.RandomAccessFile[] postFiles)
               throws java.io.IOException
Creates a disk dictionary that we can use for querying.

Parameters:
entryClass - The class of the entries that the dictionary contains.
decoder - A decoder for the names in this dictionary.
dictFile - The file containing the dictionary.
postFiles - The files containing the postings associated with the entries in this dictionary.
Throws:
java.io.IOException - if there is any error opening the dictionary

DiskDictionary

public DiskDictionary(java.lang.Class entryClass,
                      NameDecoder decoder,
                      java.io.RandomAccessFile dictFile,
                      java.io.RandomAccessFile[] postFiles,
                      Partition part)
               throws java.io.IOException
Creates a disk dictionary that we can use for querying.

Parameters:
entryClass - The class of the entries that the dictionary contains.
decoder - A decoder for the names in this dictionary.
dictFile - The file containing the dictionary.
postFiles - The files containing the postings associated with the entries in this dictionary.
part - The partition with which this dictionary is associated.
Throws:
java.io.IOException - if there is any error opening the dictionary

DiskDictionary

public DiskDictionary(java.lang.Class entryClass,
                      NameDecoder decoder,
                      java.io.RandomAccessFile dictFile,
                      java.io.RandomAccessFile[] postFiles,
                      int postInType,
                      Partition part)
               throws java.io.IOException
Creates a disk dictionary that we can use for querying.

Parameters:
entryClass - The class of the entries that the dictionary contains.
decoder - A decoder for the names in this dictionary.
dictFile - The file containing the dictionary.
postFiles - The files containing the postings associated with the entries in this dictionary.
postInType - The type of postings input to use.
part - The partition with which this dictionary is associated.
Throws:
java.io.IOException - if there is any error opening the dictionary

DiskDictionary

public DiskDictionary(java.lang.Class entryClass,
                      NameDecoder decoder,
                      java.io.RandomAccessFile dictFile,
                      java.io.RandomAccessFile[] postFiles,
                      int postInType,
                      int cacheSize,
                      int nameBufferSize,
                      int offsetsBufferSize,
                      int infoBufferSize,
                      int infoOffsetsBufferSize,
                      Partition part)
               throws java.io.IOException
Creates a disk dictionary that we can use for querying.

Parameters:
nameBufferSize - the size of the buffer (in bytes) to use for the entry names
offsetsBufferSize - the size of the buffer (in bytes) to use for the entry name offsets
infoBufferSize - the size of the buffer (in bytes) to use for the entry information
infoOffsetsBufferSize - the size of the buffer (in bytes) to use for the entry information offsets
entryClass - The class of the entries that the dictionary contains.
decoder - A decoder for the names in this dictionary.
dictFile - The file containing the dictionary.
postFiles - The files containing the postings associated with the entries in this dictionary.
postInType - The type of postings input to use.
cacheSize - The number of entries to use in the name and position caches.
part - The partition with which this dictionary is associated.
Throws:
java.io.IOException - if there is any error opening the dictionary
Method Detail

setUpBuffers

protected void setUpBuffers(int nameBufferSize,
                            int offsetsBufferSize,
                            int infoBufferSize,
                            int infoOffsetsBufferSize)
                     throws java.io.IOException
Throws:
java.io.IOException

setCacheSize

public void setCacheSize(int s)
Sets the sizes of the name and position cache.

Parameters:
s - The size of the caches to use.

getMaxID

public int getMaxID()
Gets the maximum ID in the dictionary.

Returns:
the maximum ID in the dictionary

put

public IndexEntry put(java.lang.Object name,
                      IndexEntry t)
Puts a entry into the dictionary. For disk-based dictionaries, this will always return null, since those dictionaries are static.

Specified by:
put in interface Dictionary
Parameters:
name - the name of the entry to put in the dictionary
t - The entry to put in the dictionary.
Returns:
null

getLookupState

public DiskDictionary.LookupState getLookupState()

get

public QueryEntry get(java.lang.Object name)
Gets a entry from the dictionary, given the name for the entry.

Specified by:
get in interface Dictionary
Parameters:
name - The name of the entry to get.
Returns:
The entry associated with the name, or null if the name doesn't appear in the dictionary.

get

public QueryEntry get(java.lang.Object name,
                      DiskDictionary.LookupState lus)
Gets a entry from the dictionary, given the name for the entry.

Parameters:
name - The name of the entry to get.
lus - a lookup state for this dictionary. A lookup state can be re-used when doing multiple lookups to save time. If this parameter is null, a lookup state will be generated for each lookup.
Returns:
The entry associated with the name, or null if the name doesn't appear in the dictionary.

get

public QueryEntry get(int id)
Gets a entry from the dictionary, given the ID for the entry.

Parameters:
id - the ID to find.
Returns:
The block, or null if the ID doesn't occur in our dictionary.

get

protected QueryEntry get(int id,
                         DiskDictionary.LookupState lus)
Gets a entry from the dictionary, given the ID for the entry.

Parameters:
id - the ID to find.
lus - the current lookup state
Returns:
The block, or null if the ID doesn't occur in our dictionary.

findPos

protected int findPos(java.lang.Object key,
                      DiskDictionary.LookupState lus)
Determines the position at which a entry falls. If the entry is not found, a number representing the position at which the entry would have been found is returned, as described below.

Parameters:
key - the name of the entry to find
lus - a lookup state that carries around copies of the buffers holding the dictionary data
Returns:
the position of the entry, if the entry is found; otherwise, (-(location) -1). location is defined as the location of the first entry in the block "greater than" the given entry. If all entries are "less than" the given entry, the size of this block will be returned. Note that this guarantees that the return value will be >= 0 if and only if the given entry is found in the block.

findPos

protected int findPos(java.lang.Object key,
                      DiskDictionary.LookupState lus,
                      boolean partial)
Determines the position within this block at which a entry falls. If the entry is not found, a number representing the position at which the entry would have been found is returned, as described below.

Parameters:
key - the name of the entry to find
lus - a lookup state variable that contains local copies of the dictionary's buffers
partial - if true, treat key as a stem and return as soon as a partial match (one that begins with the stem) is found
Returns:
the position of the entry, if the entry is found; otherwise, (-(location) -1). location is defined as the location of the first entry in the block "greater than" the given entry. If all entries are "less than" the given entry, the size of this block will be returned. Note that this guarantees that the return value will be >= 0 if and only if the given entry is found in the block.

find

protected QueryEntry find(int posn,
                          DiskDictionary.LookupState lus)
Finds the entry at the given position in this dictionary.

Parameters:
lus - the current state of the lookup
Returns:
The entry at the given position, or null if there is no entry at that position in this block.

newEntry

protected QueryEntry newEntry(java.lang.Object name,
                              int posn,
                              DiskDictionary.LookupState lus,
                              PostingsInput[] postIn)
Creates a new entry and fills in its information.

Parameters:
name - The name of the entry to be filled.
posn - The position of this entry in the dictionary.
lus - A lookup state containing copies of the dictionary's data
postIn - The postings channels to use for reading postings.
Returns:
The filled entry.

getStemMatches

public QueryEntry[] getStemMatches(DiskBiGramDictionary biDict,
                                   java.lang.String term,
                                   boolean caseSensitive,
                                   int minLen,
                                   float matchCutOff,
                                   int maxEntries,
                                   long timeLimit)
Gets a set of all the entries with the given stem

Parameters:
biDict - The bigram dictionary to use for the lookup.
term - the stem to look for
caseSensitive - If true, then we should only return entries that match the case of the pattern.
minLen - The minimum length that we'll consider for a stem.
matchCutOff - the cutoff score for matching the variants to the original entry
maxEntries - the maximum number of entries to provide; returns all entries if maxEntries is non-positive
timeLimit - The maximum amount of time (in milliseconds) to spend trying to find matches. If zero or negative, no time limit is imposed.
Returns:
an in-order array of the entries beginning with the stem

getMatching

public QueryEntry[] getMatching(DiskBiGramDictionary biDict,
                                java.lang.String pat,
                                boolean caseSensitive,
                                int maxEntries,
                                long timeLimit)
Gets the entries matching the given pattern from the given dictionary. This can be used by anyone with a dictionary and some matching bigrams.

Parameters:
biDict - The bigrams to use to do the candidate entry selection.
pat - The pattern to match entries against.
caseSensitive - If true, then we should only return entries that match the case of the pattern.
maxEntries - The maximum number of entries to return. If zero or negative, return all possible entries.
timeLimit - The maximum amount of time (in milliseconds) to spend trying to find matches. If zero or negative, no time limit is imposed.
Returns:
An array of Entry objects containing the matching entries, or null if there are not such entries, or an array of length zero if the operation timed out before any entries could be matched

getSpellingVariants

public QueryEntry[] getSpellingVariants(DiskBiGramDictionary biDict,
                                        java.lang.String word,
                                        boolean caseSensitive,
                                        int maxEntries,
                                        long timeLimit)
Gets the list of possible spelling corrections, based on terms in the index, for the string that is passed in.

Parameters:
biDict - The bigrams to use to do the candidate entry selection.
word - the word to find alternates for
caseSensitive - If true, then we should only return entries that match the case of the pattern.
maxEntries - The maximum number of entries to return. If zero or negative, return all possible entries.
timeLimit - The maximum amount of time (in milliseconds) to spend trying to find matches. If zero or negative, no time limit is imposed.
Returns:
An array of Entry objects containing the matching entries, or null if there are not such entries, or an array of length zero if the operation timed out before any entries could be matched

getSubstring

public QueryEntry[] getSubstring(DiskBiGramDictionary biDict,
                                 java.lang.String substring,
                                 boolean caseSensitive,
                                 boolean starts,
                                 boolean ends,
                                 int maxEntries,
                                 long timeLimit)
Gets the entries matching the given pattern from the given dictionary. This can be used by anyone with a dictionary and some matching bigrams.

Parameters:
biDict - A dictionary of bigrams built from the entries that are in this dictionary.
substring - The substring to find in the entries.
caseSensitive - If true, then we should look for matches that match the case of the letters in the substring.
starts - If true, the value must start with the given substring.
ends - If true, the value must end with the given substring.
maxEntries - The maximum number of entries to return. If zero or negative, return all possible entries.
timeLimit - The maximum amount of time (in milliseconds) to spend trying to find matches. If zero or negative, no time limit is imposed.
Returns:
An array of Entry objects containing the matching entries, or null if there are not such entries, or an array of length zero if the operation timed out before any entries could be matched

size

public int size()
Gets the size of the dictionary.

Specified by:
size in interface Dictionary
Returns:
the number of entries in the dictionary.

newEntry

public QueryEntry newEntry(java.lang.Object name)
Gets an instance of the kind of entries stored in this dictionary.

Parameters:
name - The name of the entry.
Returns:
A newly instantiated entry, or null if there is some exception thrown while instantiating the entry. Any such exceptions will be logged as errors.

getPartition

public Partition getPartition()
Gets the partition to which this dictionary belongs.

Specified by:
getPartition in interface Dictionary
Returns:
the partition

setPartition

public void setPartition(Partition p)
Sets the partition with which this dictionary is associated

Parameters:
p - the partition with which the dictionary is associated

getBufferedInputs

protected PostingsInput[] getBufferedInputs()
Gets a buffered version of the postings inputs for this dictionary so that we can stream a bit better through the postings when doing, for example, dictionary merges.

Returns:
a buffered set of postings inputs for this dictionary

getBufferedInputs

protected PostingsInput[] getBufferedInputs(int buffSize)
Gets a buffered version of the postings inputs for this dictionary so that we can stream a bit better through the postings when doing, for example, dictionary merges.

Parameters:
buffSize - the size of the buffer to use
Returns:
a buffered set of postings inputs for this dictionary

iterator

public DictionaryIterator iterator()
Gets an iterator for this dictionary. Just gets an iterator for the block.

Specified by:
iterator in interface Dictionary
Specified by:
iterator in interface java.lang.Iterable<QueryEntry>
Returns:
an iterator for the dictionary. The elements of the iterator implement the Map.Entry interface

literator

public LightIterator literator()

iterator

public DictionaryIterator iterator(java.lang.Object startEntry,
                                   boolean includeStart)
Creates an iterator that starts iterating at the specified entry, or, if the entry does not exist in the block, starts iterating at the first entry greater than the provided entry.

Parameters:
startEntry - the name of the entry to start iterating at
includeStart - If true, then the iterator will return startEntry, if it is in the dictionary.
Returns:
the iterator

iterator

public DictionaryIterator iterator(java.lang.Object startEntry,
                                   boolean includeStart,
                                   java.lang.Object stopEntry,
                                   boolean includeStop)
Creates an iterator that starts iterating at the specified startEntry and stops iterating at the specified stopEntry. If either entry does not exist in the block, the first entry greater than the entry provided will be used.

Parameters:
startEntry - the name of the entry to start iterating at, or null to start at the first entry
includeStart - If true, then the iterator will return startEntry, if it is in the dictionary.
stopEntry - the name of the entry to stop iterating at, or null to stop after the last entry
includeStop - if true and stopEntry is non-null, then the iterator will return stopEntry, if it is in the dictionary.
Returns:
the iterator

iterator

public DictionaryIterator iterator(int begin,
                                   int end)
Gets an iterator for the dictionary that starts and stops at the given indices in the dictionary. This can be used when we want to process pieces of a dictionary in different threads.

Parameters:
begin - the beginning index in the dictionary, counting from 0. If this value is less than zero it will be clamped to zero.
end - the ending index in the dictionary, counting from 0. If this value is greater than the number of entries in the dictionary, it will be limited to that number.
Returns:
an iterator for the dictionary. The elements of the iterator implement the Map.Entry interface

merge

public int[][] merge(IndexEntry entryFactory,
                     NameEncoder encoder,
                     DiskDictionary[] dicts,
                     EntryMapper[] mappers,
                     int[] starts,
                     int[][] postIDMaps,
                     java.io.RandomAccessFile mDictFile,
                     PostingsOutput[] postOut,
                     boolean appendPostings)
              throws java.io.IOException
Merges a number of dictionaries into a single dictionary. This method will be responsible for merging the postings lists associated with the entries and dumping the new dictionary and postings to the given channels.

Parameters:
entryFactory - An index entry that we can use to generate entries for the merged dictionary.
encoder - An encoder for the names in this dictionary.
dicts - The dictionaries to merge.
mappers - A set of entry mappers that will be applied to the dictionaries as entries are considered for the merge. If this parameter is null, then the entries in the merged dictionary will be renumbered in order of increasing name.
starts - The starting IDs for the new partition.
postIDMaps - Maps from old to new IDs for the IDs in our postings.
mDictFile - The file where the merged dictionary will be written.
postOut - The output where the postings for the merged dictionary will be written
appendPostings - true if postings should be appended rather than merged
Returns:
A set of maps from the old entry IDs to the entry IDs in the merged dictionary. The element [0][0] of this matrix contains the number of entries in the merged dictionary.
Throws:
java.io.IOException - when there is an error during the merge.

merge

public int[][] merge(IndexEntry entryFactory,
                     NameEncoder encoder,
                     PartitionStats partStats,
                     DiskDictionary[] dicts,
                     EntryMapper[] mappers,
                     int[] starts,
                     int[][] postIDMaps,
                     java.io.RandomAccessFile mDictFile,
                     PostingsOutput[] postOut,
                     boolean appendPostings)
              throws java.io.IOException
Merges a number of dictionaries into a single dictionary. This method will be responsible for merging the postings lists associated with the entries and dumping the new dictionary and postings to the given channels.

Parameters:
entryFactory - An index entry that we can use to generate entries for the merged dictionary.
encoder - An encoder for the names in this dictionary.
partStats - a set of partition statistics to which we'll add during the merge.
dicts - The dictionaries to merge.
mappers - A set of entry mappers that will be applied to the dictionaries as entries are considered for the merge. If this parameter is null, then the entries in the merged dictionary will be renumbered in order of increasing name.
starts - The starting IDs for the new partition.
postIDMaps - Maps from old to new IDs for the IDs in our postings.
mDictFile - The file where the merged dictionary will be written.
postOut - The output where the postings for the merged dictionary will be written
appendPostings - true if postings should be appended rather than merged
Returns:
A set of maps from the old entry IDs to the entry IDs in the merged dictionary. The element [0][0] of this matrix contains the max entry id in the new dict.
Throws:
java.io.IOException - when there is an error during the merge.

remapPostings

public void remapPostings(IndexEntry entryFactory,
                          NameEncoder encoder,
                          PartitionStats partStats,
                          int[] postMap,
                          java.io.RandomAccessFile dictFile,
                          PostingsOutput[] postOut)
                   throws java.io.IOException
Rewrites this dictionary to the files passed in while remapping IDs in the postings to the new IDs passed in.

Parameters:
entryFactory - factory to create new entries
encoder - name encoder to write new entries
partStats - the partition stats for this partition
postMap - a mapping from old postings ids to new ids
dictFile - a file to write the new dictionary in
postOut - a set of files to write the new postings in
Throws:
java.io.IOException - if there is any error reading or writing dictionaries

customSetup

protected void customSetup(IndexEntry me,
                           QueryEntry e,
                           int start,
                           int[] postIDMap)
Do any custom setup required for merging one entry onto another. This is intended for use by subclasses so that they don't need to override the entire merge method, which is a stone PITA to get right. At this level, this method does nothing.

Parameters:
me - the entry onto which we're merging postings.
e - the entry which we are about to merge into the merged entry.
start - the new starting ID for documents for the partition from which the entry we're going to merge are drawn.
postIDMap - a map from old to new IDs for the postings that we're about to merge.