com.sun.labs.minion.indexer.partition
Class MemoryPartition

java.lang.Object
  extended by com.sun.labs.minion.indexer.partition.Partition
      extended by com.sun.labs.minion.indexer.partition.MemoryPartition
All Implemented Interfaces:
com.sun.labs.util.props.Component, com.sun.labs.util.props.Configurable, java.lang.Comparable<Partition>
Direct Known Subclasses:
ClassifierMemoryPartition, ClusterMemoryPartition, InvFileMemoryPartition

public abstract class MemoryPartition
extends Partition

A class for holding a partition in memory while it is under construction. A partition consists of four files:

  1. The dictionary, which contains a mapping from terms to various offset information.
  2. The postings data for the files in the partition.
  3. The "document dictionary", which contains information about the documents indexed in this partition (e.g., the title)
  4. The taxonomy for the terms indexed in the partition.
Such a partition cannot be used for searching.

See Also:
DiskPartition, MemoryDictionary, MemoryFieldStore

Field Summary
protected  DocOccurrence ddo
          An occurrence that we can use to add data to postings for the document dictionary entries.
protected  DelMap del
          A deleted map to use when the same document comes along in the same partition.
protected  java.util.List<java.lang.Integer> deleted
           
protected  MemoryDictionary docDict
          The document dictionary.
protected  DocKeyEntry dockey
          The key for the document that we're currently processing, from the document dictionary.
protected static java.lang.String logTag
          The tag for this module.
protected  MemoryDictionary mainDict
          The main dictionary.
protected  java.lang.String name
           
protected  int nWords
          The number of words in the current document.
protected  long postBytes
          The number of bytes of postings data we've encoded so far.
 
Fields inherited from class com.sun.labs.minion.indexer.partition.Partition
DICT_OFFSETS_SIZE, docDictFactory, entryClass, entryName, indexConfig, mainDictFactory, mainDictFile, mainPostFiles, manager, maxID, nEntries, partNumber, PROP_DOC_DICT_FACTORY, PROP_INDEX_CONFIG, PROP_MAIN_DICT_FACTORY, PROP_PARTITION_MANAGER, stats
 
Constructor Summary
MemoryPartition()
           
 
Method Summary
protected  int dump()
          Dumps the current partition.
 void dump(IndexConfig iC)
          Tells a stage that its data must be dumped to the index.
protected  void dumpCustom(Entry[] sorted)
          Performs any custom data dump required in a subclass.
 DocKeyEntry getDocumentTerm(java.lang.String key)
          Gets an entry from the in-memory document dictionary.
 void newProperties(com.sun.labs.util.props.PropertySheet ps)
           
 void shutdown(IndexConfig iC)
          Shut down the indexing stage, dumping any collected data and reporting on our final progress.
 
Methods inherited from class com.sun.labs.minion.indexer.partition.Partition
compareTo, getAllFiles, getAllFiles, getDocFiles, getDocFiles, getIndexConfig, getMainFiles, getMainFiles, getManager, getName, getNDocs, getNumPostingsChannels, getPartitionNumber, getQueryConfig, getStats
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mainDict

protected MemoryDictionary mainDict
The main dictionary.


docDict

protected MemoryDictionary docDict
The document dictionary.


dockey

protected DocKeyEntry dockey
The key for the document that we're currently processing, from the document dictionary.


ddo

protected DocOccurrence ddo
An occurrence that we can use to add data to postings for the document dictionary entries.


del

protected DelMap del
A deleted map to use when the same document comes along in the same partition.


postBytes

protected long postBytes
The number of bytes of postings data we've encoded so far.


nWords

protected int nWords
The number of words in the current document.


logTag

protected static java.lang.String logTag
The tag for this module.


name

protected java.lang.String name

deleted

protected java.util.List<java.lang.Integer> deleted
Constructor Detail

MemoryPartition

public MemoryPartition()
Method Detail

dump

protected int dump()
            throws java.io.IOException
Dumps the current partition.

Returns:
The partition number for the dumped partition.
Throws:
java.io.IOException - if there is any error writing the partition data to disk

getDocumentTerm

public DocKeyEntry getDocumentTerm(java.lang.String key)
Gets an entry from the in-memory document dictionary. This can be used to get a document vector for a document that has not been committed to disk.

Parameters:
key - the key of the document that we want the entry for
Returns:
the entry for the given key, or null if this key doesn't occur in this partition.

dumpCustom

protected void dumpCustom(Entry[] sorted)
                   throws java.io.IOException
Performs any custom data dump required in a subclass. This method exists to be overridden in a subclass and provides no functionality at this level.

Parameters:
sorted - the sorted array of main dictionary entries, which might be useful in a subclass
Throws:
java.io.IOException - if there is any error writing the data to disk

dump

public void dump(IndexConfig iC)
Tells a stage that its data must be dumped to the index.

Parameters:
iC - The configuration for the index, which can be used to retrieve things like the index directory.

shutdown

public void shutdown(IndexConfig iC)
Shut down the indexing stage, dumping any collected data and reporting on our final progress.

Parameters:
iC - The configuration for the index, which can be used to retrieve things like the index directory.

newProperties

public void newProperties(com.sun.labs.util.props.PropertySheet ps)
                   throws com.sun.labs.util.props.PropertyException
Specified by:
newProperties in interface com.sun.labs.util.props.Configurable
Overrides:
newProperties in class Partition
Throws:
com.sun.labs.util.props.PropertyException