com.sun.labs.minion.indexer.dictionary
Class MemoryFieldStore

java.lang.Object
  extended by com.sun.labs.minion.indexer.dictionary.FieldStore
      extended by com.sun.labs.minion.indexer.dictionary.MemoryFieldStore

public class MemoryFieldStore
extends FieldStore

A field store to be used during indexing.


Field Summary
protected  int[] activeFields
          A set of active fields.
protected  int currDoc
          The ID of the current document that we're processing.
protected  Stack fieldStack
          A stack of the fields that we're processing.
protected  boolean inDocument
          Whether we're in a document.
protected static java.lang.String logTag
          The tag for this module.
protected  int nActive
          The count of currently active fields.
protected  boolean shouldIndex
          A boolean indicating whether words should be indexed or not.
protected  boolean shouldVector
          A boolean indicating whether words should be added to the document vector or not.
 
Fields inherited from class com.sun.labs.minion.indexer.dictionary.FieldStore
header, metaFile, savedFields
 
Constructor Summary
MemoryFieldStore(MetaFile f)
          Constructs the field store for use.
 
Method Summary
 void clear()
          Clears the saved fields for the next indexing run.
 FieldInfo defineField(FieldInfo fi)
          Defines a field, given a field information object.
 void dump(java.lang.String path, java.io.RandomAccessFile dictFile, PostingsOutput[] postOut)
          Dump the field store to disk.
 void endDocument()
          Ends the document.
 void endField()
          Tells the field store that a field has ended.
 int[] getActiveFields()
          Gets the active fields list.
 SavedField getSavedField(FieldInfo fi)
           
protected  SavedField makeSavedField(FieldInfo fi)
          Creates a saved field entry based on the type of the field.
 void saveData(FieldInfo cfi, FieldInfo sfi, ClassificationResult r)
           
 void saveData(FieldInfo fi, int docID, java.lang.Object data)
           
 void saveData(int docID, java.lang.Object data)
          Saves the given data in the current field.
 boolean shouldIndex()
          A boolean indicating whether words should be indexed or not.
 boolean shouldVector()
          Indicates whether a field should contribute tokens to the document vector.
 void startDocument(int docID)
          Tells the field store that a new document has been started.
 int startField(FieldInfo f)
          Tells the field store that a particular field has started.
 
Methods inherited from class com.sun.labs.minion.indexer.dictionary.FieldStore
getFieldArray, getFieldID, getFieldInfo, getFieldInfo, getFieldName, getFieldType, getMultArray, getMultArray, getNFields, getVectoredFields, isSavedField
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

nActive

protected int nActive
The count of currently active fields.


activeFields

protected int[] activeFields
A set of active fields.


fieldStack

protected Stack fieldStack
A stack of the fields that we're processing.


currDoc

protected int currDoc
The ID of the current document that we're processing.


inDocument

protected boolean inDocument
Whether we're in a document. Used to catch the case when we start a new document before ending the old one.


shouldIndex

protected boolean shouldIndex
A boolean indicating whether words should be indexed or not.


shouldVector

protected boolean shouldVector
A boolean indicating whether words should be added to the document vector or not.


logTag

protected static java.lang.String logTag
The tag for this module.

Constructor Detail

MemoryFieldStore

public MemoryFieldStore(MetaFile f)
Constructs the field store for use.

Method Detail

defineField

public FieldInfo defineField(FieldInfo fi)
Defines a field, given a field information object.

Parameters:
fi - the information for the field that we want to define
Returns:
the field information for the field that we want to define

getActiveFields

public int[] getActiveFields()
Gets the active fields list.


startDocument

public void startDocument(int docID)
Tells the field store that a new document has been started. This will flush any unsaved data.


startField

public int startField(FieldInfo f)
Tells the field store that a particular field has started.

Parameters:
f - The FieldInfo object for the field that is starting.
Returns:
the fieldID for this field.

saveData

public void saveData(int docID,
                     java.lang.Object data)
Saves the given data in the current field.


saveData

public void saveData(FieldInfo cfi,
                     FieldInfo sfi,
                     ClassificationResult r)

saveData

public void saveData(FieldInfo fi,
                     int docID,
                     java.lang.Object data)

getSavedField

public SavedField getSavedField(FieldInfo fi)

makeSavedField

protected SavedField makeSavedField(FieldInfo fi)
Creates a saved field entry based on the type of the field.


endField

public void endField()
Tells the field store that a field has ended.


endDocument

public void endDocument()
Ends the document. Will flush the field stack completely, that is, it will appear that any open fields ended at the document end. If any fields are open, we will log a warning for each open field.


dump

public void dump(java.lang.String path,
                 java.io.RandomAccessFile dictFile,
                 PostingsOutput[] postOut)
          throws java.io.IOException
Dump the field store to disk. This mostly requires dumping the saved fields.

Parameters:
path - The path to the directory where the field store will be written.
dictFile - The file to which the field dictionaries will be written.
postOut - The outputs to which the field postings will be written.
Throws:
java.io.IOException - if there is an error during writing.

clear

public void clear()
Clears the saved fields for the next indexing run.


shouldIndex

public boolean shouldIndex()
A boolean indicating whether words should be indexed or not.


shouldVector

public boolean shouldVector()
Indicates whether a field should contribute tokens to the document vector.