com.sun.labs.minion.indexer.dictionary
Class BasicField

java.lang.Object
  extended by com.sun.labs.minion.indexer.dictionary.BasicField
All Implemented Interfaces:
SavedField, java.lang.Comparable

public class BasicField
extends java.lang.Object
implements SavedField

A class to hold the data for a saved field during indexing.

See Also:
FieldInfo, MemoryFieldStore

Nested Class Summary
 class BasicField.Fetcher
          A class that can be used when you want to get a lot of field values for a particular field, for example, when sorting or clustering results by a particular field.
 
Field Summary
protected  DiskBiGramDictionary bigrams
          A bigram dictionary that we can use for character fields.
protected  CDateParser dp
          A date parser for date fields.
protected  ReadableBuffer dtvData
          A buffer containing the actual dtv data at query time.
protected  ReadableBuffer dtvOffsets
          A buffer containing the dtv offsets at query time.
protected  java.util.List[] dv
          An array of the sets of entries stored per document at indexing time.
protected  int dvPos
          The current postition in the dtv array, that is, where the next document ID will be added.
protected  FieldInfo field
          The field info object for this field.
protected  com.sun.labs.minion.indexer.dictionary.SavedFieldHeader header
          The header for this field.
protected static java.lang.String logTag
          The log tag.
protected  int nBytes
          The number of bytes we're using to store data.
protected  Dictionary values
          A dictionary to use for the saved field data.
 
Constructor Summary
protected BasicField()
          Default constructor for subclasses.
  BasicField(FieldInfo field)
          Constructs a saved field that will be used to store data during indexing.
  BasicField(FieldInfo field, java.io.RandomAccessFile dictFile, java.io.RandomAccessFile[] postFiles, DictionaryFactory fieldStoreDictFactory, DictionaryFactory bigramDictFactory, DiskPartition part)
          Constructs a saved field that will be used to retrieve data during querying.
 
Method Summary
 void add(int docID, java.lang.Object data)
          Adds data to a saved field.
 void clear()
          Clears a saved field, if it's open for indexing.
 int compareTo(java.lang.Object o)
          Compares saved fields according to the field ID.
 void dump(java.lang.String path, java.io.RandomAccessFile dictFile, PostingsOutput[] postOut, int maxID)
          Writes the data to the provided stream.
 QueryEntry get(java.lang.Object v, boolean caseSensitive)
          Gets a particular value from the field.
static java.lang.Class getEntryClass(FieldInfo field)
          Gets an entry class appropriate to the type of the given field.
protected  java.lang.Object getEntryName(java.lang.Object val)
          Gets a name for a given saved value, parsing as necessary.
 BasicField.Fetcher getFetcher()
           
 FieldInfo getField()
          Get the field info object for this field.
 java.util.SortedSet<FieldValue> getMatching(java.lang.String pattern)
           
protected static NameDecoder getNameDecoder(FieldInfo field)
          Gets a name decoder of the appropriate type for the given field.
protected static NameEncoder getNameEncoder(FieldInfo field)
          Gets a name encoder of the appropriate type for the given field.
 java.lang.Object getSavedData(int docID, boolean all)
          Retrieve data from a saved field.
 ArrayGroup getSimilar(ArrayGroup ag, java.lang.String value, boolean matchCase)
           
 ArrayGroup getUndefined(ArrayGroup ag)
          Gets a group of all the documents that do not have any values saved for this field.
 boolean hasSavedValues(int docID)
          Indicates whether a given document has saved data for this field.
 DictionaryIterator iterator(java.lang.Object lowerBound, boolean includeLower, java.lang.Object upperBound, boolean includeUpper)
          Gets an iterator for the values in this field.
 void merge(java.lang.String path, SavedField[] fields, int maxID, int[] starts, int[] nUndel, int[][] docIDMaps, java.io.RandomAccessFile dictFile, PostingsOutput postOut)
          Merges a number of saved fields.
 int size()
          Gets the number of saved terms that we're storing.
protected  java.util.Iterator valueIterator()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

field

protected FieldInfo field
The field info object for this field. Used during indexing.


values

protected Dictionary values
A dictionary to use for the saved field data.


dv

protected java.util.List[] dv
An array of the sets of entries stored per document at indexing time.


dvPos

protected int dvPos
The current postition in the dtv array, that is, where the next document ID will be added.


dtvOffsets

protected ReadableBuffer dtvOffsets
A buffer containing the dtv offsets at query time.


dtvData

protected ReadableBuffer dtvData
A buffer containing the actual dtv data at query time.


bigrams

protected DiskBiGramDictionary bigrams
A bigram dictionary that we can use for character fields.


nBytes

protected int nBytes
The number of bytes we're using to store data.


header

protected com.sun.labs.minion.indexer.dictionary.SavedFieldHeader header
The header for this field.


dp

protected CDateParser dp
A date parser for date fields.


logTag

protected static java.lang.String logTag
The log tag.

Constructor Detail

BasicField

protected BasicField()
Default constructor for subclasses.


BasicField

public BasicField(FieldInfo field)
Constructs a saved field that will be used to store data during indexing.

Parameters:
field - The FieldInfo for this saved field.

BasicField

public BasicField(FieldInfo field,
                  java.io.RandomAccessFile dictFile,
                  java.io.RandomAccessFile[] postFiles,
                  DictionaryFactory fieldStoreDictFactory,
                  DictionaryFactory bigramDictFactory,
                  DiskPartition part)
           throws java.io.IOException
Constructs a saved field that will be used to retrieve data during querying.

Parameters:
field - The FieldInfo for this saved field.
dictFile - The file containing the dictionary for this field.
postFiles - The files containing the postings for this field.
part - The disk partition that this field is associated with.
Throws:
java.io.IOException - if there is any error loading the field data.
Method Detail

getEntryClass

public static java.lang.Class getEntryClass(FieldInfo field)
Gets an entry class appropriate to the type of the given field.


add

public void add(int docID,
                java.lang.Object data)
Adds data to a saved field.

Specified by:
add in interface SavedField
Parameters:
docID - the document ID for the document containing the saved data
data - The actual field data.

getNameDecoder

protected static NameDecoder getNameDecoder(FieldInfo field)
Gets a name decoder of the appropriate type for the given field.

Parameters:
field - The field for which we want a name decoder.

getNameEncoder

protected static NameEncoder getNameEncoder(FieldInfo field)
Gets a name encoder of the appropriate type for the given field.

Parameters:
field - The field for which we want a name encoder.

getEntryName

protected java.lang.Object getEntryName(java.lang.Object val)
Gets a name for a given saved value, parsing as necessary.

Parameters:
val - The value that we were passed.
Returns:
A name appropriate for the given value and field type.

dump

public void dump(java.lang.String path,
                 java.io.RandomAccessFile dictFile,
                 PostingsOutput[] postOut,
                 int maxID)
          throws java.io.IOException
Writes the data to the provided stream.

Specified by:
dump in interface SavedField
Parameters:
path - The path of the index directory.
dictFile - The file where the dictionary will be written.
postOut - A place to write the postings associated with the values.
maxID - The maximum document ID for this partition.
Throws:
java.io.IOException - if there is an error during the writing.

hasSavedValues

public boolean hasSavedValues(int docID)
Indicates whether a given document has saved data for this field.

Parameters:
docID - the document ID for the document that we wish to check.
Returns:
true if this document ID has saved values, false otherwise.

getSavedData

public java.lang.Object getSavedData(int docID,
                                     boolean all)
Retrieve data from a saved field.

Specified by:
getSavedData in interface SavedField
Parameters:
docID - the document ID that we want data for.
all - If true, return all known values for the field in the given document. If false return only one value.
Returns:
If all is true, then return a List of the values stored in the given field in the given document. If all is false, a single value of the appropriate type will be returned.

If the given name is not the name of a saved field, or the document ID is invalid, null will be returned.


get

public QueryEntry get(java.lang.Object v,
                      boolean caseSensitive)
Gets a particular value from the field.

Specified by:
get in interface SavedField
Parameters:
v - The value to get.
caseSensitive - If true, case should be taken into account when iterating through the values. This value will only be observed for character fields!
Returns:
The term associated with that name, or null if that term doesn't occur in the indexed material.

getUndefined

public ArrayGroup getUndefined(ArrayGroup ag)
Gets a group of all the documents that do not have any values saved for this field.

Specified by:
getUndefined in interface SavedField
Parameters:
ag - a set of documents to which we should restrict the search for documents with undefined field values. If this is null then there is no such restriction.
Returns:
a set of documents that have no defined values for this field. This set may be restricted to documents occurring in the group that was passed in.

getSimilar

public ArrayGroup getSimilar(ArrayGroup ag,
                             java.lang.String value,
                             boolean matchCase)

getMatching

public java.util.SortedSet<FieldValue> getMatching(java.lang.String pattern)

iterator

public DictionaryIterator iterator(java.lang.Object lowerBound,
                                   boolean includeLower,
                                   java.lang.Object upperBound,
                                   boolean includeUpper)
Gets an iterator for the values in this field. If this is a string field only the values that actually occurred will be returned.

Specified by:
iterator in interface SavedField
Parameters:
lowerBound - the name of the entry that will be the lower bound of the iterator, or null if there is no such bound
includeLower - whether the lower bound should be included in the results of the iterator
upperBound - the name of the entry that will be the upper bound of the iterator, or null if there is no such bound
includeUpper - whether the upper bound should be included in the results of the iterator
Returns:
an iterator for the entries in the dictionary

merge

public void merge(java.lang.String path,
                  SavedField[] fields,
                  int maxID,
                  int[] starts,
                  int[] nUndel,
                  int[][] docIDMaps,
                  java.io.RandomAccessFile dictFile,
                  PostingsOutput postOut)
           throws java.io.IOException
Merges a number of saved fields.

Specified by:
merge in interface SavedField
Parameters:
path - The path to the index directory.
fields - An array of fields to merge.
maxID - The max doc ID in the new partition
starts - The new starting document IDs for the partitions.
docIDMaps - A map for each partition from old document IDs to new document IDs. IDs that map to a value less than 0 have been deleted. A null array means that the old IDs are the new IDs.
dictFile - The file to which the merged dictionaries will be written.
postOut - The output to which the merged postings will be written.
nUndel - The number of undeleted documents in each partition
Throws:
java.io.IOException - if there is an error during the merge.

valueIterator

protected java.util.Iterator valueIterator()

size

public int size()
Gets the number of saved terms that we're storing.

Specified by:
size in interface SavedField

clear

public void clear()
Clears a saved field, if it's open for indexing.

Specified by:
clear in interface SavedField

compareTo

public int compareTo(java.lang.Object o)
Compares saved fields according to the field ID.

Specified by:
compareTo in interface java.lang.Comparable

getField

public FieldInfo getField()
Get the field info object for this field. Used during indexing.

Specified by:
getField in interface SavedField
Returns:
the FieldInfo

getFetcher

public BasicField.Fetcher getFetcher()