com.sun.labs.minion.indexer.dictionary
Class DiskFieldStore

java.lang.Object
  extended by com.sun.labs.minion.indexer.dictionary.FieldStore
      extended by com.sun.labs.minion.indexer.dictionary.DiskFieldStore

public class DiskFieldStore
extends FieldStore

A field store that can be used for querying operations.


Field Summary
protected static java.lang.String logTag
          The tag for this module.
protected  int nDocs
          The number of documents.
protected  DiskPartition part
          The partition this field store is associated with.
 
Fields inherited from class com.sun.labs.minion.indexer.dictionary.FieldStore
header, metaFile, savedFields
 
Constructor Summary
DiskFieldStore(DiskPartition part, java.io.RandomAccessFile dictFile, java.io.RandomAccessFile[] postFiles, DictionaryFactory fieldStoreDictFactory, DictionaryFactory bigramDictFactory, MetaFile metaFile)
          Reads the field store from the provided file.
 
Method Summary
 void close()
          Closes the field store.
 double[] euclideanDistance(double[] vec, java.lang.String field)
          Computes the euclidean distance between the given document and all documents.
 java.lang.Object getDefaultSavedFieldData(FieldInfo fi)
          Get the default value for a saved field.
 java.lang.Object getDefaultSavedFieldData(java.lang.String name)
          Get the default value for a saved field.
 BasicField.Fetcher getFetcher(FieldInfo fi)
           
 BasicField.Fetcher getFetcher(java.lang.String field)
           
 DictionaryIterator getFieldIterator(java.lang.String name, boolean caseSensitive, java.lang.Object lowerBound, boolean includeLower, java.lang.Object upperBound, boolean includeUpper)
          Gets an iterator for the values in a given range in a field.
 PostingsIterator getFieldPostings(java.lang.String name, java.lang.Object value, boolean caseSensitive)
          Gets the postings associated with a particular field value.
 java.util.Iterator getFields(int docID)
          Gets an interator for the field values for a given document.
 FieldInfo.Type getFieldType(java.lang.String name)
          Gets the type of hte named field, if it is a saved field.
 java.util.SortedSet<FieldValue> getMatching(java.lang.String field, java.lang.String pattern)
          Gets the values for the given field that match the given pattern.
 DictionaryIterator getMatchingIterator(java.lang.String name, java.lang.String val, boolean caseSensitive, int maxEntries, long timeLimit)
          Gets an iterator for the character saved field values that match a given wildcard pattern.
 SavedField getSavedField(FieldInfo fi)
          Gets a saved field from a field name.
 SavedField getSavedField(java.lang.String name)
          Gets a saved field from a field name.
 java.lang.Object getSavedFieldData(FieldInfo fi, int docID, boolean all)
           
 java.lang.Object getSavedFieldData(java.lang.String name, int docID, boolean all)
          Gets saved data for a particular field.
 java.util.Map<java.lang.String,java.util.List> getSavedFields(int docID)
          Gets a map from saved field names to the saved field values for those fields.
 DictionaryIterator getSubstringIterator(java.lang.String name, java.lang.String val, boolean caseSensitive, boolean starts, boolean ends, int maxEntries, long timeLimit)
          Gets an iterator for the character saved field values that contain a given substring.
protected  SavedField makeSavedField(FieldInfo fi, java.io.RandomAccessFile dictFile, java.io.RandomAccessFile[] postFiles, DictionaryFactory fieldStoreDictFactory, DictionaryFactory bigramDictFactory, DiskPartition part)
          Makes a saved field instance of the appropriate type.
 void merge(DiskFieldStore[] stores, int maxID, int[] starts, int[] nUndel, int[][] docIDMaps, java.io.RandomAccessFile dictFile, PostingsOutput postOut)
          Merges a number of field stores into a single store.
 
Methods inherited from class com.sun.labs.minion.indexer.dictionary.FieldStore
getFieldArray, getFieldID, getFieldInfo, getFieldInfo, getFieldName, getMultArray, getMultArray, getNFields, getVectoredFields, isSavedField
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

part

protected DiskPartition part
The partition this field store is associated with.


nDocs

protected int nDocs
The number of documents.


logTag

protected static java.lang.String logTag
The tag for this module.

Constructor Detail

DiskFieldStore

public DiskFieldStore(DiskPartition part,
                      java.io.RandomAccessFile dictFile,
                      java.io.RandomAccessFile[] postFiles,
                      DictionaryFactory fieldStoreDictFactory,
                      DictionaryFactory bigramDictFactory,
                      MetaFile metaFile)
               throws java.io.IOException
Reads the field store from the provided file.

Parameters:
part - The partition that this field store is associated with.
dictFile - The file containing the dictionaries for the saved fields.
postFiles - The files containing the postings for the saved fields.
metaFile - The meta file to use to get field information.
Throws:
java.io.IOException - if there is an error during reading
Method Detail

makeSavedField

protected SavedField makeSavedField(FieldInfo fi,
                                    java.io.RandomAccessFile dictFile,
                                    java.io.RandomAccessFile[] postFiles,
                                    DictionaryFactory fieldStoreDictFactory,
                                    DictionaryFactory bigramDictFactory,
                                    DiskPartition part)
                             throws java.io.IOException
Makes a saved field instance of the appropriate type.

Throws:
java.io.IOException

close

public void close()
           throws java.io.IOException
Closes the field store.

Throws:
java.io.IOException

getSavedField

public SavedField getSavedField(java.lang.String name)
Gets a saved field from a field name.


getSavedField

public SavedField getSavedField(FieldInfo fi)
Gets a saved field from a field name.


getFieldType

public FieldInfo.Type getFieldType(java.lang.String name)
Gets the type of hte named field, if it is a saved field.

Overrides:
getFieldType in class FieldStore
Parameters:
name - The name of the field.
Returns:
The type of the field, as defined in (@link com.sun.labs.minion.FieldInfo.Type}. If the name is not the name of a saved field, then FieldInfo.Type.NONE is returned.

getSavedFieldData

public java.lang.Object getSavedFieldData(java.lang.String name,
                                          int docID,
                                          boolean all)
Gets saved data for a particular field.

Parameters:
name - The name of the field.
docID - The document whose field value we want.
all - If true, return all known values for the field in the given document. If false return only one value.
Returns:
If all is true, then return a List of the values stored in the given field in the given document. The elements of the list will have a type that is appropriate to the type of the saved field. If all is false, a single value of the appropriate type will be returned.

If the given name is not the name of a saved field, or the document ID is invalid, null will be returned.


getSavedFieldData

public java.lang.Object getSavedFieldData(FieldInfo fi,
                                          int docID,
                                          boolean all)

getDefaultSavedFieldData

public java.lang.Object getDefaultSavedFieldData(java.lang.String name)
Get the default value for a saved field.


getDefaultSavedFieldData

public java.lang.Object getDefaultSavedFieldData(FieldInfo fi)
Get the default value for a saved field.


euclideanDistance

public double[] euclideanDistance(double[] vec,
                                  java.lang.String field)
Computes the euclidean distance between the given document and all documents. The distance is based on the features stored in the saved field with the given name.


getFields

public java.util.Iterator getFields(int docID)
Gets an interator for the field values for a given document.


getFetcher

public BasicField.Fetcher getFetcher(FieldInfo fi)

getFetcher

public BasicField.Fetcher getFetcher(java.lang.String field)

getMatching

public java.util.SortedSet<FieldValue> getMatching(java.lang.String field,
                                                   java.lang.String pattern)
Gets the values for the given field that match the given pattern.

Parameters:
field - the saved, string field against whose values we will match. If the named field is not saved or is not a string field, then null will be returned.
pattern - the pattern for which we'll find matching field values.
Returns:
a sorted set of field values. This set will be ordered by the proportion of the field value that is covered by the given pattern. If the named field is not saved or is not a string, then null will be returned.

getFieldIterator

public DictionaryIterator getFieldIterator(java.lang.String name,
                                           boolean caseSensitive,
                                           java.lang.Object lowerBound,
                                           boolean includeLower,
                                           java.lang.Object upperBound,
                                           boolean includeUpper)
Gets an iterator for the values in a given range in a field.

Parameters:
name - The name of the field we need an iterator for.
caseSensitive - If true, case should be taken into account when iterating through the values. This value will only be observed for character fields!
lowerBound - The lower bound on the iterator. If null, only the upper bound is considered and the iteration will commence with the first term in the dictionary.
includeLower - If true, then the lower bound will be included in the terms returned by the iterator, if it occurs in the dictionary.
upperBound - The upper bound on the iterator. If null, only the lower bound is considered and the iteration will end at the last term in the dictionary.
includeUpper - If true, then the upper bound will be included in the terms returned by the iterator, if it occurs in the dictionary.
Returns:
An iterator for the dictionary entries contained in the range, or null if there is no such range or the named field is not a saved field.

getMatchingIterator

public DictionaryIterator getMatchingIterator(java.lang.String name,
                                              java.lang.String val,
                                              boolean caseSensitive,
                                              int maxEntries,
                                              long timeLimit)
Gets an iterator for the character saved field values that match a given wildcard pattern.

Parameters:
name - The name of the field whose values we wish to match against.
val - The wildcard value against which we will match.
caseSensitive - If true, then case will be taken into account during the match.
maxEntries - The maximum number of entries to return. If zero or negative, return all possible entries.
timeLimit - The maximum amount of time (in milliseconds) to spend trying to find matches. If zero or negative, no time limit is imposed.

getSubstringIterator

public DictionaryIterator getSubstringIterator(java.lang.String name,
                                               java.lang.String val,
                                               boolean caseSensitive,
                                               boolean starts,
                                               boolean ends,
                                               int maxEntries,
                                               long timeLimit)
Gets an iterator for the character saved field values that contain a given substring.

Parameters:
name - The name of the field whose values we wish to match against.
val - The substring that we are looking for.
caseSensitive - If true, then case will be taken into account during the match.
starts - If true, the value must start with the given substring.
ends - If true, the value must end with the given substring.
maxEntries - The maximum number of entries to return. If zero or negative, return all possible entries.
timeLimit - The maximum amount of time (in milliseconds) to spend trying to find matches. If zero or negative, no time limit is imposed.

getFieldPostings

public PostingsIterator getFieldPostings(java.lang.String name,
                                         java.lang.Object value,
                                         boolean caseSensitive)
Gets the postings associated with a particular field value.

Parameters:
name - The name of the field for which we want postings.
value - The value from the field for which we want postings.
caseSensitive - If true, case should be taken into account when iterating through the values. This value will only be observed for character fields!
Returns:
The postings associated with that value, or null if there is no such value in the field.

merge

public void merge(DiskFieldStore[] stores,
                  int maxID,
                  int[] starts,
                  int[] nUndel,
                  int[][] docIDMaps,
                  java.io.RandomAccessFile dictFile,
                  PostingsOutput postOut)
           throws java.io.IOException
Merges a number of field stores into a single store.

Parameters:
stores - the field stores to merge.
maxID - The maximum document ID in the merged partition.
starts - The new starting document IDs for the partitions.
nUndel - The number of documents that are not deleted in each of the partitionsS
dictFile - The file where the merged dictionaries will be written.
postOut - The output where the merged postings will be written.
docIDMaps - The maps from old to new document IDs.
Throws:
java.io.IOException - when there is an error writing the file.

getSavedFields

public java.util.Map<java.lang.String,java.util.List> getSavedFields(int docID)
Gets a map from saved field names to the saved field values for those fields.