|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.sun.labs.minion.indexer.partition.Partition
com.sun.labs.minion.indexer.partition.DiskPartition
com.sun.labs.minion.indexer.partition.InvFileDiskPartition
public class InvFileDiskPartition
A disk partition that holds data that is specific to the implementation of an inverted file. It extends the disk partition to add bigrams and a field store to the main and document dictionaries already present in the superclass.
| Field Summary | |
|---|---|
protected DiskBiGramDictionary |
bigramDict
Bigrams from the main dictionary. |
protected DictionaryFactory |
bigramDictFactory
A factory for bigram dictionaries that will be used by the main dictioanry and by the field store. |
protected java.io.RandomAccessFile |
bigramDictFile
The stream for the bigram dictionaries. |
protected long |
bigramDictOffset
The offset of the bigrams in the main dictionary. |
protected java.io.RandomAccessFile |
bigramPostFile
The stream for the bigram postings. |
protected java.io.RandomAccessFile |
fieldDictFile
The stream for the field store dictionaries. |
protected java.io.RandomAccessFile |
fieldPostFile
The stream for the field store postings. |
protected DiskFieldStore |
fields
The field store. |
protected DictionaryFactory |
fieldStoreDictFactory
A factory for the dictionaries that the field store will use for saved field values. |
protected static java.lang.String |
logTag
|
protected DiskDictionary |
ngrams
The ngram dictionary. |
protected DiskTaxonomy |
taxonomy
A disk taxonomy, if one exists. |
| Fields inherited from class com.sun.labs.minion.indexer.partition.DiskPartition |
|---|
BUFF_SIZE, deletions, delFile, delFileLock, docDict, docDictFile, docPostFile, documentDictFactory, dvl, ignored, mainDict, mainFiles, MATCH_CUT_OFF, MIN_LEN, removedFile, termCache |
| Fields inherited from class com.sun.labs.minion.indexer.partition.Partition |
|---|
DICT_OFFSETS_SIZE, docDictFactory, entryClass, entryName, indexConfig, mainDictFactory, mainDictFile, mainPostFiles, manager, maxID, nEntries, partNumber, PROP_DOC_DICT_FACTORY, PROP_INDEX_CONFIG, PROP_MAIN_DICT_FACTORY, PROP_PARTITION_MANAGER, stats |
| Constructor Summary | |
|---|---|
InvFileDiskPartition(int partNumber,
PartitionManager manager,
DictionaryFactory mainDictFactory,
DictionaryFactory documentDictFactory,
DictionaryFactory fieldStoreDictFactory,
DictionaryFactory bigramDictFactory,
boolean cacheVectorLengths,
int termCacheSize)
Opens a partition with a given number |
|
| Method Summary | |
|---|---|
boolean |
close(long currTime)
Close the files associated with this partition. |
double[] |
euclideanDistance(double[] vec,
java.lang.String field)
Computes the euclidean distance between the given document and all documents. |
void |
export(java.io.PrintWriter o)
Exports the data in this partition to an XML file format. |
protected java.io.File[] |
getAllFiles()
Gets all the files associated with a partition, including those specific to the inverted file. |
protected static java.io.File[] |
getAllFiles(PartitionManager manager,
int partNumber)
Gets all the files associated with a partition, including those specific to the inverted file. |
protected java.io.File[] |
getBigramFiles()
Gets the files associated with the bigram postings for a partition. |
int |
getFieldCount()
Gets the number of defined fields. |
protected java.io.File[] |
getFieldFiles()
Gets the files associated with the field store for a partition. |
DictionaryIterator |
getFieldIterator(java.lang.String name)
Gets an iterator for all of the values in a field. |
DictionaryIterator |
getFieldIterator(java.lang.String name,
boolean caseSensitive,
java.lang.Object lowerBound,
boolean includeLower,
java.lang.Object upperBound,
boolean includeUpper)
Gets an iterator for the values in a given range in a field. |
PostingsIterator |
getFieldPostings(java.lang.String name,
java.lang.Object value,
boolean caseSensitive)
Gets the postings associated with a particular field value. |
int |
getFieldSize(java.lang.String name)
|
DiskFieldStore |
getFieldStore()
Gets the field store associated with this partition. |
QueryEntry[] |
getMatching(java.lang.String pat,
boolean caseSensitive,
int maxEntries,
long timeLimit)
Gets the entries matching the given pattern |
DictionaryIterator |
getMatchingIterator(java.lang.String name,
java.lang.String val,
boolean caseSensitive)
Gets an iterator for the character saved field values that match a given wildcard pattern. |
java.lang.Object |
getSavedFieldData(FieldInfo fi,
int docID,
boolean all)
|
java.util.List |
getSavedFieldData(java.lang.String name,
int docID)
Gets all of the data saved in a given field. |
java.lang.Object |
getSavedFieldData(java.lang.String name,
int docID,
boolean all)
Gets some or all of the data saved in a given field. |
java.util.List |
getSavedFieldData(java.lang.String name,
java.lang.String key)
Gets all of the the data saved in a given field, in a given document. |
java.lang.Object |
getSavedFieldData(java.lang.String name,
java.lang.String key,
boolean all)
Gets some or all of the data saved in a given field, in a given document. |
java.util.Map<java.lang.String,java.util.List> |
getSavedFields(int docID)
Gets an iterator for all the saved fields in a document. |
QueryEntry[] |
getSpellingVariants(java.lang.String pat,
boolean caseSensitive,
int maxEntries,
long timeLimit)
Gets the spelling variants of a term |
QueryEntry[] |
getStemMatches(java.lang.String term,
boolean caseSensitive,
int minLen,
float matchCutOff,
int maxEntries,
long timeLimit)
Gets the entries that match the stem of the given term. |
QueryEntry[] |
getStemMatches(java.lang.String term,
boolean caseSensitive,
int maxEntries,
long timeLimit)
Gets the entries that match the stem of the given term. |
QueryEntry[] |
getSubstring(java.lang.String pat,
boolean caseSensitive,
int maxEntries,
long timeLimit)
Gets the entries containing the given substring. |
DictionaryIterator |
getSubstringIterator(java.lang.String name,
java.lang.String val,
boolean caseSensitive,
boolean starts,
boolean ends)
Gets an iterator for the character saved field values that contain a given substring. |
java.util.Set |
getSubsumed(java.lang.String name)
Gets the entries subsumed by a given name. |
DiskTaxonomy |
getTaxonomy()
|
protected void |
initAll()
Initializes everything all at once. |
protected void |
initBigramDict()
Initializes the bigram dictionary, if necessary. |
protected void |
initFields()
Initializes the field store, if necessary. |
protected void |
initTaxonomy()
Initialise the taxonomy, should one be necessary. |
protected void |
mergeCustom(int newPartNumber,
DiskPartition[] sortedParts,
int[][] idMaps,
int newMaxDocID,
int[] docIDStart,
int[] nUndel,
int[][] docIDMaps)
Provides a place to merge data that is specific to a subclass of disk partition. |
protected static void |
reap(PartitionManager m,
int n)
Reaps the given partition. |
| Methods inherited from class com.sun.labs.minion.indexer.partition.DiskPartition |
|---|
close, createRemoveFile, delete, deleteDocument, deleteDocument, docsAreMerged, getAverageDocumentLength, getCloseTime, getDeletedDocumentsMap, getDelMap, getDocIDMap, getDocumentIterator, getDocumentIterator, getDocumentLength, getDocumentTerm, getDocumentTerm, getDocumentVectorLength, getDocumentVectorLength, getDocumentVectorLength, getDVL, getInputBuffers, getMainDictionary, getMainDictionaryIterator, getMainDictionaryIterator, getMainIterator, getMaxDocumentID, getMaxTermID, getNDocs, getNEntries, getNTokens, getTerm, getTerm, getTerm, getTerm, getTermCache, initDocDict, initDVL, initMainDict, initMainFiles, isDeleted, isIndexed, merge, merge, normalize, setCloseTime, syncDeletedMap, toString, updatePartition |
| Methods inherited from class com.sun.labs.minion.indexer.partition.Partition |
|---|
compareTo, getDocFiles, getDocFiles, getIndexConfig, getMainFiles, getMainFiles, getManager, getName, getNumPostingsChannels, getPartitionNumber, getQueryConfig, getStats, newProperties |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
protected DictionaryFactory fieldStoreDictFactory
protected DictionaryFactory bigramDictFactory
protected DiskBiGramDictionary bigramDict
protected DiskTaxonomy taxonomy
protected long bigramDictOffset
protected DiskFieldStore fields
protected DiskDictionary ngrams
protected java.io.RandomAccessFile bigramDictFile
protected java.io.RandomAccessFile bigramPostFile
protected java.io.RandomAccessFile fieldDictFile
protected java.io.RandomAccessFile fieldPostFile
protected static java.lang.String logTag
| Constructor Detail |
|---|
public InvFileDiskPartition(int partNumber,
PartitionManager manager,
DictionaryFactory mainDictFactory,
DictionaryFactory documentDictFactory,
DictionaryFactory fieldStoreDictFactory,
DictionaryFactory bigramDictFactory,
boolean cacheVectorLengths,
int termCacheSize)
throws java.io.IOException
partNumber - the number of this partition.manager - the manager for this partition.mainDictFactory - a factory that will be used to generate the main
dictionary for this partitiondocumentDictFactory - a factory that will be used to generate the document
dictionary for this partitionfieldStoreDictFactory - a factory that will be used to generate the
dictionaries in the field storebigramDictFactory - a factory that will be used to generate the
bigram dictionaries needed for this partition
java.io.IOException - If there is an error opening or reading
any of the files making up a partition.Partition,
Dictionary| Method Detail |
|---|
protected void initAll()
throws java.io.IOException
initAll in class DiskPartitionjava.io.IOException - if there was an error reading the filesprotected void initBigramDict()
protected void initFields()
protected void initTaxonomy()
public java.lang.Object getSavedFieldData(java.lang.String name,
java.lang.String key,
boolean all)
name - The name of the field.key - The document key of the document for which we want data.all - If true, all field values will be returned
as a list. If false only the first value will be returned.
null if the given key is not in this
partition.
public java.lang.Object getSavedFieldData(java.lang.String name,
int docID,
boolean all)
name - The name of the field.docID - The document ID for which we want the saved data.all - If true, return all known values for the
field in the given document. If false return only one
value.
all is true, then return a List
of field values, otherwise, return a single field value of the
appropriate type. If all is false, a single
value of the appropriate type will be returned.
If the given name is not the name of a saved field, or the document
ID is invalid, then if all is true, an empty list will
be returned. If all is false,
null will be returned.
public java.util.List getSavedFieldData(java.lang.String name,
int docID)
name - The name of the field.docID - The document ID for which we want the saved data.
List of field values of the appropriate type.
If the given name is not the name of a saved field, or the document
ID is invalid, then an empty list is returned.
public java.util.List getSavedFieldData(java.lang.String name,
java.lang.String key)
name - The name of the field.key - The document key of the document for which we want data.
List
If the given name is not the name of a saved field, or the document ID is invalid, then an empty list will be returned.
public java.lang.Object getSavedFieldData(FieldInfo fi,
int docID,
boolean all)
public java.util.Map<java.lang.String,java.util.List> getSavedFields(int docID)
public DictionaryIterator getFieldIterator(java.lang.String name)
name - The name of the field we need an iterator for.
public DictionaryIterator getFieldIterator(java.lang.String name,
boolean caseSensitive,
java.lang.Object lowerBound,
boolean includeLower,
java.lang.Object upperBound,
boolean includeUpper)
name - The name of the field we need an iterator for.caseSensitive - If true, case should be taken into account when
iterating through the values. This value will only be observed for
character fields!lowerBound - The lower bound on the iterator. If
null, only the upper bound is considered and the
iteration will commence with the first term in the dictionary.includeLower - If true, then the lower bound will
be included in the entries returned by the iterator, if it occurs in
the dictionary.upperBound - The upper bound on the iterator. If
null, only the lower bound is considered and the
iteration will end at the last term in the dictionary.includeUpper - If true, then the upper bound will
be included in the entries returned by the iterator, if it occurs in
the dictionary.
null if there is no such range or the named
field is not a saved field.
public DictionaryIterator getMatchingIterator(java.lang.String name,
java.lang.String val,
boolean caseSensitive)
name - The name of the field whose values we wish to match
against.val - The wildcard value against which we will match.caseSensitive - If true, then case will be taken
into account during the match.
public DictionaryIterator getSubstringIterator(java.lang.String name,
java.lang.String val,
boolean caseSensitive,
boolean starts,
boolean ends)
name - The name of the field whose values we wish to match
against.val - The wildcard value against which we will match.caseSensitive - If true, then case will be taken
into account during the match.
public PostingsIterator getFieldPostings(java.lang.String name,
java.lang.Object value,
boolean caseSensitive)
name - The name of the field for which we want postings.value - The value from the field for which we want postings.caseSensitive - If true, case should be taken into account when
iterating through the values. This value will only be observed for
character fields!
null if there is no such value in the field.public int getFieldCount()
public DiskFieldStore getFieldStore()
public int getFieldSize(java.lang.String name)
public java.util.Set getSubsumed(java.lang.String name)
name - the name for which we want subsumed entries.
null if this name is not
in the main dictionary.
public double[] euclideanDistance(double[] vec,
java.lang.String field)
protected void mergeCustom(int newPartNumber,
DiskPartition[] sortedParts,
int[][] idMaps,
int newMaxDocID,
int[] docIDStart,
int[] nUndel,
int[][] docIDMaps)
throws java.lang.Exception
DiskPartition
mergeCustom in class DiskPartitionnewPartNumber - the number of the new partitionsortedParts - the sorted list of partitionsidMaps - a set of maps from old entry ids in the main dictionary
to new entry ids in the merged dictionarynewMaxDocID - the new maximum document iddocIDStart - the starting doc idsnUndel - the number of undeleted documents in each partitiondocIDMaps - doc id maps (see merge)
java.lang.Exceptionpublic boolean close(long currTime)
close in interface Closeableclose in class DiskPartitioncurrTime - the current time
true if the thing was closed, false otherwise.
public QueryEntry[] getMatching(java.lang.String pat,
boolean caseSensitive,
int maxEntries,
long timeLimit)
pat - The pattern to match entries against.caseSensitive - If true, then do the lookup in a
case sensitive fashion.maxEntries - The maximum number of entries to return. If zero or
negative, return all possible entries.timeLimit - The maximum amount of time (in milliseconds) to
spend trying to find matches. If zero or negative, no time limit is
imposed.
public QueryEntry[] getSpellingVariants(java.lang.String pat,
boolean caseSensitive,
int maxEntries,
long timeLimit)
pat - The pattern to match entries against.caseSensitive - If true, then do the lookup in a
case sensitive fashion.maxEntries - The maximum number of entries to return. If zero or
negative, return all possible entries.timeLimit - The maximum amount of time (in milliseconds) to
spend trying to find matches. If zero or negative, no time limit is
imposed.
public QueryEntry[] getSubstring(java.lang.String pat,
boolean caseSensitive,
int maxEntries,
long timeLimit)
pat - The pattern to match entries against.caseSensitive - If true, then do the lookup in a
case sensitive fashion.maxEntries - The maximum number of entries to return. If zero or
negative, return all possible entries.timeLimit - The maximum amount of time (in milliseconds) to
spend trying to find matches. If zero or negative, no time limit is
imposed.
Term objects containing the
matching entries, or null if there are not such entries, or an
array of length zero if the operation timed out before any
entries could be matched
public QueryEntry[] getStemMatches(java.lang.String term,
boolean caseSensitive,
int maxEntries,
long timeLimit)
term - The term we want to get variants of.caseSensitive - If true, then do the lookup in a
case sensitive fashion.maxEntries - The maximum number of entries to return. If zero or
negative, return all possible entries.timeLimit - The maximum amount of time (in milliseconds) to
spend trying to find matches. If zero or negative, no time limit is
imposed.
Term objects containing the
matching entries, or null if there are not such entries.
public QueryEntry[] getStemMatches(java.lang.String term,
boolean caseSensitive,
int minLen,
float matchCutOff,
int maxEntries,
long timeLimit)
term - The term we want to get variants of.caseSensitive - If true, then do the lookup in a
case sensitive fashion.minLen - The minimum term length for stemming.matchCutOff - The cutoff score for matching variants and the
original term.maxEntries - The maximum number of entries to return. If zero or
negative, return all possible entries.timeLimit - The maximum amount of time (in milliseconds) to
spend trying to find matches. If zero or negative, no time limit is
imposed.
Term objects containing the
matching entries, or null if there are not such entries.protected java.io.File[] getFieldFiles()
protected java.io.File[] getAllFiles()
getAllFiles in class Partition
protected static java.io.File[] getAllFiles(PartitionManager manager,
int partNumber)
protected static void reap(PartitionManager m,
int n)
m - The manager associated with the partition.n - The partition number to reap.protected java.io.File[] getBigramFiles()
public DiskTaxonomy getTaxonomy()
public void export(java.io.PrintWriter o)
o - the writer to which the data will be output.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||