|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.sun.labs.minion.indexer.partition.Partition com.sun.labs.minion.indexer.partition.DiskPartition
public class DiskPartition
A partition of the index which is resident on the disk and suitable for querying.
A disk partition consists of four things:
DiskDictionary
,
DocumentVectorLengths
Field Summary | |
---|---|
protected static int |
BUFF_SIZE
Buffer size for merging. |
protected DelMap |
deletions
The deletion map for this partition. |
protected java.io.File |
delFile
The deleted documents file. |
protected FileLock |
delFileLock
A lock for the deleted documents file. |
protected DiskDictionary |
docDict
The document dictionary. |
protected java.io.RandomAccessFile |
docDictFile
The stream for the document dictionary. |
protected java.io.RandomAccessFile |
docPostFile
The postings stream for the document dictionary. |
protected DictionaryFactory |
documentDictFactory
A factory for the document dictionary. |
protected DocumentVectorLengths |
dvl
The lengths of the document vectors for this partition. |
protected boolean |
ignored
Whether this partition was ignored during a merge, due to it being empty. |
protected static java.lang.String |
logTag
The tag for this module. |
protected DiskDictionary |
mainDict
The main dictionary. |
protected java.io.File[] |
mainFiles
The files containing the main data. |
protected static float |
MATCH_CUT_OFF
The limit for variant entries relationship to a stemmed entry. |
protected static int |
MIN_LEN
Minimum length of a stem. |
protected java.io.File |
removedFile
A File indicating that this partition is no longer
active. |
protected TermCache |
termCache
A cache of uncompressed postings data. |
Fields inherited from class com.sun.labs.minion.indexer.partition.Partition |
---|
DICT_OFFSETS_SIZE, docDictFactory, entryClass, entryName, indexConfig, mainDictFactory, mainDictFile, mainPostFiles, manager, maxID, nEntries, partNumber, PROP_DOC_DICT_FACTORY, PROP_INDEX_CONFIG, PROP_MAIN_DICT_FACTORY, PROP_PARTITION_MANAGER, stats |
Constructor Summary | |
---|---|
DiskPartition(int partNumber,
PartitionManager manager,
DictionaryFactory mainDictFactory,
DictionaryFactory documentDictFactory)
Opens a partition with a given number |
|
DiskPartition(int partNumber,
PartitionManager manager,
DictionaryFactory mainDictFactory,
DictionaryFactory documentDictFactory,
boolean cacheVectorLengths,
int termCacheSize)
Opens a partition with a given number |
Method Summary | |
---|---|
boolean |
close()
Close the files associated with this partition. |
boolean |
close(long currTime)
Close the files associated with this partition, if enough time has passed. |
void |
createRemoveFile()
|
void |
delete()
Deletes the files associated with this partition. |
boolean |
deleteDocument(int docID)
Deletes a document specified by the given ID. |
boolean |
deleteDocument(java.lang.String key)
Deletes a document specified by the given key, if it occurs in this partition. |
boolean |
docsAreMerged()
Returns true if documents in this partition type can be merged - that is, that the postings of two same-named docs in different partitions will be combined. |
float |
getAverageDocumentLength()
Get the average document length in this partition. |
long |
getCloseTime()
|
ReadableBuffer |
getDeletedDocumentsMap()
Gets the map of deleted documents for this partition. |
DelMap |
getDelMap()
|
protected int[] |
getDocIDMap(ReadableBuffer del)
Returns a map from the document IDs in this partition to IDs in a partition that has no deleted documents. |
java.util.Iterator |
getDocumentIterator()
Gets an iterator for the document keys in this partition. |
protected java.util.Iterator |
getDocumentIterator(int begin,
int end)
Gets an iterator for some of the document keys in this partition. |
int |
getDocumentLength(int docID)
Gets the length of a document (in words) qthat's in this partition. |
DocKeyEntry |
getDocumentTerm(int docID)
Gets the entry from the document dictionary corresponding to a given document ID |
DocKeyEntry |
getDocumentTerm(java.lang.String key)
Gets the entry from the document dictionary corresponding to a given document key |
float |
getDocumentVectorLength(int docID)
Gets the length of a document vector for a given document. |
float |
getDocumentVectorLength(int docID,
int fieldID)
Gets the length of a document vector for a given document. |
float |
getDocumentVectorLength(int docID,
java.lang.String field)
Gets the length of a document vector for a given document. |
DocumentVectorLengths |
getDVL()
Gets the document vector lengths associated with this partition. |
protected java.nio.ByteBuffer[] |
getInputBuffers(int size)
Gets an array of buffers to use for buffering postings during merges. |
DiskDictionary |
getMainDictionary()
Returns the main dictionary, to be used by subclasses. |
DictionaryIterator |
getMainDictionaryIterator()
Gets an iterator for the entries in the main dictionary. |
java.util.Iterator |
getMainDictionaryIterator(java.lang.String start,
java.lang.String end)
Gets an iterator for the entries in the main dictionary. |
java.util.Iterator |
getMainIterator()
Gets an iterator for the entries in the main dictionary. |
int |
getMaxDocumentID()
Get the maximum document ID. |
int |
getMaxTermID()
Gets the maximum term ID from the main dictionary. |
int |
getNDocs()
Gets the number of documents in this partition. |
int |
getNEntries()
Gets the total number of distinct terms in this partition. |
long |
getNTokens()
Gets the total number of tokens indexed in this partition. |
QueryEntry |
getTerm(int id)
Gets the entry from the main dictionary that has a given ID. |
QueryEntry |
getTerm(java.lang.String name)
Gets the entry in the main dictionary associated with a given name. |
QueryEntry |
getTerm(java.lang.String name,
boolean caseSensitive)
Gets the term associated with a given name. |
QueryEntry |
getTerm(java.lang.String name,
boolean caseSensitive,
DiskDictionary.LookupState lus)
Gets the term associated with a given name. |
TermCache |
getTermCache()
Gets the term cache for this partition, if there is one. |
protected void |
initAll()
Initializes the main dictionary and the document dictionary. |
protected void |
initDocDict()
Initializes the document dictionary, if necessary. |
protected void |
initDVL(boolean adjustStats)
Initializes the document vector lengths. |
protected void |
initMainDict()
Initializes the main dictionary, if necessary. |
protected void |
initMainFiles()
Initializes the files used for the main dictionary and the associated postings. |
boolean |
isDeleted(int docID)
Tells us whether a given document ID has been deleted. |
boolean |
isIndexed(java.lang.String key)
Checks to see whether a given document is indexed. |
DiskPartition |
merge(java.util.List<DiskPartition> partitions,
java.util.List<DelMap> delMaps,
boolean calculateDVL)
Merges a number of DiskPartition s into a single
partition. |
DiskPartition |
merge(java.util.List<DiskPartition> partitions,
java.util.List<DelMap> delMaps,
boolean calculateDVL,
int depth)
Merges a number of DiskPartition s into a single
partition. |
protected void |
mergeCustom(int newPartNumber,
DiskPartition[] sortedParts,
int[][] idMaps,
int newMaxDocID,
int[] docIDStart,
int[] nUndel,
int[][] docIDMaps)
Provides a place to merge data that is specific to a subclass of disk partition. |
void |
normalize(int[] docs,
float[] scores,
int p,
float qw,
int field)
|
protected static void |
reap(PartitionManager m,
int n)
Reaps the given partition. |
void |
setCloseTime(long closeTime)
|
protected void |
syncDeletedMap()
Synchronizes the deletion map in memory with the one on disk. |
java.lang.String |
toString()
|
protected boolean |
updatePartition(java.util.Set<java.lang.Object> keys)
Updates the partition by deleting any documents whose keys are in the given dictionary. |
Methods inherited from class com.sun.labs.minion.indexer.partition.Partition |
---|
compareTo, getAllFiles, getAllFiles, getDocFiles, getDocFiles, getIndexConfig, getMainFiles, getMainFiles, getManager, getName, getNumPostingsChannels, getPartitionNumber, getQueryConfig, getStats, newProperties |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected java.io.File[] mainFiles
protected DictionaryFactory documentDictFactory
protected DiskDictionary mainDict
protected DiskDictionary docDict
protected java.io.RandomAccessFile docDictFile
protected java.io.RandomAccessFile docPostFile
protected java.io.File delFile
protected FileLock delFileLock
protected java.io.File removedFile
File
indicating that this partition is no longer
active.
protected DelMap deletions
protected DocumentVectorLengths dvl
protected TermCache termCache
protected boolean ignored
protected static java.lang.String logTag
protected static int BUFF_SIZE
protected static int MIN_LEN
protected static float MATCH_CUT_OFF
Constructor Detail |
---|
public DiskPartition(int partNumber, PartitionManager manager, DictionaryFactory mainDictFactory, DictionaryFactory documentDictFactory) throws java.io.IOException
partNumber
- the number of this partition.manager
- the manager for this partition.mainDictFactory
- the dictionary factory that we will use to create
the main dictionarydocumentDictFactory
- the dictionary factory that we will use to
create the document dictionary
java.io.IOException
- If there is an error opening or reading
any of the files making up a partition.Partition
,
Dictionary
public DiskPartition(int partNumber, PartitionManager manager, DictionaryFactory mainDictFactory, DictionaryFactory documentDictFactory, boolean cacheVectorLengths, int termCacheSize) throws java.io.IOException
partNumber
- the number of this partition.manager
- the manager for this partition.mainDictFactory
- the dictionary factory that we will use to create
the main dictionarydocumentDictFactory
- the dictionary factory that we will use to
create the document dictionarycacheVectorLengths
- if true
document vector and field
vector lengths will be cached in memory for faster access during normalization.
java.io.IOException
- If there is an error opening or reading
any of the files making up a partition.Partition
,
Dictionary
Method Detail |
---|
protected void initAll() throws java.io.IOException
java.io.IOException
- if there is any error initializing the
dictionaries.protected void initMainFiles() throws java.io.IOException
java.io.IOException
- if there is any error opening the filesprotected void initMainDict()
protected void initDocDict()
protected void initDVL(boolean adjustStats)
adjustStats
- if it's necessary to compute the document vector lengths,
should we adjust the term statistics while we're at it?public DocumentVectorLengths getDVL()
public java.util.Iterator getDocumentIterator()
protected java.util.Iterator getDocumentIterator(int begin, int end)
begin
- the ID (inclusive) of the document at which we wish to begin
iterationend
- the ID (exclusive) of the document at which we wish to end
iteration
public DictionaryIterator getMainDictionaryIterator()
public java.util.Iterator getMainDictionaryIterator(java.lang.String start, java.lang.String end)
start
- the name of the entry (inclusive) at which to start the
iterationend
- the name of the entry (exclusive) at which to stop the iteration
public DocKeyEntry getDocumentTerm(java.lang.String key)
key
- the document key
null
if this key does not occur in the document dictionary or if the document
existed in this partition, but it was deleted.public DocKeyEntry getDocumentTerm(int docID)
docID
- the document ID
null
if this id does not occur in the document dictionary. Note that this may
return the entry for a document that has been deleted!public float getDocumentVectorLength(int docID)
docID
- the ID of the document for whose vector we want the length
public float getDocumentVectorLength(int docID, java.lang.String field)
docID
- the ID of the document for whose vector we want the lengthfield
- the vectored field for which we we want the document vector
length. If this value is null
the length for all vectored fields
is returned. If this value is the empty string, the length for the default
body field is returned. If this value does not name a vectored field, a
default value of 1 will be returned.
public float getDocumentVectorLength(int docID, int fieldID)
docID
- the ID of the document for whose vector we want the lengthfieldID
- the ID of the field for which we want the length if this
value is less than 0, the length for all vectored fields
is returned. If this value is 0, the length for the default
body field is returned. Other wise, the length for the corresponding field
is returned.
public void normalize(int[] docs, float[] scores, int p, float qw, int field)
protected void syncDeletedMap()
public boolean close()
true
if the files were successfully closed.public boolean close(long currTime)
close
in interface Closeable
currTime
- the current time
true
if the thing was closed, false
otherwise.public void delete()
protected static void reap(PartitionManager m, int n)
m
- The manager associated with the partition.n
- The partition number to reap.public QueryEntry getTerm(java.lang.String name)
name
- The name of the term, as a string.
getTerm(String,boolean)
public QueryEntry getTerm(int id)
id
- The ID of the term that we want to get.
null
if the
ID is not in the main dictionary.public TermCache getTermCache()
null
if there is none.public QueryEntry getTerm(java.lang.String name, boolean caseSensitive)
name
- The name of the term.caseSensitive
- If true
then the term should be
looked up in the case that it is given.
public QueryEntry getTerm(java.lang.String name, boolean caseSensitive, DiskDictionary.LookupState lus)
name
- The name of the term.caseSensitive
- If true
then the term should be
looked up in the case that it is given.lus
- a lookup state to use for the dictionary lookup
public boolean isIndexed(java.lang.String key)
key
- the key for the document that we want to check
true
if this key occurs in this partition and the
document has not been deleted.public boolean deleteDocument(int docID)
docID
- The ID of the file to delete.
public boolean deleteDocument(java.lang.String key)
key
- The document key to be deleted.
public boolean isDeleted(int docID)
docID
- the ID of the document that we want to check
true
if the document has been deleted, false
otherwise.protected boolean updatePartition(java.util.Set<java.lang.Object> keys)
keys
- a set of keys to delete. The string representation of the
elements of the set will be the keys to delete.
true
if any documents were deleted,
false
otherwise.public int getNDocs()
getNDocs
in class Partition
public int getMaxDocumentID()
public int getMaxTermID()
public int getDocumentLength(int docID)
docID
- the ID of the document for which we want the length
for a way to get the length of the
vector associated with this document
public float getAverageDocumentLength()
public long getNTokens()
public int getNEntries()
public ReadableBuffer getDeletedDocumentsMap()
public DelMap getDelMap()
protected int[] getDocIDMap(ReadableBuffer del)
del
- a buffer of deleted documents
int
containing the mapping, where
deleted documents map to < 0, or null
if there are no
deleted documents. The 0th element of the returned array contains
the number of undeleted documents.DelMap.getDelMap()
public java.util.Iterator getMainIterator()
protected java.nio.ByteBuffer[] getInputBuffers(int size)
size
- The size of the input buffers to use.
public DiskDictionary getMainDictionary()
public DiskPartition merge(java.util.List<DiskPartition> partitions, java.util.List<DelMap> delMaps, boolean calculateDVL) throws java.lang.Exception
DiskPartition
s into a single
partition.
partitions
- the partitions to mergedelMaps
- the state of the deletion maps for the partitions to
merge before the merge started. We need these to be the same as the
ones at the place where the merge was called for (see PartitionManager.Merger
),
otherwise we might get some skew in the maps between when they are recorded
there and recorded here!calculateDVL
- if true
, then calculate the document
vector lengths for the documents in the merged partition after the merge
is finished.
java.lang.Exception
- If there is any error during the merge.public DiskPartition merge(java.util.List<DiskPartition> partitions, java.util.List<DelMap> delMaps, boolean calculateDVL, int depth) throws java.lang.Exception
DiskPartition
s into a single
partition.
partitions
- the partitions to mergedelMaps
- the state of the deletion maps for the partitions to
merge before the merge started. We need these to be the same as the
ones at the place where the merge was called for (see PartitionManager.Merger
),
otherwise we might get some skew in the maps between when they are recorded
there and recorded here!calculateDVL
- if true
, then calculate the document
vector lengths for the documents in the merged partition after the merge
is finished.
java.lang.Exception
- If there is any error during the merge.protected void mergeCustom(int newPartNumber, DiskPartition[] sortedParts, int[][] idMaps, int newMaxDocID, int[] docIDStart, int[] nUndel, int[][] docIDMaps) throws java.lang.Exception
newPartNumber
- the number of the new partitionsortedParts
- the sorted list of partitionsidMaps
- a set of maps from old entry ids in the main dictionary
to new entry ids in the merged dictionarynewMaxDocID
- the new maximum document iddocIDStart
- the starting doc idsnUndel
- the number of undeleted documents in each partitiondocIDMaps
- doc id maps (see merge)
java.lang.Exception
public boolean docsAreMerged()
public java.lang.String toString()
toString
in class java.lang.Object
public void setCloseTime(long closeTime)
setCloseTime
in interface Closeable
public long getCloseTime()
getCloseTime
in interface Closeable
public void createRemoveFile()
createRemoveFile
in interface Closeable
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |