com.sun.labs.minion.indexer.entry
Class DocKeyEntry

java.lang.Object
  extended by com.sun.labs.minion.indexer.entry.BaseEntry
      extended by com.sun.labs.minion.indexer.entry.SinglePostingsEntry
          extended by com.sun.labs.minion.indexer.entry.DocKeyEntry
All Implemented Interfaces:
Entry, IndexEntry, MergeableEntry, QueryEntry, java.lang.Comparable
Direct Known Subclasses:
ClusterEntry, FieldedDocKeyEntry

public class DocKeyEntry
extends SinglePostingsEntry
implements MergeableEntry

A class for holding entries in the document dictionary. Such entries have ID and frequency postings and they encode their IDs into the dictionary, since that information cannot be recovered any other way.


Field Summary
protected  int docLen
          The length of the document, in words.
protected static java.lang.String logTag
           
protected  int origID
          The original document ID before remapping.
protected  DocKeyEntry prevEntry
          A previous entry, appended onto our entry, used to detect when we've got a duplicate key during a merge.
 
Fields inherited from class com.sun.labs.minion.indexer.entry.SinglePostingsEntry
n, offset, p, size, tsize
 
Fields inherited from class com.sun.labs.minion.indexer.entry.BaseEntry
dict, id, name, postIn
 
Constructor Summary
DocKeyEntry()
           
DocKeyEntry(java.lang.Object name)
           
 
Method Summary
 void append(QueryEntry qe, int start, int[] idMap)
          Appends, with a check for a duplicate key, which is bad.
 void decodePostingsInfo(ReadableBuffer b, int pos)
          Decodes the postings information associated with this entry.
 void encodePostingsInfo(WriteableBuffer b)
          Encodes any information associated with the postings onto the given buffer.
 int getDocumentLength()
          Gets the document length in words.
 float getDocumentVectorLength()
          Gets the length of the vector associated with this document.
 float getDocumentVectorLength(int fieldID)
           
 float getDocumentVectorLength(java.lang.String field)
           
 Entry getEntry()
          Gets a new entry that contains a copy of the data in this entry.
 Entry getEntry(java.lang.Object name)
          Gets a new entry with the given name.
 int getOrigID()
           
 Postings getPostings()
          Gets the appropriate postings type for the class.
protected  Postings getPostings(ReadableBuffer input)
          Gets a set of postings useful at query time.
 long getTotalOccurrences()
          Returns the total number of occurrences, which is the same as the document length.
 WeightedFeature[] getWeightedFeatures(WeightingFunction wf, WeightingComponents wc)
          Gets an array of weighted features associated with this document key.
 void merge(QueryEntry qe, int[] map)
          Merges the entries in the postings underlying the other document key with the entries in the postings for this key.
 boolean writePostings(PostingsOutput[] out, int[] idMap)
          Writes the postings associated with this entry to some or all of the given channels.
 
Methods inherited from class com.sun.labs.minion.indexer.entry.SinglePostingsEntry
add, copyData, getMaxFDT, getN, getNumChannels, hasFieldInformation, hasPositionInformation, iterator, readPostings
 
Methods inherited from class com.sun.labs.minion.indexer.entry.BaseEntry
compareTo, getID, getName, getPartition, setDictionary, setID, setName, setPostingsInput, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface com.sun.labs.minion.indexer.entry.IndexEntry
add, setID, setName
 
Methods inherited from interface com.sun.labs.minion.indexer.entry.Entry
getID, getMaxFDT, getN, getName, getNumChannels, getPartition, setDictionary
 
Methods inherited from interface java.lang.Comparable
compareTo
 

Field Detail

docLen

protected int docLen
The length of the document, in words.


prevEntry

protected DocKeyEntry prevEntry
A previous entry, appended onto our entry, used to detect when we've got a duplicate key during a merge.


origID

protected int origID
The original document ID before remapping.


logTag

protected static java.lang.String logTag
Constructor Detail

DocKeyEntry

public DocKeyEntry()

DocKeyEntry

public DocKeyEntry(java.lang.Object name)
Method Detail

getEntry

public Entry getEntry(java.lang.Object name)
Description copied from interface: Entry
Gets a new entry with the given name.

Specified by:
getEntry in interface Entry
Parameters:
name - the name that we want to give the entry.
Returns:
a new entry.

getEntry

public Entry getEntry()
Gets a new entry that contains a copy of the data in this entry.

Specified by:
getEntry in interface Entry
Overrides:
getEntry in class SinglePostingsEntry
Returns:
a new entry containing a copy of hte data in this entry.
Throws:
java.lang.ClassCastException - if the provided entry is not of type SinglePostingsEntry

getOrigID

public int getOrigID()

append

public void append(QueryEntry qe,
                   int start,
                   int[] idMap)
Appends, with a check for a duplicate key, which is bad.

Specified by:
append in interface IndexEntry
Overrides:
append in class SinglePostingsEntry
Parameters:
qe - The entry that we want to append onto this one.
start - The new starting ID for the partition that the entry was drawn from.
idMap - A map from old IDs in the given postings to new IDs with gaps removed for deleted data. If this is null, then there are no deleted documents.

merge

public void merge(QueryEntry qe,
                  int[] map)
Merges the entries in the postings underlying the other document key with the entries in the postings for this key. During indexing, we may want to merge the contents of two document key entries, for example, when dumping feature clusters during classification. so we need to be able to get the list of entries in the underlying postings.

Specified by:
merge in interface MergeableEntry
Parameters:
qe - The entry that we want to append onto this one.
map - A map from old IDs in the given postings to new IDs with gaps removed for deleted data. If this is null, then there are no deleted documents.

getWeightedFeatures

public WeightedFeature[] getWeightedFeatures(WeightingFunction wf,
                                             WeightingComponents wc)
Gets an array of weighted features associated with this document key. This can be used to generate a document vector for a document that was indexed but not dumped to disk.

Parameters:
wf - a weighting function to use to get the weight for the entries in the document vector
wc - a set of weighting components to use with the weighting function.
See Also:
SearchEngineImpl.getDocumentVector(Document,String)

getPostings

public Postings getPostings()
Gets the appropriate postings type for the class. These postings should be useable for indexing.

Specified by:
getPostings in class SinglePostingsEntry
Returns:
A set of ID and frequency postings.

getPostings

protected Postings getPostings(ReadableBuffer input)
Gets a set of postings useful at query time.

Specified by:
getPostings in class SinglePostingsEntry
Parameters:
input - The buffer containing the postings read from the postings file.
Returns:
A set of ID and frequency postings.

writePostings

public boolean writePostings(PostingsOutput[] out,
                             int[] idMap)
                      throws java.io.IOException
Writes the postings associated with this entry to some or all of the given channels.

Specified by:
writePostings in interface IndexEntry
Overrides:
writePostings in class SinglePostingsEntry
Parameters:
out - The outputs to which we will write the postings.
idMap - A map from the IDs currently used in the postings to the IDs that should be used when the postings are written to disk. This may be null, in which case no remapping will occur.
Returns:
true if postings were written, false otherwise
Throws:
java.io.IOException - if there is any error writing the postings.

encodePostingsInfo

public void encodePostingsInfo(WriteableBuffer b)
Encodes any information associated with the postings onto the given buffer. We override the parent's method because we need to encode the ID for our term.

Specified by:
encodePostingsInfo in interface IndexEntry
Overrides:
encodePostingsInfo in class SinglePostingsEntry
Parameters:
b - The buffer onto which the postings information should be encoded. The buffer will be positioned to the correct spot for the encoding.

decodePostingsInfo

public void decodePostingsInfo(ReadableBuffer b,
                               int pos)
Decodes the postings information associated with this entry.

Specified by:
decodePostingsInfo in interface QueryEntry
Overrides:
decodePostingsInfo in class SinglePostingsEntry
Parameters:
b - The buffer containing the encoded postings information.
pos - The position in b where the postings information can be found.

getTotalOccurrences

public long getTotalOccurrences()
Returns the total number of occurrences, which is the same as the document length.

Specified by:
getTotalOccurrences in interface Entry
Overrides:
getTotalOccurrences in class SinglePostingsEntry
Returns:
The total number of occurrences associated with this entry.

getDocumentLength

public int getDocumentLength()
Gets the document length in words.


getDocumentVectorLength

public float getDocumentVectorLength()
Gets the length of the vector associated with this document.


getDocumentVectorLength

public float getDocumentVectorLength(java.lang.String field)

getDocumentVectorLength

public float getDocumentVectorLength(int fieldID)