|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.sun.labs.minion.indexer.postings.IDPostings
com.sun.labs.minion.indexer.postings.IDFreqPostings
com.sun.labs.minion.indexer.postings.DocumentVectorPostings
public class DocumentVectorPostings
A class to hold postings for the document vectors. For these postings, the IDs that we store are the IDs of the terms that occurred in the document. Along with the IDs, we store the frequency of occurrence of each term.
During indexing, we will encounter term IDs in a (seemingly) random
order, so in this case we store the IDs and frequencies in an array of
integers. At dump time, we use the remap method to remap
the IDs in the postings to the renumbered IDs from the main dictionary
and we actually encode the data onto the buffer.
Along with the usual functionalities, these postings will calculate document vector lengths at postings dump and merge time so that the lengths are readily available from document dictionary entries.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class com.sun.labs.minion.indexer.postings.IDFreqPostings |
|---|
IDFreqPostings.IDFreqIterator |
| Nested classes/interfaces inherited from class com.sun.labs.minion.indexer.postings.IDPostings |
|---|
IDPostings.IDIterator |
| Field Summary | |
|---|---|
protected java.util.Map<java.lang.Object,com.sun.labs.minion.indexer.postings.DocumentVectorPostings.EntryFreq> |
entries
Storage for the entries making up this set of postings. |
protected static java.lang.String |
logTag
|
| Fields inherited from class com.sun.labs.minion.indexer.postings.IDFreqPostings |
|---|
freq, freqs, maxfdt, to |
| Fields inherited from class com.sun.labs.minion.indexer.postings.IDPostings |
|---|
curr, dataStart, ids, lastID, nIDs, nSkips, post, prevID, skipID, skipPos, skipSize |
| Constructor Summary | |
|---|---|
DocumentVectorPostings()
Creates a set of postings suitable for use during indexing. |
|
DocumentVectorPostings(ReadableBuffer b)
Creates a set of postings suitable for use during querying. |
|
| Method Summary | |
|---|---|
void |
add(Occurrence o)
Adds an occurrence to the postings. |
void |
finish()
Finishes off the encoding, which does nothing in this case. |
WeightedFeature[] |
getWeightedFeatures(int docID,
int fieldID,
Dictionary dict,
WeightingFunction wf,
WeightingComponents wc)
Gets the entries in this set of postings as an array of weighted features. |
void |
merge(MergeablePostings mp,
int[] map)
Merges another set of postings with this set of postings. |
void |
remap(int[] idMap)
Remaps the IDs in the postings, using the provided ID map. |
int |
size()
Estimates the size of the postings associated with this document. |
| Methods inherited from class com.sun.labs.minion.indexer.postings.IDFreqPostings |
|---|
encode, getMaxFDT, getTotalOccurrences, iterator, recodeID |
| Methods inherited from class com.sun.labs.minion.indexer.postings.IDPostings |
|---|
addSkip, append, append, getBuffers, getLastID, getN, setSkipSize, skip |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected java.util.Map<java.lang.Object,com.sun.labs.minion.indexer.postings.DocumentVectorPostings.EntryFreq> entries
protected static java.lang.String logTag
| Constructor Detail |
|---|
public DocumentVectorPostings()
public DocumentVectorPostings(ReadableBuffer b)
b - a buffer containing the encoded postings.| Method Detail |
|---|
public void add(Occurrence o)
add in interface Postingsadd in class IDFreqPostingso - the occurrence to add.
public void merge(MergeablePostings mp,
int[] map)
MergeablePostings
merge in interface MergeablePostingsmerge in class IDFreqPostingsmp - the postings to merge into these postings.map - a map from IDs in the postings to IDs in the merged space.public int size()
size in interface Postingssize in class IDPostingspublic void finish()
finish in interface Postingsfinish in class IDFreqPostingspublic void remap(int[] idMap)
remap in interface Postingsremap in class IDPostingsidMap - a map from old IDs to new IDs.
public WeightedFeature[] getWeightedFeatures(int docID,
int fieldID,
Dictionary dict,
WeightingFunction wf,
WeightingComponents wc)
docID - the id of this document, if it is in an already dumped partition.fieldID - the id of the field from which the postings were drawndict - a dictionary that we can use to fetch term names when all we
have is IDs.wf - a weighting function to use to weight the entries in the document vector.wc - a set of weighting components to use in the weighting fucntion.
getEntry method for these features
will return the dictionary entry for the term from the partition holding
the document. This is a convenience to avoid multiple dictionary lookups
in this paritition.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||