com.sun.labs.minion.indexer.postings
Class FieldedDocumentVectorPostings

java.lang.Object
  extended by com.sun.labs.minion.indexer.postings.FieldedDocumentVectorPostings
All Implemented Interfaces:
MergeablePostings, Postings

public class FieldedDocumentVectorPostings
extends java.lang.Object
implements Postings, MergeablePostings

A set of fielded document vector postings. We will store a set of document vector postings for each field that is being vectorized.


Field Summary
static java.lang.String logTag
           
 
Constructor Summary
FieldedDocumentVectorPostings()
          Creates a new instance of FieldedDocumentVectorPostings
FieldedDocumentVectorPostings(ReadableBuffer b)
          Creates a set of fielded document vector postings from a buffer.
 
Method Summary
 void add(Occurrence o)
          Adds an occurrence to the postings list.
 void append(Postings p, int start)
          Appends another set of postings to this one.
 void append(Postings p, int start, int[] idMap)
          Appends another set of postings to this one, removing any data associated with deleted documents.
 void finish()
          Finishes any ongoing encoding and prepares for the data to be dumped.
 WriteableBuffer[] getBuffers()
          Gets the buffers for these postings, which includes all of the buffers for the fields as well as the buffer for the complete document and a set of offsets into the buffers.
 int getLastID()
          Gets the last ID in the postings list.
 int getMaxFDT()
          Gets the maximum frequency in the postings associated with this entry.
 int getN()
          Gets the number of IDs in the postings list.
 long getTotalOccurrences()
          Gets the total number of occurrences associated with this set of postings.
 WeightedFeature[] getWeightedFeatures(int field, int docID, Dictionary dict, WeightingFunction wf, WeightingComponents wc)
          Gets the entries for a particular field in this set of postings as an array of weighted features.
 PostingsIterator iterator(PostingsIteratorFeatures features)
          Gets an iterator for a set of fielded postings.
 void merge(MergeablePostings mp, int[] map)
          Merges another set of postings with this set of postings.
 void remap(int[] idMap)
          Remaps the IDs in this postings list according to the given old-to-new ID map.
 void setSkipSize(int size)
          Sets the skip size used for building the skip table.
 int size()
          Gets the size of the postings, in bytes.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logTag

public static final java.lang.String logTag
See Also:
Constant Field Values
Constructor Detail

FieldedDocumentVectorPostings

public FieldedDocumentVectorPostings()
Creates a new instance of FieldedDocumentVectorPostings


FieldedDocumentVectorPostings

public FieldedDocumentVectorPostings(ReadableBuffer b)
Creates a set of fielded document vector postings from a buffer.

Method Detail

setSkipSize

public void setSkipSize(int size)
Description copied from interface: Postings
Sets the skip size used for building the skip table. A larger number will result in more IDs being encoded per skip.

Specified by:
setSkipSize in interface Postings

add

public void add(Occurrence o)
Description copied from interface: Postings
Adds an occurrence to the postings list.

Specified by:
add in interface Postings
Parameters:
o - The occurrence.

getN

public int getN()
Description copied from interface: Postings
Gets the number of IDs in the postings list.

Specified by:
getN in interface Postings

getLastID

public int getLastID()
Description copied from interface: Postings
Gets the last ID in the postings list.

Specified by:
getLastID in interface Postings

getTotalOccurrences

public long getTotalOccurrences()
Description copied from interface: Postings
Gets the total number of occurrences associated with this set of postings. This is useful when a single postings entry may comprise multiple occurrences.

Specified by:
getTotalOccurrences in interface Postings
Returns:
The total number of occurrences associated with these postings.

getMaxFDT

public int getMaxFDT()
Description copied from interface: Postings
Gets the maximum frequency in the postings associated with this entry.

Specified by:
getMaxFDT in interface Postings
Returns:
the maximum frequency across all of the postings stored in this postings list.

finish

public void finish()
Description copied from interface: Postings
Finishes any ongoing encoding and prepares for the data to be dumped.

Specified by:
finish in interface Postings

size

public int size()
Description copied from interface: Postings
Gets the size of the postings, in bytes.

Specified by:
size in interface Postings

getBuffers

public WriteableBuffer[] getBuffers()
Gets the buffers for these postings, which includes all of the buffers for the fields as well as the buffer for the complete document and a set of offsets into the buffers.

Specified by:
getBuffers in interface Postings
Returns:
An array of Buffers containing the postings data. All of the data in these buffers must be written to the postings file!

remap

public void remap(int[] idMap)
Description copied from interface: Postings
Remaps the IDs in this postings list according to the given old-to-new ID map.

Specified by:
remap in interface Postings
Parameters:
idMap - A map from the IDs currently in use in the postings to new IDs.

merge

public void merge(MergeablePostings mp,
                  int[] map)
Description copied from interface: MergeablePostings
Merges another set of postings with this set of postings.

Specified by:
merge in interface MergeablePostings
Parameters:
mp - the postings to merge into these postings.
map - a map from IDs in the postings to IDs in the merged space.

append

public void append(Postings p,
                   int start)
Description copied from interface: Postings
Appends another set of postings to this one.

Specified by:
append in interface Postings
Parameters:
p - The postings to append. Implementers can safely assume that the postings being passed in are of the same class as the implementing class.
start - The new starting document ID for the partition that the entry was drawn from.

append

public void append(Postings p,
                   int start,
                   int[] idMap)
Description copied from interface: Postings
Appends another set of postings to this one, removing any data associated with deleted documents.

Specified by:
append in interface Postings
Parameters:
p - The postings to append. Implementers can safely assume that the postings being passed in are of the same class as the implementing class.
start - The new starting document ID for the partition that the entry was drawn from.
idMap - A map from old IDs in the given postings to new IDs with gaps removed for deleted data. If this is null, then there are no deleted documents.

getWeightedFeatures

public WeightedFeature[] getWeightedFeatures(int field,
                                             int docID,
                                             Dictionary dict,
                                             WeightingFunction wf,
                                             WeightingComponents wc)
Gets the entries for a particular field in this set of postings as an array of weighted features.

Parameters:
field - the ID of the field for which we want the entries. If this is -1, then we want the vector for the full document.
docID - the id of this document, if it is in an already dumped partition.
dict - a dictionary that we can use to fetch term names when all we have is IDs.
wf - a weighting function to use to weight the entries in the document vector.
wc - a set of weighting components to use in the weighting fucntion.

iterator

public PostingsIterator iterator(PostingsIteratorFeatures features)
Gets an iterator for a set of fielded postings.

Specified by:
iterator in interface Postings
Parameters:
features - the features for the iterator that we will return. The field for which we want postings will be specified in the fields element of the features. If multiple fields are specified, we will return postings for the first field (by field ID) that we have postings for. If the features are null or there are no fields specified, then postings for all fields will be returned.
Returns:
a postings iterator for the postings for the field specified in the features.