com.sun.labs.minion.classification
Class FeaturePostings

java.lang.Object
  extended by com.sun.labs.minion.classification.FeaturePostings
All Implemented Interfaces:
Postings

public class FeaturePostings
extends java.lang.Object
implements Postings

An implementation of Postings that we can use to store classifier features. The postings will store the IDs for the features as well as encoding the other information about the features.


Nested Class Summary
 class FeaturePostings.Featurator
           
 
Field Summary
protected  int dataStart
          The offset of the start of the actual data in our ino buffer.
protected  Feature[] feats
          An array of the features added.
protected  Buffer info
          A buffer holding feature info.
protected  Buffer ino
          A buffer holding feature IDs and offsets into the info buffer.
protected  int lastID
          The last ID that we hold.
protected  int lastOff
          The last info offset that we hold.
protected static java.lang.String logTag
           
protected  int nIDs
          The number of features that we hold.
 
Constructor Summary
FeaturePostings()
          Creates a set of postings suitable for indexing time.
FeaturePostings(ReadableBuffer ino, ReadableBuffer info)
          Creates a set of postings suitable for querying time.
 
Method Summary
 void add(Occurrence o)
          Adds an occurrence to the postings list.
 void append(Postings p, int start)
          Appends another set of postings to this one.
 void append(Postings p, int start, int[] idMap)
          Appends another set of postings to this one, removing any data associated with deleted documents.
 void finish()
          Finishes any ongoing encoding and prepares for the data to be dumped.
 WriteableBuffer[] getBuffers()
          Gets a number of Buffers whose contents represent the postings.
 int getLastID()
          Gets the last ID in the postings list.
 int getMaxFDT()
          Gets the maximum fdt value for these postings, which is just 1, since we're using real-valued features.
 int getN()
          Gets the number of IDs in the postings list.
 long getTotalOccurrences()
          Gets the total number of occurrences for these postings, which is just the number of features encoded.
 boolean hasFieldInformation()
           
 boolean hasPositionInformation()
           
 PostingsIterator iterator(PostingsIteratorFeatures features)
          Gets an iterator for the postings.
 void remap(int[] idMap)
          Remaps the IDs in the features in these postings, resulting in the encoding of the IDs and feature information to the buffers.
 void setSkipSize(int size)
          Sets the skip size used for building the skip table.
 int size()
          Gets the size of the postings, in bytes.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

feats

protected Feature[] feats
An array of the features added. We'll need to remap this at dump time.


nIDs

protected int nIDs
The number of features that we hold.


lastID

protected int lastID
The last ID that we hold.


lastOff

protected int lastOff
The last info offset that we hold.


dataStart

protected int dataStart
The offset of the start of the actual data in our ino buffer.


ino

protected Buffer ino
A buffer holding feature IDs and offsets into the info buffer.


info

protected Buffer info
A buffer holding feature info.


logTag

protected static java.lang.String logTag
Constructor Detail

FeaturePostings

public FeaturePostings()
Creates a set of postings suitable for indexing time.


FeaturePostings

public FeaturePostings(ReadableBuffer ino,
                       ReadableBuffer info)
Creates a set of postings suitable for querying time.

Method Detail

setSkipSize

public void setSkipSize(int size)
Sets the skip size used for building the skip table. This implementation uses no skips.

Specified by:
setSkipSize in interface Postings

getTotalOccurrences

public long getTotalOccurrences()
Gets the total number of occurrences for these postings, which is just the number of features encoded.

Specified by:
getTotalOccurrences in interface Postings
Returns:
The total number of occurrences associated with these postings.

getMaxFDT

public int getMaxFDT()
Gets the maximum fdt value for these postings, which is just 1, since we're using real-valued features.

Specified by:
getMaxFDT in interface Postings
Returns:
the maximum frequency across all of the postings stored in this postings list.

add

public void add(Occurrence o)
Adds an occurrence to the postings list. We assume that the occurrence is actually an implementation of Feature.

Specified by:
add in interface Postings
Parameters:
o - The occurrence.

getN

public int getN()
Gets the number of IDs in the postings list.

Specified by:
getN in interface Postings

getLastID

public int getLastID()
Description copied from interface: Postings
Gets the last ID in the postings list.

Specified by:
getLastID in interface Postings

finish

public void finish()
Finishes any ongoing encoding and prepares for the data to be dumped. This implementation doesn't require any finishing.

Specified by:
finish in interface Postings

size

public int size()
Gets the size of the postings, in bytes.

Specified by:
size in interface Postings

getBuffers

public WriteableBuffer[] getBuffers()
Gets a number of Buffers whose contents represent the postings. These buffers can be written to disk.

This method must ensure that all of the data used by the entry is properly handled by the time that the method returns. This method will be called by a dictionary when it is ready to dump the postings data to a stream.

Specified by:
getBuffers in interface Postings
Returns:
An array of Buffers containing the postings data. All of the data in these buffers must be written to the postings file!

remap

public void remap(int[] idMap)
Remaps the IDs in the features in these postings, resulting in the encoding of the IDs and feature information to the buffers.

Specified by:
remap in interface Postings
Parameters:
idMap - a map from the IDs currently in use in the postings to new IDs.

append

public void append(Postings p,
                   int start)
Appends another set of postings to this one.

Specified by:
append in interface Postings
Parameters:
p - The postings to append. Implementers can safely assume that the postings being passed in are of the same class as the implementing class.
start - The new starting document ID for the partition that the entry was drawn from.

append

public void append(Postings p,
                   int start,
                   int[] idMap)
Appends another set of postings to this one, removing any data associated with deleted documents.

Specified by:
append in interface Postings
Parameters:
p - The postings to append. Implementers can safely assume that the postings being passed in are of the same class as the implementing class.
start - The new starting document ID for the partition that the entry was drawn from.
idMap - A map from old IDs in the given postings to new IDs with gaps removed for deleted data. If this is null, then there are no deleted documents.

iterator

public PostingsIterator iterator(PostingsIteratorFeatures features)
Gets an iterator for the postings.

Specified by:
iterator in interface Postings
Parameters:
features - A set of features that the iterator must support.
Returns:
A postings iterator that supports the given features. If the underlying postings do not support a specified feature, then a warning should be logged and null will be returned.

hasPositionInformation

public boolean hasPositionInformation()

hasFieldInformation

public boolean hasFieldInformation()