com.sun.labs.minion.indexer.postings
Class IDPostings

java.lang.Object
  extended by com.sun.labs.minion.indexer.postings.IDPostings
All Implemented Interfaces:
MergeablePostings, Postings
Direct Known Subclasses:
IDFreqPostings, IDWPostings

public class IDPostings
extends java.lang.Object
implements Postings, MergeablePostings

A postings class for ID only postings. These will be used for things like the bigram tables used for wildcard expansion and for saved field values.

The structure of the encoded data is as follows:

  1. The number of IDs in the postings is byte encoded.
  2. The last ID in the postings list is byte encoded.
  3. The number of skips in the skip table is byte encoded.
  4. The skip table. The number of entries per skip is dependent on the application. The skip table has the following structure.
    1. The number of skips is byte encoded.
    2. For each skip we encode:
      1. The ID of skipped entry. This is byte encoded as a delta from the previous ID in the skip table
      2. The position in the encoded data to skip to. This is byte encoded as a delta from the previous position in the skip table. Note that this position is relative to the end of the skip table!
  5. Each ID is encoded as a delta from the previous ID in the postings list.


Nested Class Summary
 class IDPostings.IDIterator
           
 
Field Summary
protected  int curr
          The ID we're collecting the frequency for.
protected  int dataStart
          The position in the compressed representation where the data starts.
protected  int[] ids
          The uncompressed postings.
protected  int lastID
          The last ID in this postings list.
protected static java.lang.String logTag
           
protected  int nIDs
          The number of IDs in the postings.
protected  int nSkips
          The number of skips in the skip table.
protected  Buffer post
          The compressed postings.
protected  int prevID
          The previous ID encountered during indexing.
protected  int[] skipID
          The IDs in the skip table.
protected  int[] skipPos
          The positions in the skip table.
protected  int skipSize
          The number of documents in a skip.
 
Constructor Summary
IDPostings()
          Makes a postings entry that is useful during indexing.
IDPostings(ReadableBuffer b)
          Makes a postings entry that is useful during querying.
IDPostings(ReadableBuffer b, int offset, int size)
          Makes a postings entry that is useful during querying.
 
Method Summary
 void add(Occurrence o)
          Adds an occurrence to the postings list.
protected  void addSkip(int id, int pos)
          Adds a skip to the skip table.
 void append(Postings p, int start)
          Appends another set of postings to this one.
 void append(Postings p, int start, int[] idMap)
          Appends another set of postings to this one, removing any data associated with deleted documents.
protected  int encode(int id)
          Encodes the data for a single ID.
 void finish()
          Finishes off the encoding by adding any data that we collected for the last document.
 WriteableBuffer[] getBuffers()
          Gets a number of WriteableBuffers whose contents represent the postings.
 int getLastID()
          Gets the last ID in the postings list.
 int getMaxFDT()
          Gets the maximum frequency in the postings associated with this entry.
 int getN()
          Gets the number of IDs in the postings list.
 long getTotalOccurrences()
          Gets the total number of occurrences in this postings list, which is always the number of postings, since we don't encode any frequencies.
 PostingsIterator iterator(PostingsIteratorFeatures features)
          Gets an iterator for the postings.
 void merge(MergeablePostings mp, int[] map)
          Merges another set of postings with this set of postings.
protected  void recodeID(int currID, int prevID, PostingsIterator pi)
          Re-encodes the data from another postings onto this one.
 void remap(int[] idMap)
          Remaps the IDs in this postings list according to the given old-to-new ID map.
 void setSkipSize(int size)
          Sets the skip size.
 int size()
          Gets the size of the postings, in bytes.
protected  void skip(ReadableBuffer b)
          Skips a set of postings from another postings entry.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

post

protected Buffer post
The compressed postings.


ids

protected int[] ids
The uncompressed postings.


curr

protected int curr
The ID we're collecting the frequency for.


prevID

protected int prevID
The previous ID encountered during indexing.


nIDs

protected int nIDs
The number of IDs in the postings.


lastID

protected int lastID
The last ID in this postings list.


skipID

protected int[] skipID
The IDs in the skip table.


skipPos

protected int[] skipPos
The positions in the skip table.


nSkips

protected int nSkips
The number of skips in the skip table.


dataStart

protected int dataStart
The position in the compressed representation where the data starts.


skipSize

protected int skipSize
The number of documents in a skip.


logTag

protected static java.lang.String logTag
Constructor Detail

IDPostings

public IDPostings()
Makes a postings entry that is useful during indexing.


IDPostings

public IDPostings(ReadableBuffer b)
Makes a postings entry that is useful during querying.

Parameters:
b - the data read from a postings file.

IDPostings

public IDPostings(ReadableBuffer b,
                  int offset,
                  int size)
Makes a postings entry that is useful during querying.

Parameters:
b - The buffer containing the postings.
offset - The offset in the buffer where our postings data starts.
size - The size of the data in our postings.
Method Detail

addSkip

protected void addSkip(int id,
                       int pos)
Adds a skip to the skip table.

Parameters:
id - The ID that the skip is pointing to.
pos - The position in the postings to skip to.

encode

protected int encode(int id)
Encodes the data for a single ID.

Returns:
The number of bytes used for the encoding.

setSkipSize

public void setSkipSize(int size)
Sets the skip size.

Specified by:
setSkipSize in interface Postings

add

public void add(Occurrence o)
Adds an occurrence to the postings list.

Specified by:
add in interface Postings
Parameters:
o - The occurrence.

getN

public int getN()
Description copied from interface: Postings
Gets the number of IDs in the postings list.

Specified by:
getN in interface Postings

getLastID

public int getLastID()
Description copied from interface: Postings
Gets the last ID in the postings list.

Specified by:
getLastID in interface Postings

getMaxFDT

public int getMaxFDT()
Gets the maximum frequency in the postings associated with this entry. For ID only postings, this value is always 1.

Specified by:
getMaxFDT in interface Postings
Returns:
1.

getTotalOccurrences

public long getTotalOccurrences()
Gets the total number of occurrences in this postings list, which is always the number of postings, since we don't encode any frequencies.

Specified by:
getTotalOccurrences in interface Postings
Returns:
The total number of occurrences associated with these postings.

finish

public void finish()
Finishes off the encoding by adding any data that we collected for the last document.

Specified by:
finish in interface Postings

size

public int size()
Gets the size of the postings, in bytes.

Specified by:
size in interface Postings

getBuffers

public WriteableBuffer[] getBuffers()
Gets a number of WriteableBuffers whose contents represent the postings. These buffers can then be written out. The format is as follows: NumIDs:LastID:NumSkipEntries[:skipID:skipPos]*:

Specified by:
getBuffers in interface Postings
Returns:
A ByteBuffer containing the encoded postings data.

remap

public void remap(int[] idMap)
Remaps the IDs in this postings list according to the given old-to-new ID map.

This is tricky, because we can't assume that the remapped IDs will maintain the order of the IDs, even if the IDs have changed. Thus, we need to uncompres all of the IDs and then put them back together.

Specified by:
remap in interface Postings
Parameters:
idMap - A map from the IDs currently in use in the postings to new IDs.

append

public void append(Postings p,
                   int start)
Appends another set of postings to this one.

Specified by:
append in interface Postings
Parameters:
p - The postings to append. Implementers can safely assume that the postings being passed in are of the same class as the implementing class.
start - The new starting document ID for the partition that the entry was drawn from.

recodeID

protected void recodeID(int currID,
                        int prevID,
                        PostingsIterator pi)
Re-encodes the data from another postings onto this one.

Parameters:
currID - The current ID
prevID - The previous ID.
pi - An iterator for the other postings list.

skip

protected void skip(ReadableBuffer b)
Skips a set of postings from another postings entry.


append

public void append(Postings p,
                   int start,
                   int[] idMap)
Appends another set of postings to this one, removing any data associated with deleted documents.

Specified by:
append in interface Postings
Parameters:
p - The postings to append. Implementers can safely assume that the postings being passed in are of the same class as the implementing class.
start - The new starting document ID for the partition that the entry was drawn from.
idMap - A map from old IDs in the given postings to new IDs with gaps removed for deleted data. If this is null, then there are no deleted documents.

merge

public void merge(MergeablePostings mp,
                  int[] map)
Description copied from interface: MergeablePostings
Merges another set of postings with this set of postings.

Specified by:
merge in interface MergeablePostings
Parameters:
mp - the postings to merge into these postings.
map - a map from IDs in the postings to IDs in the merged space.

iterator

public PostingsIterator iterator(PostingsIteratorFeatures features)
Gets an iterator for the postings.

Specified by:
iterator in interface Postings
Parameters:
features - A set of features that the iterator must support.
Returns:
A postings iterator. The iterators for these postings do not support any of the extra features available. If any extra features are requested, a warning will be logged and null will be returned.