|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.sun.labs.minion.indexer.postings.DFOPostings
public class DFOPostings
A postings class for storing IDs, frequencies, and field and word position information. The data is encoded into two buffers.
The first buffer contains document ID and frequency information for each document and an offset into the second buffer where field and position information is stored for a particular document.
The second buffer contains encoded field and position information. For each document, the data is structured in the following way:
Nested Class Summary | |
---|---|
class |
DFOPostings.DFOIterator
|
Field Summary | |
---|---|
protected boolean |
appending
Whether we're building these postings by appending. |
protected int |
dataStart
The position in the compressed representation where the data starts. |
protected Buffer |
dfo
The compressed document and frequency postings. |
protected int[] |
ffreq
The field frequency information. |
protected Buffer |
fnp
The compressed field and position information. |
protected WriteableBuffer[] |
fposn
The field position information. |
protected int |
freq
The frequency of current ID. |
protected int |
lastID
The last ID in this postings list. |
protected int |
lastOff
The last positions offset in this postings list. |
protected static java.lang.String |
logTag
|
protected int |
maxfdt
The maximum frequency encountered in the postings. |
protected int |
nFields
The number of fields in the current documents. |
protected int |
nIDs
The number of IDs in the postings. |
protected int |
nSkips
The number of skips in the skip table. |
protected int[] |
prevFPosn
The previous field positions. |
protected int |
prevID
The previous ID encountered during indexing. |
protected int[] |
skipID
The IDs in the skip table. |
protected int[] |
skipOff
The offsets in the skip table. |
protected int[] |
skipPos
The positions in the skip table. |
protected int |
skipSize
The number of documents in a skip. |
protected int |
splitPoint
After getting the buffers, this member will contain the split point between the buffers for the documents, frequencies, and offsets and the buffers for the field and position information. |
protected long |
to
The total number of occurrences in the postings. |
Constructor Summary | |
---|---|
DFOPostings()
Makes a postings entry that is useful during indexing. |
|
DFOPostings(ReadableBuffer input)
Makes a postings entry that is useful during querying. |
|
DFOPostings(ReadableBuffer input,
int offset,
int size,
int fnpSize)
Makes a postings entry that is useful during querying. |
|
DFOPostings(ReadableBuffer b1,
ReadableBuffer b2)
|
Method Summary | |
---|---|
void |
add(Occurrence o)
Adds an occurrence to the postings list. |
protected void |
addFields(FieldOccurrence fo)
Adds an occurrence to all relevant fields. |
protected void |
addSkip(int id,
int pos,
int off)
Adds a skip to the skip table. |
void |
append(Postings p,
int start)
Appends another set of postings to this one. |
void |
append(Postings p,
int start,
int[] idMap)
Appends another set of postings to this one, removing any data associated with deleted documents. |
protected int |
encode()
Encodes the data for a single ID. |
protected int |
encodeBasic()
|
void |
finish()
Finishes off the encoding by adding any data that we collected for the last document. |
WriteableBuffer[] |
getBuffers()
Gets a ByteBuffer whose contents represent the
postings. |
int |
getLastID()
Gets the last ID in the postings list. |
int |
getMaxFDT()
Gets the maximum frequency in the postings list. |
int |
getN()
Gets the number of IDs in the postings list. |
long |
getTotalOccurrences()
Gets the total number of occurrences in the postings list. |
protected void |
init(ReadableBuffer b1,
ReadableBuffer b2)
|
PostingsIterator |
iterator(PostingsIteratorFeatures features)
Gets an iterator for the postings. |
void |
remap(int[] idMap)
Remaps the IDs in this postings list according to the given old-to-new ID map. |
void |
setSkipSize(int size)
Sets the skip size. |
int |
size()
Gets the size of the postings, in bytes. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected Buffer dfo
protected Buffer fnp
protected boolean appending
protected int nIDs
protected long to
protected int maxfdt
protected int splitPoint
protected int prevID
protected int lastID
protected int lastOff
protected int freq
protected int nFields
protected int[] ffreq
protected int[] prevFPosn
protected WriteableBuffer[] fposn
protected int[] skipID
protected int[] skipOff
protected int[] skipPos
protected int nSkips
protected int dataStart
protected int skipSize
protected static java.lang.String logTag
Constructor Detail |
---|
public DFOPostings()
public DFOPostings(ReadableBuffer input)
input
- the data read from a postings file.public DFOPostings(ReadableBuffer input, int offset, int size, int fnpSize)
input
- the data read from a postings file.offset
- The offset in the buffer from which we should start
reading. If this value is greater than 0, then we need to share the
bit buffer, since we may be part of a larger postings entry that
will need multiple readers.size
- The size of the data in the sub-buffer.public DFOPostings(ReadableBuffer b1, ReadableBuffer b2)
Method Detail |
---|
protected void init(ReadableBuffer b1, ReadableBuffer b2)
protected void addSkip(int id, int pos, int off)
id
- The ID that the skip is pointing to.pos
- The position in the postings to skip to.protected int encodeBasic()
protected int encode()
public void setSkipSize(int size)
setSkipSize
in interface Postings
public void add(Occurrence o)
add
in interface Postings
o
- The occurrence.protected void addFields(FieldOccurrence fo)
fo
- an occurrence that includes information about what fields are
currently active.public int getN()
getN
in interface Postings
public int getLastID()
Postings
getLastID
in interface Postings
public int getMaxFDT()
getMaxFDT
in interface Postings
public long getTotalOccurrences()
getTotalOccurrences
in interface Postings
public void finish()
finish
in interface Postings
public int size()
size
in interface Postings
public WriteableBuffer[] getBuffers()
ByteBuffer
whose contents represent the
postings. These buffers can safely be written to streams.
The format is as follows:
NumIDs:LastID:NumSkipEntries[:skipID:skipPos]*:
getBuffers
in interface Postings
ByteBuffer
containing the encoded postings
data.public void remap(int[] idMap)
This is tricky, because we can't assume that the remapped IDs will maintain the order of the IDs, even if the IDs have changed. Thus, we need to uncompres all of the IDs and then put them back together.
remap
in interface Postings
idMap
- A map from the IDs currently in use in the postings to
new IDs.public void append(Postings p, int start)
append
in interface Postings
p
- The postings to append. Implementers can safely assume
that the postings being passed in are of the same class as the
implementing class.start
- The new starting document ID for the partition
that the entry was drawn from.public void append(Postings p, int start, int[] idMap)
append
in interface Postings
p
- The postings to append. Implementers can safely assume
that the postings being passed in are of the same class as the
implementing class.start
- The new starting document ID for the partition
that the entry was drawn from.idMap
- A map from old IDs in the given postings to new IDs
with gaps removed for deleted data. If this is null, then there are
no deleted documents.public PostingsIterator iterator(PostingsIteratorFeatures features)
iterator
in interface Postings
features
- A set of features that the iterator must support.
null
will be returned.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |