|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.sun.labs.minion.indexer.partition.DocumentVectorLengths
public class DocumentVectorLengths
A class that holds the document vector lengths for a partition. It can be used at indexing or query time to build the document vector lengths and dump them to disk. The stored document vector lengths are used for document vector length normalization during querying and classification operations.
The lengths are represented using a file-backed buffer.
| Field Summary | |
|---|---|
protected static int |
BUFF_SIZE
A standard buffer size to use, in bytes. |
protected ReadableBuffer[] |
fieldLens
Buffers containing vector lengths for the vectored fields. |
protected static java.lang.String |
logTag
|
protected DiskPartition |
part
The partition whose values we're storing. |
protected java.io.RandomAccessFile |
raf
The random access file that we'll use to back our buffer. |
protected ReadableBuffer |
vecLens
A buffer containing the vector lengths for the whole document. |
protected java.io.File |
vlFile
The file that (will) contain the document vector lengths. |
| Constructor Summary | |
|---|---|
DocumentVectorLengths(DiskPartition part,
boolean adjustStats)
Creates a set of vector lengths for a given partition. |
|
DocumentVectorLengths(DiskPartition part,
int buffSize,
boolean adjustStats)
Creates a set of vector lengths for a given partition. |
|
| Method Summary | |
|---|---|
void |
calculateLengths(DiskPartition p,
TermStatsDictionary gts,
boolean adjustStats)
Calculates a set of document vector lengths from a partition using a global set of term statistics. |
void |
close()
Closes the file associated with the document lengths. |
float |
getVectorLength(int docID)
Gets the length of a document associated with this partition. |
float |
getVectorLength(int docID,
int fieldID)
Gets the length of a document associated with this partition. |
void |
normalize(int[] docs,
float[] scores,
int p,
float qw,
int fieldID)
Normalizes a set of document scores all in one go, using a local buffer copy to avoid synchronization and churn in the buffer. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected DiskPartition part
protected java.io.File vlFile
protected java.io.RandomAccessFile raf
protected ReadableBuffer vecLens
protected ReadableBuffer[] fieldLens
protected static int BUFF_SIZE
protected static java.lang.String logTag
| Constructor Detail |
|---|
public DocumentVectorLengths(DiskPartition part,
boolean adjustStats)
throws java.io.IOException
part - the partition whose document vector lengths we will
calculate.adjustStats - if true then if we have to calculate the
vector lengths, then we will modify the global term stats.
java.io.IOException - if there is any error reading or writing
the vector lengths.
public DocumentVectorLengths(DiskPartition part,
int buffSize,
boolean adjustStats)
throws java.io.IOException
part - the partition whose vector lengths we're storing.buffSize - the size of the buffer to use when storing the
lengths.adjustStats - if true then if we have to calculate the
vector lengths, then we will modify the global term stats.
java.io.IOException - if there is any error reading or writing
the vector lengths.| Method Detail |
|---|
public void calculateLengths(DiskPartition p,
TermStatsDictionary gts,
boolean adjustStats)
throws FileLockException,
java.io.IOException
p - the partition for which we're calculating document vector lengthsgts - the dictionary of global term stats.adjustStats - if true, the global term stats will be
modified to include the statistics from the term in the partition. This
will be the case when computing vector lengths for a new partition, but
not when computing vector lengths for a merged partition, since in that
case the global term stats will already include data from the partitions
that were merged. If this paramater is false the global
stats will not be rewritten.
FileLockException - if we can't lock the vector length
file
java.io.IOException - if there is any error writing the vector lengthspublic float getVectorLength(int docID)
docID - the ID of the document whose vector length we wish to
retrieve.
public void normalize(int[] docs,
float[] scores,
int p,
float qw,
int fieldID)
scores array.
docs - the document IDs to normalizescores - the document scoresp - the number of document IDs and scores in the arrayqw - the query weight to use for normalizationfieldID - the ID of the field that the scores were computed from and that
should be used for normalization.
public float getVectorLength(int docID,
int fieldID)
docID - the ID of the document whose vector length we wish to
retrieve.fieldID - the ID of the field for which we're looking for the length.
A field ID of -1 is interpreted as a request for the length using all
vectored fields. If this field was not vectored, then a length of 1 is returned, so that
dividing weights by document lengths won't cause problems.
public void close()
throws java.io.IOException
java.io.IOException - if there is any error closing the file
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||