DocumentVectorLengths (Minion Search Engine)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.sun.labs.minion.indexer.partition
Class DocumentVectorLengths

java.lang.Object
  com.sun.labs.minion.indexer.partition.DocumentVectorLengths

Direct Known Subclasses:: CachedDocumentVectorLengths

public class DocumentVectorLengths
extends java.lang.Object
extends java.lang.Object

A class that holds the document vector lengths for a partition. It can be used at indexing or query time to build the document vector lengths and dump them to disk. The stored document vector lengths are used for document vector length normalization during querying and classification operations.

The lengths are represented using a file-backed buffer.

Field Summary
`protected static int`	`BUFF_SIZE` A standard buffer size to use, in bytes.
`protected ReadableBuffer[]`	`fieldLens` Buffers containing vector lengths for the vectored fields.
`protected static java.lang.String`	`logTag`
`protected DiskPartition`	`part` The partition whose values we're storing.
`protected java.io.RandomAccessFile`	`raf` The random access file that we'll use to back our buffer.
`protected ReadableBuffer`	`vecLens` A buffer containing the vector lengths for the whole document.
`protected java.io.File`	`vlFile` The file that (will) contain the document vector lengths.

Constructor Summary
`DocumentVectorLengths(DiskPartition part, boolean adjustStats)` Creates a set of vector lengths for a given partition.
`DocumentVectorLengths(DiskPartition part, int buffSize, boolean adjustStats)` Creates a set of vector lengths for a given partition.

Method Summary
`void`	`calculateLengths(DiskPartition p, TermStatsDictionary gts, boolean adjustStats)` Calculates a set of document vector lengths from a partition using a global set of term statistics.
`void`	`close()` Closes the file associated with the document lengths.
`float`	`getVectorLength(int docID)` Gets the length of a document associated with this partition.
`float`	`getVectorLength(int docID, int fieldID)` Gets the length of a document associated with this partition.
`void`	`normalize(int[] docs, float[] scores, int p, float qw, int fieldID)` Normalizes a set of document scores all in one go, using a local buffer copy to avoid synchronization and churn in the buffer.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

part

protected DiskPartition part

The partition whose values we're storing.

vlFile

protected java.io.File vlFile

The file that (will) contain the document vector lengths.

raf

protected java.io.RandomAccessFile raf

The random access file that we'll use to back our buffer.

vecLens

protected ReadableBuffer vecLens

A buffer containing the vector lengths for the whole document.

fieldLens

protected ReadableBuffer[] fieldLens

Buffers containing vector lengths for the vectored fields.

BUFF_SIZE

protected static int BUFF_SIZE

A standard buffer size to use, in bytes.

logTag

protected static java.lang.String logTag

Constructor Detail

DocumentVectorLengths

public DocumentVectorLengths(DiskPartition part,
                             boolean adjustStats)
                      throws java.io.IOException

Creates a set of vector lengths for a given partition. If the file of vector lengths already exists, it is opened for use. If the file doesn't exist, then the vector lengths will be created by a multitude of threads.

Parameters:: part - the partition whose document vector lengths we will calculate.; adjustStats - if true then if we have to calculate the vector lengths, then we will modify the global term stats.
Throws:: java.io.IOException - if there is any error reading or writing the vector lengths.

DocumentVectorLengths

public DocumentVectorLengths(DiskPartition part,
                             int buffSize,
                             boolean adjustStats)
                      throws java.io.IOException

Creates a set of vector lengths for a given partition. If the file of vector lengths already exists, it is opened for use. If the file doesn't exist, then the vector lengths will be calculated and then stored to the file.

Parameters:: part - the partition whose vector lengths we're storing.; buffSize - the size of the buffer to use when storing the lengths.; adjustStats - if true then if we have to calculate the vector lengths, then we will modify the global term stats.
Throws:: java.io.IOException - if there is any error reading or writing the vector lengths.

Method Detail

calculateLengths

public void calculateLengths(DiskPartition p,
                             TermStatsDictionary gts,
                             boolean adjustStats)
                      throws FileLockException,
                             java.io.IOException

Calculates a set of document vector lengths from a partition using a global set of term statistics. The global term stats may be re-written as a side effect.

Parameters:: p - the partition for which we're calculating document vector lengths; gts - the dictionary of global term stats.; adjustStats - if true, the global term stats will be modified to include the statistics from the term in the partition. This will be the case when computing vector lengths for a new partition, but not when computing vector lengths for a merged partition, since in that case the global term stats will already include data from the partitions that were merged. If this paramater is false the global stats will not be rewritten.
Throws:: FileLockException - if we can't lock the vector length file; java.io.IOException - if there is any error writing the vector lengths

getVectorLength

public float getVectorLength(int docID)

Gets the length of a document associated with this partition. This will be used at query and classification time. Note that our buffer uses 0 based indexing, so we need to subtract one from the document ID!

Parameters:: docID - the ID of the document whose vector length we wish to retrieve.
Returns:: the vector length of the document with the given ID

normalize

public void normalize(int[] docs,
                      float[] scores,
                      int p,
                      float qw,
                      int fieldID)

Normalizes a set of document scores all in one go, using a local buffer copy to avoid synchronization and churn in the buffer. This will modify the scores array.

Parameters:: docs - the document IDs to normalize; scores - the document scores; p - the number of document IDs and scores in the array; qw - the query weight to use for normalization; fieldID - the ID of the field that the scores were computed from and that should be used for normalization.

getVectorLength

public float getVectorLength(int docID,
                             int fieldID)

Parameters:: docID - the ID of the document whose vector length we wish to retrieve.; fieldID - the ID of the field for which we're looking for the length. A field ID of -1 is interpreted as a request for the length using all vectored fields. If this field was not vectored, then a length of 1 is returned, so that dividing weights by document lengths won't cause problems.
Returns:: the length of the vector for the given ID and vectored field

close

public void close()
           throws java.io.IOException

Closes the file associated with the document lengths.

Throws:: java.io.IOException - if there is any error closing the file

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.sun.labs.minion.indexer.partition Class DocumentVectorLengths

part

vlFile

raf

vecLens

fieldLens

BUFF_SIZE

logTag

DocumentVectorLengths

DocumentVectorLengths

calculateLengths

getVectorLength

normalize

getVectorLength

close

com.sun.labs.minion.indexer.partition
Class DocumentVectorLengths