|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.sun.labs.minion.indexer.dictionary.FeatureVector
public class FeatureVector
A class that can be used to save feature vectors in an index. A feature vector
is simply an array of double
s that represent the features. The
width of the feature vector is determined by the first vector that is indexed.
If subsequent values have a different width a warning will be issued.
Currently, this class will only store one feature vector per document.
Field Summary | |
---|---|
protected double[] |
features
The features stored during indexing. |
protected FieldInfo |
fi
The information for this field. |
protected int[] |
idToFeat
A map from document IDs to the indices where feature vectors can be found in the stored features. |
protected static java.lang.String |
logTag
|
protected int |
pos
The current position in the features array. |
protected int |
width
The width of the feature vectors that we're storing. |
Constructor Summary | |
---|---|
FeatureVector(FieldInfo fi)
Creates a FeatureVector that can be used to store data
at indexing time. |
|
FeatureVector(FieldInfo field,
java.io.RandomAccessFile dictFile,
java.io.RandomAccessFile[] postFiles,
DiskPartition part)
Constructs a feature vector field that will be used to retrieve data during querying. |
Method Summary | |
---|---|
void |
add(int docID,
java.lang.Object data)
Adds data to this saved field. |
long |
bytesInUse()
|
void |
clear()
Clears a saved field, if it's open for indexing. |
int |
compareTo(java.lang.Object o)
|
double |
distance(int id1,
FeatureVector v,
int id2)
Gets the distance between two feature vectors stored in different partitions. |
double |
distance(int d1,
int d2)
|
void |
dump(java.lang.String path,
java.io.RandomAccessFile dictFile,
PostingsOutput[] postOut,
int maxID)
Dumps our saved data to the file. |
double[] |
euclideanDistance(double[] vec)
Computes the Euclidean distance from the given document to all other documents. |
double |
euclideanDistance(double[] vec,
int docID)
Computes the Euclidean distance of the given feature vector to the vector for the given ID. |
double[] |
euclideanDistance(int docID)
Computes the Euclidean distance from the given document to all other documents. |
QueryEntry |
get(java.lang.Object v,
boolean caseSensitive)
Unsupported operation. |
java.lang.Object |
getDefault()
Gets the default value for a feature vector, which is null |
FieldInfo |
getField()
Get the field info object for this field. |
java.lang.Object |
getSavedData(int docID,
boolean all)
Gets the data saved for a particular document ID. |
ArrayGroup |
getUndefined(ArrayGroup ag)
Gets a group of all the documents that do not have any values saved for this field. |
DictionaryIterator |
iterator(java.lang.Object lowerBound,
boolean includeLower,
java.lang.Object upperBound,
boolean includeUpper)
Gets an iterator for the values in this field. |
void |
merge(java.lang.String path,
SavedField[] fields,
int maxID,
int[] starts,
int[] nUndel,
int[][] docIDMaps,
java.io.RandomAccessFile dictFile,
PostingsOutput postOut)
Merges a number of saved fields. |
int |
size()
Gets the number of saved items that we're storing. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected FieldInfo fi
protected int[] idToFeat
protected double[] features
protected int pos
protected int width
protected static java.lang.String logTag
Constructor Detail |
---|
public FeatureVector(FieldInfo fi)
FeatureVector
that can be used to store data
at indexing time.
public FeatureVector(FieldInfo field, java.io.RandomAccessFile dictFile, java.io.RandomAccessFile[] postFiles, DiskPartition part) throws java.io.IOException
field
- The FieldInfo
for this saved field.dictFile
- The file containing the dictionary for this field.postFiles
- The files containing the postings for this field.part
- The disk partition that this field is associated with.
java.io.IOException
- if there is any error loading the field
data.Method Detail |
---|
public void add(int docID, java.lang.Object data)
double
- Specified by:
add
in interface SavedField
- Parameters:
docID
- the document ID for the data we're adding.data
- the data to add. We assume that this is an array of double
- Throws:
java.lang.ClassCastException
- if data is not an array of double.
public void dump(java.lang.String path, java.io.RandomAccessFile dictFile, PostingsOutput[] postOut, int maxID) throws java.io.IOException
dump
in interface SavedField
path
- The path of the index directory.dictFile
- The file where the dictionary will be written.postOut
- A place to write the postings associated with the
values.maxID
- The maximum document ID for this partition.
java.io.IOException
- if there is an error during the
writing.public QueryEntry get(java.lang.Object v, boolean caseSensitive)
get
in interface SavedField
v
- The value to get.caseSensitive
- If true, case should be taken into account when
iterating through the values. This value will only be observed for
character fields!
null
if
that term doesn't occur in the indexed material.public FieldInfo getField()
SavedField
getField
in interface SavedField
public java.lang.Object getSavedData(int docID, boolean all)
null
is returned.
getSavedData
in interface SavedField
docID
- the document whose data we wantall
- if true
a list containing the single stored value
for the document will be returned.
public ArrayGroup getUndefined(ArrayGroup ag)
SavedField
getUndefined
in interface SavedField
ag
- a set of documents to which we should restrict the search for
documents with undefined field values. If this is null
then
there is no such restriction.
public DictionaryIterator iterator(java.lang.Object lowerBound, boolean includeLower, java.lang.Object upperBound, boolean includeUpper)
SavedField
iterator
in interface SavedField
public int size()
SavedField
size
in interface SavedField
public int compareTo(java.lang.Object o)
compareTo
in interface java.lang.Comparable
public void clear()
SavedField
clear
in interface SavedField
public long bytesInUse()
public java.lang.Object getDefault()
null
public double euclideanDistance(double[] vec, int docID)
vec
- a feature vectordocID
- the id of the document to which we want to compute the
distance. If there is no data stored for this document,
Double.POSITIVE_INFINITY
is returned.public double[] euclideanDistance(int docID)
docID
- the document.
null
is
returned. If a document does
not have data associated with it, the value for that document will be
Double.POSITIVE_INFINITY
public double[] euclideanDistance(double[] vec)
vec
- the feature vector to which we're going to compute similarity.
Double.POSITIVE_INFINITY
public void merge(java.lang.String path, SavedField[] fields, int maxID, int[] starts, int[] nUndel, int[][] docIDMaps, java.io.RandomAccessFile dictFile, PostingsOutput postOut) throws java.io.IOException
SavedField
merge
in interface SavedField
path
- The path to the index directory.fields
- An array of fields to merge.maxID
- The max doc ID in the new partitionstarts
- The new starting document IDs for the partitions.nUndel
- The number of undeleted documents in each partitiondocIDMaps
- A map for each partition from old document IDs to
new document IDs. IDs that map to a value less than 0 have been
deleted. A null array means that the old IDs are the new IDs.dictFile
- The file to which the merged dictionaries will be
written.postOut
- The output to which the merged postings will be
written.
java.io.IOException
- if there is an error during the merge.public double distance(int id1, FeatureVector v, int id2)
id1
- the id of the document containing the vector in this partitionv
- the saved field holding the vector for the other partitionid2
- the id of the document containing the vector in the other partition
Double.POSITIVE_INFINITY
if either of the vector is undefined for the given IDs.public double distance(int d1, int d2)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |