FeatureVector (Minion Search Engine)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.sun.labs.minion.indexer.dictionary
Class FeatureVector

java.lang.Object
  com.sun.labs.minion.indexer.dictionary.FeatureVector

All Implemented Interfaces:: SavedField, java.lang.Comparable

public class FeatureVector
extends java.lang.Object
implements SavedField
extends java.lang.Object
implements SavedField

A class that can be used to save feature vectors in an index. A feature vector is simply an array of doubles that represent the features. The width of the feature vector is determined by the first vector that is indexed. If subsequent values have a different width a warning will be issued.

Currently, this class will only store one feature vector per document.

Field Summary
`protected double[]`	`features` The features stored during indexing.
`protected FieldInfo`	`fi` The information for this field.
`protected int[]`	`idToFeat` A map from document IDs to the indices where feature vectors can be found in the stored features.
`protected static java.lang.String`	`logTag`
`protected int`	`pos` The current position in the features array.
`protected int`	`width` The width of the feature vectors that we're storing.

Constructor Summary
`FeatureVector(FieldInfo fi)` Creates a `FeatureVector` that can be used to store data at indexing time.
`FeatureVector(FieldInfo field, java.io.RandomAccessFile dictFile, java.io.RandomAccessFile[] postFiles, DiskPartition part)` Constructs a feature vector field that will be used to retrieve data during querying.

Method Summary
`void`	`add(int docID, java.lang.Object data)` Adds data to this saved field.
`long`	`bytesInUse()`
`void`	`clear()` Clears a saved field, if it's open for indexing.
`int`	`compareTo(java.lang.Object o)`
`double`	`distance(int id1, FeatureVector v, int id2)` Gets the distance between two feature vectors stored in different partitions.
`double`	`distance(int d1, int d2)`
`void`	`dump(java.lang.String path, java.io.RandomAccessFile dictFile, PostingsOutput[] postOut, int maxID)` Dumps our saved data to the file.
`double[]`	`euclideanDistance(double[] vec)` Computes the Euclidean distance from the given document to all other documents.
`double`	`euclideanDistance(double[] vec, int docID)` Computes the Euclidean distance of the given feature vector to the vector for the given ID.
`double[]`	`euclideanDistance(int docID)` Computes the Euclidean distance from the given document to all other documents.
`QueryEntry`	`get(java.lang.Object v, boolean caseSensitive)` Unsupported operation.
`java.lang.Object`	`getDefault()` Gets the default value for a feature vector, which is `null`
`FieldInfo`	`getField()` Get the field info object for this field.
`java.lang.Object`	`getSavedData(int docID, boolean all)` Gets the data saved for a particular document ID.
`ArrayGroup`	`getUndefined(ArrayGroup ag)` Gets a group of all the documents that do not have any values saved for this field.
`DictionaryIterator`	`iterator(java.lang.Object lowerBound, boolean includeLower, java.lang.Object upperBound, boolean includeUpper)` Gets an iterator for the values in this field.
`void`	`merge(java.lang.String path, SavedField[] fields, int maxID, int[] starts, int[] nUndel, int[][] docIDMaps, java.io.RandomAccessFile dictFile, PostingsOutput postOut)` Merges a number of saved fields.
`int`	`size()` Gets the number of saved items that we're storing.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

fi

protected FieldInfo fi

The information for this field.

idToFeat

protected int[] idToFeat

A map from document IDs to the indices where feature vectors can be found in the stored features.

features

protected double[] features

The features stored during indexing.

pos

protected int pos

The current position in the features array.

width

protected int width

The width of the feature vectors that we're storing.

logTag

protected static java.lang.String logTag

Constructor Detail

FeatureVector

public FeatureVector(FieldInfo fi)

Creates a FeatureVector that can be used to store data at indexing time.

FeatureVector

public FeatureVector(FieldInfo field,
                     java.io.RandomAccessFile dictFile,
                     java.io.RandomAccessFile[] postFiles,
                     DiskPartition part)
              throws java.io.IOException

Constructs a feature vector field that will be used to retrieve data during querying.

Parameters:: field - The FieldInfo for this saved field.; dictFile - The file containing the dictionary for this field.; postFiles - The files containing the postings for this field.; part - The disk partition that this field is associated with.
Throws:: java.io.IOException - if there is any error loading the field data.

Method Detail

add

public void add(int docID,
                java.lang.Object data)

Adds data to this saved field. Assumes that data is an array of

double

Specified by:: add in interface SavedField



Parameters:
docID - the document ID for the data we're adding.
data - the data to add. We assume that this is an array of double
Throws:
java.lang.ClassCastException - if data is not an array of double.





dump
public void dump(java.lang.String path,
                 java.io.RandomAccessFile dictFile,
                 PostingsOutput[] postOut,
                 int maxID)
          throws java.io.IOException

Dumps our saved data to the file.  We won't actually store anything in the
 postings file, we'll just dump everything to the dictionary file.


Specified by:
dump in interface SavedField


Parameters:
path - The path of the index directory.
dictFile - The file where the dictionary will be written.
postOut - A place to write the postings associated with the
 values.
maxID - The maximum document ID for this partition.
Throws:
java.io.IOException - if there is an error during the
 writing.





get
public QueryEntry get(java.lang.Object v,
                      boolean caseSensitive)

Unsupported operation.


Specified by:
get in interface SavedField


Parameters:
v - The value to get.
caseSensitive - If true, case should be taken into account when
 iterating through the values.  This value will only be observed for
 character fields!
Returns:
The term associated with that name, or null if
 that term doesn't occur in the indexed material.





getField
public FieldInfo getField()

Description copied from interface: SavedField
Get the field info object for this field.


Specified by:
getField in interface SavedField



Returns:
the FieldInfo





getSavedData
public java.lang.Object getSavedData(int docID,
                                     boolean all)

Gets the data saved for a particular document ID.  If no data
 was stored for that ID, null is returned.


Specified by:
getSavedData in interface SavedField


Parameters:
docID - the document whose data we want
all - if true a list containing the single stored value
 for the document will be returned.
Returns:
the data





getUndefined
public ArrayGroup getUndefined(ArrayGroup ag)

Description copied from interface: SavedField
Gets a group of all the documents that do not have any values saved for
 this field.


Specified by:
getUndefined in interface SavedField


Parameters:
ag - a set of documents to which we should restrict the search for
 documents with undefined field values.  If this is null then
 there is no such restriction.
Returns:
a set of documents that have no defined values for this field.
 This set may be restricted to documents occurring in the group that was
 passed in.





iterator
public DictionaryIterator iterator(java.lang.Object lowerBound,
                                   boolean includeLower,
                                   java.lang.Object upperBound,
                                   boolean includeUpper)

Description copied from interface: SavedField
Gets an iterator for the values in this field.


Specified by:
iterator in interface SavedField








size
public int size()

Description copied from interface: SavedField
Gets the number of saved items that we're storing.


Specified by:
size in interface SavedField








compareTo
public int compareTo(java.lang.Object o)


Specified by:
compareTo in interface java.lang.Comparable








clear
public void clear()

Description copied from interface: SavedField
Clears a saved field, if it's open for indexing.


Specified by:
clear in interface SavedField








bytesInUse
public long bytesInUse()











getDefault
public java.lang.Object getDefault()

Gets the default value for a feature vector, which is null











euclideanDistance
public double euclideanDistance(double[] vec,
                                int docID)

Computes the Euclidean distance of the given feature vector to the vector
 for the given ID.





Parameters:
vec - a feature vector
docID - the id of the document to which we want to compute the
 distance.  If there is no data stored for this document,
 Double.POSITIVE_INFINITY
 is returned.





euclideanDistance
public double[] euclideanDistance(int docID)

Computes the Euclidean distance from the given document to all other
 documents.





Parameters:
docID - the document.
Returns:
an array of double, indexed by document ID.  If there is no data
 associated with the document that we were given, null is
 returned.  If a document does
 not have data associated with it, the value for that document will be
 Double.POSITIVE_INFINITY





euclideanDistance
public double[] euclideanDistance(double[] vec)

Computes the Euclidean distance from the given document to all other
 documents.





Parameters:
vec - the feature vector to which we're going to compute similarity.
Returns:
an array of double, indexed by document ID.  If a document does
 not have data associated with it, the value for that document will be
 Double.POSITIVE_INFINITY





merge
public void merge(java.lang.String path,
                  SavedField[] fields,
                  int maxID,
                  int[] starts,
                  int[] nUndel,
                  int[][] docIDMaps,
                  java.io.RandomAccessFile dictFile,
                  PostingsOutput postOut)
           throws java.io.IOException

Description copied from interface: SavedField
Merges a number of saved fields.


Specified by:
merge in interface SavedField


Parameters:
path - The path to the index directory.
fields - An array of fields to merge.
maxID - The max doc ID in the new partition
starts - The new starting document IDs for the partitions.
nUndel - The number of undeleted documents in each partition
docIDMaps - A map for each partition from old document IDs to
 new document IDs.  IDs that map to a value less than 0 have been
 deleted.  A null array means that the old IDs are the new IDs.
dictFile - The file to which the merged dictionaries will be
 written.
postOut - The output to which the merged postings will be
 written.
Throws:
java.io.IOException - if there is an error during the merge.





distance
public double distance(int id1,
                       FeatureVector v,
                       int id2)

Gets the distance between two feature vectors stored in different
 partitions.





Parameters:
id1 - the id of the document containing the vector in this partition
v - the saved field holding the vector for the other partition
id2 - the id of the document containing the vector in the other partition
Returns:
the distance between the feature vectors, or Double.POSITIVE_INFINITY
 if either of the vector is undefined for the given IDs.





distance
public double distance(int d1,
                       int d2)




















  
      Overview 
      Package 
    Class 
      Use 
      Tree 
      Deprecated 
      Index 
      Help 
  









 PREV CLASS 
 NEXT CLASS

  FRAMES   
 NO FRAMES   
 







  SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.sun.labs.minion.indexer.dictionary Class FeatureVector

fi

idToFeat

features

pos

width

logTag

FeatureVector

FeatureVector

add

dump

get

getField

getSavedData

getUndefined

iterator

size

compareTo

clear

bytesInUse

getDefault

euclideanDistance

euclideanDistance

euclideanDistance

merge

distance

distance

com.sun.labs.minion.indexer.dictionary
Class FeatureVector