|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.sun.labs.minion.retrieval.DocumentVectorImpl
public class DocumentVectorImpl
A class that holds a weighted document vector for a given document from a given partition. This implementation is meant to handle features from either the entire document or a single vectored field.
for an implementation that can handle
features from multiple vectored fields.,
Serialized Form| Field Summary | |
|---|---|
protected SearchEngine |
e
The search engine that generated this vector. |
protected java.lang.String |
field
|
protected int |
fieldID
|
protected int[] |
fields
The field from which this document vector was generated. |
protected StopWords |
ignoreWords
|
protected DocKeyEntry |
key
The document key for this entry. |
protected java.lang.String |
keyName
The name of the key, which will survive transport. |
protected float |
length
The length of this document vector. |
protected static java.lang.String |
logTag
|
protected boolean |
normalized
Whether we've been normalized. |
protected QueryStats |
qs
|
protected WeightedFeature[] |
v
An array to hold the features that make up our vector. |
protected WeightingComponents |
wc
A set of weighting components that can be used when calculating term weights. |
protected WeightingFunction |
wf
The weighting function to use for computing term weights. |
| Constructor Summary | |
|---|---|
protected |
DocumentVectorImpl()
|
|
DocumentVectorImpl(ResultImpl r)
Creates a document vector from a search result. |
|
DocumentVectorImpl(ResultImpl r,
java.lang.String field)
Creates a document vector for a particular field from a search result. |
|
DocumentVectorImpl(SearchEngine e,
DocKeyEntry key,
java.lang.String field)
Creates a document vector for a given document. |
|
DocumentVectorImpl(SearchEngine e,
DocKeyEntry key,
java.lang.String field,
WeightingFunction wf,
WeightingComponents wc)
|
|
DocumentVectorImpl(SearchEngine e,
WeightedFeature[] basisFeatures)
|
| Method Summary | |
|---|---|
DocumentVector |
copy()
Creates a copy of the current document vector and returns it. |
float |
dot(DocumentVectorImpl dvi)
Calculates the dot product of this document vector with another. |
float |
dot(WeightedFeature[] wfv)
Calculates the dot product of this feature vector and another feature vector. |
boolean |
equals(java.lang.Object dv)
Two document vectors are equal if all their weighted features are equal (in both name and weight) |
ResultSet |
findSimilar()
Finds similar documents to this one. |
ResultSet |
findSimilar(java.lang.String sortOrder)
Finds documents that are similar to this one. |
ResultSet |
findSimilar(java.lang.String sortOrder,
double skimPercent)
Finds similar documents to this one. |
SearchEngine |
getEngine()
|
DocKeyEntry |
getEntry()
|
WeightedFeature[] |
getFeatures()
|
java.lang.String |
getKey()
Gets the key for the document associated with this vector. |
java.util.SortedSet |
getSet()
Gets a sorted set of features. |
float |
getSimilarity(DocumentVector otherVector)
Computes the similarity between this document vector and the supplied vector. |
float |
getSimilarity(DocumentVectorImpl otherVector)
|
java.util.Map<java.lang.String,java.lang.Float> |
getSimilarityTerms(DocumentVector dv)
Gets a map of term names to weights, where the weights represent the amount the term contributed to the similarity of the two documents. |
java.util.SortedSet |
getSimilarityTerms(DocumentVectorImpl dvi)
Gets a sorted (by weight) set of the terms contributing to document similarity with the provided document. |
java.util.Set<java.lang.String> |
getTerms()
Gets the set of terms in the document represented by this vector. |
java.util.Map<java.lang.String,java.lang.Float> |
getTopWeightedTerms(int nTerms)
Gets the n terms that have the highest document weight in this document vector. |
java.util.SortedSet |
getWeightOrderedSet()
|
float |
length()
Gets the euclidean length of this vector. |
void |
normalize()
Normalizes the length of this vector to 1. |
void |
setEngine(SearchEngine e)
Sets the search engine that this vector will use, which is useful when we've been unserialized and need to get ourselves back into shape. |
void |
setField(java.lang.String field)
|
java.lang.String |
toString()
|
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
protected transient SearchEngine e
protected transient DocKeyEntry key
protected java.lang.String keyName
protected transient int[] fields
protected transient WeightingFunction wf
protected transient WeightingComponents wc
protected WeightedFeature[] v
protected float length
protected boolean normalized
protected QueryStats qs
protected static java.lang.String logTag
protected transient StopWords ignoreWords
protected java.lang.String field
protected int fieldID
| Constructor Detail |
|---|
protected DocumentVectorImpl()
public DocumentVectorImpl(ResultImpl r)
r - The search result for which we want a document vector.
public DocumentVectorImpl(ResultImpl r,
java.lang.String field)
r - The search result for which we want a document vector.field - The name of the field for which we want the document vector.
If this value is null a vector for the whole document will
be returned. If the named field is not a field that was indexed with the
vectored attribute set, the resulting document vector will be empty!
public DocumentVectorImpl(SearchEngine e,
WeightedFeature[] basisFeatures)
public DocumentVectorImpl(SearchEngine e,
DocKeyEntry key,
java.lang.String field)
e - The search engine with which the docuemnt is associated.key - The entry from the document dictionary for the given
document.field - The name of the field for which we want the document vector.
If this value is null a vector for the whole document will
be returned. If this value is the empty string, then a vector for the text
not in any defined field will be returned. If the named field is not a
field that was indexed with the
vectored attribute set, the resulting document vector will be empty!
public DocumentVectorImpl(SearchEngine e,
DocKeyEntry key,
java.lang.String field,
WeightingFunction wf,
WeightingComponents wc)
| Method Detail |
|---|
public DocumentVector copy()
DocumentVector
copy in interface DocumentVectorpublic WeightedFeature[] getFeatures()
public void setEngine(SearchEngine e)
setEngine in interface DocumentVectore - the engine to usepublic DocKeyEntry getEntry()
public SearchEngine getEngine()
public float dot(DocumentVectorImpl dvi)
dvi - another document vector
public float dot(WeightedFeature[] wfv)
wfv - a weighted feature vector
public boolean equals(java.lang.Object dv)
equals in interface DocumentVectorequals in class java.lang.Objectdv - the document vector to compare this one to
public java.util.Set<java.lang.String> getTerms()
DocumentVector
getTerms in interface DocumentVectorpublic java.util.Map<java.lang.String,java.lang.Float> getSimilarityTerms(DocumentVector dv)
getSimilarityTerms in interface DocumentVectordv - the document vector to compare this one to
public java.util.SortedSet getSimilarityTerms(DocumentVectorImpl dvi)
dvi - the document to compare this one to
public void normalize()
public float length()
public java.util.SortedSet getSet()
public java.util.SortedSet getWeightOrderedSet()
public float getSimilarity(DocumentVector otherVector)
getSimilarity in interface DocumentVectorotherVector - the vector representing the document to compare
this vector to
public float getSimilarity(DocumentVectorImpl otherVector)
public ResultSet findSimilar()
findSimilar in interface DocumentVectorpublic ResultSet findSimilar(java.lang.String sortOrder)
DocumentVector
findSimilar in interface DocumentVectorsortOrder - a string describing the order in which to sort the results
public ResultSet findSimilar(java.lang.String sortOrder,
double skimPercent)
findSimilar in interface DocumentVectorsortOrder - a string describing the order in which to sort the resultsskimPercent - a number between 0 and 1 representing what percent of the features should be used to perform findSimilar
public java.util.Map<java.lang.String,java.lang.Float> getTopWeightedTerms(int nTerms)
DocumentVector
getTopWeightedTerms in interface DocumentVectornTerms - the number of terms to return
public java.lang.String getKey()
DocumentVector
getKey in interface DocumentVectorpublic java.lang.String toString()
toString in class java.lang.Objectpublic void setField(java.lang.String field)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||