|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.sun.labs.minion.retrieval.DocumentVectorImpl
public class DocumentVectorImpl
A class that holds a weighted document vector for a given document from a given partition. This implementation is meant to handle features from either the entire document or a single vectored field.
for an implementation that can handle
features from multiple vectored fields.
,
Serialized FormField Summary | |
---|---|
protected SearchEngine |
e
The search engine that generated this vector. |
protected java.lang.String |
field
|
protected int |
fieldID
|
protected int[] |
fields
The field from which this document vector was generated. |
protected StopWords |
ignoreWords
|
protected DocKeyEntry |
key
The document key for this entry. |
protected java.lang.String |
keyName
The name of the key, which will survive transport. |
protected float |
length
The length of this document vector. |
protected static java.lang.String |
logTag
|
protected boolean |
normalized
Whether we've been normalized. |
protected QueryStats |
qs
|
protected WeightedFeature[] |
v
An array to hold the features that make up our vector. |
protected WeightingComponents |
wc
A set of weighting components that can be used when calculating term weights. |
protected WeightingFunction |
wf
The weighting function to use for computing term weights. |
Constructor Summary | |
---|---|
protected |
DocumentVectorImpl()
|
|
DocumentVectorImpl(ResultImpl r)
Creates a document vector from a search result. |
|
DocumentVectorImpl(ResultImpl r,
java.lang.String field)
Creates a document vector for a particular field from a search result. |
|
DocumentVectorImpl(SearchEngine e,
DocKeyEntry key,
java.lang.String field)
Creates a document vector for a given document. |
|
DocumentVectorImpl(SearchEngine e,
DocKeyEntry key,
java.lang.String field,
WeightingFunction wf,
WeightingComponents wc)
|
|
DocumentVectorImpl(SearchEngine e,
WeightedFeature[] basisFeatures)
|
Method Summary | |
---|---|
DocumentVector |
copy()
Creates a copy of the current document vector and returns it. |
float |
dot(DocumentVectorImpl dvi)
Calculates the dot product of this document vector with another. |
float |
dot(WeightedFeature[] wfv)
Calculates the dot product of this feature vector and another feature vector. |
boolean |
equals(java.lang.Object dv)
Two document vectors are equal if all their weighted features are equal (in both name and weight) |
ResultSet |
findSimilar()
Finds similar documents to this one. |
ResultSet |
findSimilar(java.lang.String sortOrder)
Finds documents that are similar to this one. |
ResultSet |
findSimilar(java.lang.String sortOrder,
double skimPercent)
Finds similar documents to this one. |
SearchEngine |
getEngine()
|
DocKeyEntry |
getEntry()
|
WeightedFeature[] |
getFeatures()
|
java.lang.String |
getKey()
Gets the key for the document associated with this vector. |
java.util.SortedSet |
getSet()
Gets a sorted set of features. |
float |
getSimilarity(DocumentVector otherVector)
Computes the similarity between this document vector and the supplied vector. |
float |
getSimilarity(DocumentVectorImpl otherVector)
|
java.util.Map<java.lang.String,java.lang.Float> |
getSimilarityTerms(DocumentVector dv)
Gets a map of term names to weights, where the weights represent the amount the term contributed to the similarity of the two documents. |
java.util.SortedSet |
getSimilarityTerms(DocumentVectorImpl dvi)
Gets a sorted (by weight) set of the terms contributing to document similarity with the provided document. |
java.util.Set<java.lang.String> |
getTerms()
Gets the set of terms in the document represented by this vector. |
java.util.Map<java.lang.String,java.lang.Float> |
getTopWeightedTerms(int nTerms)
Gets the n terms that have the highest document weight in this document vector. |
java.util.SortedSet |
getWeightOrderedSet()
|
float |
length()
Gets the euclidean length of this vector. |
void |
normalize()
Normalizes the length of this vector to 1. |
void |
setEngine(SearchEngine e)
Sets the search engine that this vector will use, which is useful when we've been unserialized and need to get ourselves back into shape. |
void |
setField(java.lang.String field)
|
java.lang.String |
toString()
|
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected transient SearchEngine e
protected transient DocKeyEntry key
protected java.lang.String keyName
protected transient int[] fields
protected transient WeightingFunction wf
protected transient WeightingComponents wc
protected WeightedFeature[] v
protected float length
protected boolean normalized
protected QueryStats qs
protected static java.lang.String logTag
protected transient StopWords ignoreWords
protected java.lang.String field
protected int fieldID
Constructor Detail |
---|
protected DocumentVectorImpl()
public DocumentVectorImpl(ResultImpl r)
r
- The search result for which we want a document vector.public DocumentVectorImpl(ResultImpl r, java.lang.String field)
r
- The search result for which we want a document vector.field
- The name of the field for which we want the document vector.
If this value is null
a vector for the whole document will
be returned. If the named field is not a field that was indexed with the
vectored attribute set, the resulting document vector will be empty!public DocumentVectorImpl(SearchEngine e, WeightedFeature[] basisFeatures)
public DocumentVectorImpl(SearchEngine e, DocKeyEntry key, java.lang.String field)
e
- The search engine with which the docuemnt is associated.key
- The entry from the document dictionary for the given
document.field
- The name of the field for which we want the document vector.
If this value is null
a vector for the whole document will
be returned. If this value is the empty string, then a vector for the text
not in any defined field will be returned. If the named field is not a
field that was indexed with the
vectored attribute set, the resulting document vector will be empty!public DocumentVectorImpl(SearchEngine e, DocKeyEntry key, java.lang.String field, WeightingFunction wf, WeightingComponents wc)
Method Detail |
---|
public DocumentVector copy()
DocumentVector
copy
in interface DocumentVector
public WeightedFeature[] getFeatures()
public void setEngine(SearchEngine e)
setEngine
in interface DocumentVector
e
- the engine to usepublic DocKeyEntry getEntry()
public SearchEngine getEngine()
public float dot(DocumentVectorImpl dvi)
dvi
- another document vector
public float dot(WeightedFeature[] wfv)
wfv
- a weighted feature vector
public boolean equals(java.lang.Object dv)
equals
in interface DocumentVector
equals
in class java.lang.Object
dv
- the document vector to compare this one to
public java.util.Set<java.lang.String> getTerms()
DocumentVector
getTerms
in interface DocumentVector
public java.util.Map<java.lang.String,java.lang.Float> getSimilarityTerms(DocumentVector dv)
getSimilarityTerms
in interface DocumentVector
dv
- the document vector to compare this one to
public java.util.SortedSet getSimilarityTerms(DocumentVectorImpl dvi)
dvi
- the document to compare this one to
public void normalize()
public float length()
public java.util.SortedSet getSet()
public java.util.SortedSet getWeightOrderedSet()
public float getSimilarity(DocumentVector otherVector)
getSimilarity
in interface DocumentVector
otherVector
- the vector representing the document to compare
this vector to
public float getSimilarity(DocumentVectorImpl otherVector)
public ResultSet findSimilar()
findSimilar
in interface DocumentVector
public ResultSet findSimilar(java.lang.String sortOrder)
DocumentVector
findSimilar
in interface DocumentVector
sortOrder
- a string describing the order in which to sort the results
public ResultSet findSimilar(java.lang.String sortOrder, double skimPercent)
findSimilar
in interface DocumentVector
sortOrder
- a string describing the order in which to sort the resultsskimPercent
- a number between 0 and 1 representing what percent of the features should be used to perform findSimilar
public java.util.Map<java.lang.String,java.lang.Float> getTopWeightedTerms(int nTerms)
DocumentVector
getTopWeightedTerms
in interface DocumentVector
nTerms
- the number of terms to return
public java.lang.String getKey()
DocumentVector
getKey
in interface DocumentVector
public java.lang.String toString()
toString
in class java.lang.Object
public void setField(java.lang.String field)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |