|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.sun.labs.minion.retrieval.CompositeDocumentVectorImpl
public class CompositeDocumentVectorImpl
An implementation of document vector that provides for a composite document vector, that is, a document vector made by taking a linear combination of more than one vectored field.
for an implementation that uses features from the
whole document or from just one vectored field.| Field Summary | |
|---|---|
protected SearchEngine |
e
The search engine that generated this vector. |
protected WeightedFeature[][] |
fieldFeatures
The per-field weighted features for the document. |
protected float[] |
fieldLengths
The per-field lengths of the vectors. |
protected WeightedField[] |
fields
A linear combination of the fields composing this vector. |
protected StopWords |
ignoreWords
|
protected boolean |
initialized
Whether we've had our features initialized. |
protected static java.lang.String |
logTag
|
protected boolean |
normalized
Whether we've been normalized. |
protected WeightingComponents |
wc
A set of weighting components that can be used when calculating term weights. |
protected WeightingFunction |
wf
The weighting function to use for computing term weights. |
| Constructor Summary | |
|---|---|
protected |
CompositeDocumentVectorImpl()
|
|
CompositeDocumentVectorImpl(ResultImpl r,
WeightedField[] fields)
Creates a document vector for a particular field from a search result. |
|
CompositeDocumentVectorImpl(SearchEngine e,
DocKeyEntry key,
WeightedField[] fields)
Creates a document vector for a given document. |
|
CompositeDocumentVectorImpl(SearchEngine e,
DocKeyEntry key,
WeightedField[] fields,
WeightingFunction wf,
WeightingComponents wc)
|
|
CompositeDocumentVectorImpl(SearchEngine e,
WeightedFeature[] wf,
WeightedField[] fields)
|
| Method Summary | |
|---|---|
DocumentVector |
copy()
Creates a copy of the current document vector and returns it. |
float |
dot(CompositeDocumentVectorImpl dvi)
Calculates the dot product of this document vector with another. |
float |
dot(WeightedFeature[] wfv1,
WeightedFeature[] wfv2)
Calculates the dot product of two sets of weighted features. |
ResultSet |
findSimilar()
Finds similar documents to this one. |
ResultSet |
findSimilar(java.lang.String sortOrder)
Finds documents that are similar to this one. |
ResultSet |
findSimilar(java.lang.String sortOrder,
double skimPercent)
Finds documents that are similar to this one. |
SearchEngine |
getEngine()
|
DocKeyEntry |
getEntry()
|
java.lang.String |
getKey()
Gets the key for the document associated with this vector. |
java.util.SortedSet<WeightedFeature> |
getSet()
Gets a sorted set of features. |
float |
getSimilarity(CompositeDocumentVectorImpl otherVector)
|
float |
getSimilarity(DocumentVector otherVector)
Computes the similarity between this document vector and the supplied vector. |
java.util.SortedSet<WeightedFeature> |
getSimilarityTerms(CompositeDocumentVectorImpl dvi)
Gets a sorted (by weight) set of the terms contributing to document similarity with the provided document. |
java.util.Map<java.lang.String,java.lang.Float> |
getSimilarityTerms(DocumentVector dv)
Gets a map of term names to weights, where the weights represent the amount the term contributed to the similarity of the two documents. |
java.util.Set<java.lang.String> |
getTerms()
Gets the set of terms in the document represented by this vector. |
java.util.Map<java.lang.String,java.lang.Float> |
getTopWeightedTerms(int nTerms)
Gets the n terms that have the highest document weight in this document vector. |
java.util.SortedSet<WeightedFeature> |
getWeightOrderedSet()
|
void |
normalize()
Normalizes the length of this vector to 1. |
void |
setEngine(SearchEngine e)
Sets the search engine to use with this document vector. |
java.lang.String |
toString()
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface com.sun.labs.minion.DocumentVector |
|---|
equals |
| Field Detail |
|---|
protected transient SearchEngine e
protected WeightedField[] fields
protected transient WeightingFunction wf
protected transient WeightingComponents wc
protected WeightedFeature[][] fieldFeatures
protected float[] fieldLengths
protected boolean initialized
protected boolean normalized
protected static java.lang.String logTag
protected transient StopWords ignoreWords
| Constructor Detail |
|---|
protected CompositeDocumentVectorImpl()
public CompositeDocumentVectorImpl(ResultImpl r,
WeightedField[] fields)
r - The search result for which we want a document vector.fields - a linear combination of fields and weights that should be
used to build this document vector. The field names provided in the array should
be the names of vectored fields. If a provided field name does not name
a vectored field, a warning will be logged, but the operation will proceed.
If this paramater contains a weighted field whose name is null,
that indicates that the data from the unnamed body field should be used with
the associated weight.
It is probably a good idea if the weights associated with the fields
sum to 1, although it is not required. If the weights do not sum to one,
then you may get document similarities greater than 1 as the result of a
findSimilar operation.
public CompositeDocumentVectorImpl(SearchEngine e,
DocKeyEntry key,
WeightedField[] fields)
e - The search engine with which the docuemnt is associated.key - The entry from the document dictionary for the given
document.fields - a linear combination of the vectored fields in this document
that we will use to build the document vector.
If this value is null a vector for the whole document will
be returned. If one of the values in a non-null array has
the name of the field set to null, then the vector will include
data from the unnamed body field. If one of the fields provided is not
a vectored field, then a warning will be issued, but processing will
proceed.
public CompositeDocumentVectorImpl(SearchEngine e,
DocKeyEntry key,
WeightedField[] fields,
WeightingFunction wf,
WeightingComponents wc)
public CompositeDocumentVectorImpl(SearchEngine e,
WeightedFeature[] wf,
WeightedField[] fields)
| Method Detail |
|---|
public DocumentVector copy()
DocumentVector
copy in interface DocumentVectorpublic DocKeyEntry getEntry()
public SearchEngine getEngine()
public void setEngine(SearchEngine e)
DocumentVector
setEngine in interface DocumentVectore - the enginepublic float dot(CompositeDocumentVectorImpl dvi)
When the other document vector contains a field that this one does not, (or vice versa (note: it's probably not a good idea to compute the dot product of such vectors!)), then there will be no contribution from that field. When the two document vectors have a field in common, the of the vectors will be multiplied by any associated field weights before they are multiplied together.
dvi - another document vector
public float dot(WeightedFeature[] wfv1,
WeightedFeature[] wfv2)
wfv1 - a weighted feature vectorwfv2 - another weighted feature vector
public java.util.Set<java.lang.String> getTerms()
DocumentVector
getTerms in interface DocumentVectorpublic java.util.Map<java.lang.String,java.lang.Float> getSimilarityTerms(DocumentVector dv)
getSimilarityTerms in interface DocumentVectordv - the document vector to compare this one to
public java.util.SortedSet<WeightedFeature> getSimilarityTerms(CompositeDocumentVectorImpl dvi)
dvi - the document to compare this one to
public void normalize()
public java.util.SortedSet<WeightedFeature> getSet()
public java.util.SortedSet<WeightedFeature> getWeightOrderedSet()
public float getSimilarity(DocumentVector otherVector)
getSimilarity in interface DocumentVectorotherVector - the vector representing the document to compare
this vector to
public float getSimilarity(CompositeDocumentVectorImpl otherVector)
public ResultSet findSimilar()
findSimilar in interface DocumentVectorpublic ResultSet findSimilar(java.lang.String sortOrder)
DocumentVector
findSimilar in interface DocumentVectorsortOrder - a string describing the order in which to sort the results
public ResultSet findSimilar(java.lang.String sortOrder,
double skimPercent)
findSimilar in interface DocumentVectorsortOrder - a string describing the order in which to sort the resultsskimPercent - a number between 0 and 1 representing what percent of the features should be used to perform findSimilar
public java.util.Map<java.lang.String,java.lang.Float> getTopWeightedTerms(int nTerms)
DocumentVector
getTopWeightedTerms in interface DocumentVectornTerms - the number of terms to return
public java.lang.String getKey()
DocumentVector
getKey in interface DocumentVectorpublic java.lang.String toString()
toString in class java.lang.Object
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||