|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.sun.labs.minion.retrieval.WeightingComponents
public class WeightingComponents
A class that will hold all of the components necessary to implement any number of weighting functions. The names and descriptions here are (mostly) taken from the Moffat and Zobel paper Exploring the Similarity Space.
The components that this class contains comprise statistics at two levels of description. First, there are the collection-level statistics that are calculated across all of the partitions contained in an index. Second, there are the document-level statistics that are set per term or document being processed, depending on the context.
For example, in typical query processing scenarios we will create a set
of weighting components from the collection statistics at the start of
query evaluation. As each term in the query is processed, we will set
the document-level term statistics using the setTerm(java.lang.String)
method.
As we process each document in the postings list associated with a term,
we will set the document-level statistics directly.
Note that this class is provided as a convienience and is merely intended as a container into which a number of statistics can be placed. There is no checking done with regard to the validity of the statistics that are placed into it. The use of inappropriate statistics may lead to strange results when calculating term weights.
Field Summary | |
---|---|
float |
avgDocLen
The average document length, in words. |
CollectionStats |
cs
A set of collection statistics. |
float |
dvl
The length of the document vector for the current document. |
int |
fdt
The frequency of term t in document d. |
int |
ft
The total number of documents containing term t. |
long |
Ft
The total number of occurrences of term t in the whole collection. |
long |
ld
The total number of words in document d. |
protected static java.lang.String |
logTag
|
int |
maxfdt
The maximum term frequency in the collection. |
int |
maxft
The maximum document frequency in the collection. |
int |
n
The number of distinct terms in the collection. |
int |
N
The total number of documents in the collection. |
int |
nd
The number of distinct terms in document d. |
long |
nTokens
The number of tokens in the collection, i.e., the sum of the lengths of all the documents. |
TermStatsImpl |
ts
The statistics that we were given or that we retrieved for the last call to setTerm . |
float |
wt
A collection level term weight. |
Constructor Summary | |
---|---|
WeightingComponents()
Creates a set of weighting components. |
|
WeightingComponents(CollectionStats s)
Initalizes a set of weighting components from a set of collection statistics. |
Method Summary | |
---|---|
TermStatsImpl |
getTermStats()
|
TermStatsImpl |
getTermStats(java.lang.String term)
|
WeightingComponents |
setCollection(CollectionStats s)
Initializes the collection-level statistics. |
WeightingComponents |
setDocument(DocKeyEntry key)
Initializes any document-level statistics that can be determined from a document key. |
WeightingComponents |
setDocument(DocKeyEntry key,
java.lang.String field)
|
WeightingComponents |
setDocument(PostingsIterator pi)
Initalizes any per-document statistics that can be gotten from a postings iterator. |
WeightingComponents |
setTerm(java.lang.String name)
Initializes any document-level statistics that can be determined from a term. |
WeightingComponents |
setTerm(TermStatsImpl s)
Initializes any document-level statistics that can be determined from a set of term statistics. |
void |
setTermStats(java.lang.String term,
TermStatsImpl ts)
|
java.lang.String |
toString()
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public CollectionStats cs
setCollection
method, then the
weighting components can handle term statistics lookups on their own.
public TermStatsImpl ts
setTerm
.
public int N
public int n
public long nTokens
public int fdt
public long Ft
public int ft
public int maxfdt
public int maxft
public int nd
public long ld
public float dvl
public float avgDocLen
public float wt
WeightingFunction.initTerm(WeightingComponents)
protected static java.lang.String logTag
Constructor Detail |
---|
public WeightingComponents()
public WeightingComponents(CollectionStats s)
setTerm(java.lang.String)
Method Detail |
---|
public WeightingComponents setCollection(CollectionStats s)
public WeightingComponents setTerm(java.lang.String name)
setCollection
method. If there are no such statistics
a warning is issued and the components in this object will
not be modified!
name
- the name of the term whose statistics we need.
public WeightingComponents setTerm(TermStatsImpl s)
s
- a set of statistics for a term.
public TermStatsImpl getTermStats()
public TermStatsImpl getTermStats(java.lang.String term)
public void setTermStats(java.lang.String term, TermStatsImpl ts)
public WeightingComponents setDocument(DocKeyEntry key)
key
- a document key entry from a dicitionary.
public WeightingComponents setDocument(DocKeyEntry key, java.lang.String field)
public WeightingComponents setDocument(PostingsIterator pi)
pi
- a postings iterator that is being processed
public java.lang.String toString()
toString
in class java.lang.Object
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |