com.sun.labs.minion.retrieval
Interface WeightingFunction

All Known Implementing Classes:
Okapi, TFIDF, TFIDFCount

public interface WeightingFunction

An interface for a term weighting function that can be used during retrieval, classification, and profiling operations. Classes that implement this interface will be used in two distinct ways:

  1. During standard query processing, terms will be retrieved from the dictionary and their postings lists will be processed. In this case, when each term is retrieved from the dictionary, the initTerm method will be called with the term statistics for the term. This should result in the calculation and caching of any collection-level weight for the term. Once initTerm has been called, the termWeight(WeightingComponents) method will be called repeatedly for each element in the postings list associated with the term.
  2. During classification operatons and during the calculation of document vector lengths at document dictionary dump and merge time, we will call the termWeight(WeightingComponents) method repeatedly to calculate the weights associated with each term in the document.

See Also:
TFIDF, Okapi

Method Summary
 float initTerm(WeightingComponents wc)
          Initializes the weighting function for a particular term.
 float termWeight(WeightingComponents wc)
          Calculates the weight for a particular term in a particular document, given a set of weighting components.
 

Method Detail

initTerm

float initTerm(WeightingComponents wc)
Initializes the weighting function for a particular term. It is expected that this method will use the weighting components to calculate some collection level weight for the term that will be used repeatedly during the processing of a postings list associated with the term.

If a collection level weight is calculated as part of a weighting function, it must be placed into the WeightingComponents.wt member. During query processing any calls to termWeight that follow the call to initTerm for a given term are guaranteed to pass in the same WeightingComponents object, so it can be used to safely cache the collection-level weights. Additionally, it will be safe to subclass WeightingComponents if necessary.

Note that the term weight computed by this method may be used as the weight of the terms in a query, so if no such weight needs to be calculated for a given implementation, a value of 1 should be used.

Parameters:
wc - a set of weighting components.
Returns:
the collection-level weight associated with this term.
See Also:
WeightingComponents.wt

termWeight

float termWeight(WeightingComponents wc)
Calculates the weight for a particular term in a particular document, given a set of weighting components.

Parameters:
wc - a set of weighting components.
Returns:
the weight of the given term in the given document.