com.sun.labs.minion.retrieval
Class CollectionStats

java.lang.Object
  extended by com.sun.labs.minion.retrieval.CollectionStats

public class CollectionStats
extends java.lang.Object

A container for collection level statistics that are coelesced out of the statistics associated with a number of partitons.


Field Summary
 float avgDocLen
          The average document length in the collection, in words.
protected  DictionaryIterator di
           
 int maxfdt
          The maximum term frequency in the collection.
 int maxft
          The maximum document frequency in the collection.
 int nd
          The number of distinct terms in the collection.
protected  int nDocs
          The total number of documents in the collection.
protected  long nTokens
          The number of tokens in the collection.
protected  PartitionManager pm
          A partition manager that will allow us to fetch collection-wide term statistics.
protected  java.util.Map<java.lang.String,TermStatsImpl> termStats
          A local cache of term stats.
 
Constructor Summary
CollectionStats(PartitionManager pm)
           
 
Method Summary
 float getAvgerageDocumentLength()
           
 int getNDistinct()
           
 int getNDocs()
           
 long getNTokens()
           
 TermStatsImpl getTermStats(java.lang.String s)
          Gets the collection-wide statistics for a given term name.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

pm

protected PartitionManager pm
A partition manager that will allow us to fetch collection-wide term statistics.


di

protected DictionaryIterator di

termStats

protected java.util.Map<java.lang.String,TermStatsImpl> termStats
A local cache of term stats.


nDocs

protected int nDocs
The total number of documents in the collection.


nTokens

protected long nTokens
The number of tokens in the collection.


maxfdt

public int maxfdt
The maximum term frequency in the collection. For all terms t in the collection and all documents d in the partition, this is the maximum value of fd,t, the frequency of term t in document d.


maxft

public int maxft
The maximum document frequency in the collection. This is given by the term that has the largest number of documents associated with it, across all dictionaries in the collection. This will most likely be an underestimate, as it most likely will not take into account the fact that the same term occurs in more than one partition!


nd

public int nd
The number of distinct terms in the collection. This is very likely an overestimate, as many terms will be shared in the various partitions' main dictionaries.


avgDocLen

public float avgDocLen
The average document length in the collection, in words.

Constructor Detail

CollectionStats

public CollectionStats(PartitionManager pm)
Method Detail

getTermStats

public TermStatsImpl getTermStats(java.lang.String s)
Gets the collection-wide statistics for a given term name.


getNDocs

public int getNDocs()

getNTokens

public long getNTokens()

getAvgerageDocumentLength

public float getAvgerageDocumentLength()

getNDistinct

public int getNDistinct()

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object