|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.sun.labs.minion.classification.BalancedWinnow
public class BalancedWinnow
An implementation of the Balanced Winnow classification algorithm. An instance of BalancedWinnow represents a classifier for a particular class. Classifiers can be trained and used to classify documents.
| Constructor Summary | |
|---|---|
BalancedWinnow()
|
|
| Method Summary | |
|---|---|
float[] |
classify(DiskPartition sdp)
Classifies a disk partition of documents. |
void |
dump(java.io.RandomAccessFile raf)
Writes the threshold out that describes the minimum closeness that a vector must have to this classifier. |
Feature |
getFeature()
Gets a single feature of the type that this classifier model uses. |
FeatureClusterSet |
getFeatures()
Gets the features that this classifier model will be using for classification. |
java.lang.String |
getFieldName()
Gets the field name where the results of this classifier will be stored. |
java.lang.String |
getFromField()
|
java.lang.String |
getModelName()
Gets the name of the model. |
protected java.util.List |
getStrengthArrays(java.util.Map<DiskPartition,java.util.List> partToDocList,
FeatureClusterSet clusterSet,
java.util.Map<DiskPartition,TermCache> termCaches)
|
ClassifierModel |
newInstance()
Creates a new instance of this classifier model. |
protected void |
nextStep(Progress p,
java.lang.String str)
|
void |
read(java.io.RandomAccessFile raf)
Reads the threshold for this classifier. |
void |
setEngine(SearchEngine e)
Sets the search engine that this classifier is part of. |
void |
setFeatures(FeatureClusterSet f)
Sets the features that the classifier model will use for classification. |
void |
setFieldName(java.lang.String fieldName)
Sets the name of the field where the results of this classifier will be stored. |
void |
setFromField(java.lang.String fromField)
Sets the name of the field from which the classifier was built, since we'll want to classify against terms only from that field. |
void |
setModelName(java.lang.String modelName)
Sets the name of the model. |
float |
similarity(ClassifierModel cm)
Computes the similarity between this classifier model and another. |
float |
similarity(DocumentVector v)
Computes the similarity of the given document vector and the classifier. |
float |
similarity(java.lang.String key)
Computes the similarity of the given document and the classifier. |
protected float |
strength(int freq)
|
void |
train(java.lang.String name,
java.lang.String fieldName,
PartitionManager manager,
ResultSetImpl training,
FeatureClusterSet selectedFeatures,
java.util.Map<java.lang.String,TermStatsImpl> termStats,
java.util.Map<DiskPartition,TermCache> termCaches,
Progress progress)
Train a balanced winnow classifier. |
protected boolean |
winnow(float[] upperWeight,
float[] lowerWeight,
java.util.List strengthArrays,
boolean expectPositive)
Actually computes the winnow sums and modifies the upper and lower weights according to balanced winnow as described in the train method. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public BalancedWinnow()
| Method Detail |
|---|
public void setModelName(java.lang.String modelName)
ClassifierModel
setModelName in interface ClassifierModelpublic java.lang.String getModelName()
ClassifierModel
getModelName in interface ClassifierModel
public void train(java.lang.String name,
java.lang.String fieldName,
PartitionManager manager,
ResultSetImpl training,
FeatureClusterSet selectedFeatures,
java.util.Map<java.lang.String,TermStatsImpl> termStats,
java.util.Map<DiskPartition,TermCache> termCaches,
Progress progress)
throws SearchEngineException
train in interface ClassifierModelname - name of classifiermanager - the partition manager for the collectiontraining - the set of documents in the training setselectedFeatures - the set of feature (clusters) to useprogress - an object to use to report progressfieldName - the name of the field where the results of this classifier will
be storedtermStats - A map from names to term statistics for the feature
clusters. This map will be populated with all of the elements of
fcs when this method is called.termCaches - A map from partitions to term caches containing the
uncompressed postings for the feature clusters in fcs. The
caches will be fully populated with the clusters from fcs when
this method is called.
SearchEngineException - if there is any error using the index
while training the classifier
protected java.util.List getStrengthArrays(java.util.Map<DiskPartition,java.util.List> partToDocList,
FeatureClusterSet clusterSet,
java.util.Map<DiskPartition,TermCache> termCaches)
protected boolean winnow(float[] upperWeight,
float[] lowerWeight,
java.util.List strengthArrays,
boolean expectPositive)
upperWeight - the upper/positive weight arraylowerWeight - the lower/negative weight arraystrengthArrays - the arrays representing the example docsexpectPositive - true if the examples are positive examples
protected float strength(int freq)
public FeatureClusterSet getFeatures()
ClassifierModelFeature.
getFeatures in interface ClassifierModelFeaturepublic Feature getFeature()
ClassifierModel
getFeature in interface ClassifierModelpublic void setEngine(SearchEngine e)
ClassifierModel
setEngine in interface ClassifierModel
public void dump(java.io.RandomAccessFile raf)
throws java.io.IOException
dump in interface ClassifierModelraf - the file (correctly positioned) to write the threshold to
java.io.IOExceptionpublic void setFeatures(FeatureClusterSet f)
ClassifierModelFeature.
setFeatures in interface ClassifierModelf - the set of features.Feature
public void read(java.io.RandomAccessFile raf)
throws java.io.IOException
read in interface ClassifierModelraf - the file (correctly positioned) to read the threshold from
java.io.IOExceptionpublic float[] classify(DiskPartition sdp)
ClassifierModel
classify in interface ClassifierModelsdp - a disk partition
public float similarity(java.lang.String key)
ClassifierModel
similarity in interface ClassifierModelkey - the key of the document for which we wish to compute
similarity
public float similarity(DocumentVector v)
ClassifierModel
similarity in interface ClassifierModelv - the document vector with which we want to calculate
similarity
public float similarity(ClassifierModel cm)
ClassifierModel
similarity in interface ClassifierModelcm - the model we want to compute the similarity to
protected void nextStep(Progress p,
java.lang.String str)
public ClassifierModel newInstance()
ClassifierModel
newInstance in interface ClassifierModelpublic java.lang.String getFieldName()
ClassifierModel
getFieldName in interface ClassifierModelpublic void setFieldName(java.lang.String fieldName)
ClassifierModel
setFieldName in interface ClassifierModelpublic void setFromField(java.lang.String fromField)
ClassifierModel
setFromField in interface ClassifierModelfromField - the name of the field that was used to generate featurespublic java.lang.String getFromField()
getFromField in interface ClassifierModel
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||