|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.sun.labs.minion.classification.Rocchio
public class Rocchio
A classifier model that does Rocchio-style classification.
Nested Class Summary | |
---|---|
protected class |
Rocchio.FQR
A class to collate and hold the results of a feedback query. |
protected class |
Rocchio.HE
A class to hold a single element of the heap that we'll use to negotiate the results of queries. |
Field Summary | |
---|---|
protected static float[] |
alpha
Values of the beta and gamma parameters to try. |
protected float |
ba
|
protected float |
bb
|
protected static float[] |
beta
|
protected float |
bg
|
protected FeatureClusterSet |
clusters
A Set of features of features |
protected SearchEngine |
e
The engine that this classifier is part of. |
protected FeatureClusterSet |
features
The features that we will use for our model. |
protected java.lang.String |
fieldName
The name of the field into which our classification results will go. |
protected java.text.DecimalFormat |
form
|
protected java.lang.String |
fromField
The name of the vectored field whose contents were used to train the classifier. |
protected static float[] |
gamma
|
protected static java.lang.String |
logTag
|
protected PartitionManager |
manager
A manager for the partitions we're classifying against. |
protected static int[] |
rankCutoff
A set of rank cutoffs to use for dynamic query zoning. |
protected java.util.Map<DiskPartition,TermCache> |
termCaches
A term cache to use when building classifiers. |
protected java.util.Map<java.lang.String,TermStatsImpl> |
termStats
|
protected float |
threshold
The similarity threshold for our classifier. |
Constructor Summary | |
---|---|
Rocchio()
|
Method Summary | |
---|---|
float |
checkThreshold(float score)
|
float[] |
classify(DiskPartition sdp)
Classifies a set of documents. |
float[][] |
classify(java.lang.String fromField,
ClassifierDiskPartition cdp,
DiskPartition sdp)
Evaluates all of the classifiers in the given classifier disk partition against all of the new documents in the given disk partition. |
java.lang.String |
describe()
Describes the classifier model. |
void |
dump(java.io.RandomAccessFile raf)
Dumps any classifier specific data to the given file. |
java.util.List<WeightedFeature> |
explain(java.lang.String key)
Explains the score that a given document would get for this classifier. |
java.lang.String |
explain(java.lang.String key,
boolean includeDocTerms)
Explains why (or why not) the document with the given key would (or would not) be classified into this class. |
ResultSet |
findSimilar()
|
ResultSet |
findSimilar(java.lang.String fromField)
Finds the documents that are most similar to this classifier, whether they are in the class or not. |
Feature |
getFeature()
Gets a single feature of the type that this classifier model uses. |
FeatureClusterSet |
getFeatures()
Gets the features that this classifier model will be using for classification. |
java.lang.String |
getFieldName()
Gets the field name where the results of this classifier will be stored. |
java.lang.String |
getFromField()
|
java.lang.String |
getModelName()
Gets the name of the model. |
float |
getThreshold()
|
ClassifierModel |
newInstance()
Creates a new instance of this classifier model. |
protected void |
nextStep(Progress p,
java.lang.String str)
|
void |
read(java.io.RandomAccessFile raf)
Reads any classifier specific data from the given file. |
protected Rocchio.FQR |
runFeedback(FeatureClusterSet cwFeatures,
WeightedFeatureVector opt,
java.util.List queryZone,
int nRel,
WeightingFunction wf,
WeightingComponents wc)
Runs a feedback query with the current estimate of the optimal query. |
void |
setEngine(SearchEngine e)
Sets the search engine that this classifier is part of. |
void |
setFeatures(FeatureClusterSet f)
Sets the feature clusters that the classifier model will use for classification. |
void |
setFieldName(java.lang.String fieldName)
Sets the name of the field where the results of this classifier will be stored. |
void |
setFromField(java.lang.String fromField)
Sets the name of the field from which the classifier was built, since we'll want to classify against terms only from that field. |
void |
setModelName(java.lang.String modelName)
Sets the name of the model. |
float |
similarity(ClassifierModel cm)
Computes the similarity between this classifier model and another. |
float |
similarity(DocumentVector v)
Computes the similarity of the given document vector and the classifier. |
float |
similarity(java.lang.String key)
Computes the similarity of the given document and the classifier. |
void |
train(java.lang.String name,
java.lang.String fieldName,
PartitionManager manager,
ResultSetImpl training,
FeatureClusterSet selectedFeatures,
java.util.Map<java.lang.String,TermStatsImpl> termStats,
java.util.Map<DiskPartition,TermCache> termCaches,
Progress progress)
Trains the classifier on a set of documents. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected SearchEngine e
protected java.util.Map<DiskPartition,TermCache> termCaches
protected java.util.Map<java.lang.String,TermStatsImpl> termStats
protected FeatureClusterSet features
protected FeatureClusterSet clusters
protected float threshold
protected float ba
protected float bb
protected float bg
protected PartitionManager manager
protected java.lang.String fieldName
protected java.lang.String fromField
protected static int[] rankCutoff
protected static float[] alpha
protected static float[] beta
protected static float[] gamma
protected static java.lang.String logTag
protected java.text.DecimalFormat form
Constructor Detail |
---|
public Rocchio()
Method Detail |
---|
public void setModelName(java.lang.String modelName)
ClassifierModel
setModelName
in interface ClassifierModel
public java.lang.String getModelName()
ClassifierModel
getModelName
in interface ClassifierModel
public float getThreshold()
public void train(java.lang.String name, java.lang.String fieldName, PartitionManager manager, ResultSetImpl training, FeatureClusterSet selectedFeatures, java.util.Map<java.lang.String,TermStatsImpl> termStats, java.util.Map<DiskPartition,TermCache> termCaches, Progress progress) throws SearchEngineException
train
in interface ClassifierModel
name
- The name of the class, as specified by the application.manager
- the manager for the partitions against which we're
trainingtraining
- A set of results containing the training documents for
the class.selectedFeatures
- the set of features to use when training this classifierfieldName
- the name of the field where the results of this classifier will
be storedtermStats
- A map from names to term statistics for the feature
clusters. This map will be populated with all of the elements of
fcs
when this method is called.termCaches
- A map from partitions to term caches containing the
uncompressed postings for the feature clusters in fcs
. The
caches will be fully populated with the clusters from fcs
when
this method is called.
SearchEngineException
- if there is any problem training the
classifier.public void setEngine(SearchEngine e)
ClassifierModel
setEngine
in interface ClassifierModel
public float checkThreshold(float score)
public float similarity(java.lang.String key)
ClassifierModel
similarity
in interface ClassifierModel
key
- the key of the document for which we wish to compute
similarity
public float similarity(DocumentVector v)
ClassifierModel
similarity
in interface ClassifierModel
v
- the document vector with which we want to calculate
similarity
public float similarity(ClassifierModel cm)
ClassifierModel
similarity
in interface ClassifierModel
cm
- the model we want to compute the similarity to
public java.lang.String explain(java.lang.String key, boolean includeDocTerms)
ExplainableClassifierModel
explain
in interface ExplainableClassifierModel
key
- the key of the document whose classification we want to explainincludeDocTerms
- if true, the explanation will include a description
of the terms from the document.
public java.util.List<WeightedFeature> explain(java.lang.String key)
ExplainableClassifierModel
explain
in interface ExplainableClassifierModel
key
- the key of the document that we want to explain
protected Rocchio.FQR runFeedback(FeatureClusterSet cwFeatures, WeightedFeatureVector opt, java.util.List queryZone, int nRel, WeightingFunction wf, WeightingComponents wc)
opt
- the current optimal query, as calculated by the vector
difference.queryZone
- a query zone that we can use to drive per-partition
processingnRel
- the number of relevant documents, which is the number of
training examples
public FeatureClusterSet getFeatures()
Feature
.
getFeatures
in interface BulkClassifier
getFeatures
in interface ClassifierModel
Feature
public Feature getFeature()
getFeature
in interface ClassifierModel
public void dump(java.io.RandomAccessFile raf) throws java.io.IOException
dump
in interface ClassifierModel
raf
- The file to which the data can be dumped.
java.io.IOException
public void setFeatures(FeatureClusterSet f)
setFeatures
in interface ClassifierModel
f
- the set of features.FeatureCluster
public void read(java.io.RandomAccessFile raf) throws java.io.IOException
read
in interface ClassifierModel
raf
- The file from which the data can be read. The file will
be positioned appropriately so that the data can be read.
java.io.IOException
public float[] classify(DiskPartition sdp)
classify
in interface ClassifierModel
sdp
- a disk partition representing the recently dumped
documents.
public float[][] classify(java.lang.String fromField, ClassifierDiskPartition cdp, DiskPartition sdp)
BulkClassifier
classify
in interface BulkClassifier
fromField
- the field from which the terms should be gathered.cdp
- A partition of classifiers to evaluatesdp
- A partition of documents to evaluate the classifiers against
public ResultSet findSimilar()
public ResultSet findSimilar(java.lang.String fromField)
public ClassifierModel newInstance()
ClassifierModel
newInstance
in interface ClassifierModel
protected void nextStep(Progress p, java.lang.String str)
public java.lang.String describe()
ExplainableClassifierModel
describe
in interface ExplainableClassifierModel
public java.lang.String getFieldName()
ClassifierModel
getFieldName
in interface ClassifierModel
public void setFieldName(java.lang.String fieldName)
ClassifierModel
setFieldName
in interface ClassifierModel
public void setFromField(java.lang.String fromField)
ClassifierModel
setFromField
in interface ClassifierModel
fromField
- the name of the field that was used to generate featurespublic java.lang.String getFromField()
getFromField
in interface ClassifierModel
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |