com.sun.labs.minion.classification
Class ClassifierManager

java.lang.Object
  extended by com.sun.labs.minion.indexer.partition.PartitionManager
      extended by com.sun.labs.minion.classification.ClassifierManager
All Implemented Interfaces:
com.sun.labs.util.props.Component, com.sun.labs.util.props.Configurable

public class ClassifierManager
extends PartitionManager

The ClassifierManager is a specialization of the PartitionManager. It performs the same roll on classifier partitions that the Partition Manager performs on partitions.


Nested Class Summary
 
Nested classes/interfaces inherited from class com.sun.labs.minion.indexer.partition.PartitionManager
PartitionManager.ExtFilter, PartitionManager.HouseKeeper, PartitionManager.Merger
 
Field Summary
protected  FeatureClusterer clustererInstance
          The feature clusterer used by all classifiers in this partition
protected  ClassifierModel modelInstance
          The model used to classify docs in this partition
static java.lang.String PROP_CLASSES_FIELD
           
static java.lang.String PROP_CLUSTERER_CLASS_NAME
           
static java.lang.String PROP_DO_CLASSIFICATION
           
static java.lang.String PROP_EXTRA_CLASSIFICATIONS
          A property for an optional set of from fields to use for each classifier.
static java.lang.String PROP_HUMAN_SELECTED
           
static java.lang.String PROP_MODEL_CLASS_NAME
           
static java.lang.String PROP_NUM_CLASSIFIER_FEATURES
           
static java.lang.String PROP_SELECTOR_CLASS_NAME
           
static java.lang.String PROP_SPLITTER_CLASS_NAME
           
protected  FeatureSelector selectorInstance
          The feature selector used by all classifiers in this partition
protected  java.lang.String splitterClassName
           
protected  ResultSplitter splitterInstance
          The result splitter used for classification in this partition.
 
Fields inherited from class com.sun.labs.minion.indexer.partition.PartitionManager
activeFile, activeLock, activeParts, engine, fieldsToLoad, indexConfig, indexDir, indexDirFile, keeper, keeperThread, lastPurgeTime, lockDirFile, logTag, mergedParts, mergeLock, mergeRate, mergeSpace, mergeThread, metaFile, name, PROP_ACTIVE_CHECK_INTERVAL, PROP_ASYNC_MERGES, PROP_CALCULATE_DVL, PROP_INDEX_CONFIG, PROP_LOCK_DIR, PROP_MAX_MERGE_SIZE, PROP_MERGE_RATE, PROP_OPEN_PARTITION_HIGH_WATER_MARK, PROP_OPEN_PARTITION_LOW_WATER_MARK, PROP_PART_CLOSE_DELAY, PROP_PART_REAP_DELAY, PROP_PARTITION_FACTORY, PROP_REAP_DOES_NOTHING, PROP_STARTING_DATA, PROP_TERMSTATS_DICT_FACTORY, queryTimer, randID, subDir, thingsToClose
 
Constructor Summary
ClassifierManager()
          Constructs the ClassifierManager.
 
Method Summary
 java.util.Map<java.lang.String,ClassificationResult> classify(DiskPartition sdp)
          Begin classification of a set of documents in memory.
 boolean doClassification()
           
 void dump()
          Signals the ClassifierManager that all the classifiers currently in memory should be dumped to disk so that they can be used for classifying new documents.
 java.util.List<WeightedFeature> explain(java.lang.String cname1, java.lang.String cname2, int n)
           
 java.util.List<FieldValue> findSimilar(java.lang.String cname, int n)
          Find classifiers that are similar to the named classifier.
 java.lang.String getClassesField()
          Gets the name of the field to which classes will be assigned during classification.
 ClassifierModel getClassifier(java.lang.String cname)
          Gets a classifier model for the given class name.
 FeatureClusterer getClustererInstance()
           
 HumanSelected getHumanSelected(java.lang.String name)
           
 ClassifierModel getModelInstance()
           
 int getNumClassifierFeatures()
          Gets the number of features to use for classifiers.
 FeatureSelector getSelectorInstance()
           
 java.io.File makeModelSpecificFile(int partNumber)
          Gets a model-specific data file name for use when dumping or merging classifier partitions.
 void newProperties(com.sun.labs.util.props.PropertySheet ps)
           
protected  void reapPartition(int partNumber)
          A method to reap a single partition.
 float similarity(java.lang.String cname, java.lang.String key)
          Computes the similarity between a document and a classifier.
 void trainClassifier(java.lang.String className, ResultSet docs)
          Creates a new classifier based on the classifier model for this collection, the documents in the ResultSet, and the set of currently indexed documents.
 
Methods inherited from class com.sun.labs.minion.indexer.partition.PartitionManager
addIndexListener, addNewPartition, addNewPartition, checkHK, deleteDocument, deleteDocuments, deleteKeys, getActivePartitions, getAllFieldValues, getCalculateDVL, getDistance, getDistance, getDocumentTerm, getDocumentVector, getDocumentVector, getDocumentVector, getEngine, getFieldInfo, getFieldIterator, getFieldIterator, getFieldNames, getFieldValue, getIndexConfig, getIndexDir, getLastPurgeTime, getLockDir, getMatching, getMerger, getMerger, getMerger, getMergerFromNumbers, getMetaFile, getNActive, getName, getNDocs, getNextPartitionNumber, getNFields, getNTerms, getNTokens, getPartCloseDelay, getPartitions, getPartNumbers, getQueryConfig, getQueryTimer, getRandID, getSimilar, getTermStats, getTermStatsDict, getTopFieldValues, hasFieldedVectors, init, isCasedIndex, isIndexed, makeActiveFile, makeDeletedDocsFile, makeDeletedDocsFile, makeDictionaryFile, makeDictionaryFile, makeMetaFile, makePostingsFile, makePostingsFile, makePostingsFile, makeRemovedPartitionFile, makeRemovedPartitionFile, makeTaxonomyFile, makeTaxonomyFile, makeTermStatsFile, makeVectorLengthFile, makeVectorLengthFile, merge, mergeAll, mergeGeometric, mergeInPieces, newDiskPartition, noMoreMerges, purge, readActiveFile, realMerge, reap, recalculateTermStats, recover, removeIndexListener, setEngine, setLockDir, setMergeRate, setPartCloseDelay, shutdown, startHK, updateActiveParts, updateTermStats, writeActiveFile
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

selectorInstance

protected FeatureSelector selectorInstance
The feature selector used by all classifiers in this partition


clustererInstance

protected FeatureClusterer clustererInstance
The feature clusterer used by all classifiers in this partition


modelInstance

protected ClassifierModel modelInstance
The model used to classify docs in this partition


splitterInstance

protected ResultSplitter splitterInstance
The result splitter used for classification in this partition.


PROP_DO_CLASSIFICATION

@ConfigBoolean(defaultValue=true)
public static final java.lang.String PROP_DO_CLASSIFICATION
See Also:
Constant Field Values

PROP_SELECTOR_CLASS_NAME

@ConfigString(defaultValue="com.sun.labs.minion.classification.ContingencyFeatureSelector")
public static final java.lang.String PROP_SELECTOR_CLASS_NAME
See Also:
Constant Field Values

PROP_CLUSTERER_CLASS_NAME

@ConfigString(defaultValue="com.sun.labs.minion.classification.StemmingClusterer")
public static final java.lang.String PROP_CLUSTERER_CLASS_NAME
See Also:
Constant Field Values

PROP_MODEL_CLASS_NAME

@ConfigString(defaultValue="com.sun.labs.minion.classification.Rocchio")
public static final java.lang.String PROP_MODEL_CLASS_NAME
See Also:
Constant Field Values

PROP_SPLITTER_CLASS_NAME

@ConfigString(defaultValue="com.sun.labs.minion.classification.KFoldSplitter")
public static final java.lang.String PROP_SPLITTER_CLASS_NAME
See Also:
Constant Field Values

splitterClassName

protected java.lang.String splitterClassName

PROP_CLASSES_FIELD

@ConfigString(defaultValue="class")
public static final java.lang.String PROP_CLASSES_FIELD
See Also:
Constant Field Values

PROP_EXTRA_CLASSIFICATIONS

@ConfigComponentList(type=ExtraClassification.class,
                     defaultList={})
public static final java.lang.String PROP_EXTRA_CLASSIFICATIONS
A property for an optional set of from fields to use for each classifier.

See Also:
Constant Field Values

PROP_NUM_CLASSIFIER_FEATURES

@ConfigInteger(defaultValue=200)
public static final java.lang.String PROP_NUM_CLASSIFIER_FEATURES
See Also:
Constant Field Values

PROP_HUMAN_SELECTED

@ConfigString(mandatory=false)
public static final java.lang.String PROP_HUMAN_SELECTED
See Also:
Constant Field Values
Constructor Detail

ClassifierManager

public ClassifierManager()
Constructs the ClassifierManager. The Term selector and clusterer are provided at initialization time and will be passed to other classes as needed.

Method Detail

getModelInstance

public ClassifierModel getModelInstance()

getSelectorInstance

public FeatureSelector getSelectorInstance()

getClustererInstance

public FeatureClusterer getClustererInstance()

doClassification

public boolean doClassification()

getClassesField

public java.lang.String getClassesField()
Gets the name of the field to which classes will be assigned during classification.

Returns:
the name of the field to which classes will be assigned

getNumClassifierFeatures

public int getNumClassifierFeatures()
Gets the number of features to use for classifiers.

Returns:
the number of features to use for classification

getHumanSelected

public HumanSelected getHumanSelected(java.lang.String name)

trainClassifier

public void trainClassifier(java.lang.String className,
                            ResultSet docs)
Creates a new classifier based on the classifier model for this collection, the documents in the ResultSet, and the set of currently indexed documents. If an existing className is provided, the existing classifier will be replaced.

Parameters:
className - the name of the class to create or replace
docs - the documents to use as exemplars for the class

dump

public void dump()
          throws java.io.IOException
Signals the ClassifierManager that all the classifiers currently in memory should be dumped to disk so that they can be used for classifying new documents. Classifiers that are just trained (and only in memory) cannot be used for classification.

Throws:
java.io.IOException - if there is any error dumping the partition

getClassifier

public ClassifierModel getClassifier(java.lang.String cname)
Gets a classifier model for the given class name.

Parameters:
cname - the name of the classifier that we want to get
Returns:
a classifier model with the given name, or null if there is no such model

findSimilar

public java.util.List<FieldValue> findSimilar(java.lang.String cname,
                                              int n)
Find classifiers that are similar to the named classifier.

Parameters:
cname - the name of the classifier for which we want to find similar classifiers
Returns:
a list of the similar classifiers, along with scores indicating the degree of similarity.

explain

public java.util.List<WeightedFeature> explain(java.lang.String cname1,
                                               java.lang.String cname2,
                                               int n)

similarity

public float similarity(java.lang.String cname,
                        java.lang.String key)
Computes the similarity between a document and a classifier.


classify

public java.util.Map<java.lang.String,ClassificationResult> classify(DiskPartition sdp)
Begin classification of a set of documents in memory. The argument provided should be the disk partition of the documents to classify. Each row (result[i]) of the output will represent the classification of a single document. The first position (result[i][0]) will be the key of the document, and any additional positions will be the names of classes into which the document was classified.

Parameters:
sdp - the disk partition to classify
Returns:
a 2-d array of string representing the classification of each doc

makeModelSpecificFile

public java.io.File makeModelSpecificFile(int partNumber)
Gets a model-specific data file name for use when dumping or merging classifier partitions.


reapPartition

protected void reapPartition(int partNumber)
A method to reap a single partition. This can be overridden in a subclass so that the reap method will work for the super and subclass.

Overrides:
reapPartition in class PartitionManager
Parameters:
partNumber - the number of the partition to reap.

newProperties

public void newProperties(com.sun.labs.util.props.PropertySheet ps)
                   throws com.sun.labs.util.props.PropertyException
Specified by:
newProperties in interface com.sun.labs.util.props.Configurable
Overrides:
newProperties in class PartitionManager
Throws:
com.sun.labs.util.props.PropertyException