com.sun.labs.minion.classification
Class ClassifierMemoryPartition

java.lang.Object
  extended by com.sun.labs.minion.indexer.partition.Partition
      extended by com.sun.labs.minion.indexer.partition.MemoryPartition
          extended by com.sun.labs.minion.classification.ClassifierMemoryPartition
All Implemented Interfaces:
com.sun.labs.util.props.Component, com.sun.labs.util.props.Configurable, java.lang.Comparable<Partition>

public class ClassifierMemoryPartition
extends MemoryPartition

A memory partition that will hold classifier data.


Field Summary
protected  FeatureClusterer clustererInstance
          The feature clusterer used by all classifiers in this partition
protected static java.lang.String logTag
          The tag for this module.
protected  ClassifierModel modelInstance
          The model used to classify docs in this partition
protected  java.util.List models
          A list of the models trained into this partition, so that we can dump their model specific data.
protected  int partClasses
          The number of classes we've indexed into this partition
static java.lang.String PROP_CLUSTER_MEMORY_PARTITION
           
static java.lang.String PROP_PART_MANAGER
           
static java.lang.String PROP_STOPWORDS
           
protected  FeatureSelector selectorInstance
          The feature selector used by all classifiers in this partition
 
Fields inherited from class com.sun.labs.minion.indexer.partition.MemoryPartition
ddo, del, deleted, docDict, dockey, mainDict, name, nWords, postBytes
 
Fields inherited from class com.sun.labs.minion.indexer.partition.Partition
DICT_OFFSETS_SIZE, docDictFactory, entryClass, entryName, indexConfig, mainDictFactory, mainDictFile, mainPostFiles, manager, maxID, nEntries, partNumber, PROP_DOC_DICT_FACTORY, PROP_INDEX_CONFIG, PROP_MAIN_DICT_FACTORY, PROP_PARTITION_MANAGER, stats
 
Constructor Summary
ClassifierMemoryPartition()
          Constructs a ClassifierMemoryPartition for general use.
 
Method Summary
protected  void dumpCustom(Entry[] sorted)
          Dumps the data that is specific to the classifier partition.
protected  void endTrainingClass(int featureSize)
           
 int getNDocs()
          Gets the number of documents in this partition.
 void newProperties(com.sun.labs.util.props.PropertySheet ps)
           
protected  void nextStep(Progress p, java.lang.String str)
           
protected  ClassifierModel selectBestModel(java.lang.String name, java.lang.String fieldName, java.lang.String fromField, ResultSetImpl results, ResultSplitter splitter, Progress progress)
           
 void train(java.lang.String name, java.lang.String fieldName, java.lang.String fromField, ResultSetImpl results, Progress progress)
          Train a classifier.
 
Methods inherited from class com.sun.labs.minion.indexer.partition.MemoryPartition
dump, dump, getDocumentTerm, shutdown
 
Methods inherited from class com.sun.labs.minion.indexer.partition.Partition
compareTo, getAllFiles, getAllFiles, getDocFiles, getDocFiles, getIndexConfig, getMainFiles, getMainFiles, getManager, getName, getNumPostingsChannels, getPartitionNumber, getQueryConfig, getStats
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

selectorInstance

protected FeatureSelector selectorInstance
The feature selector used by all classifiers in this partition


clustererInstance

protected FeatureClusterer clustererInstance
The feature clusterer used by all classifiers in this partition


modelInstance

protected ClassifierModel modelInstance
The model used to classify docs in this partition


models

protected java.util.List models
A list of the models trained into this partition, so that we can dump their model specific data.


logTag

protected static java.lang.String logTag
The tag for this module.


partClasses

protected int partClasses
The number of classes we've indexed into this partition


PROP_PART_MANAGER

@ConfigComponent(type=PartitionManager.class)
public static final java.lang.String PROP_PART_MANAGER
See Also:
Constant Field Values

PROP_CLUSTER_MEMORY_PARTITION

@ConfigComponent(type=ClusterMemoryPartition.class)
public static final java.lang.String PROP_CLUSTER_MEMORY_PARTITION
See Also:
Constant Field Values

PROP_STOPWORDS

@ConfigComponent(type=StopWords.class,
                 mandatory=false)
public static final java.lang.String PROP_STOPWORDS
See Also:
Constant Field Values
Constructor Detail

ClassifierMemoryPartition

public ClassifierMemoryPartition()
Constructs a ClassifierMemoryPartition for general use.

Method Detail

getNDocs

public int getNDocs()
Description copied from class: Partition
Gets the number of documents in this partition.

Specified by:
getNDocs in class Partition

train

public void train(java.lang.String name,
                  java.lang.String fieldName,
                  java.lang.String fromField,
                  ResultSetImpl results,
                  Progress progress)
           throws SearchEngineException
Train a classifier. Either creates a new classifier with the given name or will replace a classifier if the specified name exists

Parameters:
name - the name of the new class, or an existing class
fieldName - the name of the field where classification results for this classifier will be stored
fromField - the vectored field from which features will be selected.
results - the set of results to use to train the classifier
progress - where to send progress events
Throws:
SearchEngineException - if there are any errors during training

selectBestModel

protected ClassifierModel selectBestModel(java.lang.String name,
                                          java.lang.String fieldName,
                                          java.lang.String fromField,
                                          ResultSetImpl results,
                                          ResultSplitter splitter,
                                          Progress progress)
                                   throws SearchEngineException
Throws:
SearchEngineException

endTrainingClass

protected void endTrainingClass(int featureSize)

dumpCustom

protected void dumpCustom(Entry[] sorted)
                   throws java.io.IOException
Dumps the data that is specific to the classifier partition. This will be any custom data that the classifier wants to store as well as the set of docs that contributed to training each classifier. This method is called automatically by the dump() method of MemoryPartition and doesn't need to be called directly.

Overrides:
dumpCustom in class MemoryPartition
Parameters:
sorted - a sorted listed of all main dictionary entries
Throws:
java.io.IOException - if there is any error writing data.

nextStep

protected void nextStep(Progress p,
                        java.lang.String str)

newProperties

public void newProperties(com.sun.labs.util.props.PropertySheet ps)
                   throws com.sun.labs.util.props.PropertyException
Specified by:
newProperties in interface com.sun.labs.util.props.Configurable
Overrides:
newProperties in class MemoryPartition
Throws:
com.sun.labs.util.props.PropertyException