com.sun.labs.minion.classification
Class SimpleFeatureSelector

java.lang.Object
  extended by com.sun.labs.minion.classification.SimpleFeatureSelector
All Implemented Interfaces:
FeatureSelector

public class SimpleFeatureSelector
extends java.lang.Object
implements FeatureSelector

A class that selects the top n features from a set of documents based on the weights assigned by a term weighting function.


Constructor Summary
SimpleFeatureSelector()
          Creates a SimpleFeatureSelector
 
Method Summary
 FeatureClusterSet select(FeatureClusterSet set, WeightingComponents wc, int numTrainingDocs, int numFeatures, SearchEngine engine)
          Selects the features from the documents in the training set.
 void setHumanSelected(HumanSelected hs)
          Provides a set of human selected terms that should be included or excluded from consideration during the feature selection process.
 void setStopWords(StopWords stopWords)
          Sets a stopword list: words that should be ignored when selecting features.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleFeatureSelector

public SimpleFeatureSelector()
Creates a SimpleFeatureSelector

Method Detail

setHumanSelected

public void setHumanSelected(HumanSelected hs)
Description copied from interface: FeatureSelector
Provides a set of human selected terms that should be included or excluded from consideration during the feature selection process.

Specified by:
setHumanSelected in interface FeatureSelector

select

public FeatureClusterSet select(FeatureClusterSet set,
                                WeightingComponents wc,
                                int numTrainingDocs,
                                int numFeatures,
                                SearchEngine engine)
Description copied from interface: FeatureSelector
Selects the features from the documents in the training set.

Specified by:
select in interface FeatureSelector
Parameters:
set - the set of feature clusters from the training set.
numFeatures - the number of features to select.
Returns:
a sorted set of the features

setStopWords

public void setStopWords(StopWords stopWords)
Description copied from interface: FeatureSelector
Sets a stopword list: words that should be ignored when selecting features.

Specified by:
setStopWords in interface FeatureSelector
Parameters:
stopWords - the set of words to ignore when performing feature selection.