|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.sun.labs.minion.classification.ContingencyFeatureSelector
public class ContingencyFeatureSelector
A feature selector that builds contingency features. The weights calculated from the contingency features depend on the type that is given to the constructor for this class.
ContingencyFeature.MUTUAL_INFORMATION,
ContingencyFeature.CHI_SQUARED| Field Summary | |
|---|---|
protected static java.lang.String |
logTag
A tag. |
protected StopWords |
stopWords
Words to ignore during selection. |
protected int |
type
How the weights should be calculated for the contingency table. |
| Constructor Summary | |
|---|---|
ContingencyFeatureSelector()
|
|
ContingencyFeatureSelector(int type)
Makes a feature selector that returns features that use a contingency table to calculate weight. |
|
| Method Summary | |
|---|---|
protected void |
computeContingency(ContingencyFeatureCluster curr,
SearchEngine engine,
WeightingComponents wc,
int tsize,
int N)
|
protected boolean |
discardFeature(ContingencyFeature cf,
SearchEngine engine)
Determines whether a given feature should be discarded from the set. |
FeatureClusterSet |
select(FeatureClusterSet training,
WeightingComponents wc,
int numTrainingDocs,
int numFeatures,
SearchEngine engine)
Selects the features from the documents that have the highest mutual information with the class represented by the given training set. |
void |
setHumanSelected(HumanSelected hs)
Provides a set of human selected terms that should be included or excluded from consideration during the feature selection process. |
void |
setStopWords(StopWords stopWords)
Sets a stopword list: words that should be ignored when selecting features. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected int type
protected StopWords stopWords
protected static java.lang.String logTag
| Constructor Detail |
|---|
public ContingencyFeatureSelector()
public ContingencyFeatureSelector(int type)
type - the type of weight to calculateContingencyFeature.MUTUAL_INFORMATION,
ContingencyFeature.CHI_SQUARED| Method Detail |
|---|
public void setHumanSelected(HumanSelected hs)
setHumanSelected in interface FeatureSelectorhs - a set of human selected terms that should be included or
excluded during feature selection.
public FeatureClusterSet select(FeatureClusterSet training,
WeightingComponents wc,
int numTrainingDocs,
int numFeatures,
SearchEngine engine)
select in interface FeatureSelectortraining - the set of features in the training set.wc - a set of weighting components to use when weighting termsnumTrainingDocs - the number of training documentsnumFeatures - the number of features to select.engine - the search engine the features are from
protected void computeContingency(ContingencyFeatureCluster curr,
SearchEngine engine,
WeightingComponents wc,
int tsize,
int N)
protected boolean discardFeature(ContingencyFeature cf,
SearchEngine engine)
cf - the feature we want to testengine - the engine we're using to do the test
true if the feature should be discarded, false
if it should be keptpublic void setStopWords(StopWords stopWords)
FeatureSelector
setStopWords in interface FeatureSelectorstopWords - the set of words to ignore when performing feature
selection.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||