|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.sun.labs.minion.classification.ContingencyFeatureSelector
public class ContingencyFeatureSelector
A feature selector that builds contingency features. The weights calculated from the contingency features depend on the type that is given to the constructor for this class.
ContingencyFeature.MUTUAL_INFORMATION
,
ContingencyFeature.CHI_SQUARED
Field Summary | |
---|---|
protected static java.lang.String |
logTag
A tag. |
protected StopWords |
stopWords
Words to ignore during selection. |
protected int |
type
How the weights should be calculated for the contingency table. |
Constructor Summary | |
---|---|
ContingencyFeatureSelector()
|
|
ContingencyFeatureSelector(int type)
Makes a feature selector that returns features that use a contingency table to calculate weight. |
Method Summary | |
---|---|
protected void |
computeContingency(ContingencyFeatureCluster curr,
SearchEngine engine,
WeightingComponents wc,
int tsize,
int N)
|
protected boolean |
discardFeature(ContingencyFeature cf,
SearchEngine engine)
Determines whether a given feature should be discarded from the set. |
FeatureClusterSet |
select(FeatureClusterSet training,
WeightingComponents wc,
int numTrainingDocs,
int numFeatures,
SearchEngine engine)
Selects the features from the documents that have the highest mutual information with the class represented by the given training set. |
void |
setHumanSelected(HumanSelected hs)
Provides a set of human selected terms that should be included or excluded from consideration during the feature selection process. |
void |
setStopWords(StopWords stopWords)
Sets a stopword list: words that should be ignored when selecting features. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected int type
protected StopWords stopWords
protected static java.lang.String logTag
Constructor Detail |
---|
public ContingencyFeatureSelector()
public ContingencyFeatureSelector(int type)
type
- the type of weight to calculateContingencyFeature.MUTUAL_INFORMATION
,
ContingencyFeature.CHI_SQUARED
Method Detail |
---|
public void setHumanSelected(HumanSelected hs)
setHumanSelected
in interface FeatureSelector
hs
- a set of human selected terms that should be included or
excluded during feature selection.public FeatureClusterSet select(FeatureClusterSet training, WeightingComponents wc, int numTrainingDocs, int numFeatures, SearchEngine engine)
select
in interface FeatureSelector
training
- the set of features in the training set.wc
- a set of weighting components to use when weighting termsnumTrainingDocs
- the number of training documentsnumFeatures
- the number of features to select.engine
- the search engine the features are from
protected void computeContingency(ContingencyFeatureCluster curr, SearchEngine engine, WeightingComponents wc, int tsize, int N)
protected boolean discardFeature(ContingencyFeature cf, SearchEngine engine)
cf
- the feature we want to testengine
- the engine we're using to do the test
true
if the feature should be discarded, false
if it should be keptpublic void setStopWords(StopWords stopWords)
FeatureSelector
setStopWords
in interface FeatureSelector
stopWords
- the set of words to ignore when performing feature
selection.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |