|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
---|---|
BulkClassifier | An interface for classifiers that can do bulk classification. |
ClassifierModel | An interface for training and using classifiers. |
ExplainableClassifierModel | An interface for classifier models that will allow explanations to be generated inidicating why (or why not) particular documents were (or were not) classified into a given class. |
Feature | An interface for the features defined by classifiers. |
FeatureCluster | A cluster of features |
FeatureClusterer | The Feature Clusterer provides the interface to create clusters of features. |
FeatureSelector | Selects terms from a given document or set of documents, relative to the collection the terms are part of. |
Profiler | An interface for profilers that will run after dump time for a new partition. |
ResultSplitter | Result Splitters split a result set into two distinct sets suitable for use in training and validation. |
Class Summary | |
---|---|
BalancedWinnow | An implementation of the Balanced Winnow classification algorithm. |
BigQuery | A helper class for running a big query during classification operations. |
ClassificationFeature | A class that holds a feature useful when classifying documents. |
ClassificationResult | The result of a classification operation for a particular classifier. |
ClassifierDiskPartition | A disk partition that will hold classifier data. |
ClassifierManager | The ClassifierManager is a specialization of the PartitionManager. |
ClassifierMemoryPartition | A memory partition that will hold classifier data. |
ClassifierPartitionFactory | A factory for the partitions used by classifiers. |
ClassifierScore | |
ClusterDiskPartition | A disk partition that will hold classifier data. |
ClusterEntry | An entry for the doc dictionary in the cluster partition. |
ClusterManager | The ClusterManager is a specialization of the PartitionManager. |
ClusterMemoryPartition | A memory partition that will hold classifier data. |
ClusterPartitionFactory | |
ClusterPostings | Postings for the cluster documents in the cluster partition. |
ClusterWeightComparator | A comparator for weighted features that compares features based on their weight. |
ContingencyFeature | A weighted feature class that contains a 2x2 contingency table that can be used to calculate the Mutual Information or Chi-squared measures. |
ContingencyFeatureCluster | A cluster of contingency features |
ContingencyFeatureClusterer | This class provides an implementation of a feature clusterer that clusters contingency features. |
ContingencyFeatureSelector | A feature selector that builds contingency features. |
CSFeatureSelector | Chi-Squared Feature Selector that is implemented by using the ContingencyFeatureSelector. |
ExtraClassification | A configurable container that can be used to describe a set of classification operations to perform that would not be done by the standard classification approach. |
FastContingencyFeatureSelector | |
FeatureClusterSet | A set of feature clusters. |
FeatureEntry | |
FeaturePostings | An implementation of Postings that we can use to store classifier features. |
HumanSelected | A container for human selected terms that specifies terms that must or must not occur in particular classifiers. |
KeyWordProfiler | A profiler class that puts documents into classes based on the presence of particular keywords. |
KFoldSplitter | Provides a K-fold splitter. |
KnowledgeSourceClusterer | Provides an implementation of a feature clusterer built around a knowledge source. |
LiteMorphClusterer | Provides an implementation of a feature clusterer built around the light morphology engine. |
MIFeatureSelector | Mutual Information Feature Selector that is implemented by using the ContingencyFeatureSelector. |
MorphClusterer | Provides an implementation of a feature clusterer built around the full morphology engine. |
NoSplitsSplitter | Result Splitters split a result set into two distinct sets suitable for use in training and validation. |
QueryZone | A query zone is a set of documents that are centered around a set of feature clusters. |
RandomTwoThirdsSplitter | Provides two thirds/one third splits of a result set by selecting documents at random to place in either set. |
Rocchio | A classifier model that does Rocchio-style classification. |
SimpleClusterer | |
SimpleFeatureCluster | A feature cluster containing a single term and a weight assigned by a standard term weighting funciton. |
SimpleFeatureSelector | A class that selects the top n features from a set of documents based on the weights assigned by a term weighting function. |
StemmingClusterer | Provides a clusterer that groups features that have the same stems. |
StrFloat | A string and a float! |
WeightedFeature | |
WeightedFeatureCluster | |
WeightedFeatureClusterer | |
WeightedFeatureSelector | Selects the highest weighted features. |
WeightedFeatureVector | A class for holding a weighted feature vector. |
Provides the automatic document classification functionality in Minion.
This package contains the code that implements the classification
infrastructure in Minion. The package contains implementations of classifiers
as well as the implementation of the classifier infrastructure. Two
classifiers are currently provided: Rocchio
and
BalancedWinnow
. Training classifiers is broken
down into several steps.
FeatureClusterer
s, each
implementing a different strategy for performing clustering. Only one should
be used at a time for a particular index.ContingencyFeatureClusterer
, which provides
single-feature clusters.Once classifiers are trained, they are automatically evaluated across sets of documents as future documents are indexed to disk. Classifiers cannot be run against documents that have already be indexed. If classifiers are added or changed, the documents to be classified should be re-indexed.
Following the "Everything Is Dictionaries And
Postings" mantra, the classification package defines two new partition types
that are used for storing classifiers and feature clusters. The infrastructure
for these classes is included in this package. The ClassifierManager
handles the partitions used for
storing trained classifiers, and the ClusterManager
handles the partitions used for storing
generated feature clusters.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |