com.sun.labs.minion
Interface Classifier

All Known Subinterfaces:
SearchEngine
All Known Implementing Classes:
SearchEngineImpl

public interface Classifier

This interface represents the classification features of the search engine. Presumably, SearchEngine will implement this interface (in addition to Searcher which it already implements).


Method Summary
 void classify(java.lang.String[] docKeys, java.lang.String[] classNames)
          Creates a manual assignment of a set of documents to a set of classes.
 java.lang.String[] getClasses()
          Returns the names of the classes for which classifiers are defined.
 ResultSet getTrainingDocuments(java.lang.String className)
          Returns the set of documents that was used to train the classifier for the class with the provided class name.
 void reclassifyIndex(java.lang.String className)
          Causes the engine to reclassify all documents against the classifier for the given class name.
 void trainClass(ResultSet results, java.lang.String className, java.lang.String fieldName)
          Generates a classifier based on the documents in the provided result set.
 void trainClass(ResultSet results, java.lang.String className, java.lang.String fieldName, Progress p)
          Generates a classifier based on the documents in the provided result set.
 void trainClass(ResultSet results, java.lang.String className, java.lang.String fieldName, java.lang.String fromField)
          Generates a classifier based on the documents in the provided result set.
 void trainClass(ResultSet results, java.lang.String className, java.lang.String fieldName, java.lang.String fromField, Progress progress)
          Generates a classifier based on the documents in the provided result set.
 

Method Detail

trainClass

void trainClass(ResultSet results,
                java.lang.String className,
                java.lang.String fieldName)
                throws SearchEngineException
Generates a classifier based on the documents in the provided result set. If the name provided is an existing class, then the existing classifier will be replaced. This method does not affect any documents that have already been indexed.

Parameters:
results - the set of documents to use for training the classifier
className - the name of the class to create or replace
fieldName - the name of the field where the results of the classifier should be stored.
Throws:
SearchEngineException - If there is any error training the classifier

trainClass

void trainClass(ResultSet results,
                java.lang.String className,
                java.lang.String fieldName,
                java.lang.String fromField)
                throws SearchEngineException
Generates a classifier based on the documents in the provided result set. If the name provided is an existing class, then the existing classifier will be replaced. This method does not affect any documents that have already been indexed.

Parameters:
results - the set of documents to use for training the classifier
className - the name of the class to create or replace
fieldName - the name of the field where the results of the classifier should be stored.
fromField - the vectored field from which we should build the classifiers. If this parameter is null then data from all indexed fields will be used. If this parameter is the empty string, then data from the "body" field will be used.
Throws:
SearchEngineException - If there is any error training the classifier

trainClass

void trainClass(ResultSet results,
                java.lang.String className,
                java.lang.String fieldName,
                Progress p)
                throws SearchEngineException
Generates a classifier based on the documents in the provided result set. If the name provided is an existing class, then the existing classifier will be replaced. This method does not affect any documents that have already been indexed.

Parameters:
results - the set of documents to use for training the classifier
className - the name of the class to create or replace
fieldName - the name of the field where the results of the classifier should be stored.
p - a progress monitor that will be notified as training proceeds
Throws:
SearchEngineException - If there is any error training the classifier

trainClass

void trainClass(ResultSet results,
                java.lang.String className,
                java.lang.String fieldName,
                java.lang.String fromField,
                Progress progress)
                throws SearchEngineException
Generates a classifier based on the documents in the provided result set. If the name provided is an existing class, then the existing classifier will be replaced. This method does not affect any documents that have already been indexed.

Parameters:
results - the set of documents to use for training the classifier
className - the name of the class to create or replace
progress - where to send progress events
fieldName - the name of the field where the results of the classifier should be stored.
fromField - the vectored field from which we should build the classifiers. If this parameter is null then data from all indexed fields will be used. If this parameter is the empty string, then data from the "body" field will be used.
Throws:
SearchEngineException - If there is any error training the classifier

reclassifyIndex

void reclassifyIndex(java.lang.String className)
                     throws SearchEngineException
Causes the engine to reclassify all documents against the classifier for the given class name. Upon completion of the classification, a short pause will occur while switching from the old set of classes to the new set (the implementation of this will determine exactly what the characteristics of the switch are). This method is only needed when there are existing indexed documents and there has been a change to the set of classifiers. Since reclassifying will likely be a lengthy process, it is never implicit in any of the other methods. (Side note: Should this be a blocking call? If not, should there be a simple event/callback mechanism to notify a user of progress?)

Parameters:
className - the class to reclassify all documents against
Throws:
SearchEngineException - If there is any error training the classifiers

classify

void classify(java.lang.String[] docKeys,
              java.lang.String[] classNames)
              throws SearchEngineException
Creates a manual assignment of a set of documents to a set of classes. All of the documents will be assigned to all of the classes. Manual assignments are stored independently of the automatic assignment the engine performs while indexing. The documents will also automatically be indexed and classified.

Parameters:
docKeys - the keys of the documents to classify
classNames - the classes to assign the documents to
Throws:
SearchEngineException - if there is any error running the classifiers

getTrainingDocuments

ResultSet getTrainingDocuments(java.lang.String className)
                               throws SearchEngineException
Returns the set of documents that was used to train the classifier for the class with the provided class name. (Note: Depending on how the Document interface shapes up, maybe this method should return Document[] instead of ResultSet?)

Parameters:
className - the name of a class
Returns:
the set of documents that defines the named class
Throws:
SearchEngineException - If there is any error retrieving the training documents

getClasses

java.lang.String[] getClasses()
Returns the names of the classes for which classifiers are defined. If no classes are defined, an empty array is returned.

Returns:
an array of class names