com.sun.labs.minion
Class IndexConfig

java.lang.Object
  extended by com.sun.labs.minion.IndexConfig
All Implemented Interfaces:
com.sun.labs.util.props.Component, com.sun.labs.util.props.Configurable, java.lang.Cloneable

public class IndexConfig
extends java.lang.Object
implements java.lang.Cloneable, com.sun.labs.util.props.Configurable

A class that holds configuration data for indexing documents.


Field Summary
protected  java.lang.String configName
          Name used by configuration manager
protected  FieldInfo defaultField
          The exemplar field info to use when encountering an unknown field name during indexing.
protected  boolean enableFeatureBackoff
          Turns on the ability to back off the number of features to try to make a better classifier.
protected  java.util.Map<java.lang.String,FieldInfo> fieldInfo
          A map from field names to the field information for those names.
protected  java.lang.String indexDir
          The directory that holds the index.
protected  java.lang.String indexName
          The symbolic name of the collection.
protected  int kFoldSplitterNumFolds
          The number of folds that should be made when the k-fold splitter is being used.
protected  Lexicon lexicon
          The lexicon.
protected  java.lang.String lexiconFile
          The file that contains the lexicon
protected static java.lang.String logTag
          A tag that will be used for log entries
static java.lang.String PROP_DEFAULT_FIELD_INFO
          A property that names the default field information to use when encountering an unknown field during indexing.
static java.lang.String PROP_ENABLE_FEATURE_BACKOFF
          The property indicating whether we should attempt to do feature backoff during classification.
static java.lang.String PROP_FIELD_INFO
          The property that contains a list of the names of the field information objects that this index should contain.
static java.lang.String PROP_INDEX_DIRECTORY
          The property for the name of the index directory.
static java.lang.String PROP_INDEX_NAME
          The property for the symbolic name of the index that we're using.
static java.lang.String PROP_KFOLD_SPLITTER_NUMFOLDS
          The property for the number of folds to use when doing k-fold cross validation when building classifiers.
static java.lang.String PROP_LEXICON_LOCATION
          The property that names the location of the lexicon.
static java.lang.String PROP_RANDOM_SPLITTER_NUMSPLITS
          The property for the number of random splits to use when doing validation using a random splitter when building classifiers.
static java.lang.String PROP_STORE_CLASSIFIER_SCORES
          The property indicating whether we should store the scores associated with classifiers when a new document is successfully classified.
static java.lang.String PROP_STORE_NON_CLASSIFIED
           
static java.lang.String PROP_TAXONOMY_ENABLED
          The property indicating for whether the taxonomy should be enabled?
protected  int randomSplitterNumSplits
          The number of random splits that should be made when the random splitter is being used.
protected  boolean storeClassifierScores
          Whether to store classifier scores in per-class saved fields.
protected  boolean storeNonClassified
          Whether to store the results of failed classifications.
protected  boolean taxonomyEnabled
          Flag to indicate if to enable the taxonomy Independent of whether a lexicon is specified
 
Constructor Summary
IndexConfig()
          Default constructor.
IndexConfig(java.lang.String indexDir)
          Creates an index configuration for a given directory, using all of the default values.
 
Method Summary
 FieldInfo getDefaultFieldInfo(java.lang.String name)
          Gets the field information to use for an unknown field.
 boolean getDoFeatureBackoff()
           
 java.util.Map<java.lang.String,FieldInfo> getFieldInfo()
          Gets the map from field names to field information objects.
 java.lang.String getIndexDirectory()
          Gets the index directory.
 java.lang.String getIndexName()
          Gets the name of the index.
 int getKFoldSplitterNumFolds()
           
 Lexicon getLexicon()
           
 java.lang.String getLexiconFile()
           
 java.lang.String getName()
           
 int getRandomSplitterNumSplits()
           
 void newProperties(com.sun.labs.util.props.PropertySheet ps)
          Creates an indexing configuration from a property sheet described in an external XML file.
 void setConfigurationManager(com.sun.labs.util.props.ConfigurationManager cm)
          Set the configuration manager if you'd like the other mutator methods in this class to make changes to the actual configuration (so that the changes will be made available for saving)
 void setDefaultFieldInfo(FieldInfo fieldInfo)
          Sets the field information to use when encountering unknown fields during indexing.
 boolean storeClassifierScores()
           
 boolean storeNonClassified()
           
 boolean taxonomyEnabled()
          Should we we using a taxonomy?
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

configName

protected java.lang.String configName
Name used by configuration manager


PROP_INDEX_DIRECTORY

@ConfigString
public static final java.lang.String PROP_INDEX_DIRECTORY
The property for the name of the index directory. This will typically be set to the value of the global index directory property.

See Also:
Constant Field Values

PROP_INDEX_NAME

@ConfigString(mandatory=false)
public static final java.lang.String PROP_INDEX_NAME
The property for the symbolic name of the index that we're using. This property can be set and used by an application that wishes to display a user-friendly name.

See Also:
Constant Field Values

PROP_RANDOM_SPLITTER_NUMSPLITS

@ConfigInteger(defaultValue=10)
public static final java.lang.String PROP_RANDOM_SPLITTER_NUMSPLITS
The property for the number of random splits to use when doing validation using a random splitter when building classifiers.

See Also:
Constant Field Values

PROP_KFOLD_SPLITTER_NUMFOLDS

@ConfigInteger(defaultValue=10)
public static final java.lang.String PROP_KFOLD_SPLITTER_NUMFOLDS
The property for the number of folds to use when doing k-fold cross validation when building classifiers.

See Also:
Constant Field Values

PROP_ENABLE_FEATURE_BACKOFF

@ConfigBoolean(defaultValue=true)
public static final java.lang.String PROP_ENABLE_FEATURE_BACKOFF
The property indicating whether we should attempt to do feature backoff during classification. If this property is true, then the system will attempt to reduce the number of features that will be used to build the classifiers to see if doing that improves the classification performance on the test data.

See Also:
Constant Field Values

PROP_STORE_CLASSIFIER_SCORES

@ConfigBoolean(defaultValue=false)
public static final java.lang.String PROP_STORE_CLASSIFIER_SCORES
The property indicating whether we should store the scores associated with classifiers when a new document is successfully classified. An application can use the stored classifier scores to show the user how likely it is that a given document belongs to a given class of documents.

See Also:
Constant Field Values

PROP_STORE_NON_CLASSIFIED

@ConfigBoolean(defaultValue=false)
public static final java.lang.String PROP_STORE_NON_CLASSIFIED
See Also:
Constant Field Values

PROP_FIELD_INFO

@ConfigComponentList(type=FieldInfo.class)
public static final java.lang.String PROP_FIELD_INFO
The property that contains a list of the names of the field information objects that this index should contain.

See Also:
Constant Field Values

PROP_LEXICON_LOCATION

@ConfigString(mandatory=false)
public static final java.lang.String PROP_LEXICON_LOCATION
The property that names the location of the lexicon.

See Also:
Constant Field Values

PROP_TAXONOMY_ENABLED

@ConfigBoolean(defaultValue=false)
public static final java.lang.String PROP_TAXONOMY_ENABLED
The property indicating for whether the taxonomy should be enabled?

See Also:
Constant Field Values

PROP_DEFAULT_FIELD_INFO

@ConfigComponent(type=FieldInfo.class)
public static final java.lang.String PROP_DEFAULT_FIELD_INFO
A property that names the default field information to use when encountering an unknown field during indexing.

See Also:
Constant Field Values

indexDir

protected java.lang.String indexDir
The directory that holds the index.


indexName

protected java.lang.String indexName
The symbolic name of the collection.


fieldInfo

protected java.util.Map<java.lang.String,FieldInfo> fieldInfo
A map from field names to the field information for those names.


defaultField

protected FieldInfo defaultField
The exemplar field info to use when encountering an unknown field name during indexing.


lexicon

protected Lexicon lexicon
The lexicon.


lexiconFile

protected java.lang.String lexiconFile
The file that contains the lexicon


randomSplitterNumSplits

protected int randomSplitterNumSplits
The number of random splits that should be made when the random splitter is being used.


kFoldSplitterNumFolds

protected int kFoldSplitterNumFolds
The number of folds that should be made when the k-fold splitter is being used.


enableFeatureBackoff

protected boolean enableFeatureBackoff
Turns on the ability to back off the number of features to try to make a better classifier.


storeClassifierScores

protected boolean storeClassifierScores
Whether to store classifier scores in per-class saved fields. Only use this if you define the fields as saved (and not indexed) in an instance of the index config. This is really only for the demo.


storeNonClassified

protected boolean storeNonClassified
Whether to store the results of failed classifications. This can be used to determine why things were not classified.


taxonomyEnabled

protected boolean taxonomyEnabled
Flag to indicate if to enable the taxonomy Independent of whether a lexicon is specified


logTag

protected static java.lang.String logTag
A tag that will be used for log entries

Constructor Detail

IndexConfig

public IndexConfig()
Default constructor. Used by configuration manager


IndexConfig

public IndexConfig(java.lang.String indexDir)
Creates an index configuration for a given directory, using all of the default values.

Parameters:
indexDir - the directory where the index will be
Method Detail

setConfigurationManager

public void setConfigurationManager(com.sun.labs.util.props.ConfigurationManager cm)
Set the configuration manager if you'd like the other mutator methods in this class to make changes to the actual configuration (so that the changes will be made available for saving)

Parameters:
cm - the configuration manager that created this index config

getIndexDirectory

public java.lang.String getIndexDirectory()
Gets the index directory.

Returns:
the directory containing the index for this search engine

getIndexName

public java.lang.String getIndexName()
Gets the name of the index.

Returns:
the symbolic name for the index, or the name of the directory containing the index if no symbolic name has been assigned

getFieldInfo

public java.util.Map<java.lang.String,FieldInfo> getFieldInfo()
Gets the map from field names to field information objects.

Returns:
the map from field names to field information objects

getLexiconFile

public java.lang.String getLexiconFile()

getRandomSplitterNumSplits

public int getRandomSplitterNumSplits()

getKFoldSplitterNumFolds

public int getKFoldSplitterNumFolds()

getDoFeatureBackoff

public boolean getDoFeatureBackoff()

storeClassifierScores

public boolean storeClassifierScores()

storeNonClassified

public boolean storeNonClassified()

newProperties

public void newProperties(com.sun.labs.util.props.PropertySheet ps)
                   throws com.sun.labs.util.props.PropertyException
Creates an indexing configuration from a property sheet described in an external XML file. A description of each of the properties follows.

Specified by:
newProperties in interface com.sun.labs.util.props.Configurable
Parameters:
ps - the property sheet containing the properties.
Throws:
com.sun.labs.util.props.PropertyException - if there is any error processing the provided properties

getLexicon

public Lexicon getLexicon()

getName

public java.lang.String getName()

taxonomyEnabled

public boolean taxonomyEnabled()
Should we we using a taxonomy?

Returns:
a boolean indicating if a taxonomy should be created.

setDefaultFieldInfo

public void setDefaultFieldInfo(FieldInfo fieldInfo)
Sets the field information to use when encountering unknown fields during indexing.

Parameters:
fieldInfo - an exemplar field information object containing the attributes and type to use when encountering unknown fields during indexing.

getDefaultFieldInfo

public FieldInfo getDefaultFieldInfo(java.lang.String name)
Gets the field information to use for an unknown field.

Returns:
the field information for the unknown field.