com.sun.labs.minion
Class QueryConfig

java.lang.Object
  extended by com.sun.labs.minion.QueryConfig
All Implemented Interfaces:
com.sun.labs.util.props.Component, com.sun.labs.util.props.Configurable, java.lang.Cloneable

public class QueryConfig
extends java.lang.Object
implements java.lang.Cloneable, com.sun.labs.util.props.Configurable

A class that holds configuration data for querying.


Field Summary
protected  boolean allLowerIsCI
          Whether all lower case terms should be treated case insensitively.
protected  boolean allUpperIsCI
          Whether all upper case terms should be treated case insensitively.
protected  boolean alwaysFindCaseVariants
          Whether we should always find all case variants.
protected  boolean boostPerfectProx
          Whether we should boost perfect proximity scores with term weights.
protected  java.lang.String configName
           
 CollectionStats cs
          Statistics for the collection we're evaluating the query in.
protected  boolean dvln
          Whether we should do document vector length normalization for queries.
protected  SearchEngine e
          The search engine associated with the collection that we're querying.
protected  boolean fieldCross
          Whether proximity queries are allowed to cross field boundaries.
protected  java.util.Map<java.lang.String,java.lang.Float> fieldMultipliers
          The map from field names to multipliers for those fields.
protected  KnowledgeSource knowledgeSource
          The knowledge source to use.
protected static java.lang.String logTag
          The log tag.
protected  long maxDictLookupTime
          The maximum amount of time to spend on a dictionary lookup, in milliseconds.
protected  int maxDictTerms
          The maximum number of terms to retrieve from the dictionary for any one term operator.
protected  long maxQueryTime
          The maximum amount of time to spend on a query, in ms.
protected  java.lang.String[] multFields
          The names of fields to which multipliers have been applied.
protected  float[] multValues
          The multiplier values to use for the multiplied fields.
static java.lang.String PROP_ALL_UPPER_IS_CI
          The property indicating whether a query term provided in all upper case characters should be considered in a case insensitive way (i.e., all case variants will be considered).
static java.lang.String PROP_BOOST_PERFECT_PROXIMITY
          The property indicating whether perfect proximity scores (i.e., when a passage exactly matches the query) should be boosted by the sum of the term weights associated with the document, in order to provide differentiation between multiple documents with perfect passages.
static java.lang.String PROP_DEFAULT_FIELDS
          A property for a list of fields to search by default.
static java.lang.String PROP_FIELD_CROSS
          The property indicating whether we should allow proximity queries to cross field boundaries.
static java.lang.String PROP_FIELD_MULTIPLIERS
          The property for the list of field multipliers to be used during querying.
static java.lang.String PROP_KNOWLEDGE_SOURCE
          The property for a knowledge source to consult during querying.
static java.lang.String PROP_MAX_QUERY_TIME
          The property for the maximum amount of time (in milliseconds) to spend on any query.
static java.lang.String PROP_MAX_TERMS
          The property for the maximum number of terms to return in response to wildcard dictionary lookups when processing queries.
static java.lang.String PROP_MAX_WC_TIME
          The property for the maximum amount of time (in milliseconds) to spend doing wildcard dictionary lookups during querying.
static java.lang.String PROP_PROXIMITY_LIMIT
          The property for the limit of the window size to consider when computing proximity queries.
static java.lang.String PROP_VECTOR_ZERO_WORDS
           
static java.lang.String PROP_WEIGHTING_FUNCTION
          The property naming the class containing the weighting function to use for querying and document similarity.
protected  long proxLimit
          The amount of time to spend on proximity queries, in milliseconds.
protected  java.lang.String sortSpec
          A string representation of the sorting specification to use for sorting results.
protected  StopWords vectorZeroWords
          Words to ignore during document vector creation.
 WeightingComponents wc
          A set of weighting components to use when computing term weights.
 WeightingFunction wf
          The weighting function to use to weight documents for a term.
 
Constructor Summary
QueryConfig()
          Creates a query configuration with the default values.
 
Method Summary
 void addDefaultField(java.lang.String field)
           
 boolean caseSensitive(java.lang.String s)
          Determines whether case should be taken into account when processing the given string.
 java.lang.Object clone()
          Clones the query configuration.
 boolean getBoostPerfectProx()
          Gets whether we want to modify perfect proximity scores by adding in the term weight scores for the documents in order to distinguish perfect hits.
 CollectionStats getCollectionStats()
          Gets the statistics for the collection against which we're evaluating.
 java.util.List<FieldInfo> getDefaultFields()
           
 java.util.Map<java.lang.String,java.lang.Float> getFieldMultipliers()
          Gets a map from field names to field multipliers.
 KnowledgeSource getKnowledgeSource()
          Gets the knowledge source.
 long getMaxDictLookupTime()
          Gets the value of maxWCTime.
 int getMaxDictTerms()
          Gets the value of maxTerms.
 java.lang.String[] getMultFields()
          Gets the names of the fields that have multipliers associated with them.
 float[] getMultValues()
          Gets the multipliers assciated with the fields.
 long getProxLimit()
          Gets the value of proxLimit.
 java.lang.String getSortSpec()
          Gets the sorting specification.
 StopWords getVectorZeroWords()
           
 WeightingComponents getWeightingComponents()
          Gets a set of weighting components that can be used when calculating term weights.
 WeightingComponents getWeightingComponents(CollectionStats cs)
          Gets a set of weighting components that can be used when calculating term weights.
 WeightingFunction getWeightingFunction()
          Gets a weighting function to use for weighting documents.
 void newProperties(com.sun.labs.util.props.PropertySheet ps)
          Creates a query configuration from a property sheet described in an external XML file.
 void removeDefaultField(java.lang.String field)
           
 void setCollectionStats(CollectionStats cs)
          Sets the statistics for the collection against which we're evaluating.
 void setEngine(SearchEngine e)
           
 void setFieldMultipliers(java.util.Map<java.lang.String,java.lang.Float> fieldMultipliers)
          Sets the field multipliers in use.
 void setMaxDictLookupTime(long maxDictLookupTime)
          Sets the maximum time that will be spent on dictionary lookups.
 void setMaxDictTerms(int maxDictTerms)
          Sets the maximum number of terms to retrive from the dictionary.
 void setProxLimit(int proxLimit)
          Sets the value of proxLimit.
 void setSortSpec(java.lang.String sortSpec)
          Sets the sorting specification.
 void setWeightingFunction(WeightingFunction wf)
          Sets the function used for weighting terms in documents.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

configName

protected java.lang.String configName

PROP_MAX_TERMS

@ConfigInteger(defaultValue=100)
public static final java.lang.String PROP_MAX_TERMS
The property for the maximum number of terms to return in response to wildcard dictionary lookups when processing queries.

See Also:
Constant Field Values

PROP_MAX_WC_TIME

@ConfigInteger(defaultValue=1000)
public static final java.lang.String PROP_MAX_WC_TIME
The property for the maximum amount of time (in milliseconds) to spend doing wildcard dictionary lookups during querying. A value less than zero indicates that there should be no limit on the time spent doing wildcard matches in the dictionary.

See Also:
Constant Field Values

PROP_PROXIMITY_LIMIT

@ConfigInteger(defaultValue=4000)
public static final java.lang.String PROP_PROXIMITY_LIMIT
The property for the limit of the window size to consider when computing proximity queries.

See Also:
Constant Field Values

PROP_FIELD_CROSS

@ConfigBoolean(defaultValue=false)
public static final java.lang.String PROP_FIELD_CROSS
The property indicating whether we should allow proximity queries to cross field boundaries.

See Also:
Constant Field Values

PROP_ALL_UPPER_IS_CI

@ConfigBoolean(defaultValue=true)
public static final java.lang.String PROP_ALL_UPPER_IS_CI
The property indicating whether a query term provided in all upper case characters should be considered in a case insensitive way (i.e., all case variants will be considered).

See Also:
Constant Field Values

PROP_MAX_QUERY_TIME

@ConfigInteger(defaultValue=-1)
public static final java.lang.String PROP_MAX_QUERY_TIME
The property for the maximum amount of time (in milliseconds) to spend on any query. A value less than zero indicates that there should be no time limit.

See Also:
Constant Field Values

PROP_WEIGHTING_FUNCTION

@ConfigString(defaultValue="com.sun.labs.minion.retrieval.TFIDF")
public static final java.lang.String PROP_WEIGHTING_FUNCTION
The property naming the class containing the weighting function to use for querying and document similarity. Any class named here must be a subclass of WeightingFunction

See Also:
Constant Field Values

PROP_BOOST_PERFECT_PROXIMITY

@ConfigBoolean(defaultValue=true)
public static final java.lang.String PROP_BOOST_PERFECT_PROXIMITY
The property indicating whether perfect proximity scores (i.e., when a passage exactly matches the query) should be boosted by the sum of the term weights associated with the document, in order to provide differentiation between multiple documents with perfect passages.

See Also:
Constant Field Values

PROP_FIELD_MULTIPLIERS

@ConfigComponentList(type=FieldMultiplier.class)
public static final java.lang.String PROP_FIELD_MULTIPLIERS
The property for the list of field multipliers to be used during querying. Such multipliers can be used (for example) to boost the scores of hits occurring in titles or section headers.

See Also:
Constant Field Values

PROP_KNOWLEDGE_SOURCE

@ConfigComponent(type=KnowledgeSource.class)
public static final java.lang.String PROP_KNOWLEDGE_SOURCE
The property for a knowledge source to consult during querying.

See Also:
Constant Field Values

e

protected SearchEngine e
The search engine associated with the collection that we're querying.


proxLimit

protected long proxLimit
The amount of time to spend on proximity queries, in milliseconds.


maxDictTerms

protected int maxDictTerms
The maximum number of terms to retrieve from the dictionary for any one term operator.


fieldMultipliers

protected java.util.Map<java.lang.String,java.lang.Float> fieldMultipliers
The map from field names to multipliers for those fields.


multFields

protected java.lang.String[] multFields
The names of fields to which multipliers have been applied.


multValues

protected float[] multValues
The multiplier values to use for the multiplied fields. Elements correspond to the elements in multFields.

See Also:
multFields

maxDictLookupTime

protected long maxDictLookupTime
The maximum amount of time to spend on a dictionary lookup, in milliseconds.


alwaysFindCaseVariants

protected boolean alwaysFindCaseVariants
Whether we should always find all case variants.


allUpperIsCI

protected boolean allUpperIsCI
Whether all upper case terms should be treated case insensitively. The default is to do so.


allLowerIsCI

protected boolean allLowerIsCI
Whether all lower case terms should be treated case insensitively. The default is to do so.


fieldCross

protected boolean fieldCross
Whether proximity queries are allowed to cross field boundaries. Default is false.


boostPerfectProx

protected boolean boostPerfectProx
Whether we should boost perfect proximity scores with term weights.


maxQueryTime

protected long maxQueryTime
The maximum amount of time to spend on a query, in ms. Less than zero indicates that there is no maximum.


sortSpec

protected java.lang.String sortSpec
A string representation of the sorting specification to use for sorting results.


wf

public WeightingFunction wf
The weighting function to use to weight documents for a term.


wc

public WeightingComponents wc
A set of weighting components to use when computing term weights.


dvln

protected boolean dvln
Whether we should do document vector length normalization for queries.


vectorZeroWords

protected StopWords vectorZeroWords
Words to ignore during document vector creation.


cs

public CollectionStats cs
Statistics for the collection we're evaluating the query in.


knowledgeSource

protected KnowledgeSource knowledgeSource
The knowledge source to use. Default to english morphological analyzer.


logTag

protected static java.lang.String logTag
The log tag.


PROP_VECTOR_ZERO_WORDS

@ConfigComponent(type=StopWords.class)
public static final java.lang.String PROP_VECTOR_ZERO_WORDS
See Also:
Constant Field Values

PROP_DEFAULT_FIELDS

@ConfigComponentList(type=FieldInfo.class,
                     defaultList={})
public static final java.lang.String PROP_DEFAULT_FIELDS
A property for a list of fields to search by default.

See Also:
Constant Field Values
Constructor Detail

QueryConfig

public QueryConfig()
Creates a query configuration with the default values.

Method Detail

setEngine

public void setEngine(SearchEngine e)

getProxLimit

public long getProxLimit()
Gets the value of proxLimit.

Returns:
the value of proxLimit

setProxLimit

public void setProxLimit(int proxLimit)
Sets the value of proxLimit.

Parameters:
proxLimit - The new value of proxLimit

getMaxDictTerms

public int getMaxDictTerms()
Gets the value of maxTerms.

Returns:
the value of maxTerms

setMaxDictTerms

public void setMaxDictTerms(int maxDictTerms)
Sets the maximum number of terms to retrive from the dictionary.

Parameters:
maxDictTerms - The maximum number of terms.

getMaxDictLookupTime

public long getMaxDictLookupTime()
Gets the value of maxWCTime.

Returns:
the value of maxWCTime

setMaxDictLookupTime

public void setMaxDictLookupTime(long maxDictLookupTime)
Sets the maximum time that will be spent on dictionary lookups. A value of -1 indicates that there is no limit.

Parameters:
maxDictLookupTime - The maximum time to spend on a dictionary lookup.

setFieldMultipliers

public void setFieldMultipliers(java.util.Map<java.lang.String,java.lang.Float> fieldMultipliers)
Sets the field multipliers in use.

Parameters:
fieldMultipliers - A map from field names (as Strings) to field multipliers (as floats). If the map is empty, no multipliers will be applied.

getMultFields

public java.lang.String[] getMultFields()
Gets the names of the fields that have multipliers associated with them.

Returns:
the names of the fields that have mulitpliers associated with them

getMultValues

public float[] getMultValues()
Gets the multipliers assciated with the fields.

Returns:
the multipliers associated with the multiplier fields
See Also:
getMultFields()

getFieldMultipliers

public java.util.Map<java.lang.String,java.lang.Float> getFieldMultipliers()
Gets a map from field names to field multipliers.

Returns:
a map from multiplier fields to mulitpliers for those fields

getWeightingFunction

public WeightingFunction getWeightingFunction()
Gets a weighting function to use for weighting documents.

Returns:
an instance of the weighting function, or null if no weighting function has been defined.

getWeightingComponents

public WeightingComponents getWeightingComponents()
Gets a set of weighting components that can be used when calculating term weights.

Returns:
an instance of the weighting components class, or null if there is some error instantiating the class. The set of weighting components that are returned will have been initialized with the collection-level statistics for this collection.

getWeightingComponents

public WeightingComponents getWeightingComponents(CollectionStats cs)
Gets a set of weighting components that can be used when calculating term weights.

Parameters:
cs - the collection statistics to use when calculating components.
Returns:
an instance of the weighting components class, or null if there is some error instantiating the class. The set of weighting components that are returned will have been initialized with the collection-level statistics for this collection.

setWeightingFunction

public void setWeightingFunction(WeightingFunction wf)
Sets the function used for weighting terms in documents.

Parameters:
wf - the weighting function to use for weighting terms

getVectorZeroWords

public StopWords getVectorZeroWords()

getCollectionStats

public CollectionStats getCollectionStats()
Gets the statistics for the collection against which we're evaluating.

Returns:
the collection level statistics for this engine

caseSensitive

public boolean caseSensitive(java.lang.String s)
Determines whether case should be taken into account when processing the given string. If the index upon which we're operating does not have case sensitive entries, then case will never be taken into account.

Parameters:
s - The string to process.
Returns:
true if the current configuration indicates that the string should be processed in a case sensitive fashion. The decision depends on:
  1. If we're always supposed to find case variants, then we aren't case sensitive.
  2. If the query is all upper case and we want such things to match case insensitively.
  3. If the query is all lower case and we want such things to match case insensitively.
  4. If the string is incapable of being cased, we are case sensitive.

getBoostPerfectProx

public boolean getBoostPerfectProx()
Gets whether we want to modify perfect proximity scores by adding in the term weight scores for the documents in order to distinguish perfect hits.

Returns:
true if perfect proximity scores should be boosted by the term weights for the passages.

setSortSpec

public void setSortSpec(java.lang.String sortSpec)
Sets the sorting specification.

Parameters:
sortSpec - the sorting specification to use for this query

getSortSpec

public java.lang.String getSortSpec()
Gets the sorting specification.

Returns:
the sorting specification to use for this query

setCollectionStats

public void setCollectionStats(CollectionStats cs)
Sets the statistics for the collection against which we're evaluating.

Parameters:
cs - the statistics for the collection

getKnowledgeSource

public KnowledgeSource getKnowledgeSource()
Gets the knowledge source.

Returns:
the knowledge source to use to look up term variants.

clone

public java.lang.Object clone()
Clones the query configuration.

Overrides:
clone in class java.lang.Object

newProperties

public void newProperties(com.sun.labs.util.props.PropertySheet ps)
                   throws com.sun.labs.util.props.PropertyException
Creates a query configuration from a property sheet described in an external XML file.

Specified by:
newProperties in interface com.sun.labs.util.props.Configurable
Parameters:
ps - The properties.
Throws:
com.sun.labs.util.props.PropertyException - if there is any error retrieving the properties.

addDefaultField

public void addDefaultField(java.lang.String field)

removeDefaultField

public void removeDefaultField(java.lang.String field)

getDefaultFields

public java.util.List<FieldInfo> getDefaultFields()