com.sun.labs.minion
Class FieldInfo

java.lang.Object
  extended by com.sun.labs.minion.FieldInfo
All Implemented Interfaces:
com.sun.labs.util.props.Component, com.sun.labs.util.props.Configurable, java.lang.Cloneable

public class FieldInfo
extends java.lang.Object
implements java.lang.Cloneable, com.sun.labs.util.props.Configurable

A class that can be used to tell the indexer what to do with the data contained in a field.

Fields can be defined in the configuration file used to create a search engine or via the SearchEngine.defineField(com.sun.labs.minion.FieldInfo) method.

Each field has a name, which is a string. Note that field names are case insensitive. This means, for example, that title, Title, and TITLE will be considered the same.

The disposition of the field data by the indexer is controlled by the attributes that are assigned to the field. The following attributes are defined by the FieldInfo.Attribute enumeration:

TOKENIZED
The field value should be tokenized. If this attribute is set, then the data in the field will be tokenized according to the rules of whatever tokenizer is currently being used. Typically, this means that the data in the field will be broken into tokens at spaces and punctuation.
INDEXED
Any terms in the field (whether the field is tokenized or un-tokenized) will have entries added to the main dictionary and postings data added to the postings file for that dictionary. Fields that specify this attribute can be specified in queries that use the <contains> operator. For example, if the title field has the INDEXED attribute, then the query:
title <contains> java
will return those documents that have the term java in the title field.
VECTORED
This attribute indicates that terms extracted from the field value should be added to a document vector specific to the field, as well as to the overall document vector for this document. Specifying this attribute allows applications to perform classification or document similarity computations against just this field. So, for example, you could find a set of documents that have titles similar to a given document's title.
TRIMMED
This attribute indicates that field values passed into the indexer should have any leading or trailing spaces trimmed from the values before they are processed any further.
CASE_SENSITIVE
This attribute indicates that a given saved string field should be treated in a case sensitive manner. If a saved field has the case sensitive attribute set, then relational queries against that field must match the case of the values stored in the field.
SAVED
This attribute indicates that the value for the field should be stored in the index exactly as it provided. Values that are in saved fields are available for parametric searches (e.g., price < 10) and for results sorting. If SAVED is specified as an attribute, then one of the following types must be specified:
INTEGER
Some quantity storeable in a 64 bit integer.
FLOAT
Some quantity storeable in a 64 bit double.
DATE
The field value is a date, given in some text representation. This date will be parsed and then stored in a Java long as the number of milliseconds since the epoch (00:00:00 GMT, January 1, 1970).
STRING
A text field that consists of a variable number of characters. The default variable width field is the empty string.

The attributes of the FieldInfo object can be set by providing an EnumSet to the constructor, or by using the setAttribute method.


Nested Class Summary
static class FieldInfo.Attribute
          The various attributes that a field can have.
static class FieldInfo.Type
          The types that a saved field can have.
 
Field Summary
static java.lang.String logTag
           
static java.lang.String PROP_INDEXED
          The property name for the indexed attribute.
static java.lang.String PROP_SAVED
          The property name for the saved attribute.
static java.lang.String PROP_TOKENIZED
          The property name for the tokenized attribute.
static java.lang.String PROP_TRIMMED
          The property name for the trimmed attribute.
static java.lang.String PROP_TYPE
          The property name for the type.
static java.lang.String PROP_VECTORED
          The property name for the vectored attribute.
 
Constructor Summary
FieldInfo()
           
FieldInfo(int id, java.lang.String name)
          Constructs a FieldInfo instance for the given field name.
FieldInfo(int id, java.lang.String name, java.util.EnumSet<FieldInfo.Attribute> attributes, FieldInfo.Type type)
          Constructs a FieldInfo object with the given attributes and sub-attribute.
FieldInfo(java.lang.String name)
          Constructs a FieldInfo instance for the given field name.
FieldInfo(java.lang.String name, java.util.EnumSet<FieldInfo.Attribute> attributes)
          Constructs a FieldInfo object with the given attributes and type.
FieldInfo(java.lang.String name, java.util.EnumSet<FieldInfo.Attribute> attributes, FieldInfo.Type type)
          Constructs a FieldInfo object with the given attributes and type.
 
Method Summary
 FieldInfo addAttribute(FieldInfo.Attribute attr)
          Adds an attribute to the field.
 FieldInfo clone()
           
 java.util.EnumSet<FieldInfo.Attribute> getAttributes()
          Gets the attributes associated with this field.
 java.lang.Object getDefaultSavedValue()
          Gets the default saved value for a field of this type.
 int getID()
          Gets the numeric id of this field.
static java.util.EnumSet<FieldInfo.Attribute> getIndexedAttributes()
          Gets a set of the typical attributes for an indexed field.
 java.lang.String getName()
          Gets the field's name.
 FieldInfo.Type getType()
          Gets the field type.
 boolean isCaseSensitive()
          Tells whether this field is meant to be stored in a case sensitive fashion.
 boolean isIndexed()
          Indicates whether the field is indexed or not.
 boolean isSaved()
          Tells whether the field is saved or not.
 boolean isTokenized()
          Tells whether the field is tokenized or not.
 boolean isTrimmed()
          Tells whether a string saved field should have it's values trimmed of spaces before the values are stored in the index.
 boolean isVectored()
          Tells whether this field should have tokens added to the document vector or not.
 void newProperties(com.sun.labs.util.props.PropertySheet ps)
          Sets the attributes and type of this field from a provided property sheet.
 void read(java.io.DataInput in)
          Reads a filed information object from the provided input.
 FieldInfo removeAttribute(FieldInfo.Attribute attr)
          Removes an attribute from the field.
 void setAttributes(java.util.EnumSet<FieldInfo.Attribute> attributes)
          Sets the attributes associated with the field.
 java.lang.String toString()
           
 void write(java.io.DataOutput out)
          Writes this field information object to a data output.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PROP_TYPE

@ConfigEnum(type=FieldInfo.Type.class,
            defaultValue="NONE")
public static final java.lang.String PROP_TYPE
The property name for the type.

See Also:
Constant Field Values

PROP_VECTORED

@ConfigBoolean(defaultValue=false)
public static final java.lang.String PROP_VECTORED
The property name for the vectored attribute.

See Also:
Constant Field Values

PROP_TOKENIZED

@ConfigBoolean(defaultValue=false)
public static final java.lang.String PROP_TOKENIZED
The property name for the tokenized attribute.

See Also:
Constant Field Values

PROP_INDEXED

@ConfigBoolean(defaultValue=false)
public static final java.lang.String PROP_INDEXED
The property name for the indexed attribute.

See Also:
Constant Field Values

PROP_SAVED

@ConfigBoolean(defaultValue=false)
public static final java.lang.String PROP_SAVED
The property name for the saved attribute.

See Also:
Constant Field Values

PROP_TRIMMED

@ConfigBoolean(defaultValue=false)
public static final java.lang.String PROP_TRIMMED
The property name for the trimmed attribute.

See Also:
Constant Field Values

logTag

public static final java.lang.String logTag
See Also:
Constant Field Values
Constructor Detail

FieldInfo

public FieldInfo()

FieldInfo

public FieldInfo(java.lang.String name)
Constructs a FieldInfo instance for the given field name.

Parameters:
name - The name of the field.

FieldInfo

public FieldInfo(int id,
                 java.lang.String name)
Constructs a FieldInfo instance for the given field name.

Parameters:
id - the ID of the field
name - The name of the field.

FieldInfo

public FieldInfo(java.lang.String name,
                 java.util.EnumSet<FieldInfo.Attribute> attributes)
Constructs a FieldInfo object with the given attributes and type.

Parameters:
name - The name of the field.
attributes - A set of field attributes. Note that we take a copy of this set, so it is safe to modify or reuse the attributes.

FieldInfo

public FieldInfo(java.lang.String name,
                 java.util.EnumSet<FieldInfo.Attribute> attributes,
                 FieldInfo.Type type)
Constructs a FieldInfo object with the given attributes and type.

Parameters:
name - The name of the field.
attributes - A set of field attributes. Note that we take a copy of this set, so it is safe to modify or reuse the attributes.
type - A type that will be used if the attributes indicate that the field is saved.

FieldInfo

public FieldInfo(int id,
                 java.lang.String name,
                 java.util.EnumSet<FieldInfo.Attribute> attributes,
                 FieldInfo.Type type)
Constructs a FieldInfo object with the given attributes and sub-attribute.

Parameters:
id - the ID to assign to this field.
name - The name of the field.
attributes - A set of field attributes. Note that we take a copy of this set, so it is safe to modify or reuse the attributes.
type - A type that will be used if the attributes for the field includes the SAVED attribute.
Throws:
java.lang.IllegalArgumentException - if the attributes of the field specify that a field is saved and the type of the field is null or Type.NONE, or if a type other than Type.NONE is specified and the attributes do not contain the SAVED attribute.
Method Detail

clone

public FieldInfo clone()
Overrides:
clone in class java.lang.Object

getName

public java.lang.String getName()
Gets the field's name.

Returns:
the name of the field. Note that field names are case insensitive.

addAttribute

public FieldInfo addAttribute(FieldInfo.Attribute attr)
Adds an attribute to the field.

Parameters:
attr - the attribute value to set.
Returns:
the current field information object, allowing chained invocations.

removeAttribute

public FieldInfo removeAttribute(FieldInfo.Attribute attr)
Removes an attribute from the field.

Parameters:
attr - the attribute value to set.
Returns:
the current field information object, allowing chained invocations.

setAttributes

public void setAttributes(java.util.EnumSet<FieldInfo.Attribute> attributes)
Sets the attributes associated with the field. The provided attributes will replace whatever attributes are currently associated with the field.

Parameters:
attributes - the attribute value to set.

getAttributes

public java.util.EnumSet<FieldInfo.Attribute> getAttributes()
Gets the attributes associated with this field.

Returns:
the set of attributes that this field has.

getType

public FieldInfo.Type getType()
Gets the field type.

Returns:
the type of this field. If this field does not have the SAVED attribute, then the type NONE is returned.

getDefaultSavedValue

public java.lang.Object getDefaultSavedValue()
Gets the default saved value for a field of this type.

Returns:
a default value for the saved type of this field. For numeric fields (including DATE fields), 0 is returned. For string fields, the empty string is returned. If this field is not a saved field, null will be returned.

getID

public int getID()
Gets the numeric id of this field.

Returns:
the id

isIndexed

public boolean isIndexed()
Indicates whether the field is indexed or not.

Returns:
true if this field has the indexed attribute, false otherwise.

isTokenized

public boolean isTokenized()
Tells whether the field is tokenized or not.

Returns:
true if this field has the tokenized attribute, false otherwise.

isSaved

public boolean isSaved()
Tells whether the field is saved or not.

Returns:
true if this field has the saved attribute, false otherwise.

isVectored

public boolean isVectored()
Tells whether this field should have tokens added to the document vector or not.

Returns:
true if this field has the vectored attribute, false otherwise.

isTrimmed

public boolean isTrimmed()
Tells whether a string saved field should have it's values trimmed of spaces before the values are stored in the index.

Returns:
true if the field values should be trimmed, false otherwise.

isCaseSensitive

public boolean isCaseSensitive()
Tells whether this field is meant to be stored in a case sensitive fashion. This attribute only makes sense for fields of type STRING. If a saved field has the case sensitive attribute set, then relational queries against that field must match the case of the values stored in the field.

Returns:
true if this field is case sensitive, false otherwise

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

newProperties

public void newProperties(com.sun.labs.util.props.PropertySheet ps)
                   throws com.sun.labs.util.props.PropertyException
Sets the attributes and type of this field from a provided property sheet.

Specified by:
newProperties in interface com.sun.labs.util.props.Configurable
Parameters:
ps - a property sheet for this field
Throws:
com.sun.labs.util.props.PropertyException - if there is any error processing the properties
See Also:
Configurable.newProperties(com.sun.labs.util.props.PropertySheet)

write

public void write(java.io.DataOutput out)
           throws java.io.IOException
Writes this field information object to a data output.

Parameters:
out - the output where we will write the object
Throws:
java.io.IOException - if there is any error writing the information

read

public void read(java.io.DataInput in)
          throws java.io.IOException
Reads a filed information object from the provided input.

Parameters:
in - the input from which we will read the field information
Throws:
java.io.IOException - if there is any error reading the field information

getIndexedAttributes

public static java.util.EnumSet<FieldInfo.Attribute> getIndexedAttributes()
Gets a set of the typical attributes for an indexed field.

Returns:
a set of attributes containing the INDEXED, TOKENIZED, and VECTORED attributes