com.sun.labs.minion
Interface SimpleIndexer

All Known Implementing Classes:
HLPipelineImpl, SyncPipelineImpl

public interface SimpleIndexer

An interface that allows one fairly straightforward access to the indexing API. An implementer of this class is meant to be used by a single thread only. Using an implementation of this class with multiple threads will lead to unexpected results and will likely produce lots of exceptions.

A typical use of a simple indexer will be something like:

 SimpleIndexer indexer = engine.getSimpleIndexer();
 while(!done) {
    indexer.startDocument(key);
    indexer.addField(name, val);
    indexer.addTerm(t1, c1);
    indexer.addField(name2, val2);
    indexer.addTerm(t2, c2);
    ...
    indexer.endDocument();
 }
 indexer.finish();
 

Note that you must call finish when you are done using the simple indexer. If you do not, then some of your data may not be saved into the index.

Once you have called finish, you cannot use the simple indexer to do any more indexing. Any attempt to do so will result in an IllegalStateException being thrown.

See Also:
SearchEngine, Document

Method Summary
 void addField(java.lang.String name, java.util.Collection<java.lang.Object> values)
          Adds a number of values to a particular field in the current document.
 void addField(java.lang.String name, java.util.Date value)
          Adds a date field value to the current document.
 void addField(java.lang.String name, java.lang.Double value)
          Adds a double field value to the current document.
 void addField(java.lang.String name, java.lang.Float value)
          Adds a float field value to the current document.
 void addField(java.lang.String name, IndexableString value)
          Adds an indexable string field value to the current document.
 void addField(java.lang.String name, java.lang.Integer value)
          Adds an integer field value to the current document.
 void addField(java.lang.String name, java.lang.Long value)
          Adds a long field value to the current document.
 void addField(java.lang.String name, java.lang.Object[] values)
          Adds a number of values to a particular field in the current document.
 void addField(java.lang.String name, java.lang.String value)
          Adds a string field value to the current document.
 void addTerm(java.lang.String term)
          Adds a term to the current document.
 void addTerm(java.lang.String term, int count)
          Adds a term to the current document.
 void addTerm(java.lang.String field, java.lang.String term, int count)
          Adds a term to the given field in the current document.
 void endDocument()
          Ends the current document.
 void finish()
          Finishes indexing.
 void indexDocument(Document doc)
          Indexes a whole document at once.
 void indexDocument(Indexable doc)
          Indexes a whole document at once.
 boolean isIndexed(java.lang.String key)
          Indicates whether a document has been indexed or not.
 void startDocument(java.lang.String key)
          Starts the indexing of a document.
 

Method Detail

indexDocument

void indexDocument(Indexable doc)
                   throws SearchEngineException
Indexes a whole document at once. You must not do this in the middle of another document!

Parameters:
doc - a document to index.
Throws:
SearchEngineException - if there is any error indexing the document
See Also:
SearchEngine.index(Indexable)

indexDocument

void indexDocument(Document doc)
                   throws SearchEngineException
Indexes a whole document at once. You must not do this in the middle of another document!

Parameters:
doc - a document to index.
Throws:
SearchEngineException - if there is any error indexing the document
See Also:
Document, SearchEngine.index(Document)

startDocument

void startDocument(java.lang.String key)
Starts the indexing of a document.

Parameters:
key - the document key for the document. If this is a duplicate, then any old data associated with the document will be removed.
Throws:
java.lang.NullPointerException - if the document key is null
java.lang.IllegalStateException - if this method is called while we are already indexing another document.

addField

void addField(java.lang.String name,
              java.lang.String value)
Adds a string field value to the current document.

Parameters:
name - the name of the field to which we want to add a value. If the named field is not currently defined, then the way that the field is treated depends on the engine configuration. By default, the field will be indexed, tokenized, and vectored. If the field name is null, then the value will be treated as part of the implicit document body and will therefore be indexed, tokenized, and vectored.
value - the value of the field. The value will be appropriately tokenized, indexed and vectored if the named field has those attributes. If the field is a saved field, then the engine will attempt to process the field according to the type of the saved field. If the field is a string saved field, this is straightforward. If the field is an integer or float saved field, then the engine will attempt to parse a numeric value of the appropriate type from the string value. If the field is a date saved field, then the engine will attempt to parse a date out of the string value using a number of different date formats.
Throws:
java.lang.IllegalStateException - if this method is called without having called startDocument
See Also:
IndexConfig, startDocument(java.lang.String)

addField

void addField(java.lang.String name,
              IndexableString value)
Adds an indexable string field value to the current document.

Parameters:
name - the name of the field to which we want to add a value. If the named field is not currently defined, then the way that the field is treated depends on the engine configuration. By default, the field will be indexed, tokenized, and vectored. If the field name is null, then the value will be treated as part of the implicit document body and will therefore be indexed, tokenized, and vectored.
value - the value of the field, which may contain some markup. Note that this may cause additional fields to be added to the document, depending on what the markup analyzer for the provided string will do.
Throws:
java.lang.IllegalStateException - if this method is called without having called startDocument
See Also:
IndexConfig, startDocument(java.lang.String)

addField

void addField(java.lang.String name,
              java.util.Date value)
Adds a date field value to the current document.

Parameters:
name - the name of the field to which we want to add a value. If the named field is not currently defined, then the way that the field is treated depends on the engine configuration. By default, the field will be indexed, tokenized, and vectored. If the field name is null, then the value will be treated as part of the implicit document body and will therefore be indexed, tokenized, and vectored.
value - the value of the field. If the named field is not a date saved field, then a warning will be issued. Note that this warning may not be seen unless you have set the logging level high enough for your indexer. If the named field is a date saved field, then this value will be stored so that it may be retrieved later.
Throws:
java.lang.IllegalStateException - if this method is called without having called startDocument
See Also:
IndexConfig, startDocument(java.lang.String)

addField

void addField(java.lang.String name,
              java.lang.Long value)
Adds a long field value to the current document.

Parameters:
name - the name of the field to which we want to add a value. If the named field is not currently defined, then the way that the field is treated depends on the engine configuration. By default, the field will be indexed, tokenized, and vectored. If the field name is null, then the value will be treated as part of the implicit document body and will therefore be indexed, tokenized, and vectored.
value - the value of the field. If the named field is an integer saved field then the value will be stored for later retrieval. If the named field is a date saved field, then this value will be treated as a number of milliseconds since the epoch. If the named field is a string saved field, then a string representation of the number (the representation provided by Long.toString()) will be saved in the index for later retrieval. Note that such a saved value will most likely not be suitable for sorting by numerical order!
Throws:
java.lang.IllegalStateException - if this method is called without having called startDocument
See Also:
IndexConfig, startDocument(java.lang.String)

addField

void addField(java.lang.String name,
              java.lang.Integer value)
Adds an integer field value to the current document.

Parameters:
name - the name of the field to which we want to add a value. If the named field is not currently defined, then the way that the field is treated depends on the engine configuration. By default, the field will be indexed, tokenized, and vectored. If the field name is null, then the value will be treated as part of the implicit document body and will therefore be indexed, tokenized, and vectored.
value - the value of the field. If the named field is an integer saved field then the value will be stored for later retrieval. If the named field is a date saved field, then this value will be treated as the number of seconds since the epoch. If the named field is a string saved field, then a string representation of the number (the representation provided by Integer.toString()) will be saved in the index for later retrieval. Note that such a saved value will most likely not be suitable for sorting by numerical order!
Throws:
java.lang.IllegalStateException - if this method is called without having called startDocument
See Also:
IndexConfig, startDocument(java.lang.String)

addField

void addField(java.lang.String name,
              java.lang.Double value)
Adds a double field value to the current document.

Parameters:
name - the name of the field to which we want to add a value. If the named field is not currently defined, then the way that the field is treated depends on the engine configuration. By default, the field will be indexed, tokenized, and vectored. If the field name is null, then the value will be treated as part of the implicit document body and will therefore be indexed, tokenized, and vectored.
value - the value of the field. If the named field is an float saved field then the value will be stored for later retrieval. If the named field is a string saved field, then a string representation of the number (the representation provided by Double.toString()) will be saved in the index for later retrieval. Note that such a saved value will most likely not be suitable for sorting by numerical order!
Throws:
java.lang.IllegalStateException - if this method is called without having called startDocument
See Also:
IndexConfig, startDocument(java.lang.String)

addField

void addField(java.lang.String name,
              java.lang.Float value)
Adds a float field value to the current document.

Parameters:
name - the name of the field to which we want to add a value. If the named field is not currently defined, then the way that the field is treated depends on the engine configuration. By default, the field will be indexed, tokenized, and vectored. If the field name is null, then the value will be treated as part of the implicit document body and will therefore be indexed, tokenized, and vectored.
value - the value of the field. If the named field is an float saved field then the value will be stored for later retrieval. If the named field is a string saved field, then a string representation of the number (the representation provided by Double.toString()) will be saved in the index for later retrieval. Note that such a saved value will most likely not be suitable for sorting by numerical order!
Throws:
java.lang.IllegalStateException - if this method is called without having called startDocument
See Also:
IndexConfig, startDocument(java.lang.String)

addField

void addField(java.lang.String name,
              java.lang.Object[] values)
Adds a number of values to a particular field in the current document.

Parameters:
name - the name of the field to which we want to add values. If the named field is not currently defined, then the way that the field is treated depends on the engine configuration. By default, the field will be indexed, tokenized, and vectored. If the field name is null, then the value will be treated as part of the implicit document body and will therefore be indexed, tokenized, and vectored.
values - the values to add to the field. The disposition of the elements of the array is in accordance with the single-value instances of the addField method. If the array contains a type that is not specified in one those methods, then we will call the toString method on the object and index the results.
Throws:
java.lang.IllegalStateException - if this method is called without having called startDocument
See Also:
IndexConfig, startDocument(java.lang.String)

addField

void addField(java.lang.String name,
              java.util.Collection<java.lang.Object> values)
Adds a number of values to a particular field in the current document.

Parameters:
name - the name of the field to which we want to add values. If the named field is not currently defined, then the way that the field is treated depends on the engine configuration. By default, the field will be indexed, tokenized, and vectored. If the field name is null, then the value will be treated as part of the implicit document body and will therefore be indexed, tokenized, and vectored.
values - the values to add to the field. The disposition of the elements of the collection is in accordance with the single-value instances of the addField method. If the array contains a type that is not specified in one those methods, then we will call the toString method on the object and index the results.
Throws:
java.lang.IllegalStateException - if this method is called without having called startDocument
See Also:
IndexConfig, startDocument(java.lang.String)

addTerm

void addTerm(java.lang.String term)
Adds a term to the current document. The term is added to the implicit body field of the document.

Parameters:
term - the term
Throws:
java.lang.IllegalStateException - if this method is called without having called startDocument
See Also:
addTerm(String,int), startDocument(java.lang.String)

addTerm

void addTerm(java.lang.String term,
             int count)
Adds a term to the current document. The term is added to the implicit body field of the document.

Parameters:
term - the term
count - the number of times that the term occurs in the body of the document
Throws:
java.lang.IllegalStateException - if this method is called without having called startDocument
See Also:
addTerm(String), startDocument(java.lang.String)

addTerm

void addTerm(java.lang.String field,
             java.lang.String term,
             int count)
Adds a term to the given field in the current document. This field must have the indexed attribute and may specify the vectored attribute, if you would like to use this field to do document similarity calculations.

Note that if the named field is a saved field, this method will not cause the data to be saved into the index.

If the given field names an undefined field, a field will be defined using the default attributes and type.

Parameters:
field - the field to which we want to add the term.
term - the term
count - the number of times that the term occurs in the document
Throws:
java.lang.IllegalStateException - if this method is called without having called startDocument
java.lang.IllegalArgumentException - if an attempt is made to add terms to a field that does not have the indexed attribute
See Also:
startDocument(java.lang.String)

endDocument

void endDocument()
Ends the current document. This may cause some index data to be written to disk.

Throws:
java.lang.IllegalStateException - if this method is called without having called startDocument
See Also:
startDocument(java.lang.String)

finish

void finish()
Finishes indexing. You must call this method when you have finished any indexing that you wish to do. If you do not, some of your data may not be saved to the index. Once this method has been called, any further attempt to index data using the simple indexer will result in an IllegalStateException.


isIndexed

boolean isIndexed(java.lang.String key)
Indicates whether a document has been indexed or not. A document has been indexed if its document key is in the index and the document has not been deleted.

Parameters:
key - the document key for the document that we wish to check.
Returns:
true if the document is in the index, false otherwise. A document has been indexed if its document key is in the index and the document has not been deleted.