com.sun.labs.minion
Interface PipelineStage

All Known Subinterfaces:
Stage
All Known Implementing Classes:
BlurbStage, DropNumbersStage, Dropper, HighlightStage, InvFileMemoryPartition, JCCTokenizer, LowerCaseStage, PrintStage, PrintTokenStage, QuestioningStage, ReplacementStage, StageAdapter, StatStage, StemStage, StopWordsStage, TokenCollectorStage, Tokenizer, UniversalTokenizer

public interface PipelineStage

The interface to a stage in the indexing pipeline. When using a CustomAnalyzer for indexing, these methods are available to control the construction of a document. Generally speaking, your analyzer is called once for each field and the pipeline is ready to receive the text contained in that field when the text(char[], int, int) method is called. However, if your analzyer determines that another field value is encountered during processing this field, it may use startField(com.sun.labs.minion.FieldInfo) to effectively push a new current field onto the stack. Any text passed to the stage will then be considered to be part of the new field. Calling endField(com.sun.labs.minion.FieldInfo) will pop the new field off the stack and any further text will be considered to be part of the original field for which the analyzer was invoked.


Method Summary
 void endField(FieldInfo field)
          Instructs the pipeline to stop collecting data for a field
 void savedData(java.lang.Object sd)
          Saves some data verbatim in the field store.
 void startField(FieldInfo field)
          Instructs the pipeline to begin collecting data for a different field
 void text(char[] t, int b, int e)
          Sends some text to be indexed as part of the field.
 

Method Detail

startField

void startField(FieldInfo field)
Instructs the pipeline to begin collecting data for a different field

Parameters:
field - the object describing the field to start. The field must already be defined in the index configuration.

text

void text(char[] t,
          int b,
          int e)
Sends some text to be indexed as part of the field. The text will also be tokenzied if the field's tokenized property is true in the index configuration. Keep in mind that text queries are tokenized before execution, so even if the text has been partially tokenized by a CustomAnalyzer, it may still be worthwhile to allow it to be tokenized by this method. This method will accumulate all the text that is entered with each call until an endField is called (either by a CustomAnalyzer that called startField or by the Minion Pipeline that called startField before invoking the CustomAnalyzer). If the field is described as a saved field in the Minion configuration, the accumulated text will also be saved in the field store.

Parameters:
t - The text to store and/or tokenize.
b - The beginning position in the text buffer.
e - The ending position in the text buffer.

savedData

void savedData(java.lang.Object sd)
Saves some data verbatim in the field store. This should only be used if the text(char[], int, int) method is not used.

Parameters:
sd - the data to store

endField

void endField(FieldInfo field)
Instructs the pipeline to stop collecting data for a field

Parameters:
field - the object describing the field that is ending