BasicField (Minion Search Engine)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.sun.labs.minion.indexer.dictionary
Class BasicField

java.lang.Object
  com.sun.labs.minion.indexer.dictionary.BasicField

All Implemented Interfaces:: SavedField, java.lang.Comparable

public class BasicField
extends java.lang.Object
implements SavedField
extends java.lang.Object
implements SavedField

A class to hold the data for a saved field during indexing.

See Also:: FieldInfo, MemoryFieldStore

Nested Class Summary
`class`	`BasicField.Fetcher` A class that can be used when you want to get a lot of field values for a particular field, for example, when sorting or clustering results by a particular field.

Field Summary
`protected DiskBiGramDictionary`	`bigrams` A bigram dictionary that we can use for character fields.
`protected CDateParser`	`dp` A date parser for date fields.
`protected ReadableBuffer`	`dtvData` A buffer containing the actual dtv data at query time.
`protected ReadableBuffer`	`dtvOffsets` A buffer containing the dtv offsets at query time.
`protected java.util.List[]`	`dv` An array of the sets of entries stored per document at indexing time.
`protected int`	`dvPos` The current postition in the dtv array, that is, where the next document ID will be added.
`protected FieldInfo`	`field` The field info object for this field.
`protected com.sun.labs.minion.indexer.dictionary.SavedFieldHeader`	`header` The header for this field.
`protected static java.lang.String`	`logTag` The log tag.
`protected int`	`nBytes` The number of bytes we're using to store data.
`protected Dictionary`	`values` A dictionary to use for the saved field data.

Constructor Summary
`protected`	`BasicField()` Default constructor for subclasses.
	`BasicField(FieldInfo field)` Constructs a saved field that will be used to store data during indexing.
	`BasicField(FieldInfo field, java.io.RandomAccessFile dictFile, java.io.RandomAccessFile[] postFiles, DictionaryFactory fieldStoreDictFactory, DictionaryFactory bigramDictFactory, DiskPartition part)` Constructs a saved field that will be used to retrieve data during querying.

Method Summary
`void`	`add(int docID, java.lang.Object data)` Adds data to a saved field.
`void`	`clear()` Clears a saved field, if it's open for indexing.
`int`	`compareTo(java.lang.Object o)` Compares saved fields according to the field ID.
`void`	`dump(java.lang.String path, java.io.RandomAccessFile dictFile, PostingsOutput[] postOut, int maxID)` Writes the data to the provided stream.
`QueryEntry`	`get(java.lang.Object v, boolean caseSensitive)` Gets a particular value from the field.
`static java.lang.Class`	`getEntryClass(FieldInfo field)` Gets an entry class appropriate to the type of the given field.
`protected java.lang.Object`	`getEntryName(java.lang.Object val)` Gets a name for a given saved value, parsing as necessary.
`BasicField.Fetcher`	`getFetcher()`
`FieldInfo`	`getField()` Get the field info object for this field.
`java.util.SortedSet<FieldValue>`	`getMatching(java.lang.String pattern)`
`protected static NameDecoder`	`getNameDecoder(FieldInfo field)` Gets a name decoder of the appropriate type for the given field.
`protected static NameEncoder`	`getNameEncoder(FieldInfo field)` Gets a name encoder of the appropriate type for the given field.
`java.lang.Object`	`getSavedData(int docID, boolean all)` Retrieve data from a saved field.
`ArrayGroup`	`getSimilar(ArrayGroup ag, java.lang.String value, boolean matchCase)`
`ArrayGroup`	`getUndefined(ArrayGroup ag)` Gets a group of all the documents that do not have any values saved for this field.
`boolean`	`hasSavedValues(int docID)` Indicates whether a given document has saved data for this field.
`DictionaryIterator`	`iterator(java.lang.Object lowerBound, boolean includeLower, java.lang.Object upperBound, boolean includeUpper)` Gets an iterator for the values in this field.
`void`	`merge(java.lang.String path, SavedField[] fields, int maxID, int[] starts, int[] nUndel, int[][] docIDMaps, java.io.RandomAccessFile dictFile, PostingsOutput postOut)` Merges a number of saved fields.
`int`	`size()` Gets the number of saved terms that we're storing.
`protected java.util.Iterator`	`valueIterator()`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

field

protected FieldInfo field

The field info object for this field. Used during indexing.

values

protected Dictionary values

A dictionary to use for the saved field data.

dv

protected java.util.List[] dv

An array of the sets of entries stored per document at indexing time.

dvPos

protected int dvPos

The current postition in the dtv array, that is, where the next document ID will be added.

dtvOffsets

protected ReadableBuffer dtvOffsets

A buffer containing the dtv offsets at query time.

dtvData

protected ReadableBuffer dtvData

A buffer containing the actual dtv data at query time.

bigrams

protected DiskBiGramDictionary bigrams

A bigram dictionary that we can use for character fields.

nBytes

protected int nBytes

The number of bytes we're using to store data.

header

protected com.sun.labs.minion.indexer.dictionary.SavedFieldHeader header

The header for this field.

dp

protected CDateParser dp

A date parser for date fields.

logTag

protected static java.lang.String logTag

The log tag.

Constructor Detail

BasicField

protected BasicField()

Default constructor for subclasses.

BasicField

public BasicField(FieldInfo field)

Constructs a saved field that will be used to store data during indexing.

Parameters:: field - The FieldInfo for this saved field.

BasicField

public BasicField(FieldInfo field,
                  java.io.RandomAccessFile dictFile,
                  java.io.RandomAccessFile[] postFiles,
                  DictionaryFactory fieldStoreDictFactory,
                  DictionaryFactory bigramDictFactory,
                  DiskPartition part)
           throws java.io.IOException

Constructs a saved field that will be used to retrieve data during querying.

Parameters:: field - The FieldInfo for this saved field.; dictFile - The file containing the dictionary for this field.; postFiles - The files containing the postings for this field.; part - The disk partition that this field is associated with.
Throws:: java.io.IOException - if there is any error loading the field data.

Method Detail

getEntryClass

public static java.lang.Class getEntryClass(FieldInfo field)

Gets an entry class appropriate to the type of the given field.

add

public void add(int docID,
                java.lang.Object data)

Adds data to a saved field.

Specified by:: add in interface SavedField

Parameters:: docID - the document ID for the document containing the saved data; data - The actual field data.

getNameDecoder

protected static NameDecoder getNameDecoder(FieldInfo field)

Gets a name decoder of the appropriate type for the given field.

Parameters:: field - The field for which we want a name decoder.

getNameEncoder

protected static NameEncoder getNameEncoder(FieldInfo field)

Gets a name encoder of the appropriate type for the given field.

Parameters:: field - The field for which we want a name encoder.

getEntryName

protected java.lang.Object getEntryName(java.lang.Object val)

Gets a name for a given saved value, parsing as necessary.

Parameters:: val - The value that we were passed.
Returns:: A name appropriate for the given value and field type.

dump

public void dump(java.lang.String path,
                 java.io.RandomAccessFile dictFile,
                 PostingsOutput[] postOut,
                 int maxID)
          throws java.io.IOException

Writes the data to the provided stream.

Specified by:: dump in interface SavedField

Parameters:: path - The path of the index directory.; dictFile - The file where the dictionary will be written.; postOut - A place to write the postings associated with the values.; maxID - The maximum document ID for this partition.
Throws:: java.io.IOException - if there is an error during the writing.

hasSavedValues

public boolean hasSavedValues(int docID)

Indicates whether a given document has saved data for this field.

Parameters:: docID - the document ID for the document that we wish to check.
Returns:: true if this document ID has saved values, false otherwise.

getSavedData

public java.lang.Object getSavedData(int docID,
                                     boolean all)

Retrieve data from a saved field.

Specified by:: getSavedData in interface SavedField

Parameters:: docID - the document ID that we want data for.; all - If true, return all known values for the field in the given document. If false return only one value.
Returns:: If all is true, then return a List of the values stored in the given field in the given document. If all is false, a single value of the appropriate type will be returned.
If the given name is not the name of a saved field, or the document ID is invalid, null will be returned.

get

public QueryEntry get(java.lang.Object v,
                      boolean caseSensitive)

Gets a particular value from the field.

Specified by:: get in interface SavedField

Parameters:: v - The value to get.; caseSensitive - If true, case should be taken into account when iterating through the values. This value will only be observed for character fields!
Returns:: The term associated with that name, or null if that term doesn't occur in the indexed material.

getUndefined

public ArrayGroup getUndefined(ArrayGroup ag)

Gets a group of all the documents that do not have any values saved for this field.

Specified by:: getUndefined in interface SavedField

Parameters:: ag - a set of documents to which we should restrict the search for documents with undefined field values. If this is null then there is no such restriction.
Returns:: a set of documents that have no defined values for this field. This set may be restricted to documents occurring in the group that was passed in.

getSimilar

public ArrayGroup getSimilar(ArrayGroup ag,
                             java.lang.String value,
                             boolean matchCase)

getMatching

public java.util.SortedSet<FieldValue> getMatching(java.lang.String pattern)

iterator

public DictionaryIterator iterator(java.lang.Object lowerBound,
                                   boolean includeLower,
                                   java.lang.Object upperBound,
                                   boolean includeUpper)

Gets an iterator for the values in this field. If this is a string field only the values that actually occurred will be returned.

Specified by:: iterator in interface SavedField

Parameters:: lowerBound - the name of the entry that will be the lower bound of the iterator, or null if there is no such bound; includeLower - whether the lower bound should be included in the results of the iterator; upperBound - the name of the entry that will be the upper bound of the iterator, or null if there is no such bound; includeUpper - whether the upper bound should be included in the results of the iterator
Returns:: an iterator for the entries in the dictionary

merge

public void merge(java.lang.String path,
                  SavedField[] fields,
                  int maxID,
                  int[] starts,
                  int[] nUndel,
                  int[][] docIDMaps,
                  java.io.RandomAccessFile dictFile,
                  PostingsOutput postOut)
           throws java.io.IOException

Merges a number of saved fields.

Specified by:: merge in interface SavedField

Parameters:: path - The path to the index directory.; fields - An array of fields to merge.; maxID - The max doc ID in the new partition; starts - The new starting document IDs for the partitions.; docIDMaps - A map for each partition from old document IDs to new document IDs. IDs that map to a value less than 0 have been deleted. A null array means that the old IDs are the new IDs.; dictFile - The file to which the merged dictionaries will be written.; postOut - The output to which the merged postings will be written.; nUndel - The number of undeleted documents in each partition
Throws:: java.io.IOException - if there is an error during the merge.

valueIterator

protected java.util.Iterator valueIterator()

size

public int size()

Gets the number of saved terms that we're storing.

Specified by:: size in interface SavedField

clear

public void clear()

Clears a saved field, if it's open for indexing.

Specified by:: clear in interface SavedField

compareTo

public int compareTo(java.lang.Object o)

Compares saved fields according to the field ID.

Specified by:: compareTo in interface java.lang.Comparable

getField

public FieldInfo getField()

Get the field info object for this field. Used during indexing.

Specified by:: getField in interface SavedField

Returns:: the FieldInfo

getFetcher

public BasicField.Fetcher getFetcher()

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.sun.labs.minion.indexer.dictionary Class BasicField

field

values

dv

dvPos

dtvOffsets

dtvData

bigrams

nBytes

header

dp

logTag

BasicField

BasicField

BasicField

getEntryClass

add

getNameDecoder

getNameEncoder

getEntryName

dump

hasSavedValues

getSavedData

get

getUndefined

getSimilar

getMatching

iterator

merge

valueIterator

size

clear

compareTo

getField

getFetcher

com.sun.labs.minion.indexer.dictionary
Class BasicField