|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface SearchEngine
This is the main interface for the search engine, which handles both indexing
and retrieval operations. An implementation of this interface can be created
using the SearchEngineFactory
.
Each document is expected to have a unique key associated
with it. The key can be any non-null
Java
String
. It is up to the application to create these unique
keys for the documents that are to be indexed. If you re-use a key (for
example, if you're re-indexing a document whose contents have changed),
then the search engine will index the new document and mark the old one
as deleted.
Indexable
index(Indexable)
method. Indexable
is an interface that can
be implemented by objects that you want to be indexed by the search
engine. Objects that implement Indexable
are indexed using the
second approach for indexing.
Map
defineField(com.sun.labs.minion.FieldInfo)
to see how you can define fields
programatically.
How the values in the map are handled depends on the attributes and
types of the fields being indexed. The values in the maps can be a
variety of types then engine will recognize String
,
java.util.Date
, Integer
, Long
,
Float
, Double
, as well as
java.util.Collection
s and arrays of these types.
The engine will do some type conversion as necessary. For example,
if the application defines a field of type INTEGER
and a
string is passed as the value for that field, the string will be parsed
as an integer.
SimpleIndexer
SimpleIndexer
. The
application can request an instance of a SimpleIndexer
using the
getSimpleIndexer()
method.
The application can use the simple indexer to add fields to a document one at a time, rather than having to have them all at the same time.
Field Summary |
---|
Fields inherited from interface com.sun.labs.minion.Searcher |
---|
GRAMMAR_LUCENE, GRAMMAR_STRICT, GRAMMAR_WEB, GRAMMARS, OP_AND, OP_OR, OP_PAND |
Method Summary | |
---|---|
void |
addIndexListener(IndexListener il)
Adds a listener for events in the index backing this search engine. |
void |
close()
Closes the engine. |
Document |
createDocument(java.lang.String key)
Creates a new document with a given key. |
FieldInfo |
defineField(FieldInfo field)
Defines a given field. |
void |
delete(java.util.List<java.lang.String> keys)
Deletes a number of documents from the index. |
void |
delete(java.lang.String key)
Deletes a document from the index. |
void |
export(java.io.PrintWriter o)
Outputs an XML representation of the search index including all saved and vectored fields. |
void |
flush()
Flushes the indexed material currently held in memory to the disk, making it available for searching. |
void |
flushClassifiers()
Flushes all the classifier data currently in memory to disk. |
java.util.List<java.lang.Object> |
getAllFieldValues(java.lang.String field,
java.lang.String key)
Gets all of the field values associated with a given field in a given document. |
double |
getDistance(java.lang.String k1,
java.lang.String k2,
java.lang.String name)
Gets the distance between two documents, based on the values stored in in a given feature vector saved field. |
Document |
getDocument(java.lang.String key)
Gets a document with a given key. |
java.util.Iterator<Document> |
getDocumentIterator()
Gets an iterator for all of the non-deleted documents in the collection. |
java.util.List<Document> |
getDocuments(java.util.List<java.lang.String> keys)
Gets a list of documents with the given keys. |
DocumentVector |
getDocumentVector(Document doc,
java.lang.String field)
Creates a document vector for the given document as though it occurred in the index. |
DocumentVector |
getDocumentVector(Document doc,
WeightedField[] fields)
Creates a composite document vector for the given document as though it occurred in the index. |
DocumentVector |
getDocumentVector(java.lang.String key)
Gets a document vector for the given key. |
DocumentVector |
getDocumentVector(java.lang.String key,
java.lang.String field)
Gets a document vector for given vectored field for the given key. |
DocumentVector |
getDocumentVector(java.lang.String key,
WeightedField[] fields)
Gets a composite document vector for the given linear combination of vectored fields for the given key. |
FieldInfo |
getFieldInfo(java.lang.String name)
Gets the information for a field. |
java.util.Iterator<java.lang.Object> |
getFieldIterator(java.lang.String field)
Gets an iterator for all the values in a field. |
java.lang.Object |
getFieldValue(java.lang.String field,
java.lang.String key)
Gets a single field value associated with a given field in a given document. |
HLPipeline |
getHLPipeline()
Gets a pipeline that can be used for highlighting. |
IndexConfig |
getIndexConfig()
Gets the index configuration in use by this search engine. |
PartitionManager |
getManager()
Gets the partition manager associated with this search engine. |
java.util.SortedSet<FieldValue> |
getMatching(java.lang.String field,
java.lang.String pattern)
Gets the values for the given field that match the given pattern. |
MetaDataStore |
getMetaDataStore()
Gets the MetaDataStore for this index. |
java.lang.String |
getName()
Gets the name of this engine, if one has been assigned by the application. |
int |
getNDocs()
Gets the number of undeleted documents that the index contains. |
PartitionManager |
getPM()
Gets the partition manager for this search engine. |
QueryConfig |
getQueryConfig()
Gets the query configuration that the engine is currently using. |
QueryStats |
getQueryStats()
Gets the combined query stats for any queries run by the engine. |
ResultSet |
getResults(java.util.Collection<java.lang.String> keys)
Gets a set of results corresponding to the document keys passed in. |
ResultSet |
getSimilar(java.lang.String key,
java.lang.String name)
Gets a set of results ordered by similarity to the given document, calculated by computing the euclidean distance based on the feature vector stored in the given field. |
SimpleIndexer |
getSimpleIndexer()
Gets a simple indexer that can be used for simple indexing. |
TermStats |
getTermStats(java.lang.String term)
Gets the collection level term statistics for the given term. |
java.util.Set<java.lang.String> |
getTermVariations(java.lang.String term)
Gets the set of variations on a term that will be generated by default when searching for the term. |
java.util.List<FieldFrequency> |
getTopFieldValues(java.lang.String field,
int n,
boolean ignoreCase)
Gets a list of the top n most frequent field values for a given named field. |
void |
index(Document document)
Indexes a document into the database. |
void |
index(Indexable document)
Indexes a document into the database. |
void |
index(java.lang.String key,
java.util.Map document)
Indexes a document into the database. |
boolean |
isIndexed(java.lang.String key)
Checks to see if a document is in the index. |
boolean |
merge()
Performs a merge in the index, if one is necessary. |
void |
optimize()
Merges all of the partitions in the index into a single partition. |
void |
purge()
Purges all of the data in the index. |
void |
recover()
Attempts to recover the index after an unruly shutdown. |
void |
removeIndexListener(IndexListener il)
Removes an index listener from the listeners. |
void |
resetQueryStats()
Resets the query stats for the engine. |
ResultSet |
search(Element el)
Runs a query against the index, returning a set of results. |
ResultSet |
search(Element el,
java.lang.String sortOrder)
Runs a query against the index, returning a set of results. |
ResultSet |
search(java.lang.String query)
Runs a query against the index, returning a set of results. |
ResultSet |
search(java.lang.String query,
java.lang.String sortOrder)
Runs a query against the index, returning a set of results. |
ResultSet |
search(java.lang.String query,
java.lang.String sortOrder,
int defaultOperator,
int grammar)
Runs a query against the index, returning a set of results. |
void |
setDefaultFieldInfo(FieldInfo field)
Sets the default field information to use when unknown fields are encountered during indexing. |
void |
setLongIndexingRun(boolean longIndexingRun)
Sets the indicator that this is a long indexing run, in which case term statistics dictionaries and document vector lengths will not be calculated until the engine is shutdown. |
void |
setQueryConfig(QueryConfig queryConfig)
Sets the query configuration to use for subsequent queries. |
Methods inherited from interface com.sun.labs.minion.Classifier |
---|
classify, getClasses, getTrainingDocuments, reclassifyIndex, trainClass, trainClass, trainClass, trainClass |
Method Detail |
---|
void setDefaultFieldInfo(FieldInfo field)
field
- an exemplar field information object that has the
attributes and type that should be used when an unknows field is
encountered during indexing. Note that any name associated with
this particular object will be ignored, we are only interested in
the attributes and type associated with this field.for how to define a field to use during indexing
void setLongIndexingRun(boolean longIndexingRun)
longIndexingRun
- true if this is a long indexing runvoid addIndexListener(IndexListener il)
il
- the listener to add.void removeIndexListener(IndexListener il)
il
- the index listener to remove.FieldInfo defineField(FieldInfo field) throws SearchEngineException
field
- the field to define
SearchEngineException
- if the field is already defined and
there is a mismatch in the attributes or type of the given field or
if there is an error adding the field to the indexFieldInfo getFieldInfo(java.lang.String name)
name
- the name of the field for which we want information
null
if this name is not the name of a defined field.java.util.Set<java.lang.String> getTermVariations(java.lang.String term)
term
- the term for which we want variants
TermStats getTermStats(java.lang.String term)
term
- the term for which we want the statisitics
null
if the term does not occur in the collection.Document getDocument(java.lang.String key)
key
- the key for the document to retrieve.
null
is returned.index(Document)
,
createDocument(java.lang.String)
,
SimpleIndexer.indexDocument(Document)
java.util.List<Document> getDocuments(java.util.List<java.lang.String> keys)
keys
- the list of keys for which we want documents
keys
and the documents in the
returned list.Document createDocument(java.lang.String key)
key
- the key for the new document
null
is returned.index(Document)
,
getDocument(java.lang.String)
,
SimpleIndexer.indexDocument(Document)
void index(java.lang.String key, java.util.Map document) throws SearchEngineException
Note that simply calling index
will not make a document
available for searching. Documents are not available until they are
flushed to disk. This can be accomplished using the
flush
method.
key
- The document key for this document. The key should be
unique in the index. If the key passed in matches a document that
is already in the index, the information for this document will
replace the existing one.document
- A map from field names to the value for that field.
If a particular field has a type or attributes associated with it,
they will be respected during indexing. If a field has no
attributes associated with it, the field will be tokenized and
indexed. If you desire consistent treatment of documents for both
indexing and highlighting, then we strongly suggest that you use a
LinkedHashMap
for this parameter.
SearchEngineException
- if there are any errors during the
indexing.LinkedHashMap
void index(Indexable document) throws SearchEngineException
Note that simply calling index
will not make a document
available for searching. Documents are not available until they are
flushed to disk. This can be accomplished using the
flush
method.
document
- the document to index.
SearchEngineException
- if there are any errors during the
indexing.void index(Document document) throws SearchEngineException
In this case, the data for the document will be flushed to disk as
soon as the document is indexed. For indexing a large number of
documents, you may wish to consider the SimpleIndexer.indexDocument(Document)
method, which will allow you
more control over when the data will be flushed to disk.
document
- a document to be indexed
SearchEngineException
- if there are any errors during the
indexing.getDocument(java.lang.String)
,
SimpleIndexer.indexDocument(Document)
void flush() throws SearchEngineException
SearchEngineException
- if there is any error
flusing the data to diskboolean isIndexed(java.lang.String key)
key
- the key for the document that we wish to check.
true
if the document is in the index. A
document is considered to be in the index if a document with the
given key appears in the index and has not been deleted.void delete(java.lang.String key)
key
- The key for the document to delete.void delete(java.util.List<java.lang.String> keys) throws SearchEngineException
keys
- The keys of the documents to delete
SearchEngineException
- if there is any error
deleting the documents from the indexResultSet search(java.lang.String query) throws SearchEngineException
search
in interface Searcher
query
- The query to run, in our query syntax.
SearchEngineException
ResultSet search(java.lang.String query, java.lang.String sortOrder) throws SearchEngineException
search
in interface Searcher
query
- The query to run, in our query syntax.sortOrder
- How the results should be sorted. This is a set of
comma-separated field names, each preceeded by a +
(for
increasing order) or by a -
(for decreasing order).
SearchEngineException
- if there are any errors
evaluating the queryResultSet search(java.lang.String query, java.lang.String sortOrder, int defaultOperator, int grammar) throws SearchEngineException
search
in interface Searcher
query
- The query to run, in our query syntax.sortOrder
- How the results should be sorted. This is a set of
comma-separated field names, each preceeded by a +
(for
increasing order) or by a -
(for decreasing order).defaultOperator
- specified the default operator to use when no
other operator is provided between terms in the query. Valid values are
defined in the Searcher
interfacegrammar
- specifies the grammar to use to parse the query. Valid values
ar edefined in the Searcher
interface
SearchEngineException
- if there is any error during
the search.ResultSet search(Element el) throws SearchEngineException
el
- the query, expressed using the programattic query API
SearchEngineException
- if there are any errors
evaluating the queryResultSet search(Element el, java.lang.String sortOrder) throws SearchEngineException
el
- the query, expressed using the programattic query APIsortOrder
- How the results should be sorted. This is a set of
comma-separated field names, each preceeded by a +
(for
increasing order) or by a -
(for decreasing order).
SearchEngineException
- if there are any errors
evaluating the queryResultSet getResults(java.util.Collection<java.lang.String> keys)
keys
- a list of document keys for which we want results.
java.util.SortedSet<FieldValue> getMatching(java.lang.String field, java.lang.String pattern)
field
- the saved, string field against whose values we will match.
If the named field is not saved or is not a string field, then the empty
set will be returned.pattern
- the pattern for which we'll find matching field values.
java.util.Iterator<java.lang.Object> getFieldIterator(java.lang.String field)
field
- The name of the field who's values we need an iterator
for.
FieldInfo.getType()
java.util.List<java.lang.Object> getAllFieldValues(java.lang.String field, java.lang.String key)
field
- The name of the field for which we want the values.key
- The key of the document whose values we want.
List
containing values of the appropriate
type. If the named field is not a saved field, or if the given
document key is not in the index, then an empty list is returned.java.lang.Object getFieldValue(java.lang.String field, java.lang.String key)
field
- The name of the field for which we want the values.key
- The key of the document whose values we want.
Object
of the appropriate type for the named
field. If the named field is not a saved field, or if the given
document key is not in the index, then null
is
returned.
Note that if there are multiple values for the given field, there is no guarantee which of the values will be returned by this method.
getAllFieldValues(java.lang.String, java.lang.String)
java.util.List<FieldFrequency> getTopFieldValues(java.lang.String field, int n, boolean ignoreCase)
field
- the name of the field to rankn
- the number of field values to return
List
containing field values of the appropriate
type for the field, ordered by frequency. The scores associated with the
field values are their frequency in the collection.DocumentVector getDocumentVector(java.lang.String key) throws SearchEngineException
key
- the key of the document whose vector we will return
null
if that key
does not appear in this index.
SearchEngineException
- if there is any error
retrieving the document vector for the document associated with the
given key.DocumentVector getDocumentVector(java.lang.String key, java.lang.String field)
key
- the key of the document whose vector we will returnfield
- the field for which we want a document vector. If this
parameter is null
, then a vector containing the terms from
all vectored fields in the document is returned. If this value is the empty string, then a
vector for the contents of the document that are not in any field are
returned. If this value is the name of a field that was not vectored
during indexing, an empty vector will be returned.
null
if that key
does not appear in this index.DocumentVector getDocumentVector(java.lang.String key, WeightedField[] fields)
key
- the key of the document whose vector we will returnfields
- the fields from which the document vector will be composed.
null
if that key
does not appear in this index.DocumentVector getDocumentVector(Document doc, java.lang.String field) throws SearchEngineException
doc
- a document for which we want a document vector. This
document may be in the index or may be generated via the
createDocument(java.lang.String)
method. The document will be processed as
though it were being indexed in order to extract the appropriate
document vector, but the data resulting from this processing will
not be added to the index.field
- the field for which we want a document vector. If this
parameter is null
, then a vector containing the terms from
all vectored fields in the document is returned. If this value is the
empty string, then a vector for the contents of the document that are
not in any field are returned. If this value is the name of a field
that was not vectored during indexing, an empty vector will be returned.
field
parameter
SearchEngineException
DocumentVector getDocumentVector(Document doc, WeightedField[] fields) throws SearchEngineException
doc
- a document for which we want a document vector. This
document may be in the index or may be generated via the
createDocument(java.lang.String)
method. The document will be processed as
though it were being indexed in order to extract the appropriate
document vector, but the data resulting from this processing will
not be added to the index.fields
- the fields for which we want a document vector.
SearchEngineException
double getDistance(java.lang.String k1, java.lang.String k2, java.lang.String name)
k1
- the first keyk2
- the second keyname
- the name of the feature vector field for which we want the
distance
ResultSet getSimilar(java.lang.String key, java.lang.String name)
key
- the key of the document to which we'll compute similarity.name
- the name of the field containing the feature vectors that
we'll use in the similarity computation.
QueryStats getQueryStats()
resetQueryStats()
void resetQueryStats()
getQueryStats()
void purge()
SimpleIndexer.finish()
before calling purge.
boolean merge()
true
if a merge was performed,
false
otherwise.void optimize() throws SearchEngineException
SearchEngineException
- if there was any error
during the merge.void recover() throws SearchEngineException
SearchEngineException
- if there is any problem
recovering the index.void export(java.io.PrintWriter o) throws java.io.IOException
o
- a print writer to which the index will be exported.
java.io.IOException
- if there is any error writing the datajava.util.Iterator<Document> getDocumentIterator()
void close() throws SearchEngineException
SearchEngineException
- if there is any error
closing the search enginejava.lang.String getName()
null
if none has been assigned.int getNDocs()
void setQueryConfig(QueryConfig queryConfig)
queryConfig
- a set of properties describing the query
configuration.QueryConfig getQueryConfig()
SimpleIndexer getSimpleIndexer()
HLPipeline getHLPipeline()
PartitionManager getPM()
void flushClassifiers() throws SearchEngineException
SearchEngineException
- if there is any error
dumping the classifiers.PartitionManager getManager()
IndexConfig getIndexConfig()
MetaDataStore getMetaDataStore() throws SearchEngineException
SearchEngineException
- if there is any error
getting the metadata store
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |