|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.sun.labs.minion.engine.SearchEngineImpl
public class SearchEngineImpl
This is the main class for handling a search engine, both for indexing
and retrieval operations. The engine is configured by two sets of
properties: indexing properties and query properties. The valid set
properties for indexing can be found in the documentation for the IndexConfig class. The valid set of query properties can be found in
the documentation for the QueryConfig class.
A synchronous pipeline is one which blocks the caller until indexing of the document has been completed. An asynchronous pipeline contains a queue of documents to index, and the caller will not be blocked during the indexing of a document unless the queue is full. Once a document has been added to the indexing queue, control returns to the caller. Note that in the case of asynchronous indexing, the map containing your document may sit on the indexing queue for some time, so you should not attempt to change or re-use that map!
To index documents, you set up the engine using a set of index configuration
properties and then simply call the SearchEngine.index(java.lang.String, java.util.Map) method.
This will route the document to a pipeline that is ready to index. Once
you've indexed all of your documents, you can call the
SearchEngine.flush() method to make sure that all of your indexed data
is written to the disk.
| Field Summary | |
|---|---|
protected ClassifierManager |
classManager
The manager for the classifier partitions in this index. |
protected ClassifierMemoryPartition |
classMemoryPartition
The memory partition for building classifiers |
protected ClusterManager |
clusterManager
The manager for the cluster partitions in this index. |
protected ClusterMemoryPartition |
clusterMemoryPartition
The memory partition for building feature clusters |
protected com.sun.labs.util.props.ConfigurationManager |
cm
The configuration manager for this engine. |
protected static java.text.DecimalFormat |
form
A format object for formatting the output. |
protected IndexConfig |
indexConfig
The configuration for the index and the indexing engine. |
protected java.util.concurrent.BlockingQueue |
indexingQueue
A blocking queue upon which we can put indexable things. |
protected PartitionManager |
invFilePartitionManager
The manager for the partitions in this index. |
protected static java.lang.String |
logTag
Our log tag. |
protected MetaDataStoreImpl |
metaDataStore
The meta data storage for this engine/index |
protected Pipeline[] |
pipes
The pipelines to use for indexing. |
protected java.lang.Thread[] |
pipeThreads
Threads to hold run our pipelines. |
static java.lang.String |
PROP_BUILD_CLASSIFIERS
A property indicating whether we should build classifiers while indexing or not. |
static java.lang.String |
PROP_CLASS_MANAGER
|
static java.lang.String |
PROP_CLASS_MEMORY_PARTITION
|
static java.lang.String |
PROP_CLASSIFIER_CLASS_NAME
|
static java.lang.String |
PROP_CLUSTER_MANAGER
|
static java.lang.String |
PROP_CLUSTER_MEMORY_PARTITION
|
static java.lang.String |
PROP_DUMPER
|
static java.lang.String |
PROP_INDEX_CONFIG
|
static java.lang.String |
PROP_INDEXING_QUEUE_LENGTH
|
static java.lang.String |
PROP_INV_FILE_PARTITION_MANAGER
|
static java.lang.String |
PROP_LONG_INDEXING_RUN
A property that indicates that the search engine will be used for a long indexing run with no querying going on during that time. |
static java.lang.String |
PROP_MIN_MEMORY_PERCENT
|
static java.lang.String |
PROP_NUM_PIPELINES
|
static java.lang.String |
PROP_PIPELINE_FACTORY
|
static java.lang.String |
PROP_PROFILERS
|
static java.lang.String |
PROP_QUERY_CONFIG
|
protected QueryConfig |
queryConfig
The configuration for the query engine. |
| Fields inherited from interface com.sun.labs.minion.Searcher |
|---|
GRAMMAR_LUCENE, GRAMMAR_STRICT, GRAMMAR_WEB, GRAMMARS, OP_AND, OP_OR, OP_PAND |
| Constructor Summary | |
|---|---|
SearchEngineImpl()
Gets a search engine implementation. |
|
| Method Summary | |
|---|---|
void |
addIndexListener(IndexListener il)
Adds a listener for events in the index backing this search engine. |
void |
addQueryStats(QueryStats qs)
|
ResultSet |
allTerms(java.util.Collection<java.lang.String> terms,
java.util.Collection<java.lang.String> fields)
Builds a result set containing all of the given terms in any of the given fields. |
ResultSet |
anyTerms(java.util.Collection<java.lang.String> terms,
java.util.Collection<java.lang.String> fields)
Builds a result set of the documents containing any of the given terms in any of the given fields. |
void |
checkDump()
|
boolean |
checkLowMemory()
Determines if available memory is low. |
void |
classify(java.lang.String[] docKeys,
java.lang.String[] classNames)
Creates a manual assignment of a set of documents to a set of classes. |
void |
close()
Closes the engine. |
Document |
createDocument(java.lang.String key)
Creates a new document with a given key. |
FieldInfo |
defineField(FieldInfo field)
Defines a given field. |
void |
delete(java.util.List<java.lang.String> docs)
Deletes a number of documents from the index. |
void |
delete(java.lang.String key)
Deletes a document from the index. |
protected void |
dump()
Dumps any data currently held in memory to the disk via our configured dumper. |
void |
export(java.io.PrintWriter o)
Outputs an XML representation of the search index including all saved and vectored fields. |
void |
flush()
Flushes the indexed material currently held in memory to the disk, making it available for searching. |
void |
flushClassifiers()
Dumps all the classifiers that have been traied since the last dump, or since the searh engine started. |
java.util.List |
getAllFieldValues(java.lang.String field,
java.lang.String key)
Gets all of the field values associated with a given field in a given document. |
java.lang.String[] |
getClasses()
Returns the names of the classes for which classifiers are defined. |
ClassifierModel |
getClassifier(java.lang.String name)
|
ClassifierManager |
getClassifierManager()
Gets the classifier manager for this search engine. |
ClusterManager |
getClusterManager()
Gets the cluster manager for this search engine. |
com.sun.labs.util.props.ConfigurationManager |
getConfigurationManager()
|
double |
getDistance(java.lang.String k1,
java.lang.String k2,
java.lang.String name)
Gets the distance between two documents, based on the values stored in in a given feature vector saved field. |
Document |
getDocument(java.lang.String key)
Gets a document with a given key. |
java.util.Iterator<Document> |
getDocumentIterator()
Gets an iterator for all of the non-deleted documents in the collection. |
java.util.List<Document> |
getDocuments(java.util.List<java.lang.String> keys)
Gets a list of documents with the given keys. |
DocKeyEntry |
getDocumentTerm(java.lang.String key)
|
DocumentVector |
getDocumentVector(Document doc,
java.lang.String field)
Creates a document vector for the given document as though it occurred in the index. |
DocumentVector |
getDocumentVector(Document doc,
WeightedField[] fields)
Creates a composite document vector for the given document as though it occurred in the index. |
DocumentVector |
getDocumentVector(java.lang.String key)
Gets a document vector for the given key. |
DocumentVector |
getDocumentVector(java.lang.String key,
java.lang.String field)
Gets a document vector for the given key. |
DocumentVector |
getDocumentVector(java.lang.String key,
WeightedField[] fields)
Gets a composite document vector for the given linear combination of vectored fields for the given key. |
FieldInfo |
getFieldInfo(java.lang.String name)
Gets the information for a field. |
java.util.Iterator |
getFieldIterator(java.lang.String field)
Gets an iterator for all the values in a field. |
java.util.Collection |
getFieldNames()
Gets the names of all the fields known in the index |
java.lang.Object |
getFieldValue(java.lang.String field,
java.lang.String key)
Gets a single field value associated with a given field in a given document. |
HLPipeline |
getHLPipeline()
Gets a pipeline that can be used for highlighting. |
IndexConfig |
getIndexConfig()
Gets the index configuration in use by this search engine. |
boolean |
getLongIndexingRun()
Indicates whether this search engine is being used for a long indexing run. |
PartitionManager |
getManager()
Gets the partition manager associated with this search engine. |
java.util.SortedSet<FieldValue> |
getMatching(java.lang.String field,
java.lang.String pattern)
Gets the values for the given field that match the given pattern. |
MetaDataStore |
getMetaDataStore()
Gets the MetaDataStore for this index. |
java.lang.String |
getName()
Gets the name of this engine, if one has been assigned by the application. |
int |
getNDocs()
Gets the number of documents that the index contains. |
PartitionManager |
getPM()
Gets the partition manager for this search engine. |
java.util.List |
getProfilers()
|
QueryConfig |
getQC()
|
QueryConfig |
getQueryConfig()
Gets the query configuration being used by this search engine. |
QueryStats |
getQueryStats()
Gets the combined query stats for any queries run by the engine. |
ResultSet |
getResults(java.util.Collection<java.lang.String> keys)
Gets a set of results corresponding to the document keys passed in. |
ResultSet |
getResults(java.util.Map<java.lang.String,java.lang.Float> keys)
Gets a set of results corresponding to the document keys and scores passed in. |
ResultSet |
getSimilar(java.lang.String key,
java.lang.String name)
Gets a set of results ordered by similarity to the given document, calculated by computing the euclidean distance based on the feature vector stored in the given field. |
java.util.List<FieldValue> |
getSimilarClassifiers(java.lang.String cname,
int n)
|
java.util.List<WeightedFeature> |
getSimilarClassifierTerms(java.lang.String cname1,
java.lang.String cname2,
int n)
|
SimpleIndexer |
getSimpleIndexer()
Gets a simple indexer that can be used for simple indexing. |
TermStats |
getTermStats(java.lang.String term)
Gets the collection level term statistics for the given term. |
java.util.Set<java.lang.String> |
getTermVariations(java.lang.String term)
Gets the set of variations on a term that will be generated by default when searching for the term. |
java.util.List<FieldFrequency> |
getTopFieldValues(java.lang.String field,
int n,
boolean ignoreCase)
Gets a list of the top n most frequent field values for a given named field. |
ResultSet |
getTrainingDocuments(java.lang.String className)
Returns the set of documents that was used to train the classifier for the class with the provided class name. |
void |
index(Document document)
Indexes a document into the database. |
void |
index(Indexable doc)
Indexes a document into the database. |
void |
index(java.lang.String key,
java.util.Map document)
Indexes a document into the database. |
boolean |
isIndexed(java.lang.String key)
Checks to see if a document is in the index. |
boolean |
merge()
Performs a merge in the index, if one is necessary. |
void |
newProperties(com.sun.labs.util.props.PropertySheet ps)
|
void |
optimize()
Merges all of the partitions in the index into a single partition. |
void |
purge()
Deletes all of the data in the index. |
void |
reclassifyIndex(java.lang.String className)
Causes the engine to reclassify all documents against the classifier for the given class name. |
void |
recover()
Attempts to recover the index after an unruly shutdown. |
void |
removeIndexListener(IndexListener il)
Removes an index listener from the listeners. |
void |
resetQueryStats()
Resets the query stats for the engine. |
ResultSet |
search(Element el)
Runs a query against the index, returning a set of results. |
ResultSet |
search(Element el,
java.lang.String sortOrder)
Runs a query against the index, returning a set of results. |
ResultSet |
search(java.lang.String query)
Runs a query against the index, returning a set of results. |
ResultSet |
search(java.lang.String query,
java.lang.String sortOrder)
Runs a query against the index, returning a set of results. |
ResultSet |
search(java.lang.String query,
java.lang.String sortOrder,
int defaultOperator,
int grammar)
Runs a query against the index, returning a set of results. |
void |
setDefaultFieldInfo(FieldInfo field)
Sets the default field information to use when unknown fields are encountered during indexing. |
void |
setLongIndexingRun(boolean longIndexingRun)
Sets the indicator that this is a long indexing run, in which case term statistics dictionaries and document vector lengths will not be calculated until the engine is shutdown. |
void |
setProfilers(java.util.List profilers)
|
void |
setQueryConfig(QueryConfig queryConfig)
Sets the query configuration to use for subsequent queries. |
protected double |
toMB(long x)
|
java.lang.String |
toString()
Gets a string description of the search engine. |
void |
trainClass(ResultSet results,
java.lang.String className,
java.lang.String fieldName)
Generates a classifier based on the documents in the provided result set. |
void |
trainClass(ResultSet results,
java.lang.String className,
java.lang.String fieldName,
Progress p)
Generates a classifier based on the documents in the provided result set. |
void |
trainClass(ResultSet results,
java.lang.String className,
java.lang.String fieldName,
java.lang.String fromField)
Generates a classifier based on the documents in the provided result set. |
void |
trainClass(ResultSet results,
java.lang.String className,
java.lang.String fieldName,
java.lang.String fromField,
Progress progress)
Generates a classifier based on the documents in the provided result set. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
protected IndexConfig indexConfig
protected QueryConfig queryConfig
protected MetaDataStoreImpl metaDataStore
protected com.sun.labs.util.props.ConfigurationManager cm
protected PartitionManager invFilePartitionManager
protected ClassifierManager classManager
protected ClusterManager clusterManager
protected ClassifierMemoryPartition classMemoryPartition
protected ClusterMemoryPartition clusterMemoryPartition
protected java.util.concurrent.BlockingQueue indexingQueue
protected Pipeline[] pipes
protected java.lang.Thread[] pipeThreads
protected static java.text.DecimalFormat form
protected static java.lang.String logTag
@ConfigComponent(type=IndexConfig.class) public static final java.lang.String PROP_INDEX_CONFIG
@ConfigComponent(type=QueryConfig.class) public static final java.lang.String PROP_QUERY_CONFIG
@ConfigComponent(type=PipelineFactory.class) public static final java.lang.String PROP_PIPELINE_FACTORY
@ConfigComponent(type=PartitionManager.class) public static final java.lang.String PROP_INV_FILE_PARTITION_MANAGER
@ConfigBoolean(defaultValue=false) public static final java.lang.String PROP_BUILD_CLASSIFIERS
@ConfigComponent(type=ClassifierManager.class) public static final java.lang.String PROP_CLASS_MANAGER
@ConfigComponent(type=ClusterManager.class) public static final java.lang.String PROP_CLUSTER_MANAGER
@ConfigComponent(type=MemoryPartition.class) public static final java.lang.String PROP_CLASS_MEMORY_PARTITION
@ConfigComponent(type=ClusterMemoryPartition.class) public static final java.lang.String PROP_CLUSTER_MEMORY_PARTITION
@ConfigDouble(defaultValue=0.3) public static final java.lang.String PROP_MIN_MEMORY_PERCENT
@ConfigComponent(type=Dumper.class) public static final java.lang.String PROP_DUMPER
@ConfigInteger(defaultValue=1) public static final java.lang.String PROP_NUM_PIPELINES
@ConfigInteger(defaultValue=256) public static final java.lang.String PROP_INDEXING_QUEUE_LENGTH
@ConfigString(defaultValue="com.sun.labs.minion.classification.Rocchio") public static final java.lang.String PROP_CLASSIFIER_CLASS_NAME
@ConfigComponentList(type=Profiler.class) public static final java.lang.String PROP_PROFILERS
@ConfigBoolean(defaultValue=false) public static final java.lang.String PROP_LONG_INDEXING_RUN
true (the default is false),
then no term statistics dictionaries or document vector lengths will be
calculated during indexing or merging of partitions. Additionally, at
shutdown, the extant partitions will be merged into a single partition and
then term statistics and document vector lengths will be calculated
for that single new partition.
| Constructor Detail |
|---|
public SearchEngineImpl()
| Method Detail |
|---|
public FieldInfo defineField(FieldInfo field)
throws SearchEngineException
SearchEngine
defineField in interface SearchEnginefield - the field to define
SearchEngineException - if the field is already defined and
there is a mismatch in the attributes or type of the given field or
if there is an error adding the field to the indexpublic void setDefaultFieldInfo(FieldInfo field)
SearchEngine
setDefaultFieldInfo in interface SearchEnginefield - an exemplar field information object that has the
attributes and type that should be used when an unknows field is
encountered during indexing. Note that any name associated with
this particular object will be ignored, we are only interested in
the attributes and type associated with this field.for how to define a field to use during indexingpublic FieldInfo getFieldInfo(java.lang.String name)
getFieldInfo in interface SearchEnginename - the name of the field for which we want information
null
if this name is not the name of a defined field.public java.util.Set<java.lang.String> getTermVariations(java.lang.String term)
SearchEngine
getTermVariations in interface SearchEngineterm - the term for which we want variants
public TermStats getTermStats(java.lang.String term)
SearchEngine
getTermStats in interface SearchEngineterm - the term for which we want the statisitics
null
if the term does not occur in the collection.public Document getDocument(java.lang.String key)
SearchEngine
getDocument in interface SearchEnginekey - the key for the document to retrieve.
null is returned.SearchEngine.index(Document),
SearchEngine.createDocument(java.lang.String),
SimpleIndexer.indexDocument(Document)public java.util.List<Document> getDocuments(java.util.List<java.lang.String> keys)
SearchEngine
getDocuments in interface SearchEnginekeys - the list of keys for which we want documents
keys and the documents in the
returned list.public Document createDocument(java.lang.String key)
SearchEngine
createDocument in interface SearchEnginekey - the key for the new document
null is returned.SearchEngine.index(Document),
SearchEngine.getDocument(java.lang.String),
SimpleIndexer.indexDocument(Document)
public void index(java.lang.String key,
java.util.Map document)
throws SearchEngineException
Note that simply calling index will not make a document
available for searching. Documents are not available until they are
flushed to disk. This can be accomplished using the
flush method.
index in interface SearchEnginekey - The document key for this document. The key should be
unique in the index. If the key passed in matches a document that
is already in the index, the information for this document will
replace the existing one.document - A map from field names to the value for that field.
If a particular field has a type or attributes associated with it,
they will be respected during indexing. If a field has no
attributes associated with it, the field will be tokenized and
indexed.
SearchEngineException - if there are any errors during the
indexing.IndexConfig.IndexConfig(java.lang.String)
public void index(Indexable doc)
throws SearchEngineException
SearchEngine
Note that simply calling index will not make a document
available for searching. Documents are not available until they are
flushed to disk. This can be accomplished using the
flush method.
index in interface SearchEnginedoc - the document to index.
SearchEngineException - if there are any errors during the
indexing.
public void index(Document document)
throws SearchEngineException
SearchEngine
In this case, the data for the document will be flushed to disk as
soon as the document is indexed. For indexing a large number of
documents, you may wish to consider the SimpleIndexer.indexDocument(Document) method, which will allow you
more control over when the data will be flushed to disk.
index in interface SearchEnginedocument - a document to be indexed
SearchEngineException - if there are any errors during the
indexing.SearchEngine.getDocument(java.lang.String),
SimpleIndexer.indexDocument(Document)public void addIndexListener(IndexListener il)
SearchEngine
addIndexListener in interface SearchEngineil - the listener to add.public void removeIndexListener(IndexListener il)
SearchEngine
removeIndexListener in interface SearchEngineil - the index listener to remove.
public void checkDump()
throws SearchEngineException
SearchEngineException
protected void dump()
throws SearchEngineException
SearchEngineException
public void flush()
throws SearchEngineException
flush in interface SearchEngineSearchEngineException - If there is any error flushing the in-memory data.public boolean isIndexed(java.lang.String key)
isIndexed in interface SearchEnginekey - the key for the document that we wish to check.
true if the document is in the index. A
document is considered to be in the index if a document with the
given key appears in the index and has not been deleted.public void delete(java.lang.String key)
delete in interface SearchEnginekey - The key for the document to delete.
public void delete(java.util.List<java.lang.String> docs)
throws SearchEngineException
delete in interface SearchEnginedocs - The keys of the documents to delete
SearchEngineException - If there is any error deleting the documents.
public ResultSet search(java.lang.String query)
throws SearchEngineException
search in interface SearchEnginesearch in interface Searcherquery - The query to run, in our query syntax.
ResultSet containing the results of the query.
SearchEngineException - If there is any error during the search.ResultSet
public ResultSet search(java.lang.String query,
java.lang.String sortOrder)
throws SearchEngineException
search in interface SearchEnginesearch in interface Searcherquery - The query to run, in our query syntax.sortOrder - How the results should be sorted. This is a set of
comma-separated field names, each preceeded by a + (for
increasing order) or by a - (for decreasing order).
ResultSet containing the results of the query.
SearchEngineException - If there is any error during the search.ResultSet
public ResultSet search(java.lang.String query,
java.lang.String sortOrder,
int defaultOperator,
int grammar)
throws SearchEngineException
search in interface SearchEnginesearch in interface Searcherquery - The query to run, in our query syntax.sortOrder - How the results should be sorted. This is a set of
comma-separated field names, each preceeded by a + (for
increasing order) or by a - (for decreasing order).defaultOperator - specifies the default operator to use when no
other operator is provided between terms in the query. Valid values are
defined in the Searcher interfacegrammar - specifies the grammar to use to parse the query. Valid values
ar edefined in the Searcher interface
ResultSet containing the results of the query.
SearchEngineException - If there is any error during the search.
public ResultSet search(Element el)
throws SearchEngineException
SearchEngine
search in interface SearchEngineel - the query, expressed using the programattic query API
SearchEngineException - if there are any errors
evaluating the query
public ResultSet search(Element el,
java.lang.String sortOrder)
throws SearchEngineException
SearchEngine
search in interface SearchEngineel - the query, expressed using the programattic query APIsortOrder - How the results should be sorted. This is a set of
comma-separated field names, each preceeded by a + (for
increasing order) or by a - (for decreasing order).
SearchEngineException - if there are any errors
evaluating the querypublic QueryStats getQueryStats()
SearchEngine
getQueryStats in interface SearchEngineSearchEngine.resetQueryStats()public void resetQueryStats()
SearchEngine
resetQueryStats in interface SearchEngineSearchEngine.getQueryStats()public void addQueryStats(QueryStats qs)
public ResultSet getResults(java.util.Collection<java.lang.String> keys)
getResults in interface SearchEnginekeys - a collection of document keys for which we want results.
public ResultSet getResults(java.util.Map<java.lang.String,java.lang.Float> keys)
keys - a collection of document keys for which we want results.
public ResultSet anyTerms(java.util.Collection<java.lang.String> terms,
java.util.Collection<java.lang.String> fields)
throws SearchEngineException
terms - the terms to look forfields - the fields to look for the terms in
SearchEngineException
public ResultSet allTerms(java.util.Collection<java.lang.String> terms,
java.util.Collection<java.lang.String> fields)
throws SearchEngineException
terms - the terms that we want to findfields - the fields that we must find the terms in
SearchEngineException - if there is an error during the search.
public java.util.SortedSet<FieldValue> getMatching(java.lang.String field,
java.lang.String pattern)
getMatching in interface SearchEnginefield - the saved, string field against whose values we will match.
If the named field is not saved or is not a string field, then the empty
set will be returned.pattern - the pattern for which we'll find matching field values.
public java.util.Iterator getFieldIterator(java.lang.String field)
getFieldIterator in interface SearchEnginefield - The name of the field who's values we need an iterator
for.
FieldInfo.getType()
public java.util.List getAllFieldValues(java.lang.String field,
java.lang.String key)
getAllFieldValues in interface SearchEnginefield - The name of the field for which we want the values.key - The key of the document whose values we want.
List containing values of the appropriate
type. If the named field is not a saved field, or if the given
document key is not in the index, then an empty list is returned.
public java.util.List<FieldFrequency> getTopFieldValues(java.lang.String field,
int n,
boolean ignoreCase)
getTopFieldValues in interface SearchEnginefield - the name of the field to rankn - the number of field values to return
List containing field values of the appropriate
type for the field, ordered by frequency
public java.util.List<FieldValue> getSimilarClassifiers(java.lang.String cname,
int n)
public java.util.List<WeightedFeature> getSimilarClassifierTerms(java.lang.String cname1,
java.lang.String cname2,
int n)
public java.lang.Object getFieldValue(java.lang.String field,
java.lang.String key)
getFieldValue in interface SearchEnginefield - The name of the field for which we want the values.key - The key of the document whose values we want.
Object of the appropriate type for the named
field. If the named field is not a saved field, or if the given
document key is not in the index, then null is
returned.
Note that if there are multiple values for the given field, there is no guarantee which of the values will be returned by this method.
getAllFieldValues(java.lang.String, java.lang.String)public java.util.Collection getFieldNames()
public DocumentVector getDocumentVector(java.lang.String key)
getDocumentVector in interface SearchEnginekey - The key for the document whose vector we are to retrieve.
DocumentVector containing the vector for this document.DocumentVector
public DocumentVector getDocumentVector(java.lang.String key,
java.lang.String field)
getDocumentVector in interface SearchEnginekey - The key for the document whose vector we are to retrieve.field - the field for which we want a document vector. If this
parameter is null, then a vector containing the terms from
all vectored fields in the document is returned. If this value is the empty string, then a
vector for the contents of the document that are not in any field are
returned. If this value is the name of a field that was not vectored
during indexing, an empty vector will be returned.
DocumentVector containing the vector for this document.DocumentVector
public DocumentVector getDocumentVector(java.lang.String key,
WeightedField[] fields)
SearchEngine
getDocumentVector in interface SearchEnginekey - the key of the document whose vector we will returnfields - the fields from which the document vector will be composed.
null if that key
does not appear in this index.
public DocumentVector getDocumentVector(Document doc,
java.lang.String field)
throws SearchEngineException
SearchEngine
getDocumentVector in interface SearchEnginedoc - a document for which we want a document vector. This
document may be in the index or may be generated via the
SearchEngine.createDocument(java.lang.String) method. The document will be processed as
though it were being indexed in order to extract the appropriate
document vector, but the data resulting from this processing will
not be added to the index.field - the field for which we want a document vector. If this
parameter is null, then a vector containing the terms from
all vectored fields in the document is returned. If this value is the
empty string, then a vector for the contents of the document that are
not in any field are returned. If this value is the name of a field
that was not vectored during indexing, an empty vector will be returned.
field parameter
SearchEngineException
public DocumentVector getDocumentVector(Document doc,
WeightedField[] fields)
throws SearchEngineException
SearchEngine
getDocumentVector in interface SearchEnginedoc - a document for which we want a document vector. This
document may be in the index or may be generated via the
SearchEngine.createDocument(java.lang.String) method. The document will be processed as
though it were being indexed in order to extract the appropriate
document vector, but the data resulting from this processing will
not be added to the index.fields - the fields for which we want a document vector.
SearchEngineExceptionpublic DocKeyEntry getDocumentTerm(java.lang.String key)
public ResultSet getSimilar(java.lang.String key,
java.lang.String name)
getSimilar in interface SearchEnginekey - the key of the document to which we'll compute similarity.name - the name of the field containing the feature vectors that
we'll use in the similarity computation.
public double getDistance(java.lang.String k1,
java.lang.String k2,
java.lang.String name)
getDistance in interface SearchEnginek1 - the first keyk2 - the second keyname - the name of the feature vector field for which we want the
distance
Double.POSITIVE_INFINITY is returned.public void purge()
purge in interface SearchEnginepublic boolean merge()
asyncMerges property to true, you will
need to call this method periodically to cause merges to happen. If
you do not, you may run out of file handles, leading to exceptions.
merge in interface SearchEnginetrue if a merge was performed,
false otherwise.
public void optimize()
throws SearchEngineException
optimize in interface SearchEngineSearchEngineException - If there is any error during the merge.
public void recover()
throws SearchEngineException
recover in interface SearchEngineSearchEngineException - If there is any error during the recovery.public java.util.Iterator<Document> getDocumentIterator()
getDocumentIterator in interface SearchEngine
public void close()
throws SearchEngineException
close in interface SearchEngineSearchEngineException - If there is any error closing the engine.public java.lang.String getName()
getName in interface SearchEnginenull if none has been assigned.public int getNDocs()
getNDocs in interface SearchEnginepublic QueryConfig getQueryConfig()
getQueryConfig in interface SearchEnginepublic SimpleIndexer getSimpleIndexer()
getSimpleIndexer in interface SearchEnginepublic HLPipeline getHLPipeline()
getHLPipeline in interface SearchEnginePipeline that can be used to highlight passages in
documents returned by a search.public java.lang.String toString()
toString in class java.lang.Objectpublic PartitionManager getPM()
getPM in interface SearchEnginepublic PartitionManager getManager()
getManager in interface SearchEnginepublic ClassifierManager getClassifierManager()
public ClusterManager getClusterManager()
public void flushClassifiers()
throws SearchEngineException
flushClassifiers in interface SearchEngineSearchEngineException - if there is any error
dumping the classifiers.
public void trainClass(ResultSet results,
java.lang.String className,
java.lang.String fieldName)
throws SearchEngineException
Classifier
trainClass in interface Classifierresults - the set of documents to use for training the classifierclassName - the name of the class to create or replacefieldName - the name of the field where the results of the classifier
should be stored.
SearchEngineException - If there is any error training the classifier
public void trainClass(ResultSet results,
java.lang.String className,
java.lang.String fieldName,
java.lang.String fromField)
throws SearchEngineException
Classifier
trainClass in interface Classifierresults - the set of documents to use for training the classifierclassName - the name of the class to create or replacefieldName - the name of the field where the results of the classifier
should be stored.fromField - the vectored field from which we should build the classifiers.
If this parameter is null then data from all indexed fields will be used. If this
parameter is the empty string, then data from the "body" field will be used.
SearchEngineException - If there is any error training the classifier
public void trainClass(ResultSet results,
java.lang.String className,
java.lang.String fieldName,
Progress p)
throws SearchEngineException
Classifier
trainClass in interface Classifierresults - the set of documents to use for training the classifierclassName - the name of the class to create or replacefieldName - the name of the field where the results of the classifier
should be stored.p - a progress monitor that will be notified as training proceeds
SearchEngineException - If there is any error training the classifier
public void trainClass(ResultSet results,
java.lang.String className,
java.lang.String fieldName,
java.lang.String fromField,
Progress progress)
throws SearchEngineException
trainClass in interface Classifierresults - the set of documents to use for training the classifierclassName - the name of the class to create or replacefieldName - the name of the field where the results of the classifier
should be stored.fromField - the vectored field from which we should build the classifiers.
If this parameter is null then data from all indexed fields will be used. If this
parameter is the empty string, then data from the "body" field will be used.progress - where to send progress events
SearchEngineException - If there is any error training the classifier
public void classify(java.lang.String[] docKeys,
java.lang.String[] classNames)
throws SearchEngineException
classify in interface ClassifierdocKeys - the keys of the documents to classifyclassNames - the classes to assign the documents to
SearchEngineException - if there is any error running the classifiers
public void reclassifyIndex(java.lang.String className)
throws SearchEngineException
reclassifyIndex in interface ClassifierclassName - the class to reclassify all documents against
SearchEngineException - If there is any error training the classifiers
public ResultSet getTrainingDocuments(java.lang.String className)
throws SearchEngineException
getTrainingDocuments in interface ClassifierclassName - the name of a class
SearchEngineException - If there is any error retrieving the training documentspublic java.lang.String[] getClasses()
getClasses in interface Classifierpublic ClassifierModel getClassifier(java.lang.String name)
public IndexConfig getIndexConfig()
getIndexConfig in interface SearchEnginepublic QueryConfig getQC()
public MetaDataStore getMetaDataStore()
throws SearchEngineException
getMetaDataStore in interface SearchEngineSearchEngineException - if there is any error
getting the metadata storepublic boolean checkLowMemory()
protected double toMB(long x)
public com.sun.labs.util.props.ConfigurationManager getConfigurationManager()
public void newProperties(com.sun.labs.util.props.PropertySheet ps)
throws com.sun.labs.util.props.PropertyException
newProperties in interface com.sun.labs.util.props.Configurablecom.sun.labs.util.props.PropertyExceptionpublic boolean getLongIndexingRun()
true if this engine is being use for a long indexing
run, in which case term statistics dictionaries and document vector lengths
will not be calculated until the engine is shutdown.public void setLongIndexingRun(boolean longIndexingRun)
setLongIndexingRun in interface SearchEnginelongIndexingRun - true if this is a long indexing runpublic void setQueryConfig(QueryConfig queryConfig)
SearchEngine
setQueryConfig in interface SearchEnginequeryConfig - a set of properties describing the query
configuration.
public void export(java.io.PrintWriter o)
throws java.io.IOException
SearchEngine
export in interface SearchEngineo - a print writer to which the index will be exported.
java.io.IOException - if there is any error writing the datapublic java.util.List getProfilers()
public void setProfilers(java.util.List profilers)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||