ClassifierDiskPartition (Minion Search Engine)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.sun.labs.minion.classification
Class ClassifierDiskPartition

java.lang.Object
  com.sun.labs.minion.indexer.partition.Partition
      com.sun.labs.minion.indexer.partition.DiskPartition
          com.sun.labs.minion.classification.ClassifierDiskPartition

All Implemented Interfaces:: Closeable, com.sun.labs.util.props.Component, com.sun.labs.util.props.Configurable, java.lang.Comparable<Partition>

public class ClassifierDiskPartition
extends DiskPartition
extends DiskPartition

A disk partition that will hold classifier data.

Field Summary
`protected ClassifierModel[]`	`allModels`
`protected long`	`dataStart` The place where the model specific data starts in the file.
`protected java.util.Map<java.lang.String,ClassificationFeature>`	`features` Things to fix after the open house: the main dictionary in the classifiers doesn't store the feature scores for the documents (i.e., the classifiers.) So we can't do bulk evaluation without inverting the document vectors.
`protected static java.lang.String`	`logTag`
`protected ClassifierModel`	`modelInstance`
`protected java.util.Map<java.lang.String,ClassifierModel>`	`modelMap`
`protected java.io.RandomAccessFile`	`msd` The file containing the model specific data for this partition.
`protected ReadableBuffer`	`msdOff` A buffer containing the offsets for the model specific data for each of our classifiers.
`protected int`	`nModels` The number of models that we're storing.

Fields inherited from class com.sun.labs.minion.indexer.partition.DiskPartition
`BUFF_SIZE, deletions, delFile, delFileLock, docDict, docDictFile, docPostFile, documentDictFactory, dvl, ignored, mainDict, mainFiles, MATCH_CUT_OFF, MIN_LEN, removedFile, termCache`

Fields inherited from class com.sun.labs.minion.indexer.partition.Partition
`DICT_OFFSETS_SIZE, docDictFactory, entryClass, entryName, indexConfig, mainDictFactory, mainDictFile, mainPostFiles, manager, maxID, nEntries, partNumber, PROP_DOC_DICT_FACTORY, PROP_INDEX_CONFIG, PROP_MAIN_DICT_FACTORY, PROP_PARTITION_MANAGER, stats`

Constructor Summary
`ClassifierDiskPartition(java.lang.Integer partNum, ClassifierManager manager, DictionaryFactory mainDictFactory, DictionaryFactory documentDictFactory)` Constructs a disk partition for a specific partition number.

Method Summary
`int`	`assembleResults(float[] scores, java.lang.String modelName, java.lang.String resultField, java.util.Map<java.lang.String,ClassificationResult> results)`
`void`	`classify(DiskPartition sdp, ExtraClassification ec, java.util.Map<java.lang.String,ClassificationResult> results)` Classifies all the documents in a disk partition.
`boolean`	`close()` Close the files associated with this partition.
`void`	`findSimilar(ClassifierModel cm, java.util.Map<java.lang.String,java.lang.Float> scores)`
`protected ClassifierModel[]`	`getAllModels()`
`protected ClassifierModel`	`getClassifier(FeatureEntry fe)` Gets a classifier model from an entry in our document dictionary.
`protected ClassifierModel`	`getClassifier(java.lang.String cname)`
`float`	`getDocumentVectorLength(int docID)` Gets the length of a document vector for a given document.
`java.util.Set`	`getFeatures(java.lang.String cname)`
`protected java.util.Map<java.lang.String,ClassificationFeature>`	`invert()`
`protected java.util.Set`	`makeFeatures(FeatureEntry entry)`
`protected void`	`mergeCustom(int newPartNumber, DiskPartition[] sortedParts, int[][] idMaps, int newMaxDocID, int[] docIDStart, int[] nUndel, int[][] docIDMaps)` Merges the model specific data for these classifiers.
`protected static void`	`reap(PartitionManager m, int n)` Reaps the given classifier partition.

Methods inherited from class com.sun.labs.minion.indexer.partition.DiskPartition
close, createRemoveFile, delete, deleteDocument, deleteDocument, docsAreMerged, getAverageDocumentLength, getCloseTime, getDeletedDocumentsMap, getDelMap, getDocIDMap, getDocumentIterator, getDocumentIterator, getDocumentLength, getDocumentTerm, getDocumentTerm, getDocumentVectorLength, getDocumentVectorLength, getDVL, getInputBuffers, getMainDictionary, getMainDictionaryIterator, getMainDictionaryIterator, getMainIterator, getMaxDocumentID, getMaxTermID, getNDocs, getNEntries, getNTokens, getTerm, getTerm, getTerm, getTerm, getTermCache, initAll, initDocDict, initDVL, initMainDict, initMainFiles, isDeleted, isIndexed, merge, merge, normalize, setCloseTime, syncDeletedMap, toString, updatePartition

Methods inherited from class com.sun.labs.minion.indexer.partition.DiskPartition

close, createRemoveFile, delete, deleteDocument, deleteDocument, docsAreMerged, getAverageDocumentLength, getCloseTime, getDeletedDocumentsMap, getDelMap, getDocIDMap, getDocumentIterator, getDocumentIterator, getDocumentLength, getDocumentTerm, getDocumentTerm, getDocumentVectorLength, getDocumentVectorLength, getDVL, getInputBuffers, getMainDictionary, getMainDictionaryIterator, getMainDictionaryIterator, getMainIterator, getMaxDocumentID, getMaxTermID, getNDocs, getNEntries, getNTokens, getTerm, getTerm, getTerm, getTerm, getTermCache, initAll, initDocDict, initDVL, initMainDict, initMainFiles, isDeleted, isIndexed, merge, merge, normalize, setCloseTime, syncDeletedMap, toString, updatePartition

Methods inherited from class com.sun.labs.minion.indexer.partition.Partition
`compareTo, getAllFiles, getAllFiles, getDocFiles, getDocFiles, getIndexConfig, getMainFiles, getMainFiles, getManager, getName, getNumPostingsChannels, getPartitionNumber, getQueryConfig, getStats, newProperties`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Field Detail

msd

protected java.io.RandomAccessFile msd

The file containing the model specific data for this partition.

msdOff

protected ReadableBuffer msdOff

A buffer containing the offsets for the model specific data for each of our classifiers.

nModels

protected int nModels

The number of models that we're storing.

dataStart

protected long dataStart

The place where the model specific data starts in the file.

logTag

protected static java.lang.String logTag

modelInstance

protected ClassifierModel modelInstance

features

protected java.util.Map<java.lang.String,ClassificationFeature> features

Things to fix after the open house: the main dictionary in the classifiers doesn't store the feature scores for the documents (i.e., the classifiers.) So we can't do bulk evaluation without inverting the document vectors. We'll do that once and keep it here.

allModels

protected ClassifierModel[] allModels

modelMap

protected java.util.Map<java.lang.String,ClassifierModel> modelMap

Constructor Detail

ClassifierDiskPartition

public ClassifierDiskPartition(java.lang.Integer partNum,
                               ClassifierManager manager,
                               DictionaryFactory mainDictFactory,
                               DictionaryFactory documentDictFactory)
                        throws java.io.IOException

Constructs a disk partition for a specific partition number.

Parameters:: partNum - the number of this partition; manager - the classifier manager for this partition
Throws:: java.io.IOException

Method Detail

getClassifier

protected ClassifierModel getClassifier(java.lang.String cname)

findSimilar

public void findSimilar(ClassifierModel cm,
                        java.util.Map<java.lang.String,java.lang.Float> scores)

getAllModels

protected ClassifierModel[] getAllModels()

invert

protected java.util.Map<java.lang.String,ClassificationFeature> invert()

getClassifier

protected ClassifierModel getClassifier(FeatureEntry fe)

Gets a classifier model from an entry in our document dictionary.

classify

public void classify(DiskPartition sdp,
                     ExtraClassification ec,
                     java.util.Map<java.lang.String,ClassificationResult> results)

Classifies all the documents in a disk partition. Uses the classifier model that is defined by the index/query configuration. The result is an array of collections of strings. Each position in the array corresponds to a document with the id of the position. The collection contains strings that represent the names of the classes to which the document belongs. If a position is null, the document belongs to no classes defined in this partition.

Parameters:: sdp - a disk partition; ec - a (possibly null) pair of field names. One is the name of the field from which classifiers were built. If this pair is non-null, then only classifiers that were built from the contents of the classifier from field in the pair will be considered. Also, if this pair is non-null then whatever classifiers are applied will be applied against the contents of the document from field in the pair. If this pair is null, then classification proceeds as usual.; results - a map to fill up with classification results

assembleResults

public int assembleResults(float[] scores,
                           java.lang.String modelName,
                           java.lang.String resultField,
                           java.util.Map<java.lang.String,ClassificationResult> results)

getFeatures

public java.util.Set getFeatures(java.lang.String cname)

makeFeatures

protected java.util.Set makeFeatures(FeatureEntry entry)

getDocumentVectorLength

public float getDocumentVectorLength(int docID)

Gets the length of a document vector for a given document. For classifier partitions, this is assumed to always be 1.

Overrides:: getDocumentVectorLength in class DiskPartition

Parameters:: docID - the ID of the document for whose vector we want the length
Returns:: 1.

mergeCustom

protected void mergeCustom(int newPartNumber,
                           DiskPartition[] sortedParts,
                           int[][] idMaps,
                           int newMaxDocID,
                           int[] docIDStart,
                           int[] nUndel,
                           int[][] docIDMaps)
                    throws java.lang.Exception

Merges the model specific data for these classifiers.

Overrides:: mergeCustom in class DiskPartition

Parameters:: newPartNumber - the number of the new partition; sortedParts - the sorted list of partitions; idMaps - a set of maps from old entry ids in the main dictionary to new entry ids in the merged dictionary; newMaxDocID - the new maximum document id; docIDStart - the starting doc ids; nUndel - the number of undeleted documents in each partition; docIDMaps - doc id maps (see merge)
Throws:: java.lang.Exception

close

public boolean close()

Close the files associated with this partition.

Overrides:: close in class DiskPartition

Returns:: true if the files were successfully closed.

reap

protected static void reap(PartitionManager m,
                           int n)

Reaps the given classifier partition.

Parameters:: m - The manager associated with the partition.; n - The partition number to reap.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.sun.labs.minion.classification Class ClassifierDiskPartition

msd

msdOff

nModels

dataStart

logTag

modelInstance

features

allModels

modelMap

ClassifierDiskPartition

getClassifier

findSimilar

getAllModels

invert

getClassifier

classify

assembleResults

getFeatures

makeFeatures

getDocumentVectorLength

mergeCustom

close

reap

com.sun.labs.minion.classification
Class ClassifierDiskPartition