com.sun.labs.minion.indexer.dictionary
Class MemoryDictionary

java.lang.Object
  extended by com.sun.labs.minion.indexer.dictionary.MemoryDictionary
All Implemented Interfaces:
Dictionary, java.lang.Iterable<QueryEntry>
Direct Known Subclasses:
MemoryBiGramDictionary

public class MemoryDictionary
extends java.lang.Object
implements Dictionary

A dictionary that will be used during indexing. The entries will be stored in a Map.

The dictionary is instantiated with the class of the entries that it will contain. It provides the ability to make entries of the appropriate type, given the name that will map to that entry. We provide a special case for entries that store information for the case insensitive version of a particular name.

At the time that the dictionary is written to disk, the entries are sorted by name. At this time, the IDs of the entries may be reassigned in name order (they were originally assigned in order of addition to the dictionary.) If this is the case, then a mapping between old and new IDs will be stored and can be retrieved by anyone who needs it.

Note that the size of the dictionary at dump time and the


Nested Class Summary
static class MemoryDictionary.IDMap
          An enumeration of the kinds of ID maps that may have to be built when dumping a dictionary.
 class MemoryDictionary.MemoryDictionaryIterator
          A class that implements a dictionary iterator for this dictionary.
static class MemoryDictionary.Renumber
          An enumeration of the kinds of renumbering that may need to be done when dumping a dictionary to disk.
 
Field Summary
protected  java.lang.Class entryClass
          The class of the entries that we will be holding.
protected  int id
          The ID that we will assign to entries as they are added.
protected  int[] idMap
          A map from the IDs assigned before sorting to the IDs assigned after sorting.
protected static java.lang.String logTag
          The tag for this module.
protected  java.util.Map<java.lang.Object,Entry> map
          A map to hold the entries.
protected  Partition part
          The partition with which this dictionary is associated.
 
Constructor Summary
MemoryDictionary(java.lang.Class entryClass)
          Creates a dictionary that can be used during indexing.
 
Method Summary
 void clear()
          Clears the dictionary, emptying it of all data.
 IndexEntry[] dump(java.lang.String path, NameEncoder encoder, PartitionStats partStats, java.io.RandomAccessFile dictFile, PostingsOutput[] postOut, MemoryDictionary.Renumber renumber, MemoryDictionary.IDMap idMapType, int[] postIDMap)
          Dumps the dictionary and the associated postings to files.
 IndexEntry[] dump(java.lang.String path, NameEncoder encoder, java.io.RandomAccessFile dictFile, PostingsOutput[] postOut, MemoryDictionary.Renumber renumber, MemoryDictionary.IDMap idMap, int[] postIDMap)
          Dumps the dictionary and the associated postings to files.
 void dumpPrepare(IndexEntry[] sortedEntries)
          Prepares a dictionary for dumping.
 QueryEntry get(java.lang.Object name)
          Gets an entry from the dictionary, given the name for the entry.
 java.lang.Class getEntryClass()
           
 int[] getIdMap()
          Gets a map from the IDs assigned before sorting to the IDs assigned after sorting.
 java.util.Set<java.lang.Object> getKeys()
           
 int getMaxId()
          Gets the largest ID in this dictionary as of the time the method is called.
 Partition getPartition()
          Gets the partition to which this dictionary belongs.
protected static int getSize(java.lang.Object name)
          Given a name, figure out how big it is in bytes.
 DictionaryIterator iterator()
          Gets an iterator for the entries in the dictionary.
 IndexEntry newEntry(java.lang.Object name)
          Gets a new, possibly cased, entry that can be added to this dictionary.
 void processEntry(IndexEntry e)
          Processes a single entry before dumping it.
 IndexEntry put(java.lang.Object name, IndexEntry e)
          Puts an entry into the dictionary.
 Entry remove(java.lang.Object name)
          Deletes an entry from the dictionary, given the name for the entry.
 void setPartition(Partition partition)
           
protected  IndexEntry simpleNewEntry(java.lang.Object name)
          Gets a new entry that can be added to this dictionary.
 int size()
          Gets the number of entries in the dictionary.
protected  IndexEntry[] sort(MemoryDictionary.Renumber renumber, MemoryDictionary.IDMap idMapType)
          Sorts the dictionary entries.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

part

protected Partition part
The partition with which this dictionary is associated.


map

protected java.util.Map<java.lang.Object,Entry> map
A map to hold the entries.


entryClass

protected java.lang.Class entryClass
The class of the entries that we will be holding.


id

protected int id
The ID that we will assign to entries as they are added.


idMap

protected int[] idMap
A map from the IDs assigned before sorting to the IDs assigned after sorting. Which way the map goes depends on the renumber parameter of dump(java.lang.String, com.sun.labs.minion.indexer.dictionary.NameEncoder, java.io.RandomAccessFile, com.sun.labs.minion.indexer.postings.io.PostingsOutput[], com.sun.labs.minion.indexer.dictionary.MemoryDictionary.Renumber, com.sun.labs.minion.indexer.dictionary.MemoryDictionary.IDMap, int[])


logTag

protected static java.lang.String logTag
The tag for this module.

Constructor Detail

MemoryDictionary

public MemoryDictionary(java.lang.Class entryClass)
Creates a dictionary that can be used during indexing.

Method Detail

getEntryClass

public java.lang.Class getEntryClass()

getKeys

public java.util.Set<java.lang.Object> getKeys()

simpleNewEntry

protected IndexEntry simpleNewEntry(java.lang.Object name)
Gets a new entry that can be added to this dictionary.

Parameters:
name - The name of the entry.
Returns:
a new entry for the dictionary, or null if there is an error instantiating the entry.

newEntry

public IndexEntry newEntry(java.lang.Object name)
Gets a new, possibly cased, entry that can be added to this dictionary. If the entry is cased, then this method assumes that the name provided is a string! This method will take care to set the pointer to the lower-case version of the entry so that cased postings can be added correctly.

Parameters:
name - The name of the new entry.
Returns:
a new entry or null if there is an error instantiating the entry.

put

public IndexEntry put(java.lang.Object name,
                      IndexEntry e)
Puts an entry into the dictionary. This will assign an ID to the entry if it is not already in the dictionary.

Specified by:
put in interface Dictionary
Parameters:
name - The name of the entry.
e - The entry to put in the dictionary.
Returns:
Any previous value stored in the dictionary under the name of the given entry.

getSize

protected static int getSize(java.lang.Object name)
Given a name, figure out how big it is in bytes.


get

public QueryEntry get(java.lang.Object name)
Gets an entry from the dictionary, given the name for the entry.

Specified by:
get in interface Dictionary
Parameters:
name - The name of the entry.
Returns:
The entry associated with the name, or null if the name doesn't appear in the dictionary.

remove

public Entry remove(java.lang.Object name)
Deletes an entry from the dictionary, given the name for the entry.

Parameters:
name - The name of the entry.
Returns:
The entry associated with the name, or null if the name doesn't appear in the dictionary.

getPartition

public Partition getPartition()
Gets the partition to which this dictionary belongs.

Specified by:
getPartition in interface Dictionary
Returns:
the partition

setPartition

public void setPartition(Partition partition)

size

public int size()
Gets the number of entries in the dictionary.

Specified by:
size in interface Dictionary
Returns:
the number of entries in the dictionary.

iterator

public DictionaryIterator iterator()
Gets an iterator for the entries in the dictionary.

Specified by:
iterator in interface Dictionary
Specified by:
iterator in interface java.lang.Iterable<QueryEntry>
Returns:
An iterator for the entries in the dictionary.

clear

public void clear()
Clears the dictionary, emptying it of all data.


sort

protected IndexEntry[] sort(MemoryDictionary.Renumber renumber,
                            MemoryDictionary.IDMap idMapType)
Sorts the dictionary entries. Depending on the value of renumber, new IDs may be assigned to the entries in their new, sorted order.

Parameters:
renumber - whether the entries in the dictionary should be renumbered in order of the names
idMapType - what kind of map (if any) should be kept between the old and new IDs
Returns:
The entries, in sorted order.

getMaxId

public int getMaxId()
Gets the largest ID in this dictionary as of the time the method is called.

Returns:
the latest id given out

getIdMap

public int[] getIdMap()
Gets a map from the IDs assigned before sorting to the IDs assigned after sorting.

Returns:
an array of int containing the map.

dumpPrepare

public void dumpPrepare(IndexEntry[] sortedEntries)
Prepares a dictionary for dumping. This can be used by subclasses to do anything that needs doing before dumping begins.

Parameters:
sortedEntries - entries from another dictionary.

dump

public IndexEntry[] dump(java.lang.String path,
                         NameEncoder encoder,
                         java.io.RandomAccessFile dictFile,
                         PostingsOutput[] postOut,
                         MemoryDictionary.Renumber renumber,
                         MemoryDictionary.IDMap idMap,
                         int[] postIDMap)
                  throws java.io.IOException
Dumps the dictionary and the associated postings to files. Once the dumping is complete, the pointer in the file must be pointing to a position *after* the data just written. This is so that we may dump multiple dictionaries and postings types to the same channel.

Parameters:
path - The path to the directory where the dictionary should be dumped.
encoder - An encoder for the names of the entries.
dictFile - The file where the dictionary will be dumped.
postOut - The place where the postings will be dumped.
renumber - How entries should be renumbered at dump time.
idMap - what kind of map from old to new IDs should be kept
postIDMap - A map from old IDs used in the postings to new IDs. This map will be given to the postings from the dictionary before they are dumped to disk, allowing the postings to be remapped before the dump. This is useful when the postings in one dictionary contain IDs that have been remapped during a dump operation, such as those in a document dictionary. If this value is null, no remapping will take place.
Returns:
An array of entries in the order that they were dumped.
Throws:
java.io.IOException - When there is an error writing either of the channels.

dump

public IndexEntry[] dump(java.lang.String path,
                         NameEncoder encoder,
                         PartitionStats partStats,
                         java.io.RandomAccessFile dictFile,
                         PostingsOutput[] postOut,
                         MemoryDictionary.Renumber renumber,
                         MemoryDictionary.IDMap idMapType,
                         int[] postIDMap)
                  throws java.io.IOException
Dumps the dictionary and the associated postings to files. Once the dumping is complete, the pointer in the file must be pointing to a position *after* the data just written. This is so that we may dump multiple dictionaries and postings types to the same channel.

Parameters:
path - The path to the directory where the dictionary should be dumped.
encoder - An encoder for the names of the entries.
partStats - a set of partition statistics that we will contribute to while dumping the dictionary. May be null.
dictFile - The file where the dictionary will be dumped.
postOut - The place where the postings will be dumped.
renumber - An integer indicating whether and how entries should be renumbered at dump time.
postIDMap - A map from old IDs used in the postings to new IDs. This map will be given to the postings from the dictionary before they are dumped to disk, allowing the postings to be remapped before the dump. This is useful when the postings in one dictionary contain IDs that have been remapped during a dump operation, such as those in a document dictionary. If this value is null, no remapping will take place.
Returns:
An array of entries in the order that they were dumped.
Throws:
java.io.IOException - When there is an error writing either of the channels.

processEntry

public void processEntry(IndexEntry e)
Processes a single entry before dumping it. Does nothing at this level.

Parameters:
e - the entry to process.