com.sun.labs.minion.retrieval.parser
Class LuceneTransformer

java.lang.Object
  extended by com.sun.labs.minion.retrieval.parser.Transformer
      extended by com.sun.labs.minion.retrieval.parser.LuceneTransformer

public class LuceneTransformer
extends Transformer

This class transforms the output of a JavaCC/JJTree tree into a tree of query elements that the query evaluator can understand.


Constructor Summary
LuceneTransformer()
           
 
Method Summary
protected static java.util.ArrayList collapseAnds(SimpleNode node, boolean parentIsAnd)
          Collapse any AND nodes that are just a chain of ANDSs into a single AND node with many children.
protected static java.util.ArrayList collapseOrs(SimpleNode node, boolean parentIsOr)
          Collapse any OR nodes that are just a chain of ORs into a single OR node with many children.
protected static void collapsePhrases(SimpleNode node, TokenCollectorStage tcs)
          Creates phrases for terms that would have been tokenized by the tokenizer upon indexing.
protected static void handleAttributes(SimpleNode node)
          Handles the extra attributes attached to terms in Lucene.
protected static boolean isPassThrough(SimpleNode node)
          Determines if the node has any significance or if it is just a passthrough that can be skipped.
static void main(java.lang.String[] args)
           
protected static QueryElement makeQueryElements(SimpleNode node, int defaultOperator)
          This recursive method creates the tree of query elements based on the root node passed in.
protected static java.util.ArrayList removeClutter(SimpleNode node)
          Removes clutter nodes -- the nodes that have only a single child and provide no additional context.
 QueryElement transformTree(SimpleNode root)
          Transforms an abstract syntax tree provided by JJTree+JavaCC into a tree of QueryElements that can be used by the query evaluator.
 QueryElement transformTree(SimpleNode root, int defaultOperator)
          Transforms an abstract syntax tree provided by JJTree+JavaCC into a tree of QueryElements that can be used by the query evaluator.
 
Methods inherited from class com.sun.labs.minion.retrieval.parser.Transformer
isDoubleQuoted, isQuoted, isSingleQuoted
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LuceneTransformer

public LuceneTransformer()
Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception

transformTree

public QueryElement transformTree(SimpleNode root)
                           throws java.text.ParseException
Transforms an abstract syntax tree provided by JJTree+JavaCC into a tree of QueryElements that can be used by the query evaluator.

Specified by:
transformTree in class Transformer
Parameters:
root - the root node of the tree returned from the Parser
Returns:
the root node of a tree describing a query
Throws:
java.text.ParseException

transformTree

public QueryElement transformTree(SimpleNode root,
                                  int defaultOperator)
                           throws java.text.ParseException
Transforms an abstract syntax tree provided by JJTree+JavaCC into a tree of QueryElements that can be used by the query evaluator.

Specified by:
transformTree in class Transformer
Parameters:
root - the root node of the tree returned from the Parser
defaultOperator - specified the default operator to use when no other operator is provided between terms in the query. Valid values are defined in the Searcher interface
Returns:
the root node of a tree describing a query
Throws:
java.text.ParseException

makeQueryElements

protected static QueryElement makeQueryElements(SimpleNode node,
                                                int defaultOperator)
                                         throws java.text.ParseException
This recursive method creates the tree of query elements based on the root node passed in. The tree passed in should already have been pruned as appropriate by the various methods in this class.

Parameters:
node -
defaultOperator - specified the default operator to use when no other operator is provided between terms in the query. Valid values are defined in the Searcher interface
Returns:
the top-level query element.
Throws:
java.text.ParseException

isPassThrough

protected static boolean isPassThrough(SimpleNode node)
Determines if the node has any significance or if it is just a passthrough that can be skipped.

Parameters:
node - the node to check
Returns:
true if the node is insignficant (semantically speaking)

removeClutter

protected static java.util.ArrayList removeClutter(SimpleNode node)
Removes clutter nodes -- the nodes that have only a single child and provide no additional context. The return parameter is used internally by the recursive call. If a node determines itself to be clutter, it returns its children in the ArrayList.

Parameters:
node - the node to clean: pass in the root when calling externally
Returns:
the node's children if the node is clutter (and not the root)

collapseOrs

protected static java.util.ArrayList collapseOrs(SimpleNode node,
                                                 boolean parentIsOr)
Collapse any OR nodes that are just a chain of ORs into a single OR node with many children.

Parameters:
node -

collapseAnds

protected static java.util.ArrayList collapseAnds(SimpleNode node,
                                                  boolean parentIsAnd)
Collapse any AND nodes that are just a chain of ANDSs into a single AND node with many children.

Parameters:
node -

handleAttributes

protected static void handleAttributes(SimpleNode node)
Handles the extra attributes attached to terms in Lucene. Detects the fuzzy "~" operator (uses morph instead of the Levenshtein distance for now), a boost "^" operator, and the proximit "~" operator on phrases. Also sets the not operator for "-" (or "!" as a bonus) and makes an and node for "+". Finally, it checks and removes attributes that don't apply since the grammar is lenient with them. This may recreate some otherwise empty "q" nodes that hold the necessary attributes.

Parameters:
node - the node whose children should be analyzed

collapsePhrases

protected static void collapsePhrases(SimpleNode node,
                                      TokenCollectorStage tcs)
Creates phrases for terms that would have been tokenized by the tokenizer upon indexing. Also strips quotes from terms. This method will traverse to above the leaf Term nodes and see if any of them need to be converted into Phrase nodes with individual terms hanging off of them.

Parameters:
node -