com.sun.labs.minion.retrieval
Class PassageImpl

java.lang.Object
  extended by com.sun.labs.minion.retrieval.PassageImpl
All Implemented Interfaces:
Passage, java.lang.Comparable

public class PassageImpl
extends java.lang.Object
implements Passage, java.lang.Comparable


Nested Class Summary
 
Nested classes/interfaces inherited from interface com.sun.labs.minion.Passage
Passage.Type
 
Field Summary
protected  int charSize
          The size of the passage, in characters.
protected  int context
          The amount of context to keep.
protected  java.lang.String elidedHLValue
          The elided passage, highlighted.
protected  java.lang.String elidedUnHLValue
          The elided passage, unhighlighted.
protected  int end
          The end of the range we want to collect for this passage.
protected  boolean finished
          Whether we're finished collecting this field.
protected  java.lang.String fullHLValue
          The full passage, highlighted.
protected  java.lang.String fullUnHLValue
          The full passage, unhighlighted.
protected static java.lang.String logTag
           
protected  int maxSize
          The maximum length, in characters, of any highlighted passage that we'll return.
protected  java.lang.String[] mt
          The terms from the document matching the query terms.
protected  float penalty
          The penalty score associated with this passage.
protected  int[] posns
          The word numbers associated with this passage.
protected  java.lang.String[] qt
          The query terms making up this passage.
protected  int size
          The size of the passage, in tokens.
protected  int start
          The start of the range we want to collect for this passage.
protected  int[] tokenEnds
          The list of ending positions for the highlighted tokens in this passage
protected  int[] tokenPosns
          The positions in the tokens array of the words associated with this passage.
protected  Token[] tokens
          The tokens and punctuation making up this passage, collected while parsing the document.
protected  int[] tokenStarts
          The list of starting positions for the highlighted tokens in this passage
 
Constructor Summary
PassageImpl(int[] posns, float penalty, java.lang.String[] qt, int context, int maxSize)
          Creates a passage for the given set of positions.
 
Method Summary
protected  boolean add(Token t)
          Tries to add a token to this passage.
 int compareTo(java.lang.Object o)
           
 java.lang.String elide(PassageHighlighter ph, boolean htmlEncode)
          Creates a string from a set of tokens that does not exceed the given length.
protected  PassageImpl endField()
          Tells us that our field has ended.
 java.lang.String getHLValue()
          Gets the highlighted value for this passage.
 java.lang.String getHLValue(boolean elided)
          Gets the highlighted value for this passage.
 int[] getMatchEnd()
          Gets the ending character positions in the document for the terms that make up the passage.
 java.lang.String[] getMatchingTerms()
          Gets the terms from the passage that match the terms in the query.
 int[] getMatchStart()
          Gets the starting character positions in the document for the terms that make up the passage.
 float getScore()
          Gets the penalty score associated with this passage.
 java.lang.String getUnHLValue(boolean elided)
          Gets the unhighlighted value for this passage.
 int[] getWordPositions()
          Gets the character positions of the passage words in the higlighted passage string that was returned earlier.
 java.lang.String highlight(PassageHighlighter highlighter)
          Marks up the passage using the highlighter.
 java.lang.String highlight(PassageHighlighter highlighter, boolean htmlEncode)
          Marks up the passage using the highlighter.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

penalty

protected float penalty
The penalty score associated with this passage.


qt

protected java.lang.String[] qt
The query terms making up this passage.


mt

protected java.lang.String[] mt
The terms from the document matching the query terms.


posns

protected int[] posns
The word numbers associated with this passage.


tokenPosns

protected int[] tokenPosns
The positions in the tokens array of the words associated with this passage.


tokenStarts

protected int[] tokenStarts
The list of starting positions for the highlighted tokens in this passage


tokenEnds

protected int[] tokenEnds
The list of ending positions for the highlighted tokens in this passage


context

protected int context
The amount of context to keep.


start

protected int start
The start of the range we want to collect for this passage.


end

protected int end
The end of the range we want to collect for this passage.


size

protected int size
The size of the passage, in tokens.


charSize

protected int charSize
The size of the passage, in characters.


maxSize

protected int maxSize
The maximum length, in characters, of any highlighted passage that we'll return. A value of -1 means the whole passage will be returned.


finished

protected boolean finished
Whether we're finished collecting this field.


tokens

protected Token[] tokens
The tokens and punctuation making up this passage, collected while parsing the document.


fullHLValue

protected java.lang.String fullHLValue
The full passage, highlighted.


fullUnHLValue

protected java.lang.String fullUnHLValue
The full passage, unhighlighted.


elidedHLValue

protected java.lang.String elidedHLValue
The elided passage, highlighted.


elidedUnHLValue

protected java.lang.String elidedUnHLValue
The elided passage, unhighlighted.


logTag

protected static java.lang.String logTag
Constructor Detail

PassageImpl

public PassageImpl(int[] posns,
                   float penalty,
                   java.lang.String[] qt,
                   int context,
                   int maxSize)
Creates a passage for the given set of positions.

Parameters:
posns - The word positions of the terms making up the passage.
penalty - The penalty associated with this passage.
qt - The terms used in the query.
context - The size of the surrounding context to put in the passage, in words. -1 means take the entire containing field.
maxSize - The maximum length, in characters, of any highlighted passage that we will return. -1 means that there is no maximum length.
Method Detail

getScore

public float getScore()
Gets the penalty score associated with this passage.

Specified by:
getScore in interface Passage
Returns:
the score associated with this passage

add

protected boolean add(Token t)
Tries to add a token to this passage. The token will only be added if it falls in the range defined by the passage.

Parameters:
t - The token to try to add.

endField

protected PassageImpl endField()
Tells us that our field has ended. If we're collecting an entire field, We return another instance of passage so that we can collect more occurrences of this field, should they pop up.


highlight

public java.lang.String highlight(PassageHighlighter highlighter,
                                  boolean htmlEncode)
Description copied from interface: Passage
Marks up the passage using the highlighter.

Specified by:
highlight in interface Passage
Parameters:
highlighter - The highlighter that will be used to mark up the passage. If this is null no highlighting will be done.
htmlEncode - If true the highlighted passage will have its text HTML encoded so that it may be safely given to a Web browser.
Returns:
the highlighted passage, cut down to the size specified when the passage was defined.
See Also:
Passage.getHLValue(boolean), Passage.getUnHLValue(boolean)

highlight

public java.lang.String highlight(PassageHighlighter highlighter)
Description copied from interface: Passage
Marks up the passage using the highlighter.

Specified by:
highlight in interface Passage
Parameters:
highlighter - The highlighter that will be used to mark up the passage. If this is null no highlighting will be done.
Returns:
the highlighted passage, cut down to the size specified when the passage was defined.
See Also:
Passage.getHLValue(boolean), Passage.getUnHLValue(boolean)

getHLValue

public java.lang.String getHLValue()
Description copied from interface: Passage
Gets the highlighted value for this passage. The passage will be cut down to the size specified when the passages were made.

Specified by:
getHLValue in interface Passage
Returns:
a highlighted string containing the passage.

getHLValue

public java.lang.String getHLValue(boolean elided)
Description copied from interface: Passage
Gets the highlighted value for this passage.

Specified by:
getHLValue in interface Passage
Parameters:
elided - If true returns the passage cut down to the size specified when the passages were made. If false the unelided, highlighted passage is returned.
Returns:
the highlighted value for this passage

getUnHLValue

public java.lang.String getUnHLValue(boolean elided)
Description copied from interface: Passage
Gets the unhighlighted value for this passage.

Specified by:
getUnHLValue in interface Passage
Parameters:
elided - If true returns the passage cut down to the size specified when the passages were made. If false the unelided, unhighlighted passage is returned.
Returns:
the unhighlighted value of the passage

elide

public java.lang.String elide(PassageHighlighter ph,
                              boolean htmlEncode)
Creates a string from a set of tokens that does not exceed the given length. This length is exclusive of any highlighting markup.

The basic idea: produce the passage in chunks centered around the hit terms. Begin with chunks that are just the hit terms and then at each step, add tokens before and after the hit terms until the string length limit is reached.

Parameters:
ph - the highlighter to use on the passage
htmlEncode - whether the string should be HTML encoded while highlighting
Returns:
a highlighted, elided string of the passage

getWordPositions

public int[] getWordPositions()
Gets the character positions of the passage words in the higlighted passage string that was returned earlier. This really only makes sense if you didn't ask for highlighting!

Specified by:
getWordPositions in interface Passage
Returns:
the character positions of the words in the highlighted passage string

getMatchingTerms

public java.lang.String[] getMatchingTerms()
Gets the terms from the passage that match the terms in the query.

Specified by:
getMatchingTerms in interface Passage
Returns:
an array of strings containing the terms that were found in the document that match the terms from the query that generated this document. If an element of the array is null, then that means that a term is missing from the passage.

getMatchStart

public int[] getMatchStart()
Gets the starting character positions in the document for the terms that make up the passage.

Specified by:
getMatchStart in interface Passage
Returns:
an array containing the character positions of the start of each term that appears in the passage. The position of the start of the term is defined as the position of the first letter in the term. This information can be used if it is necessary to highlight the actual document for display.

getMatchEnd

public int[] getMatchEnd()
Gets the ending character positions in the document for the terms that make up the passage.

Specified by:
getMatchEnd in interface Passage
Returns:
an array containing the character positions of the end of each term that appears in the passage. The position of the end of a term is defined as the position after the position of the last character in the word. This information can be used if it is necessary to highlight the actual document for display.

compareTo

public int compareTo(java.lang.Object o)
Specified by:
compareTo in interface java.lang.Comparable

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object