Token (Minion Search Engine)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.sun.labs.minion.pipeline
Class Token

java.lang.Object
  com.sun.labs.minion.pipeline.Token

All Implemented Interfaces:: FieldOccurrence, Occurrence

public class Token
extends java.lang.Object
implements FieldOccurrence
extends java.lang.Object
implements FieldOccurrence

A class encapsulating all of our knowledge about a given token. Instances of this class are passed down an indexing pipeline as they are parsed from the file.

Field Summary
`static int`	`BIGRAM`
`protected boolean`	`containsDigits` An indicator to show if this token contains digits (The taxonomy classifier ignores such tokens.)
`protected int`	`count` The occurrence count for this token.
`protected int`	`end` The ending character offset for the token.
`protected int[]`	`fields` A set of fields active for this token.
`protected int`	`id` An ID assigned to this token.
`static int`	`NORMAL`
`static int`	`PUNCT`
`protected int`	`start` The starting character offset for the token.
`protected java.lang.String`	`token` The string for a token.
`protected int`	`type` The type of this token, whether standard, bigram, or punctuation.
`protected int`	`wordNum` The ordinal number of this word in the document.

Constructor Summary
`Token()`
`Token(java.lang.String token, int count)` Creates a token.
`Token(java.lang.String token, int wordNum, int type)` Creates a token.
`Token(java.lang.String token, int wordNum, int start, int end)` Creates a token that can be passed down the pipeline.
`Token(java.lang.String token, int wordNum, int type, int start, int end)` Creates a token that can be passed down the pipeline.
`Token(java.lang.String token, int wordNum, int type, int start, int end, int count)` Creates a token that can be passed down the pipeline.

Method Summary
`boolean`	`containsDigits()`
`int`	`getCount()` Gets the count of occurrences for this token.
`int`	`getEnd()`
`int[]`	`getFields()` Gets the fields that are active at the time of the occurrence.
`int`	`getID()` Gets the ID of the term in this occurrence.
`int`	`getPos()` Gets the position at which the occurrence was found.
`int`	`getStart()`
`java.lang.String`	`getToken()`
`int`	`getType()`
`int`	`getWordNum()`
`void`	`incrWordNum()`
`int`	`length()`
`Token`	`reset(java.lang.String token, int wordNum, int start, int end)`
`Token`	`reset(java.lang.String token, int wordNum, int type, int start, int end)`
`Token`	`reset(java.lang.String token, int wordNum, int type, int start, int end, int count)`
`void`	`setCount(int count)` Sets the count of occurrences that this occurrence represents.
`void`	`setFields(int[] fields)`
`void`	`setID(int id)` Sets the ID for this token.
`void`	`setPos(int pos)` Sets the position for this token.
`void`	`setToken(java.lang.String token)` This method is intentionally package-private.
`void`	`setType(int type)`
`void`	`setWordNum(int wordNum)` Sets the word number for this token.
`java.lang.String`	`toString()`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Field Detail

token

protected java.lang.String token

The string for a token.

wordNum

protected int wordNum

The ordinal number of this word in the document.

type

protected int type

The type of this token, whether standard, bigram, or punctuation.

start

protected int start

The starting character offset for the token.

end

protected int end

The ending character offset for the token.

count

protected int count

The occurrence count for this token.

id

protected int id

An ID assigned to this token.

fields

protected int[] fields

A set of fields active for this token.

containsDigits

protected boolean containsDigits

An indicator to show if this token contains digits (The taxonomy classifier ignores such tokens.)

NORMAL

public static final int NORMAL

See Also:: Constant Field Values

BIGRAM

public static final int BIGRAM

See Also:: Constant Field Values

PUNCT

public static final int PUNCT

See Also:: Constant Field Values

Constructor Detail

Token

public Token()

Token

public Token(java.lang.String token,
             int count)

Creates a token.

Token

public Token(java.lang.String token,
             int wordNum,
             int type)

Creates a token.

Token

public Token(java.lang.String token,
             int wordNum,
             int start,
             int end)

Creates a token that can be passed down the pipeline.

Parameters:: token - The string tokenized from the input data; wordNum - The ordinal word number of this token in the indexed material.; start - The starting character offset of this token; end - The ending character offset of this token

Token

public Token(java.lang.String token,
             int wordNum,
             int type,
             int start,
             int end)

Creates a token that can be passed down the pipeline.

Parameters:: token - The string tokenized from the input data; wordNum - The ordinal word number of this token in the indexed material.; type - The type of this token, from our constant types; start - The beginning character offset of this token; end - The ending character offset of this token

Token

public Token(java.lang.String token,
             int wordNum,
             int type,
             int start,
             int end,
             int count)

Creates a token that can be passed down the pipeline.

Parameters:: token - The string tokenized from the input data; wordNum - The ordinal word number of this token in the indexed material.; type - The type of this token, from our constant types; start - The beginning character offset of this token; end - The ending character offset of this token

Method Detail

reset

public Token reset(java.lang.String token,
                   int wordNum,
                   int type,
                   int start,
                   int end)

reset

public Token reset(java.lang.String token,
                   int wordNum,
                   int start,
                   int end)

reset

public Token reset(java.lang.String token,
                   int wordNum,
                   int type,
                   int start,
                   int end,
                   int count)

length

public int length()

getToken

public java.lang.String getToken()

setToken

public void setToken(java.lang.String token)

This method is intentionally package-private. In classification, we'll reset the token to a stemmed token. Otherwise, this object should be immutable.

getType

public int getType()

setType

public void setType(int type)

getWordNum

public int getWordNum()

incrWordNum

public void incrWordNum()

getStart

public int getStart()

getEnd

public int getEnd()

toString

public java.lang.String toString()

Overrides:: toString in class java.lang.Object

getID

public int getID()

Gets the ID of the term in this occurrence.

Specified by:: getID in interface Occurrence

Returns:: the ID for the term.

setID

public void setID(int id)

Sets the ID for this token.

Specified by:: setID in interface Occurrence

Parameters:: id - the ID.

getCount

public int getCount()

Gets the count of occurrences for this token.

Specified by:: getCount in interface Occurrence

Returns:: the number of occurrences.

setWordNum

public void setWordNum(int wordNum)

Sets the word number for this token.

setCount

public void setCount(int count)

Sets the count of occurrences that this occurrence represents.

Specified by:: setCount in interface Occurrence

Parameters:: count - the number of occurrences.

getPos

public int getPos()

Gets the position at which the occurrence was found.

Specified by:: getPos in interface FieldOccurrence

Returns:: the position where the occurrence was found.

setPos

public void setPos(int pos)

Sets the position for this token.

getFields

public int[] getFields()

Gets the fields that are active at the time of the occurrence.

Specified by:: getFields in interface FieldOccurrence

Returns:: an array that is as long as the number of defined fields. The i^th element of this array indicates the current position in the field whose ID is i. If element 0 of this array is greater than zero, then no fields are currently active.

setFields

public void setFields(int[] fields)

containsDigits

public boolean containsDigits()

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.sun.labs.minion.pipeline Class Token

token

wordNum

type

start

end

count

id

fields

containsDigits

NORMAL

BIGRAM

PUNCT

Token

Token

Token

Token

Token

Token

reset

reset

reset

length

getToken

setToken

getType

setType

getWordNum

incrWordNum

getStart

getEnd

toString

getID

setID

getCount

setWordNum

setCount

getPos

setPos

getFields

setFields

containsDigits

com.sun.labs.minion.pipeline
Class Token