com.sun.labs.minion.document
Class MarkUpAnalyzer

java.lang.Object
  extended by com.sun.labs.minion.document.MarkUpAnalyzer
Direct Known Subclasses:
MarkUpAnalyzer_html, MarkUpAnalyzer_txt, MarkUpAnalyzer_xml

public abstract class MarkUpAnalyzer
extends java.lang.Object

An abstract class intended to be the superclass of all mark-up analyzers.


Field Summary
protected  java.io.Reader r
          A reader that can be used to read the data from the file.
 
Constructor Summary
MarkUpAnalyzer(java.io.Reader r, int pos, java.lang.String key)
          Makes a markup analyzer that will read from an input stream.
MarkUpAnalyzer(java.lang.String s, java.lang.String key)
          Makes a markup analyzer that will read from a string.
 
Method Summary
abstract  void analyze(Stage stage)
          Analyzes the current document.
static MarkUpAnalyzer getMarkUpAnalyzer(java.io.File f, java.io.Reader r, java.lang.String key)
          Gets a markup analyzer that's appropriate for the given file.
static MarkUpAnalyzer getMarkUpAnalyzer(java.lang.String mimeType, java.io.Reader r, java.lang.String key)
          Gets a markup analyzer that is appropriate for the given MIME type.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

r

protected java.io.Reader r
A reader that can be used to read the data from the file.

Constructor Detail

MarkUpAnalyzer

public MarkUpAnalyzer(java.io.Reader r,
                      int pos,
                      java.lang.String key)
Makes a markup analyzer that will read from an input stream.

Parameters:
r -
pos - The current position in the input stream, so that we can keep accurate counts of where things are.
key - The key for the document we're analyzing, so that we can report errors usefully.

MarkUpAnalyzer

public MarkUpAnalyzer(java.lang.String s,
                      java.lang.String key)
Makes a markup analyzer that will read from a string.

Parameters:
s - The string to read data from.
key - The key for the document we're analyzing, so that we can report errors usefully.
Method Detail

getMarkUpAnalyzer

public static MarkUpAnalyzer getMarkUpAnalyzer(java.io.File f,
                                               java.io.Reader r,
                                               java.lang.String key)
Gets a markup analyzer that's appropriate for the given file. This currently works by looking at the extension on the file.

Parameters:
f - The file that we're going to read from.
r - The reader that we can read data from.

getMarkUpAnalyzer

public static MarkUpAnalyzer getMarkUpAnalyzer(java.lang.String mimeType,
                                               java.io.Reader r,
                                               java.lang.String key)
Gets a markup analyzer that is appropriate for the given MIME type. As there are many variations in how types are specified, this currently looks only at the subtype for certain key words (xml and html). If a specific analyzer can't be chosen, a default text analyzer that just reads the file as text is used.

Parameters:
mimeType - the RFC 2046 mime type
r - a reader with the contents of the data
key - the document key
Returns:
an appropriate markup analyzer

analyze

public abstract void analyze(Stage stage)
                      throws java.io.IOException
Analyzes the current document. This will pass the markup and text events through to the tokenizer.

Parameters:
stage - the head of the pipeline that will process the text of the document
Throws:
java.io.IOException - If there is any error reading.