com.sun.labs.minion.classification
Class KFoldSplitter

java.lang.Object
  extended by com.sun.labs.minion.classification.KFoldSplitter
All Implemented Interfaces:
ResultSplitter

public class KFoldSplitter
extends java.lang.Object
implements ResultSplitter

Provides a K-fold splitter. The results are divided up into K equal sized subsets. At each of K iterations, one subset is withheld from the training set to be the validation set.


Nested Class Summary
 class KFoldSplitter.Doc
           
 
Field Summary
protected  KFoldSplitter.Doc[][] allDocs
          All the docs, paired with the array group each comes from
 int currFold
          The current fold when iterating
 int foldSize
          The number of documents in each fold.
 int k
           
protected static java.lang.String logTag
          The tag for this module.
protected  int numArrayGroups
          The number of array groups in the result set
protected  ResultSetImpl parent
          The full results, as passed in
protected  ResultSetImpl train
          The set that should be trained on
protected  ResultSetImpl validate
          The set that should be used for validation
 
Constructor Summary
KFoldSplitter()
          Default constructor.
 
Method Summary
 int getMinDocs()
          Gets the minimum number of docs needed for this splitter to be useful.
 ResultSetImpl getTrainSet()
          Gets the first of the two subset
 ResultSetImpl getValidateSet()
          Gets the second of the two subsets
 void init(ResultSetImpl parent, IndexConfig iC)
          Initializes the class.
 boolean nextSplit()
          Advances to the next split, if there is one.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

k

public int k

foldSize

public int foldSize
The number of documents in each fold.


currFold

public int currFold
The current fold when iterating


train

protected ResultSetImpl train
The set that should be trained on


validate

protected ResultSetImpl validate
The set that should be used for validation


parent

protected ResultSetImpl parent
The full results, as passed in


numArrayGroups

protected int numArrayGroups
The number of array groups in the result set


allDocs

protected KFoldSplitter.Doc[][] allDocs
All the docs, paired with the array group each comes from


logTag

protected static java.lang.String logTag
The tag for this module.

Constructor Detail

KFoldSplitter

public KFoldSplitter()
Default constructor. Setup is performed in the init method.

Method Detail

init

public void init(ResultSetImpl parent,
                 IndexConfig iC)
Description copied from interface: ResultSplitter
Initializes the class. Since implementors of ResultSplitter will be created via reflection, the default constructor is used. After instantiation, this init method is called.

Specified by:
init in interface ResultSplitter
Parameters:
parent - the result set to split up
iC - the index config, possibly containing relevent settings for this splitter

getMinDocs

public int getMinDocs()
Gets the minimum number of docs needed for this splitter to be useful.

Specified by:
getMinDocs in interface ResultSplitter
Returns:
the number of folds

getTrainSet

public ResultSetImpl getTrainSet()
Gets the first of the two subset

Specified by:
getTrainSet in interface ResultSplitter
Returns:
a result set that is a subset of the one passed in

getValidateSet

public ResultSetImpl getValidateSet()
Gets the second of the two subsets

Specified by:
getValidateSet in interface ResultSplitter
Returns:
a result set that is a subset of the one passed in

nextSplit

public boolean nextSplit()
Advances to the next split, if there is one. getFirstSet() and getSecondSet() will return the new splits after this method is called

Specified by:
nextSplit in interface ResultSplitter
Returns:
true if there is another split available