|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.sun.labs.minion.util.BitBuffer
public class BitBuffer
The BitBuffer class is used to represent a string of bits. We're not using a java.util.BitSet because it doesn't support operations, like append, which we would like to have and it doesn't think that zeros at the end of the buffer are significant (they are for us.) BitSet has a solid basis implementation, however, so we'll use it (we can't extend the class because too many variables we need are private.)
The basic idea: An array of ints (called bits) is maintained to store the bits. Each int in this array is called a unit. We store bits starting at unit 0, bit 0, which is the least-significant bit of the int stored in bits[0]. Bits are stored from the least-significant position to the most-significant position in each unit runnning from 0 to the size of the bits array. An example:
Direction Units filled
-------------------------------------------->
----------------------+----------------------+
Unit 0 | Unit 1 |
3 0|3 0|
2 ... 0|2 ... 0|
----------------------+----------------------+
<---------------------|<---------------------|
Direction bits filled |Direction bits filled |
A "bit index" is simply a number giving a specific bit position in a BitBuffer, so, for example, a bit index of 69 indicates the 5th least-siginificant bit in bits[2] (i.e, bit 5 in bits[2]).
A "bit position" is a position within a unit, i.e., a number between 0 and 31. So, the bit index 69 has bit position 5.
There are three numbers that describe the size of the array and where bits are to be written to or read from: currUnit, which points to the unit where the next bit will be written, currBit, which points to the next bit to be set in the current unit, and head, which contains the bit index of the next bit to be read from the head of the array. All three numbers are indexed from 0.
In general, we expect that bits will be read from the head of the BitBuffer and written to the end of the BitBuffer. BitSet provides methods for setting and clearing a specific bit, which we will use with some modification.
We also provide methods for writing and reading BitBuffers. BitBuffers
can be written to any output stream that implements the
DataOutput interface. A buffer is written in the following
manner:
Buffers can be read from any stream that implements the
DataInput interface.
The methods in the class have been synchronized where appropriate to prevent overlapping writes, but you probably don't want to use the same instance in different threads without some other synchronization method. But hey, it's your funeral.
| Field Summary | |
|---|---|
protected static int |
ADDRESS_BITS_PER_UNIT
|
protected static int |
BIT_INDEX_MASK
|
protected int[] |
bits
The bits in this BitSet. |
protected static int |
BITS_PER_UNIT
|
protected static int[] |
ca
Some zeros we can use to clear out buffers. |
protected int |
currBit
The current bit in the last unit. |
protected int |
currUnit
The unit where we will write the next bit. |
protected static int |
FULL_UNIT
|
protected int |
head
A pointer to the head of the current Buffer. |
protected int |
markPoint
A mark that we can reset to. |
protected static int[] |
maxIntPerBits
The maximum number that can be encoded in a given number of bits. |
| Constructor Summary | |
|---|---|
BitBuffer()
Creates a new bit buffer. |
|
BitBuffer(BitBuffer b)
Creates a bit buffer whose data is initialized by the bit buffer given as a parameter. |
|
BitBuffer(boolean resetp,
BitBuffer b)
Creates a bit buffer whose data shares the data of the bit buffer given as a parameter. |
|
BitBuffer(byte[] bytes,
int offset)
Creates a bit buffer from an array of bytes. |
|
BitBuffer(java.io.DataInput di)
Creates a bit buffer by reading it from the provided stream. |
|
BitBuffer(int nbits)
Creates a bit buffer whose initial size is large enough to explicitly represent bits with indices in the range 0 through
nbits-1. |
|
BitBuffer(java.lang.String b)
|
|
| Method Summary | |
|---|---|
void |
and(BitBuffer r)
Calculate the bitwise and of this buffer and another. |
void |
append(BitBuffer b)
Append another BitBuffer to this one. |
protected static int |
bit(int bitIndex)
Given a bit index, return a unit that masks that bit in its unit. |
protected static int |
bitPos(int bitIndex)
Find the position in a unit of a given bit index. |
int |
byteDecode()
Decodes an integer stored using the byte encoding. |
int |
byteEncode(int n)
Encodes an integer in a byte-aligned fashion, using the minimal number of bytes. |
int |
chunkDecode()
A chunk decoder. |
int |
chunkEncode(int n)
A "chunk" encoder, after an idea by Doug Cutting. |
void |
clear()
Clears out a bit buffer entirely. |
java.lang.Object |
clone()
Cloning this BitBuffer produces a new
BitBuffer that is equal to it. |
void |
complement()
Calculate the bitwise complement of this buffer. |
void |
copyNBitsFrom(BitBuffer b,
int n)
Copies n bits from the given buffer onto our buffer. |
int |
countBits(boolean b)
Counts the number of bits of the given type in the buffer. |
protected int |
countTrue()
Counts the number of on bits in the buffer. |
static void |
createBuffers(BitBuffer[] buffers,
byte[] bytes,
int[] offsets)
Creates an array of BitBuffers from the provided array
of bytes. |
char[] |
decodeChars()
Decodes an array of characters from the head of the buffer. |
char[] |
decodeChars(int utflen)
|
void |
decodeChars(int utflen,
CACont str)
|
int |
decodeInt()
Decodes any Java int. |
long |
decodeLong()
Decodes any Java long. |
java.lang.String |
decodeUTF()
Decodes a string from the head of the buffer. |
int |
deltaDecode()
Delta decode an int from the front of the buffer. |
int |
deltaEncode(int n)
Delta encode an int onto the end of the buffer. |
int[] |
differenceDecodeArray()
Decodes an array of integers using the standard difference method. |
int |
differenceEncodeArray(int n,
int[] a)
Encodes an array integers using a standard difference method. |
int |
directDecode(int nBits)
Decode a positive integer that was coded using a specific number of bits. |
long |
directDecodeLong(int nBits)
Direct decodes a long. |
int |
directEncode(int n,
int nBits)
Encode a positive integer directly, using a given number of bits. |
int |
directEncode(long n,
int nBits)
Encodes a long directly in the given number of bits. |
int |
encodeChars(char[] a)
Encodes an array of characters using a modified UTF-8 encoding in a machine independant manner. |
int |
encodeChars(char[] a,
int b,
int e)
Encodes an array of characters using a modified UTF-8 encoding in a machine independant manner. |
static int |
encodeChars(char[] a,
int b,
int e,
BitBuffer buff)
Encodes an array of characters using a modified UTF-8 encoding in a machine independant manner. |
int |
encodeInt(int n)
Encodes any Java int. |
int |
encodeLong(long n)
Encodes any Java long. |
int |
encodeUTF(java.lang.String s)
Encodes a string using a modified UTF-8 encoding in a machine independant manner. |
int |
encodeUTFLen(int n)
Encodes a number of bytes in a manner suitable for use by decodeChars. |
protected void |
ensureCapacity(int unitsRequired)
|
int |
findFirst(boolean b)
Find the first instance of a bit matching the bit passed in. |
static int |
findFirstInUnit(boolean b,
int unit,
int start)
Find the positon of the first bit matching b in the unit held in unit, starting from position start. |
static int |
flLog2(int n)
Calculate the floor of the base 2 log of an integer n. |
static int |
flLog2(long n)
Calculates the floor of the base 2 log for a long. |
int |
gammaDecode()
Gamma decode the head of the buffer. |
int |
gammaEncode(int n)
Gamma encode an int onto the end of the buffer. |
int |
getCurrBitIndex()
Get the index of currBit. |
int |
getNBytes()
Get the number of bytes that it will take to store this BitBuffer in a file. |
int |
getWBytes()
Gets the total number of bytes that it will take to write this buffer into a file. |
protected static java.lang.String |
intToBinaryString(int n)
Build a string representation of an int with the bits in the right order. |
int |
length()
Get the length of the bit string. |
protected static java.lang.String |
longToBinaryString(long n)
|
void |
or(BitBuffer r)
Calculate the bitwise or of this buffer and another. |
boolean |
peek()
|
boolean |
pop()
Pop a bit from the head of the current buffer. |
void |
push(boolean newbit)
Push a bit onto the end of the buffer. |
void |
push(boolean newbit,
int n)
Push a bit onto the end of the buffer n times. |
void |
quickClear()
Does a quick clear out. |
void |
read(java.io.DataInput din)
Read a BitBuffer from an object that implements
DataInput. |
void |
reset()
Reset the head to point at the start of bits array. |
void |
seek(int i)
Seeks to the given bit index. |
void |
set(int bitIndex)
Sets the bit specified by the index to true. |
static BitBuffer |
share(BitBuffer shared)
Share a BitBuffer among multiple readers. |
static void |
skip(java.io.DataInput in)
Skips a bit buffer without reading it in. |
void |
skip(int n)
Skip the given number of bits. |
int |
tell()
Tells where the read head currently is. |
boolean |
test(int bitIndex)
Test whether a given bit is true or false. |
java.lang.String |
toActualString()
Print the bits in the buffer in the order in which they actually occur. |
java.lang.String |
toLogicalString()
Provide a logical string representation of the bit buffer. |
java.lang.String |
toString()
|
int |
unaryDecode()
Unary decode an int from the head of the BitBuffer. |
int |
unaryEncode(int n)
Unary encode an integer onto the end of the buffer. |
protected static int |
unitIndex(int bitIndex)
Given a bit index return unit index containing it. |
protected java.lang.String |
unitToLogicalString(int unit,
int pos)
Get the bits in a unit from right to left, put them in a string left to right, starting at the given position. |
void |
write(BitBuffer b)
Write this BitBuffer to another. |
void |
write(java.io.DataOutput dout)
Write a buffer to a stream with the length. |
void |
write(java.io.DataOutput dout,
boolean writeLength)
Write a BitBuffer to a class implementing
DataOuput. |
void |
xor(BitBuffer r)
Calculate the bitwise xor of this buffer and another. |
| Methods inherited from class java.lang.Object |
|---|
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
protected static final int ADDRESS_BITS_PER_UNIT
protected static final int BITS_PER_UNIT
protected static final int BIT_INDEX_MASK
protected static final int FULL_UNIT
protected int[] bits
protected int currUnit
protected int currBit
protected int head
protected int markPoint
protected static int[] ca
protected static int[] maxIntPerBits
| Constructor Detail |
|---|
public BitBuffer()
false.
public BitBuffer(int nbits)
0 through
nbits-1. All bits are initially false.
nbits - the initial size of the bit set.
java.lang.NegativeArraySizeException - if the specified initial size
is negative.public BitBuffer(BitBuffer b)
b - The bit buffer to use.
public BitBuffer(boolean resetp,
BitBuffer b)
resetp - true resets the read pointer in the new BBb - The bit buffer to share the data from..
public BitBuffer(java.io.DataInput di)
throws java.io.IOException
di - The stream to read from.
java.io.IOException - if there is an error during reading.
public BitBuffer(byte[] bytes,
int offset)
write method. The first four bytes of the array must
consist of the length of the buffer in bits. The remaining bytes
encode the integers of the actual data.
bytes - The array of bytes to convert to a
BitBufferoffset - The offset in the array to start at.public BitBuffer(java.lang.String b)
| Method Detail |
|---|
protected static final int unitIndex(int bitIndex)
protected static final int bit(int bitIndex)
protected static final int bitPos(int bitIndex)
public final int getCurrBitIndex()
public int getNBytes()
BitBuffer in a file.
getNBytes in interface IntEncoderpublic int getWBytes()
public static void createBuffers(BitBuffer[] buffers,
byte[] bytes,
int[] offsets)
BitBuffers from the provided array
of bytes. The idea is that if we've read multiple buffers from a
file, this method can convert them into a list of buffers useful for
things like iterating through postings files.
public java.lang.Object clone()
BitBuffer produces a new
BitBuffer that is equal to it. The clone of the bit
set is another bit set that has exactly the same bits set to
true as this bit set and the same current size.
Overrides the clone method of Object.
clone in class java.lang.Objectpublic static BitBuffer share(BitBuffer shared)
BitBuffer among multiple readers. There is a
strong assumption for this case that there will be no writers! This
will be most useful for multiple iterators through a compressed
inverted file entry, where we don't wish to duplicate the array.
shared - The BitBuffer that we will share.
BitBuffer sharing that data.protected void ensureCapacity(int unitsRequired)
public int length()
public void set(int bitIndex)
true. If the
bit that we set is further out than the current bit index, then
reset the position to just after the bit set.
bitIndex - the bit index to set.public boolean test(int bitIndex)
bitIndex - the index of the bit to test.
public void push(boolean newbit)
public void push(boolean newbit,
int n)
newbit - The bit to push.n - The number of times to push it.public boolean pop()
public boolean peek()
public int countBits(boolean b)
b - Whether we want to count true or
false.protected int countTrue()
public final void clear()
public void quickClear()
public void skip(int n)
n - The number of bits to skip.public void seek(int i)
i - The index to set the head to.public int tell()
public static final int findFirstInUnit(boolean b,
int unit,
int start)
public final int findFirst(boolean b)
public static final int flLog2(int n)
This is implemented as an unrolled binary search and is therefore totally out of control (but hopefully fast.)
n - The int to calculate flLog2 for
public static final int flLog2(long n)
public void reset()
public final int directEncode(int n,
int nBits)
n - The number to encode.nBits - The number of bits to use in the encoding.
public final int directEncode(long n,
int nBits)
public final long directDecodeLong(int nBits)
public final int directDecode(int nBits)
nBits - The number of bits to use.
public final int unaryEncode(int n)
n - The number to encode.
public final int unaryDecode()
public final int chunkEncode(int n)
We could probably try the same trick at the nybble level, if we're getting lots of numbers in the 1-128 range, which should show an improvement over the byte encoding.
public final int chunkDecode()
public final int byteEncode(int n)
byteEncode in interface IntEncoderpublic final int byteDecode()
byteDecode in interface IntEncoderbyteEncode(int)public int gammaEncode(int n)
n - The int to encode
public int gammaDecode()
public int deltaEncode(int n)
n - The int to encode.
public int deltaDecode()
public int encodeChars(char[] a)
First, we write the number of bytes in the encoding, using either 10, 18, or 34 bits. Following the length, each character of the string is output, in sequence, using the UTF-8 encoding for the character.
This code is based on that in DataOutputStream.
a - The array of characters to encode.
public int encodeUTFLen(int n)
n - The number of bytes to encode.
public int encodeChars(char[] a,
int b,
int e)
a - The array of characters to encode.b - The beginning offset in the arraye - The exclusive ending offset in the array.
public static int encodeChars(char[] a,
int b,
int e,
BitBuffer buff)
First, we write the number of bytes in the encoding, using either 10, 18, or 34 bits. Following the length, each character of the string is output, in sequence, using the UTF-8 encoding for the character.
This code is based on that in DataOutputStream.
a - The array of characters to encode.b - The beginning offset in the arraye - The exclusive ending offset in the array.
encodeChars(char[], int, int)public char[] decodeChars()
public char[] decodeChars(int utflen)
public void decodeChars(int utflen,
CACont str)
public int encodeUTF(java.lang.String s)
s - The string to encode.
encodeChars(char[])public java.lang.String decodeUTF()
public int differenceEncodeArray(int n,
int[] a)
public int[] differenceDecodeArray()
public int encodeInt(int n)
n - The int to encode.
public int decodeInt()
public int encodeLong(long n)
public long decodeLong()
public void append(BitBuffer b)
b - The BitBuffer to append.
public void copyNBitsFrom(BitBuffer b,
int n)
b - The BitBuffer to copy bits from.n - The number of bits to copy.public void or(BitBuffer r)
r - The BitBuffer to or with this one.public void and(BitBuffer r)
r - The BitBuffer to and with this one.public void xor(BitBuffer r)
r - the BitBuffer to xor with this one.public void complement()
public void write(BitBuffer b)
public void write(java.io.DataOutput dout)
throws java.io.IOException
java.io.IOException
public void write(java.io.DataOutput dout,
boolean writeLength)
throws java.io.IOException
BitBuffer to a class implementing
DataOuput. If the second parameter is true, then the length of the
buffer will be written out first as a 4 byte integer.
If the second parameter is false, then the length will not be
written out. This option is meant to be used in situations where we
want to batch up writes to provide better performance. The proviso
is the user must write the number of bits to read in to the
head of the string. If this is not done, you will not be
able to read the BitBuffer back in.
dout - The DataOutput object to write to.writeLength - If true, the length of the buffer will be written
out first as a 32 bit number.
java.io.IOException - if the write fails.
public void read(java.io.DataInput din)
throws java.io.IOException
BitBuffer from an object that implements
DataInput. This will allow us to use BitBuffers with random access
files.
din - The DataInput object to read from.
java.io.IOException - if the read fails.
public static void skip(java.io.DataInput in)
throws java.io.IOException
java.io.IOException
protected java.lang.String unitToLogicalString(int unit,
int pos)
unit - the unit to processpos - the place to start.
protected static java.lang.String intToBinaryString(int n)
n - The number to representprotected static java.lang.String longToBinaryString(long n)
public java.lang.String toActualString()
public java.lang.String toLogicalString()
public java.lang.String toString()
toString in class java.lang.Object
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||