Package org.apache.uima.internal.util
Class PositiveIntSet_impl
java.lang.Object
org.apache.uima.internal.util.PositiveIntSet_impl
- All Implemented Interfaces:
PositiveIntSet
An set of non-zero integers, ability to iterate over them (possibly in a sorted way),
with O(1) operations for adding, removing, and testing for contains.
Optimized for adding mostly increasing integers (with some space for less-than-first-one-added).
- major uses: indexes of Feature Structures.
Optimize for small sets
Graceful degradation for completely random integer operations on large sets;
keep O(1) operations, at the expense of extra space (< 3 x size).
Uses several different representations depending on the range of ints stored and the total number of ints stored.
Sizes: Tiny, medium, large
Ranges: semi-knowable, unknowable;
if semi-knowable, dense, small-range (< 65K), large-range
For all sizes,
if dense, use IntBitSet (with offset)
else
For large, (implies large-range, too) use IntHashSet
For medium,
if small-range < 65K, use IntHashSet with offset
else use IntHashSet
Arrange switching between representations
to occur infrequently, especially as cost of switching (size) grows
Arrange checking for switching to occur infrequently, taking into account how underlying data structures grow
(which is often by doubling)
Switch when adding new member(s) if alternative representation is sufficiently smaller
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final int
static final int[]
private static final int
private static final int
private static final int
private static final int
private PositiveIntSet
(package private) boolean
(package private) boolean
(package private) boolean
(package private) boolean
(package private) boolean
Set false once we find we have to reduce the bit offset -
Constructor Summary
ConstructorsConstructorDescriptionPositiveIntSet_impl
(int initialSize, int estMin, int estMax) Set up a Positive Bit Set -
Method Summary
Modifier and TypeMethodDescriptionboolean
add
(int key) private void
adjustBitSetForLowerOffset
(int key) When a key is added which is lower than the offset, we need to adjust the whole int set table.private boolean
allocateIntBitSet
(int estMax, int estMin, int offsetSpec) Allocate a new IntBitSetvoid
add all elements in this set to the IntVector v as a bulk operationvoid
clear()
remove all members of the setboolean
contains
(int key) private static int
estimatedBitSetOffset
(int estMin, int offsetSpec) int
find
(int element) int
get
(int position) For FSBagIndex low level iterator use DOESN"T WORK WITH INCREMENTING position VALUESprivate static int
getBitSetSpaceFromRange
(int adjKey, int spaceUsed) Only called if key won't fit in existing IntHashSet, IntSet or IntBitSet allocation If IntBitSet (spaceUsed != 0) compute what its size() (capacity) would be if the key were added (causing doubling); include the overhead of the intbitset class.private static int
getHashSetOffset
(int estMax, int estMin) private static int
getHashSetOverAllocateSize
(int existingSize) When converting from bitset to hash set, the initial hash set should be big enough so that it takes a while before it is doubled-expanded.private int
Only called if key won't fit in existing allocation returns new hash table size ( usually double) + overheadprivate static int
getHashSetSpace
(int numberOfElements, int estMax, int estMin) For the case where the intHashSet doesn't yet existprivate static int
getIntSetSpace
(int size) private void
handleBitSet
(int newKey) Bit Setprivate void
handleHashSet
(int newKey) Hash Setprivate void
handleIntSet
(int newKey) IntSet *(package private) boolean
(package private) boolean
boolean
isValid
(int position) For FSBagIndex low level iterator useiterator()
private void
maybeSwitchRepresentation
(int newKey) logic for switching representation BitSet preferred - is faster, can be more compact, and permits ordered iteration HashSet used when BitSet is too space inefficient.int
For FSBagIndex low level iterator useint
For FSBagIndex low level iterator useint
moveToNext
(int position) For FSBagIndex low level iterator useint
moveToPrevious
(int position) For FSBagIndex low level iterator useboolean
remove
(int key) int
size()
private void
switchFromBitSet
(int size, int offset) switching from bit set to either IntSet or IntHashSetprivate void
switchToBitSet
(int estMax, int estMin, int offset) private void
switchToHashSet
(int size, int offset) Switch from any intSet impl to Hash Setprivate void
switchToIntSet
(int size) switch to IntSetint[]
int[]
toString()
int[]
-
Field Details
-
HYSTERESIS
private static final int HYSTERESIS- See Also:
-
INT_SET_MAX_SIZE
private static final int INT_SET_MAX_SIZE- See Also:
-
HASH_SET_SHORT_MAX_SIZE
private static final int HASH_SET_SHORT_MAX_SIZE- See Also:
-
BIT_SET_OVERALLOCATE
private static final int BIT_SET_OVERALLOCATE- See Also:
-
HASH_SET_OVERALLOCATE_DIVIDER_SHIFT
private static final int HASH_SET_OVERALLOCATE_DIVIDER_SHIFT- See Also:
-
EMPTY_INT_ARRAY
public static final int[] EMPTY_INT_ARRAY -
intSet
-
isBitSet
boolean isBitSet -
isIntSet
boolean isIntSet -
isHashSet
boolean isHashSet -
secondTimeShrinkable
boolean secondTimeShrinkable -
useOffset
boolean useOffsetSet false once we find we have to reduce the bit offset
-
-
Constructor Details
-
PositiveIntSet_impl
public PositiveIntSet_impl() -
PositiveIntSet_impl
public PositiveIntSet_impl(int initialSize, int estMin, int estMax) Set up a Positive Bit Set- Parameters:
initialSize
- - if 0, don't allocate yet, wait for first add. If isBitSetDense, then this number is interpreted as the first int to be added, typically with an offset. The next two params are used only if initialSize is not 0.estMin
- - the estimated minimum int value to be addedestMax
- - the estimated max int value to be added
-
-
Method Details
-
iterator
- Specified by:
iterator
in interfacePositiveIntSet
- Returns:
- an iterator (may be ordered or unordered) over the members of the set
-
getUnorderedIterator
-
getOrderedIterator
-
clear
public void clear()Description copied from interface:PositiveIntSet
remove all members of the set- Specified by:
clear
in interfacePositiveIntSet
-
contains
public boolean contains(int key) - Specified by:
contains
in interfacePositiveIntSet
- Parameters:
key
- -- Returns:
- true if key is in the set
-
find
public int find(int element) - Specified by:
find
in interfacePositiveIntSet
- Parameters:
element
- an item which may be in the set- Returns:
- -1 if the item is not in the set, or a position value that can be used with iterators to start at that item.
-
add
public boolean add(int key) - Specified by:
add
in interfacePositiveIntSet
- Parameters:
key
- -- Returns:
- true if this set did not already contain the specified element
-
remove
public boolean remove(int key) - Specified by:
remove
in interfacePositiveIntSet
- Parameters:
key
- -- Returns:
- true if the set had this element before the remove
-
size
public int size()- Specified by:
size
in interfacePositiveIntSet
- Returns:
- number of elements in the set
-
adjustBitSetForLowerOffset
private void adjustBitSetForLowerOffset(int key) When a key is added which is lower than the offset, we need to adjust the whole int set table. Although an array copy could do this, we don't have access to the bitset impl. So we just set the offset to 0, and either convert the whole thing to a inthashset or copy all the bits to a new bit set.- Parameters:
key
-
-
getHashSetOffset
private static int getHashSetOffset(int estMax, int estMin) -
toUnorderedIntArray
public int[] toUnorderedIntArray() -
toOrderedIntArray
public int[] toOrderedIntArray() -
getBitSetSpaceFromRange
private static int getBitSetSpaceFromRange(int adjKey, int spaceUsed) Only called if key won't fit in existing IntHashSet, IntSet or IntBitSet allocation If IntBitSet (spaceUsed != 0) compute what its size() (capacity) would be if the key were added (causing doubling); include the overhead of the intbitset class. long words required = larger of double the current size, or the number of long words required- Parameters:
adjKey
- - the range of bits neededspaceUsed
- - the number of bits currently allocated (if currently a bit set), else 0- Returns:
- number of words needed to store the range,
-
getIntSetSpace
private static int getIntSetSpace(int size) - Parameters:
size
- including new element to be added- Returns:
- number of words including overhead to store that many elements unless is bigger than INT_SET_MAX_SIZE, then return MAX_VALUE
-
getHashSetSpace
private int getHashSetSpace()Only called if key won't fit in existing allocation returns new hash table size ( usually double) + overhead- Parameters:
numberOfEntries
-- Returns:
- new hash table size ( usually double) + overhead
-
getHashSetSpace
private static int getHashSetSpace(int numberOfElements, int estMax, int estMin) For the case where the intHashSet doesn't yet exist- Parameters:
numberOfElements
- -estMax
- - the largest int to be storedestMin
- - the smallest int to be stored- Returns:
- the size in words
-
getHashSetOverAllocateSize
private static int getHashSetOverAllocateSize(int existingSize) When converting from bitset to hash set, the initial hash set should be big enough so that it takes a while before it is doubled-expanded. Reduce overallocation for small sizes because the cost to switch is low, and the % overallocate could be a large.- Parameters:
existingSize
-- Returns:
- overallocate size
-
maybeSwitchRepresentation
private void maybeSwitchRepresentation(int newKey) logic for switching representation BitSet preferred - is faster, can be more compact, and permits ordered iteration HashSet used when BitSet is too space inefficient. Avoid switching if the new key being added won't increase the representation size - for hash: if number of elements +1 won't cause doubling of the hash table - for bitset: if key will fit in existing "capacity" Compute space needed after adding - bitset: array doubles - hashset: array doubles Preventing jitters back and forth: - removes don't cause switching - compare against other size including its non-linear space jumps. -
handleBitSet
private void handleBitSet(int newKey) Bit Set -
handleIntSet
private void handleIntSet(int newKey) IntSet * -
handleHashSet
private void handleHashSet(int newKey) Hash Set -
allocateIntBitSet
private boolean allocateIntBitSet(int estMax, int estMin, int offsetSpec) Allocate a new IntBitSet- Parameters:
estMax
- the largest int to store in the bit set - determines the initial sizeestMin
- this is a lower bound on the value that can be in the bit setoffsetSpec
- this is the offset to use, or -1, to calculate the offset from the initial_size (assuming a small amount of adds will be for ints somewhat less than the initial_size entry (being treated as the first int to be added)- Returns:
- true if allocated with offset, implying guarantee the estMax key fits.
-
estimatedBitSetOffset
private static int estimatedBitSetOffset(int estMin, int offsetSpec) -
switchFromBitSet
private void switchFromBitSet(int size, int offset) switching from bit set to either IntSet or IntHashSet- Parameters:
size
- - space needed including new itemoffset
- - for IntHashSet values kept as short, the offset subtracted from values before storing them offset == Integer.MIN_VALUE means use 4 byte ints
-
switchToIntSet
private void switchToIntSet(int size) switch to IntSet- Parameters:
size
- number of elements to be able to store without expanding
-
switchToHashSet
private void switchToHashSet(int size, int offset) Switch from any intSet impl to Hash Set- Parameters:
size
- - the capacity - this many elements could be stored before swtichingoffset
- - used only when the size < 65K, and is a value subtracted from elements before storing them, in an attempt to fit the results into just "short"s instead of "int"s if == MIN_VALUE, then force 4 byte ints
-
switchToBitSet
private void switchToBitSet(int estMax, int estMin, int offset) -
get
public int get(int position) Description copied from interface:PositiveIntSet
For FSBagIndex low level iterator use DOESN"T WORK WITH INCREMENTING position VALUES- Specified by:
get
in interfacePositiveIntSet
- Parameters:
position
- - get the element at this position. This is for iterator use only, and is not related to any key- Returns:
- the element
-
moveToFirst
public int moveToFirst()Description copied from interface:PositiveIntSet
For FSBagIndex low level iterator use- Specified by:
moveToFirst
in interfacePositiveIntSet
- Returns:
- the position of the first element, or -1;
-
moveToLast
public int moveToLast()Description copied from interface:PositiveIntSet
For FSBagIndex low level iterator use- Specified by:
moveToLast
in interfacePositiveIntSet
- Returns:
- the position of the last element, or -1;
-
moveToNext
public int moveToNext(int position) Description copied from interface:PositiveIntSet
For FSBagIndex low level iterator use- Specified by:
moveToNext
in interfacePositiveIntSet
- Parameters:
position
- -- Returns:
- the position of the next element, or -1;
-
moveToPrevious
public int moveToPrevious(int position) Description copied from interface:PositiveIntSet
For FSBagIndex low level iterator use- Specified by:
moveToPrevious
in interfacePositiveIntSet
- Parameters:
position
- -- Returns:
- the position of the next element, or -1;
-
isValid
public boolean isValid(int position) Description copied from interface:PositiveIntSet
For FSBagIndex low level iterator use- Specified by:
isValid
in interfacePositiveIntSet
- Parameters:
position
- -- Returns:
- true if the position is between the first and last element inclusive.
-
bulkAddTo
Description copied from interface:PositiveIntSet
add all elements in this set to the IntVector v as a bulk operation- Specified by:
bulkAddTo
in interfacePositiveIntSet
- Parameters:
v
- - to be added to
-
toIntArray
public int[] toIntArray()- Specified by:
toIntArray
in interfacePositiveIntSet
- Returns:
- the set as an arbitrarily ordered int array
-
toString
-
isOffsetBitSet
boolean isOffsetBitSet() -
isShortHashSet
boolean isShortHashSet()
-