libdatrie  0.2.13
Macros | Typedefs | Functions
trie.h File Reference

Trie data type and functions. More...

Macros

#define trie_state_is_terminal(s)   trie_state_is_walkable((s),0)
 Check for terminal state. More...
 
#define trie_state_is_leaf(s)    (trie_state_is_single(s) && trie_state_is_terminal(s))
 Check for leaf state. More...
 

Typedefs

typedef struct _Trie Trie
 Trie data type.
 
typedef Bool(* TrieEnumFunc) (const AlphaChar *key, TrieData key_data, void *user_data)
 Trie enumeration function. More...
 
typedef struct _TrieState TrieState
 Trie walking state.
 
typedef struct _TrieIterator TrieIterator
 Trie iteration state.
 

Functions

Trietrie_new (const AlphaMap *alpha_map)
 Create a new trie. More...
 
Trietrie_new_from_file (const char *path)
 Create a new trie by loading from a file. More...
 
Trietrie_fread (FILE *file)
 Create a new trie by reading from an open file. More...
 
void trie_free (Trie *trie)
 Free a trie object. More...
 
size_t trie_get_serialized_size (Trie *trie)
 Get trie serialized size. More...
 
void trie_serialize (Trie *trie, uint8 *ptr)
 Serializes trie data into a memory buffer (including mapping) More...
 
int trie_save (Trie *trie, const char *path)
 Save a trie to file. More...
 
int trie_fwrite (Trie *trie, FILE *file)
 Write trie data to an open file. More...
 
Bool trie_is_dirty (const Trie *trie)
 Check pending changes. More...
 
Bool trie_retrieve (const Trie *trie, const AlphaChar *key, TrieData *o_data)
 Retrieve an entry from trie. More...
 
Bool trie_store (Trie *trie, const AlphaChar *key, TrieData data)
 Store a value for an entry to trie. More...
 
Bool trie_store_if_absent (Trie *trie, const AlphaChar *key, TrieData data)
 Store a value for an entry to trie only if the key is not present. More...
 
Bool trie_delete (Trie *trie, const AlphaChar *key)
 Delete an entry from trie. More...
 
Bool trie_enumerate (const Trie *trie, TrieEnumFunc enum_func, void *user_data)
 Enumerate entries in trie. More...
 
TrieStatetrie_root (const Trie *trie)
 Get root state of a trie. More...
 
TrieStatetrie_state_clone (const TrieState *s)
 Clone a trie state. More...
 
void trie_state_copy (TrieState *dst, const TrieState *src)
 Copy trie state to another. More...
 
void trie_state_free (TrieState *s)
 Free a trie state. More...
 
void trie_state_rewind (TrieState *s)
 Rewind a trie state. More...
 
Bool trie_state_walk (TrieState *s, AlphaChar c)
 Walk the trie from the state. More...
 
Bool trie_state_is_walkable (const TrieState *s, AlphaChar c)
 Test walkability of character from state. More...
 
int trie_state_walkable_chars (const TrieState *s, AlphaChar chars[], int chars_nelm)
 Get all walkable characters from state. More...
 
Bool trie_state_is_single (const TrieState *s)
 Check for single path. More...
 
TrieData trie_state_get_data (const TrieState *s)
 Get data from terminal state. More...
 
TrieIteratortrie_iterator_new (TrieState *s)
 Create a new trie iterator. More...
 
void trie_iterator_free (TrieIterator *iter)
 Free a trie iterator. More...
 
Bool trie_iterator_next (TrieIterator *iter)
 Move trie iterator to the next entry. More...
 
AlphaChartrie_iterator_get_key (const TrieIterator *iter)
 Get key for a trie iterator. More...
 
TrieData trie_iterator_get_data (const TrieIterator *iter)
 Get data for the entry referenced by an iterator. More...
 

Detailed Description

Trie data type and functions.

Trie is a kind of digital search tree, an efficient indexing method with O(1) time complexity for searching. Comparably as efficient as hashing, trie also provides flexibility on incremental matching and key spelling manipulation. This makes it ideal for lexical analyzers, as well as spelling dictionaries.

This library is an implementation of double-array structure for representing trie, as proposed by Junichi Aoe. The details of the implementation can be found at http://linux.thai.net/~thep/datrie/datrie.html

A Trie is associated with an AlphaMap, a map between actual alphabet characters and the raw characters used to walk through trie. You can define the alphabet set by adding ranges of character codes to it before associating it to a trie. And the keys to be added to the trie must comprise only characters in such ranges. Note that the size of the alphabet set is limited to 256 (TRIE_CHAR_MAX + 1), and the AlphaMap will map the alphabet characters to raw codes in the range 0..255 (0..TRIE_CHAR_MAX). The alphabet character ranges need not be continuous, but the mapped raw codes will be continuous, for the sake of compactness of the trie.

A new Trie can be created in memory using trie_new(), saved to file using trie_save(), and loaded later with trie_new_from_file(). It can even be embeded in another file using trie_fwrite() and read back using trie_fread(). After use, Trie objects must be freed using trie_free().

Operations on trie include:

Macro Definition Documentation

◆ trie_state_is_leaf

#define trie_state_is_leaf (   s)     (trie_state_is_single(s) && trie_state_is_terminal(s))

Check for leaf state.

Parameters
s: the state to check
Returns
boolean value indicating whether it is a leaf state

Check if the given state is a leaf state. A leaf state is a terminal state that has no other branch.

◆ trie_state_is_terminal

#define trie_state_is_terminal (   s)    trie_state_is_walkable((s),0)

Check for terminal state.

Parameters
s: the state to check
Returns
boolean value indicating whether it is a terminal state

Check if the given state is a terminal state. A terminal state is a trie state that terminates a key, and stores a value associated with it.

Typedef Documentation

◆ TrieEnumFunc

typedef Bool(* TrieEnumFunc) (const AlphaChar *key, TrieData key_data, void *user_data)

Trie enumeration function.

Parameters
key: the key of the entry
data: the data of the entry
user_data: the user-supplied data on enumerate call
Returns
TRUE to continue enumeration, FALSE to stop

Function Documentation

◆ trie_delete()

Bool trie_delete ( Trie trie,
const AlphaChar key 
)

Delete an entry from trie.

Parameters
trie: the trie
key: the key for the entry to delete
Returns
boolean value indicating whether the key exists and is removed

Delete an entry for the given key from trie.

◆ trie_enumerate()

Bool trie_enumerate ( const Trie trie,
TrieEnumFunc  enum_func,
void *  user_data 
)

Enumerate entries in trie.

Parameters
trie: the trie
enum_func: the callback function to be called on each key
user_data: user-supplied data to send as an argument to enum_func
Returns
boolean value indicating whether all the keys are visited

Enumerate all entries in trie. For each entry, the user-supplied enum_func callback function is called, with the entry key and data. Returning FALSE from such callback will stop enumeration and return FALSE.

◆ trie_fread()

Trie* trie_fread ( FILE *  file)

Create a new trie by reading from an open file.

Parameters
file: the handle of the open file
Returns
a pointer to the created trie, NULL on failure

Create a new trie and initialize its contents by reading from the open file. After reading, the file pointer is left at the end of the trie data. This can be useful for reading embedded trie index as part of a file data.

The created object must be freed with trie_free().

Available since: 0.2.4

◆ trie_free()

void trie_free ( Trie trie)

Free a trie object.

Parameters
trie: the trie object to free

Destruct the trie and free its allocated memory.

◆ trie_fwrite()

int trie_fwrite ( Trie trie,
FILE *  file 
)

Write trie data to an open file.

Parameters
trie: the trie
file: the open file
Returns
0 on success, non-zero on failure

Write trie data to file which is opened for writing. After writing, the file pointer is left at the end of the trie data. This can be useful for embedding trie index as part of a file data.

Available since: 0.2.4

◆ trie_get_serialized_size()

size_t trie_get_serialized_size ( Trie trie)

Get trie serialized size.

Parameters
trie: the trie
Returns
size of the trie in bytes

Returns size that would be occupied by a trie if it was serialized into a binary blob or file.

Available since: 0.2.13

◆ trie_is_dirty()

Bool trie_is_dirty ( const Trie trie)

Check pending changes.

Parameters
trie: the trie object
Returns
TRUE if there are pending changes, FALSE otherwise

Check if the trie is dirty with some pending changes and needs saving to keep the file synchronized.

◆ trie_iterator_free()

void trie_iterator_free ( TrieIterator iter)

Free a trie iterator.

Parameters
iter: the trie iterator to free

Destruct the iterator iter and free its allocated memory.

Available since: 0.2.6

◆ trie_iterator_get_data()

TrieData trie_iterator_get_data ( const TrieIterator iter)

Get data for the entry referenced by an iterator.

Parameters
iter: an iterator
Returns
the data associated with the entry referenced by iterator iter, or TRIE_DATA_ERROR if iter does not reference to a unique entry

Get value for the entry referenced by an iterator. Getting value from an un-iterated (or broken for any reason) iterator will result in TRIE_DATA_ERROR.

Available since: 0.2.6

◆ trie_iterator_get_key()

AlphaChar* trie_iterator_get_key ( const TrieIterator iter)

Get key for a trie iterator.

Parameters
iter: an iterator
Returns
the allocated key string; NULL on failure

Get key for the current entry referenced by the trie iterator iter.

The return string must be freed with free().

Available since: 0.2.6

◆ trie_iterator_new()

TrieIterator* trie_iterator_new ( TrieState s)

Create a new trie iterator.

Parameters
s: the TrieState to start iteration from
Returns
a pointer to the newly created TrieIterator, or NULL on failure

Create a new trie iterator for iterating entries of a sub-trie rooted at state s.

Use it with the result of trie_root() to iterate the whole trie.

The created object must be freed with trie_iterator_free().

Available since: 0.2.6

◆ trie_iterator_next()

Bool trie_iterator_next ( TrieIterator iter)

Move trie iterator to the next entry.

Parameters
iter: an iterator
Returns
boolean value indicating the availability of the entry

Move trie iterator to the next entry. On return, the iterator iter is updated to reference to the new entry if successfully moved.

Available since: 0.2.6

◆ trie_new()

Trie* trie_new ( const AlphaMap alpha_map)

Create a new trie.

Parameters
alpha_map: the alphabet set for the trie
Returns
a pointer to the newly created trie, NULL on failure

Create a new empty trie object based on the given alpha_map alphabet set. The trie contents can then be added and deleted with trie_store() and trie_delete() respectively.

The created object must be freed with trie_free().

◆ trie_new_from_file()

Trie* trie_new_from_file ( const char *  path)

Create a new trie by loading from a file.

Parameters
path: the path to the file
Returns
a pointer to the created trie, NULL on failure

Create a new trie and initialize its contents by loading from the file at given path.

The created object must be freed with trie_free().

◆ trie_retrieve()

Bool trie_retrieve ( const Trie trie,
const AlphaChar key,
TrieData o_data 
)

Retrieve an entry from trie.

Parameters
trie: the trie
key: the key for the entry to retrieve
o_data: the storage for storing the entry data on return
Returns
boolean value indicating the existence of the entry.

Retrieve an entry for the given key from trie. On return, if key is found and o_data is not NULL, *o_data is set to the data associated to key.

◆ trie_root()

TrieState* trie_root ( const Trie trie)

Get root state of a trie.

Parameters
trie: the trie
Returns
the root state of the trie

Get root state of trie, for stepwise walking.

The returned state is allocated and must be freed with trie_state_free()

◆ trie_save()

int trie_save ( Trie trie,
const char *  path 
)

Save a trie to file.

Parameters
trie: the trie
path: the path to the file
Returns
0 on success, non-zero on failure

Create a new file at the given path and write trie data to it. If path already exists, its contents will be replaced.

◆ trie_serialize()

void trie_serialize ( Trie trie,
uint8 *  ptr 
)

Serializes trie data into a memory buffer (including mapping)

Parameters
trie: the trie
ptr: a pointer to current position inside of a preallocated buffer.

Write trie data to a current position in a buffer pointed by ptr. This can be useful for embedding trie index as part of a file data.

The size that the trie will occupy can be calculated using trie_get_serialized_size()

Available since: 0.2.13

◆ trie_state_clone()

TrieState* trie_state_clone ( const TrieState s)

Clone a trie state.

Parameters
s: the state to clone
Returns
an duplicated instance of s

Make a copy of trie state.

The returned state is allocated and must be freed with trie_state_free()

◆ trie_state_copy()

void trie_state_copy ( TrieState dst,
const TrieState src 
)

Copy trie state to another.

Parameters
dst: the destination state
src: the source state

Copy trie state data from src to dst. All existing data in dst is overwritten.

◆ trie_state_free()

void trie_state_free ( TrieState s)

Free a trie state.

Parameters
s: the state to free

Free the trie state.

◆ trie_state_get_data()

TrieData trie_state_get_data ( const TrieState s)

Get data from terminal state.

Parameters
s: a terminal state
Returns
the data associated with the terminal state s, or TRIE_DATA_ERROR if s is not a terminal state

Get value from a terminal state of trie. Getting value from a non-terminal state will result in TRIE_DATA_ERROR.

◆ trie_state_is_single()

Bool trie_state_is_single ( const TrieState s)

Check for single path.

Parameters
s: the state to check
Returns
boolean value indicating whether it is in a single path

Check if the given state is in a single path, that is, there is no other branch from it to leaf.

◆ trie_state_is_walkable()

Bool trie_state_is_walkable ( const TrieState s,
AlphaChar  c 
)

Test walkability of character from state.

Parameters
s: the state to check
c: the input character
Returns
boolean indicating walkability

Test if there is a transition from state s with input character c.

◆ trie_state_rewind()

void trie_state_rewind ( TrieState s)

Rewind a trie state.

Parameters
s: the state to rewind

Put the state at root.

◆ trie_state_walk()

Bool trie_state_walk ( TrieState s,
AlphaChar  c 
)

Walk the trie from the state.

Parameters
s: current state
c: key character for walking
Returns
boolean value indicating the success of the walk

Walk the trie stepwise, using a given character c. On return, the state s is updated to the new state if successfully walked.

◆ trie_state_walkable_chars()

int trie_state_walkable_chars ( const TrieState s,
AlphaChar  chars[],
int  chars_nelm 
)

Get all walkable characters from state.

Parameters
s: the state to get
chars: the storage for the result
chars_nelm: the size of chars[] in number of elements
Returns
total walkable characters

Get the list of all walkable characters from state s. At most chars_nelm walkable characters are stored in chars[] on return.

The function returns the actual number of walkable characters from s. Note that this may not equal the number of characters stored in chars[] if chars_nelm is less than the actual number.

Available since: 0.2.6

◆ trie_store()

Bool trie_store ( Trie trie,
const AlphaChar key,
TrieData  data 
)

Store a value for an entry to trie.

Parameters
trie: the trie
key: the key for the entry to store
data: the data associated to the entry
Returns
boolean value indicating the success of the operation

Store a data for the given key in trie. If key does not exist in trie, it will be appended. If it does, its current data will be overwritten.

◆ trie_store_if_absent()

Bool trie_store_if_absent ( Trie trie,
const AlphaChar key,
TrieData  data 
)

Store a value for an entry to trie only if the key is not present.

Parameters
trie: the trie
key: the key for the entry to store
data: the data associated to the entry
Returns
boolean value indicating the success of the operation

Store a data for the given key in trie. If key does not exist in trie, it will be inserted. If it does, the function will return failure and the existing value will not be touched.

This can be useful for multi-thread applications, as race condition can be avoided.

Available since: 0.2.4