uart

package module
v0.12.16 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 12, 2026 License: MIT Imports: 18 Imported by: 0

README

ART Trees in Go: an enhanced radix trie with path compression

  • is also an Order-Statistics tree

This means you can access your dictionary like a Go slice, with integer indexes.

See "Memory Use" and "Benchmarks" below for a comparison to in-memory B-trees

(In memory B-trees like https://github.com/google/btree or https://github.com/tidwall/btree are more space and cache efficient if you do alot of sequential-key queries).

overview

Naming? This is a minimal-dependency version of my Adaptive Radix Tree (ART) implementation and it comes without serialization support. Thus it is unserialized ART, or uart.

What exactly? This project provides an enhanced implemention of the Adaptive Radix Tree (ART) data structure[1]. It is both a memory-based sorted key/value store and an Order-Statistic tree. It offers ordered lookups, range queries, and integer based indexing.

Why? In read-mostly situations, ART trees can have very good performance (e.g. unlocked reads are 35 ns/op for ART vs 10 ns/op for a standard Go map on a 10M key dictionary) while also providing sorted order lookups (e.g. find the next key greater-than this one) and range queries, things that hash tables cannot do.

Who else? Modern analytics-oriented databases like Tableau and DuckDB leverage ART trees to implement their indexes because of their read heavy workloads[6]. ART trees are a radix tree with variable sized inner nodes. They were designed in 2013 by the German computer scientists whose HyPer database became the backend engine for Tableau[7].

Here is a nice slide deck summarizing the paper.

https://bu-disc.github.io/CS561-Spring2023/slides/CAS-CS561-Class12.pdf

  • more detail

An ART tree is a sorted, key-value, in-memory dictionary. It maps arbitrary []byte keys to an any value. The ART tree is a trie, and provides both path compression (vertical compression) and variable sized inner nodes (horizontal compression) for space-efficient fanout.

Path compression is particularly attractive in situations where many keys have shared or redundant prefixes. This is the common case for many ordered key/value store use cases, such as database indexes and file-system hierarchies. The Google File System paper, for example, mentions the efficiencies obtained by exploiting prefix compression in their distributed file system[2]. FoundationDB's new Redwood backend provides it as a feature[3], and users wish the API could be improved by offering it[4] in query result APIs.

Ease of use: efficient greater-than/less-than key lookup and range iteration, as well as the ability to "treat the tree as a slice" using integer indexes (based on the counted B-tree idea -- see the tree.At(i int) method), make this ART tree implementation particularly easy to use in practice.

A practical feature is that it is safe to delete from the tree, or insert into it, during iteration. The iterator will simply resume at the next available key beyond the previously returned key. Reverse iteration and prefix-only scanning are both supported. Each iterator Next() call is an efficient O(log N). A complete pass through the tree, even with inter-leaved deletes, is still only O(N log N).

The integer indexing makes this ART implementation also an Order-Statistic tree, much like the Counted B-tree[5], so it is ideal for quickly computing quantiles, medians, and other statistics of interest. Jumping forward by 500 keys, for example, is an efficient O(log N) time operation for N keys in the tree.

Trie operations are sometimes described as being O(k) time where k is the string length of the key. That may be technically more correct, but I'ved opted for the more familiar O(log N) description under the assumption that, in practice, k will approximate log(N). Path compression means there is an inner node in the radix trie only where two keys differ, and this closely resembles an (unbalanced) binary search tree. ART trees never have to rebalance. See the journal paper for a full description[1].

This ART tree supports only a single value for each key -- it is not a "multi-map" in the C++ sense. This makes it simple to use and implement. Note that the user can store any value, so being a unique-key-map is not really a limitation. The user can simply point to a struct, slice or map holding the same-key values in the Leaf.Value field.

Concurrency: by default this ART implementation is goroutine safe, as it uses a sync.RWMutex for synchronization. Thus it allows only a single writer at a time, and any number of readers when there is no writing. Readers will block until the writer is done, and thus they see a fully consistent view of the tree. The SkipLocking flag can be set to omit all locking if goroutine coordination is provided by other means, or unneeded (in the case of single goroutine only access).

Iterators are available. Be aware they do no locking of their own, much like the built-in Go map.

Full package docs: https://pkg.go.dev/github.com/glycerine/uart

[1] "The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases" by Viktor Leis, Alfons Kemper, Thomas Neumann. https://db.in.tum.de/~leis/papers/ART.pdf

[2] "The Google File System" SOSP’03, October 19–22, 2003, Bolton Landing, New York, USA. by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. https://pdos.csail.mit.edu/6.824/papers/gfs.pdf

[3] "How does FoundationDB store keys with duplicate prefixes?" https://forums.foundationdb.org/t/how-does-foundationdb-store-keys-with-duplicate-prefixes/1234

[4] "Issue #2189: Prefix compress read range results" https://github.com/apple/foundationdb/issues/2189

[5] "Counted B-Trees" https://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html

[6] "Persistent Storage of Adaptive Radix Trees in DuckDB" https://duckdb.org/2022/07/27/art-storage.html

[7] https://hyper-db.de/ https://tableau.github.io/hyper-db/journey


Author: Jason E. Aten, Ph.D.

Licence: MIT

Originally based on, but much diverged from, the upstream repo https://github.com/WenyXu/sync-adaptive-radix-tree .

In particular, the racey and unfinished optimistic locking was removed, many bugs were fixed, and code was added to support ordered-key queries (FindGE, FindGTE, FindLE, FindLTE) and integer index access. A comprehensive test suite is inclued to verify all operations.

Memory use

I measured memory using the runtime.MemStats.HeapAlloc counter on four simple programs that did nothing else besides load the dictionary with one, two, or three copies of the 93790 linux kernel source tree paths in assets/linux.txt. The code for these simple programs is in the mem/ subdirectory.

So that we might get to see the benefits of path compression -- to see what difference that can make -- the second and third loads were exactly the same paths, but with __J appended to each, where J is the additional copy number.

go run map.go     # the built-in Go map (go1.24 Swiss tables based)
mstat.HeapAlloc = '21_518_728' (copies = 1; diff = 21_518_728 bytes)
mstat.HeapAlloc = '37_891_552' (copies = 2; diff = 16_372_824 bytes)
mstat.HeapAlloc = '55_186_712' (copies = 3; diff = 17_295_160 bytes)

go run art.go     # our ART tree
mstat.HeapAlloc = '34_789_632' (copies = 1; diff = 34_789_632 bytes)
mstat.HeapAlloc = '70_789_824' (copies = 2; diff = 36_000_192 bytes)
mstat.HeapAlloc = '101_813_560' (copies = 3; diff = 31_023_736 bytes)

go run rbtree.go  # a red-black tree
mstat.HeapAlloc = '22_911_096' (copies = 1; diff = 22_911_096 bytes)
mstat.HeapAlloc = '42_296_912' (copies = 2; diff = 19_385_816 bytes)
mstat.HeapAlloc = '55_081_880' (copies = 3; diff = 12_784_968 bytes)

go run googbtree.go  # github.com/google/btree degree=30 b-tree
mstat.HeapAlloc = '18_536_872' (copies = 1; diff = 18_536_872 bytes)
mstat.HeapAlloc = '30_895_664' (copies = 2; diff = 12_358_792 bytes)
mstat.HeapAlloc = '37_989_176' (copies = 3; diff = 7_093_512 bytes)

Conclusions: the Go map and the red-black tree use about the same amount of memory. The ART tree uses about 2x the memory of those.

The btree consumes much less memory than the map or red-black tree. ART seems memory hungry in comparison, using 2x to 3x or more memory compared to the btree. This is due to the extra internal nodes and leaf nodes. The incrementally more leaf and internal nodes can consume about the same memory as the more compact btree itself, resulting in 3 fold the memory consumption of the much more compact btree. This is a pretty serious drawback to ART trees. For random access (instead of sequential) reads, ART can be slightly faster than even a well tuned btree, but only about 15% faster. That speed hardly makes up for 3x more memory to my thinking.

Path/prefix compression too really seems like a wash on this data set. The second copy actually consumed more memory (3.5% more) than the inital set plus the baseline of memory for the runtime. The third copy consumed about 14% less than either of the first two. Prefix compression is heavily data dependent, of course. The longest available shared prefix in that data set was only 48 bytes, and occurred only twice. The most frequent compressed out prefix was only 1 byte long, and had 103094 instances. I added the tree.CompressedStats() method to allow you to analyze your own data sets. Here is the output for the linux.txt paths:

compressed stats: 'map[int]int{0:42130, 1:103094, 2:7770, 3:9886, 4:11357, 5:10560, 6:10469, 7:10634, 8:6340, 9:4769, 10:3635, 11:3080, 12:2293, 13:1839, 14:1444, 15:1277, 16:990, 17:782, 18:628, 19:474, 20:346, 21:283, 22:255, 23:168, 24:143, 25:89, 26:92, 27:58, 28:38, 29:27, 30:26, 31:18, 32:8, 33:19, 34:9, 35:4, 36:6, 37:3, 38:3, 39:1, 40:2, 41:2, 44:1, 46:2, 48:2}'

The total bytes saved through prefix compression here was 12_628_922 bytes (12MB). Which is to say, the ART total of 101_813_560 bytes (97MB) would have been at least 12MB larger (about 12% larger) and probably much, much larger with all the extra inner nodes, without the prefix compression.

While your mileage may very because you have a alot of shared sub-strings in your key space (remember that ART will compress "prefixes" in the middle of a key as well as at the beginning), my take-home summary here is that I don't think the prefix compression feature of ART should be a big deciding factor.

ART trees are about 2-5x as fast as the red-black tree used in my measurements (depending on the read/write mix), so in a sense this is a straight time-for-space trade-off: 2/3x as fast, for 2/3x the memory use versus the red-black tree. However the in-memory btree does so much better than the red-black tree; it is the real competition to ART.

A note about reading keys sequentially, the "full-table" scan case:

Without synchronization, a degree 32 b-tree github.com/google/btree, when reading sequential values in-order (its sweet spot), kicks ART's bootie to the curb, in both time and space.

The google/btree reads are 2x faster than the Go map Swiss table, and 9x faster than my ART. Writes are 15% faster than the Go map, and 75% faster than my ART. Measurements below. Code in mem/googbtree.go and commented in tree_test.go Test620 (on branch). If no deletions are needed, the btree with bigger degree (say degree 3000) performs even better than the degree 32 btree, with the understanding that deletes are then 6x slower due to the large copies involved.

Still without synchronization: for random access, this ART is slightly faster than the btree on reads, and ART is slightly faster than the btree on writes (inserts). So the access pattern matters a great deal. For random writes and reads (single goroutine, unsynchronized access), this ART is only a little slower than the built-in Go map (which does not provide next-key-greater-than queries). You can use the tests in tree_bench_test.go to measure performance on your own data and access patterns.

In short, this ART is faster than the sync.Map in many cases, and competitive with the built-in Go map, and offers a sorted dictionary and fast order statstics. Nonetheless, if sequential full read of all keys (a full table scan) is common, an in-memory btree will save you a ton of time and space, and should be preferred to the ART tree. If your key access is random, you have memory to spare, and you absolutely need the last ounce of speed, the ART tree may be the marginally faster choice.

To go deeper into the rationale as to why in-memory B-trees do so well: The article here http://google-opensource.blogspot.com/2013/01/c-containers-that-save-memory-and-time.html says

For small data types, B-tree containers typically reduce memory use by 50 to 80% compared with Red-Black tree containers.

and

Storing multiple elements per node can also improve performance, especially in large containers with inexpensive key-compare functions (e.g., integer keys using less-than), relative to Red-Black tree containers. This is because performance in these cases is effectively governed by the number of cache misses, not the number of key-compare operations. For large data sets, using these B-tree containers will save memory and improve performance.

In the sequential read/full table scan, the btree has most reads cached from the last read, and so suffers very few L1 cache misses.

Benchmarks

For code, see tree_bench_test.go.

frac_x means 0.x read fraction. frac_0 means write-only, frac_10 means read-only.

Note that our ART tree provide sorted element key query, range queries, and integer indexing based access.

As hash tables, the Go map and sync.Map provide only unordered exact-match lookups.


Unlocked apples-to-apples versus the Go map and google/btree:

(To take synchronization overhead out of the picture.)

=== RUN   Test620_unlocked_read_comparison
map time to store 10_000_000 keys: 2.466118634s (246ns/op)
map reads 10_000_000 keys: elapsed 101.076718ms (10ns/op)
map deletes 10_000_000 keys: elapsed 1.36421433s (136ns/op)

uart.Tree time to store 10_000_000 keys: 3.665080254s (366ns/op)
Ascend(tree) reads 10_000_000 keys: elapsed 368.400458ms (36ns/op)
uart Iter() reads 10_000_000 keys: elapsed 354.294422ms (35ns/op)
tree.At(i) reads 10_000_000 keys: elapsed 381.973196ms (38ns/op)
tree.At(i) reads from 10: 9999990 keys: elapsed 381.521877ms (38ns/op)
my ART: delete 10000000 keys: elapsed 1.059195931s (105ns/op)

Notice: as Atfar does not use an iterator to cache 
sequential At calls, it is 6x slower than the iterator
or sequential At calls. This was the motivation for
adding the Tree.atCache mechanism to speed up 
sequential At(i), At(i+1), At(i+2), ... calls.

tree.Atfar(i) reads 10_000_000 keys: elapsed 2.431009745s (243ns/op)

// degree 32 b-tree github.com/google/btree
//
// (Code kept on a branch to keep zero dependencies).
// See https://github.com/glycerine/uart/tree/bench_goog_btree
//
google/btree time to store 10_000_000 keys: 2.097327599s (209ns/op)
google/btree read all keys sequentially: elapsed 49.415972ms (4ns/op)
google/btree delete all keys: elapsed 2.024685985s (202ns/op)

--- PASS: Test620_unlocked_read_comparison (12.14s)


started at Tue 2025 Mar 12 18:46:28

go test -v -run=blah -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: github.com/glycerine/uart
cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz

// our ART tree, using the default sync.RWMutex.

// The first line represents 100% write/write conflict.
// The last line shows 100% reading.

BenchmarkArtReadWrite
BenchmarkArtReadWrite/frac_0
BenchmarkArtReadWrite/frac_0-8                 	 2639517	       530.6 ns/op	     118 B/op	       4 allocs/op
BenchmarkArtReadWrite/frac_1
BenchmarkArtReadWrite/frac_1-8                 	 1934983	       748.4 ns/op	     106 B/op	       3 allocs/op
BenchmarkArtReadWrite/frac_2
BenchmarkArtReadWrite/frac_2-8                 	 2144061	       687.2 ns/op	      94 B/op	       3 allocs/op
BenchmarkArtReadWrite/frac_3
BenchmarkArtReadWrite/frac_3-8                 	 2418021	       627.3 ns/op	      83 B/op	       2 allocs/op
BenchmarkArtReadWrite/frac_4
BenchmarkArtReadWrite/frac_4-8                 	 2569453	       601.9 ns/op	      71 B/op	       2 allocs/op
BenchmarkArtReadWrite/frac_5
BenchmarkArtReadWrite/frac_5-8                 	 2924372	       565.7 ns/op	      59 B/op	       2 allocs/op
BenchmarkArtReadWrite/frac_6
BenchmarkArtReadWrite/frac_6-8                 	 3415107	       493.2 ns/op	      47 B/op	       1 allocs/op
BenchmarkArtReadWrite/frac_7
BenchmarkArtReadWrite/frac_7-8                 	 3958209	       432.9 ns/op	      35 B/op	       1 allocs/op
BenchmarkArtReadWrite/frac_8
BenchmarkArtReadWrite/frac_8-8                 	 4604090	       372.3 ns/op	      23 B/op	       0 allocs/op
BenchmarkArtReadWrite/frac_9
BenchmarkArtReadWrite/frac_9-8                 	 4809241	       295.1 ns/op	      11 B/op	       0 allocs/op
BenchmarkArtReadWrite/frac_10
BenchmarkArtReadWrite/frac_10-8                	23420194	        49.64 ns/op	       0 B/op	       0 allocs/op


// standard Go map wrapped with a sync.RWMutex (no range queries)

BenchmarkReadWrite_map_RWMutex_wrapped
BenchmarkReadWrite_map_RWMutex_wrapped/frac_0
BenchmarkReadWrite_map_RWMutex_wrapped/frac_0-8         	 5315512	       269.8 ns/op	      26 B/op	       1 allocs/op
BenchmarkReadWrite_map_RWMutex_wrapped/frac_1
BenchmarkReadWrite_map_RWMutex_wrapped/frac_1-8         	 5972695	       255.8 ns/op	      24 B/op	       0 allocs/op
BenchmarkReadWrite_map_RWMutex_wrapped/frac_2
BenchmarkReadWrite_map_RWMutex_wrapped/frac_2-8         	 6723937	       206.8 ns/op	      21 B/op	       0 allocs/op
BenchmarkReadWrite_map_RWMutex_wrapped/frac_3
BenchmarkReadWrite_map_RWMutex_wrapped/frac_3-8         	 7515493	       194.9 ns/op	      19 B/op	       0 allocs/op
BenchmarkReadWrite_map_RWMutex_wrapped/frac_4
BenchmarkReadWrite_map_RWMutex_wrapped/frac_4-8         	 8147653	       179.1 ns/op	      17 B/op	       0 allocs/op
BenchmarkReadWrite_map_RWMutex_wrapped/frac_5
BenchmarkReadWrite_map_RWMutex_wrapped/frac_5-8         	 8596716	       174.9 ns/op	      15 B/op	       0 allocs/op
BenchmarkReadWrite_map_RWMutex_wrapped/frac_6
BenchmarkReadWrite_map_RWMutex_wrapped/frac_6-8         	 9224728	       165.4 ns/op	      13 B/op	       0 allocs/op
BenchmarkReadWrite_map_RWMutex_wrapped/frac_7
BenchmarkReadWrite_map_RWMutex_wrapped/frac_7-8         	 9719365	       151.7 ns/op	       7 B/op	       0 allocs/op
BenchmarkReadWrite_map_RWMutex_wrapped/frac_8
BenchmarkReadWrite_map_RWMutex_wrapped/frac_8-8         	10205706	       148.0 ns/op	       6 B/op	       0 allocs/op
BenchmarkReadWrite_map_RWMutex_wrapped/frac_9
BenchmarkReadWrite_map_RWMutex_wrapped/frac_9-8         	 9078074	       149.2 ns/op	       3 B/op	       0 allocs/op
BenchmarkReadWrite_map_RWMutex_wrapped/frac_10
BenchmarkReadWrite_map_RWMutex_wrapped/frac_10-8        	35471313	        33.72 ns/op	       0 B/op	       0 allocs/op


// standard library sync.Map (no range queries)

BenchmarkReadWriteSyncMap
BenchmarkReadWriteSyncMap/frac_0
BenchmarkReadWriteSyncMap/frac_0-8                      	11585511	       128.9 ns/op	     111 B/op	       5 allocs/op
BenchmarkReadWriteSyncMap/frac_1
BenchmarkReadWriteSyncMap/frac_1-8                      	12977352	       131.1 ns/op	     101 B/op	       4 allocs/op
BenchmarkReadWriteSyncMap/frac_2
BenchmarkReadWriteSyncMap/frac_2-8                      	14838945	       122.2 ns/op	      90 B/op	       4 allocs/op
BenchmarkReadWriteSyncMap/frac_3
BenchmarkReadWriteSyncMap/frac_3-8                      	14907528	       114.5 ns/op	      80 B/op	       3 allocs/op
BenchmarkReadWriteSyncMap/frac_4
BenchmarkReadWriteSyncMap/frac_4-8                      	18078240	       100.8 ns/op	      70 B/op	       3 allocs/op
BenchmarkReadWriteSyncMap/frac_5
BenchmarkReadWriteSyncMap/frac_5-8                      	18791480	        89.52 ns/op	      59 B/op	       3 allocs/op
BenchmarkReadWriteSyncMap/frac_6
BenchmarkReadWriteSyncMap/frac_6-8                      	21172922	        79.55 ns/op	      49 B/op	       2 allocs/op
BenchmarkReadWriteSyncMap/frac_7
BenchmarkReadWriteSyncMap/frac_7-8                      	25711459	        73.04 ns/op	      38 B/op	       2 allocs/op
BenchmarkReadWriteSyncMap/frac_8
BenchmarkReadWriteSyncMap/frac_8-8                      	29925382	        60.51 ns/op	      28 B/op	       1 allocs/op
BenchmarkReadWriteSyncMap/frac_9
BenchmarkReadWriteSyncMap/frac_9-8                      	41599728	        46.34 ns/op	      18 B/op	       1 allocs/op
BenchmarkReadWriteSyncMap/frac_10
BenchmarkReadWriteSyncMap/frac_10-8                     	100000000	        11.96 ns/op	       8 B/op	       1 allocs/op


// a red-black tree; "github.com/glycerine/rbtree"

BenchmarkReadWrite_RedBlackTree
BenchmarkReadWrite_RedBlackTree/frac_0
BenchmarkReadWrite_RedBlackTree/frac_0-8                	 1000000	      1557 ns/op	      87 B/op	       2 allocs/op
BenchmarkReadWrite_RedBlackTree/frac_1
BenchmarkReadWrite_RedBlackTree/frac_1-8                	 1000000	      1521 ns/op	      40 B/op	       2 allocs/op
BenchmarkReadWrite_RedBlackTree/frac_2
BenchmarkReadWrite_RedBlackTree/frac_2-8                	 1000000	      1554 ns/op	      40 B/op	       2 allocs/op
BenchmarkReadWrite_RedBlackTree/frac_3
BenchmarkReadWrite_RedBlackTree/frac_3-8                	 1000000	      1485 ns/op	      40 B/op	       2 allocs/op
BenchmarkReadWrite_RedBlackTree/frac_4
BenchmarkReadWrite_RedBlackTree/frac_4-8                	 1000000	      1476 ns/op	      40 B/op	       2 allocs/op
BenchmarkReadWrite_RedBlackTree/frac_5
BenchmarkReadWrite_RedBlackTree/frac_5-8                	 1000000	      1548 ns/op	      40 B/op	       2 allocs/op
BenchmarkReadWrite_RedBlackTree/frac_6
BenchmarkReadWrite_RedBlackTree/frac_6-8                	 1000000	      1524 ns/op	      40 B/op	       2 allocs/op
BenchmarkReadWrite_RedBlackTree/frac_7
BenchmarkReadWrite_RedBlackTree/frac_7-8                	 1000000	      1638 ns/op	      40 B/op	       2 allocs/op
BenchmarkReadWrite_RedBlackTree/frac_8
BenchmarkReadWrite_RedBlackTree/frac_8-8                	 1000000	      1543 ns/op	      40 B/op	       2 allocs/op
BenchmarkReadWrite_RedBlackTree/frac_9
BenchmarkReadWrite_RedBlackTree/frac_9-8                	 1000000	      1548 ns/op	      40 B/op	       2 allocs/op
BenchmarkReadWrite_RedBlackTree/frac_10
BenchmarkReadWrite_RedBlackTree/frac_10-8               	 1000000	      1467 ns/op	      40 B/op	       2 allocs/op


PASS
ok  	github.com/glycerine/uart	137.808s

finished at Wed 2025 Mar 12 18:47:16

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Ascend

func Ascend(t *Tree, beg, endx Key) iter.Seq2[Key, any]

Ascend wraps a tree.Iter() iteration in ascending lexicographic (shortlex) order. See the Tree.Iter description for details.

func Descend added in v0.7.4

func Descend(t *Tree, endx, start Key) iter.Seq2[Key, any]

Descend iterates from highest to lowest key in lexicographic (shortlex) order. Perhaps counter-intuitively, the smaller (endx) key is always the first argument. "Smallest-first" is an easy way to remember this, as it applies to both directions. Descend is a simple wrapper around the RevIter method. In reverse iteration, the range covered is (endx, start], so the endx key itself will not be seen. Use Descend(nil, nil) to see all keys in the tree.

Types

type ByteSlice

type ByteSlice []byte

ByteSlice is an alias for []byte. It can be ignored in the uart Unserserialized ART project, as it is only used for serialization purposes elsewhere.

ByteSlice is a simple wrapper header on all msgpack messages; has the length and the bytes. Allows us length delimited messages; with length knowledge up front.

type Iterator added in v0.11.0

type Iterator interface {

	// Next() must be called to start the iteration before
	// Key(), Value(), or Leaf() will be meaningful.
	//
	// Next will iterate over all leaf nodes in
	// the specified range in the chosen direction.
	//
	// When the iteration is done, Next returns false.
	//
	// If the tree is modified between calls to Next,
	// a version change will be recognized, and
	// Next will transparently resume iteration
	// from the successor to the last returned key.
	Next() (ok bool)

	// Leaf returns the current leaf in
	// the iterator after the first successful
	// Next() call.
	Leaf() *Leaf

	// Value returns the current value of
	// the iterator after the first successful
	// Next() call. This is the same as Leaf().Value
	Value() any

	// Index returns the current integer index of
	// the iterator after the first successful
	// Next() call.
	Index() int

	// Key returns the current key of
	// the iterator after the first successful
	// Next() call.
	//
	// Warning: the user must not modify
	// this returned key! The tree depends on
	// its value for correctness. Make a copy
	// before modifying the copy. To change
	// a key in the tree, call tree.Remove(old)
	// and then tree.Insert(new) with the new key.
	Key() Key
}

An Iterator will scan the tree in lexicographic (shortlex) order. See the Iter() and RevIter() methods on Tree.

type Key

type Key []byte

Key is the []byte which the tree sorts in lexicographic (shortlex) order, which means that shorter keys sort before longer keys with the same prefix.

Key is an arbitrary string of bytes, and in particular can contain the 0 byte anywhere in the slice.

func (Key) At

func (key Key) At(pos int) byte

At() returns the char at key[pos], or a 0 if out of bounds.

type Leaf

type Leaf struct {
	Key   Key         `zid:"0"`
	Value interface{} `msg:"-"`
	// contains filtered or unexported fields
}

Leaf holds a Key and a Value together, and is stored in the Tree.

Users must take care not to modify the Key on any leaf still in the tree (for example, the leaf returned from a Find() call), since it is used internally in the sorted order that the Tree maintains. The leaf must "own" its Key bytes, and the the user must copy them if they want to make changes.

The leaf returned from Remove can be modified in any way desired, as it is no longer in the tree.

In contrast, users should feel free to update the leaf.Value on any leaf. This can be much more efficient than doing an insert to update a Key's value if the leaf is already in hand.

func NewLeaf

func NewLeaf(key Key, v any, x []byte) *Leaf

func (*Leaf) FlatString

func (n *Leaf) FlatString(depth int, recurse int) (s string)

func (*Leaf) String

func (lf *Leaf) String() string

type SearchModifier

type SearchModifier int
const (
	// Exact is the default.
	Exact SearchModifier = 0 // exact matches only; like a hash table
	GTE   SearchModifier = 1 // greater than or equal to this key.
	LTE   SearchModifier = 2 // less than or equal to this key.
	GT    SearchModifier = 3 // strictly greater than this key.
	LT    SearchModifier = 4 // strictly less than this key.
)

func (SearchModifier) String

func (smod SearchModifier) String() string

type TestBytes

type TestBytes struct {
	Slc []byte `zid:"0"`
}

type Tree

type Tree struct {
	RWmut sync.RWMutex `msg:"-"`

	// SkipLocking means do no internal
	// synchronization, because a higher
	// component is doing so.
	//
	// Warning: when using SkipLocking
	// the user's code _must_ synchronize (prevent
	// overlap) of readers and writers from
	// different goroutines who access the Tree
	// simultaneously. Under this setting,
	// the Tree will not do locking itself
	// (it does by default, with SkipLocking false).
	// Without synchronization, multiple goroutines
	// will create data races, lost data, and
	// segfaults from torn reads.
	//
	// The easiest way to do this is with a sync.RWMutex.
	// One such, the RWmut on this Tree, will be
	// employed if SkipLocking is allowed to
	// default to false.
	SkipLocking bool `msg:"-"`
	// contains filtered or unexported fields
}

Tree is a trie that implements the Adaptive Radix Tree (ART) algorithm to provide a sorted, key-value, in-memory dictionary[1]. The ART tree provides both path compression (vertical compression) and variable sized inner nodes (horizontal compression) for space-efficient fanout.

Path compression is particularly attractive in situations where many keys have redudant prefixes. This is the common case for many ordered-key-value-map use cases, such as database indexes and file-system hierarchies. The Google File System paper, for example, mentions the efficiencies obtained by exploiting prefix compression in their distributed file system[2]. FoundationDB's new Redwood backend provides it as a feature[3], and users wish the API could be improved by offering it[4] in query result APIs.

As an alternative to red-black trees, AVL trees, and other kinds of balanced binary trees, ART is particularly attractive. Like those trees, ART offers an ordered index of sorted keys allowing efficient O(log N) access for each unique key.

Efficient key-range lookup and iteration, as well as the ability to treat the tree as array using integer indexes (based on the counted B-tree idea[5]), make this ART tree implementation particularly easy to use in practice.

ART supports just a single value for each key -- it is not a "multi-map" in the C++ sense.

Concurrency: this ART implementation is goroutine safe, as it uses a the Tree.RWmut sync.RWMutex for synchronization. Thus it allows only a single writer at a time, and any number of readers. Readers will block until the writer is done, and thus they see a fully consistent view of the tree. The RWMutex approach was the fastest and easiest to reason about in our applications without overly complicating the code base. The SkipLocking flag can be set to omit all locking if goroutine coordination is provided by other means, or unneeded (in the case of single goroutine only access).

[1] "The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases" by Viktor Leis, Alfons Kemper, Thomas Neumann.

[2] "The Google File System" SOSP’03, October 19–22, 2003, Bolton Landing, New York, USA. by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. https://pdos.csail.mit.edu/6.824/papers/gfs.pdf

[3] "How does FoundationDB store keys with duplicate prefixes?" https://forums.foundationdb.org/t/how-does-foundationdb-store-keys-with-duplicate-prefixes/1234

[4] "Issue #2189: Prefix compress read range results" https://github.com/apple/foundationdb/issues/2189

[5] "Counted B-Trees" https://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html

func NewArtTree

func NewArtTree() *Tree

NewArtTree creates and returns a new ART Tree, ready for use.

func (*Tree) At

func (t *Tree) At(i int) (lf *Leaf, ok bool)

At(i) lets us think of the tree as a array, returning the i-th leaf from the sorted leaf nodes, using an efficient O(log N) time algorithm. Here N is the size or count of elements stored in the tree.

At() uses the counted B-tree approach described by Simon Tatham[1]. This is also known as an Order-Statistic tree in the literature[2].

Optimization: for the common case of sequential calls going forward, At(i) will transparently use an iterator to cache the tree traversal point so that the next At(i+1) call can pick up where the previous tree search left off. Since we don't have to repeat the mostly-the-same traversal down the tree again, the speed-up in benchmark (see Test620) for this common case is a dramatic 6x, from 240 nsec to 40 nsec per call. The trade-off is that we must do a small amount of allocation to maintain a stack (the checkpoint frames) during the calls. We have a small freelist pool of iterator checkpoint frames to minimize it, but complete zero-alloc is almost impossible (and the iterator itself is an allocation). My call is that this small space for large time-speed-up trade-off well worth making.

If you will be doing alot of random (un-sequential/non-linear) access to the tree, use Atfar() instead of At(). They are the same, except that Atfar will not spend any time trying to cache the tree traversal paths to your random access points.

[1] https://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html

[2] https://en.wikipedia.org/wiki/Order_statistic_tree

func (*Tree) Atfar added in v0.9.9

func (t *Tree) Atfar(i int) (lf *Leaf, ok bool)

Atfar is more suitable that At for random (non-linear sequential) access to the tree.

Atfar() is the same as At() except that it does not attempt to do any caching of the tree traversal point to speed up sequential calls such as At(i), At(i+1), At(i+2), ... like At() does.

Hence Atfar() saves the (relatively small) time that At() spends filling the iterator checkpoint cache.

If you will be doing alot of random (un-sequential) access to the tree, use Atfar() instead of At(). The name tries to suggest that this access is "far" away from any others.

func (*Tree) Atv

func (t *Tree) Atv(i int) (val any, ok bool)

Atv(i) is like At(i) but returns the value from the Leaf instead of the actual *Leaf itself, simply for convenience.

func (*Tree) Clone added in v0.12.14

func (t *Tree) Clone() (r *Tree)

func (*Tree) CompressedStats added in v0.11.6

func (t *Tree) CompressedStats() (cs map[int]int, bytesSaved int)

CompressedStats returns the count of inner nodes with a given compressed prefix length.

func (*Tree) Find

func (t *Tree) Find(smod SearchModifier, key Key) (lf *Leaf, idx int, found bool)

Find allows GTE, GT, LTE, LT, and Exact searches.

GTE: find a leaf greater-than-or-equal to key; the smallest such key.

GT: find a leaf strictly greater-than key; the smallest such key.

LTE: find a leaf less-than-or-equal to key; the largest such key.

LT: find a leaf less-than key; the largest such key.

Exact: find leaf whose key matches the supplied key exactly. This is the default. It acts like a hash table. A key can only be stored once in the tree. (It is not a multi-map in the C++ STL sense).

If key is nil, then GTE and GT return the first leaf in the tree, while LTE and LT return the last leaf in the tree.

Clients must take care not to modify the returned Leaf.Key, as it is not copied to keep memory use low. Doing so will result in undefined behavior.

The FindGTE, FindGT, FindLTE, and FindLT methods provide a more convenient interface to obtain the stored Leaf.Value if the full generality of Leaf access is not required.

By default, Find obtains a read-lock on the Tree.RWmut. This can be omitted by setting the Tree.SkipLocking option to true.

func (*Tree) FindExact

func (t *Tree) FindExact(key Key) (val any, idx int, found bool)

FindExact returns the element whose key matches the supplied key.

func (*Tree) FindGT

func (t *Tree) FindGT(key Key) (val any, idx int, found bool)

FindGT returns the first element whose key is greater than the supplied key.

func (*Tree) FindGTE

func (t *Tree) FindGTE(key Key) (val any, idx int, found bool)

FindGTE returns the first element whose key is greater than, or equal to, the supplied key.

func (*Tree) FindLT

func (t *Tree) FindLT(key Key) (val any, idx int, found bool)

FindGT returns the first element whose key is less than the supplied key.

func (*Tree) FindLTE

func (t *Tree) FindLTE(key Key) (val any, idx int, found bool)

FindLTE returns the first element whose key is less-than-or-equal to the supplied key.

func (*Tree) FirstLeaf

func (t *Tree) FirstLeaf() (lf *Leaf, idx int, found bool)

FirstLeaf returns the first leaf in the Tree.

func (*Tree) Insert

func (t *Tree) Insert(key Key, value any) (updated bool)

Insert could be called "insert -- or replace this key, if it is already in the tree, with this value". Insert makes a copy of key to avoid sharing bugs. The value is only stored and not copied. The returned `updated` flag will be true if the size of the tree did not change because the key was already in the tree--an "update" occurred. If updated is true, the existing leaf that had the identical key has had its previous value discarded, and the leaf.Value now holds the value from this Insert call.

func (*Tree) InsertLeaf

func (t *Tree) InsertLeaf(lf *Leaf) (updated bool)

InsertLeaf: the *Leaf lf *must* own the lf.Key it holds. It cannot be shared. Callers must guarantee this, copying the slice if necessary before submitting the Leaf.

func (*Tree) InsertX

func (t *Tree) InsertX(key Key, value any, x []byte) (updated bool)

InsertX now copies the key to avoid bugs. The value is held by pointer in the interface. The x slice is not copied.

func (*Tree) IsEmpty

func (t *Tree) IsEmpty() (empty bool)

IsEmpty returns true iff the Tree is empty.

func (*Tree) Iter added in v0.6.0

func (t *Tree) Iter(start, end []byte) (iter *iterator)

Iter starts a traversal over the range [start, end) in ascending order.

iter.Next() must be called to start the iteration before iter.Key(), iter.Value(), or iter.Leaf() will be meaningful.

When iter.Next() returns false, the iteration has completed.

We begin with the first key that is >= start and < end.

The end key must be > the start key, or no values will be returned. Either start or end can be nil to indicate the furthest possible range in that direction.

For example, note that [x, x) will return the empty set, unless x is nil. If x _is_ nil, this will return the entire tree in ascending order.

For another example, suppose the keys {0, 1, 2} are in the tree, and tree.Iter(0, 2) is called. Forward iteration will return 0, then 1.

The returned iterator is not concurrent/multiple goroutine safe. Iteration does no synchronization. This allows for single goroutine code that deletes from (or inserts into) the tree during the iteration, which is not an uncommon need.

func (*Tree) LastLeaf

func (t *Tree) LastLeaf() (lf *Leaf, idx int, found bool)

FirstLeaf returns the last leaf in the Tree.

func (*Tree) LeafIndex added in v0.5.0

func (t *Tree) LeafIndex(leaf *Leaf) (idx int, ok bool)

LeafIndex returns the integer index of the leaf in the tree using exact key matching. The index represents the position in the lexicographic (shortlex) sorted order of keys, and so can be used to compute quantile and other statistics efficiently. The time complexity is O(log N).

func (*Tree) Remove

func (t *Tree) Remove(key Key) (deleted bool, deletedLeaf *Leaf)

Remove deletes the key from the Tree. If the key is not present deleted will return false. If the key was present, deletedLeaf will supply its associated Leaf from which value, in the deletedLeaf.Value field, can be obtained.

func (*Tree) RevIter added in v0.6.0

func (t *Tree) RevIter(end, start []byte) (iter *iterator)

RevIter starts a traversal over the range (end, start] in descending order.

Note that the first argument to RevIter() is the smaller (if the two differ), assuming you don't want the empty set. This is true for Iter() as well.

iter.Next() must be called to start the iteration before iter.Key(), iter.Value(), or iter.Leaf() will be meaningful.

When iter.Next() returns false, the iteration has completed.

We begin with the first key that is <= start and > end.

The end key must be < the start key, or no values will be returned. Either start or end can be nil to indicate the furthest possible range in that direction.

For example, note that (x, x] will return the empty set, unless x is nil. If x _is_ nil, this will return the entire tree in descending order.

For another example, suppose the keys {0, 1, 2} are in the tree, and tree.RevIter(0, 2) is called. Reverse iteration will return 2, then 1. The same holds true if start (2 here) is replaced by by any integer > 2.

tree.RevIter(nil, 2) will yield 2, then 1, then 0; as will tree.RevIter(nil, nil).

The returned iterator is not concurrent/multiple goroutine safe. Iteration does no synchronization. This allows for single goroutine code that deletes from (or inserts into) the tree during the iteration, which is not an uncommon need.

func (*Tree) Size

func (t *Tree) Size() (sz int)

Size returns the number of keys (leaf nodes) stored in the tree.

func (*Tree) String

func (t *Tree) String() string

String does no locking. It returns a string representation of the tree. This is mostly for debugging, and can be quite slow for large trees.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL