This repository contains our implementations of four approximate filters: the Bloom filter, the Cuckoo filter, theMorton filter, and the Xor filter. We used the code in our paper [A four-dimensional Analysis of Partitioned Filters](https://www.db.in.tum.de).
This repository contains our implementations of four approximate filters: the Bloom filter[1], the Cuckoo filter [2], theMorton filter [3], and the Xor filter [4]. We used the code in our paper [A four-dimensional Analysis of Partitioned Filters](https://www.db.in.tum.de).
In addition to our optimized filter implementations, the repository also contains the code of state-of-the-art competitors we compare to and extensive test cases. We generate the benchmarks using python scripts and included our results on an Intel i9-9900x (Skylake-X) with 64 GiB memory.
...
...
@@ -24,28 +24,52 @@ In addition to our optimized filter implementations, the repository also contain
Executing all benchmarks takes roughly 1 week and requires 64 GiB memory. Some of the benchmarks do measure only the false-positive rate and the failed builds and, thus, should be executed with all available threads.
| branch-misses | float | branch misses per iteration |
| cycles | float | cycles per iteration |
| instruction | float | executed instructions per iteration |
| FPR | float | false-positive rate of the filter (only available when lookup is benchmarked) |
| failures | integer | number of failed builds |
| retries | integer | number of retries needed to build the filter |
| bits | float | number of bits per key allocated to the filter |
| size | integer | size of the filter in bytes |
### Repository Structure
*`benchmark`: code for benchmarking the filter and the definition files with our results on the Skylake-X machine.
*`cmake`: optional packages.
*`lib`: external dependencies and existing filter implementations. *The code in this folder is not licensed under the MIT License (see Dependencies).*
*`python`: scripts for generating, executing and plotting benchmarks.
*`src`: filter implementations.
*`test`: extensive test cases for our filter implementations and the integration the competitors.
*`vendor`: external packages.
### Dependencies
*`lib/amd_mortonfilter`: original [Morton Filter](https://github.com/AMDComputeLibraries/morton_filter) implementation used in [3], licensed under **the MIT License**.
*`lib/bsd`: [(register-)blocked and (cache-)sectorized Bloom Filter](https://github.com/peterboncz/bloomfilter-bsd) implementations with SIMD support and external competitors used in [1], licensed under **the Apache License (Version 2.0), the 2-clause BSD License, and the 3-clause BSD License**.
*`lib/cityhash`: [Google's CityHash](https://github.com/google/cityhash) implementation, licensed under **the MIT License**.
*`lib/efficient_cuckoofilter`: original [Cuckoo Filter](https://github.com/efficient/cuckoofilter) implementation used in [2], licensed under **the Apache License (Version 2.0)**.
*`lib/fastfilter`: original [Xor Filter](https://github.com/FastFilter/fastfilter_cpp) implementation used in [4] licensed under **the Apache License (Version 2.0)**.
*`lib/impala`: original [sectorized Bloom Filter](https://github.com/apache/impala) used in the Impala, licensed under **the Apache License (Version 2.0)**.
*`lib/libdivide`: the [LibDivide](https://github.com/ridiculousfish/libdivide) library computes magic numbers for optimizing integer divisions, licensed under **the zlib License**.
*`lib/perfevent`: library for reading perf counters in C(++), licensed under **the MIT License**.
Post-publication an error was found (and fixed) in the collision resolution of
cuckoo filters with arbitrarily sized tables.
We refer to our blog post
["Cuckoo Filters with arbitrarily sized tables"](https://databasearchitects.blogspot.com/2019/07/cuckoo-filters-with-arbitrarily-sized.html) for details.
Using the Code
--------------
### Prerequisites
* A C++14 compliant compiler; only GCC has been tested.
*[CMake](http://www.cmake.org/) version 3.5 or later.
* The [Boost C++ Libraries](https://www.boost.org/), version 1.58 or later.
* A Linux environment (including the BASH shell and the typical GNU tools).
* SQLite version 3.x
* a TeX distribution, e.g. TeX Live (optional)
### Repository structure
*`benchmarks/`: the benchmark runner
*`module/dtl/`: git submodule for the SIMD-optimized filter implementations
* In particular, our Bloom filter implementation can be found in `./filter/blocked_bloomfilter/` and our Cuckoo implementation in
`./filter/cuckoofilter/`
*`scripts/`: several shell scripts that drive the benchmark
*`src/`: the C++ header and implementation of the (original) cuckoo filter, the