Skip to content
Snippets Groups Projects
Verified Commit a4612caf authored by Tobias Schmidt's avatar Tobias Schmidt
Browse files

update Readme and add license information

parent 5ba8c550
No related branches found
No related tags found
No related merge requests found
Pipeline #46348 passed
# Partitioned Filters
This repository contains our implementations of four approximate filters: the Bloom filter, the Cuckoo filter, theMorton filter, and the Xor filter. We used the code in our paper [A four-dimensional Analysis of Partitioned Filters](https://www.db.in.tum.de).
This repository contains our implementations of four approximate filters: the Bloom filter [1], the Cuckoo filter [2], the Morton filter [3], and the Xor filter [4]. We used the code in our paper [A four-dimensional Analysis of Partitioned Filters](https://www.db.in.tum.de).
In addition to our optimized filter implementations, the repository also contains the code of state-of-the-art competitors we compare to and extensive test cases. We generate the benchmarks using python scripts and included our results on an Intel i9-9900x (Skylake-X) with 64 GiB memory.
......@@ -24,28 +24,52 @@ In addition to our optimized filter implementations, the repository also contain
Executing all benchmarks takes roughly 1 week and requires 64 GiB memory. Some of the benchmarks do measure only the false-positive rate and the failed builds and, thus, should be executed with all available threads.
The csv includes the following fields:
| Field | Unit | Description |
| ------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| name | Text | Configuration: `<BenchmarkName>_<k>`/ `<Fixture>`/ `<s>` / `<n_threads>` / `<n_partitions>` / `<elements_build>` / `<elements_lookup>` / `<shared_elements` / `_` / `_` |
| real_time | milliseconds | execution time per iteration |
| DTLB-misses | float | data translation lookaside buffer misses per iteration |
| ITLB-misses | float | instruction translation lookaside buffer misses per iteration |
| L1D-misses | float | level 1 data cache misses per iteration |
| L1I-misses | float | level 1 instruction cache misses per iteration |
| LLC-misses | float | last-level (L3) cache misses per iteration |
| branch-misses | float | branch misses per iteration |
| cycles | float | cycles per iteration |
| instruction | float | executed instructions per iteration |
| FPR | float | false-positive rate of the filter (only available when lookup is benchmarked) |
| failures | integer | number of failed builds |
| retries | integer | number of retries needed to build the filter |
| bits | float | number of bits per key allocated to the filter |
| size | integer | size of the filter in bytes |
| Field | Unit | Description |
| ------------- | ------------ | ------------------------------------------------------------ |
| name | Text | Configuration: `<BenchmarkName>_<k>`/ `<Fixture>`/ `<s>` / `<n_threads>` / `<n_partitions>` / `<elements_build>` / `<elements_lookup>` / `<shared_elements>` / `_` / `_` |
| real_time | milliseconds | execution time per iteration |
| DTLB-misses | float | data translation lookaside buffer misses per iteration |
| ITLB-misses | float | instruction translation lookaside buffer misses per iteration |
| L1D-misses | float | level 1 data cache misses per iteration |
| L1I-misses | float | level 1 instruction cache misses per iteration |
| LLC-misses | float | last-level (L3) cache misses per iteration |
| branch-misses | float | branch misses per iteration |
| cycles | float | cycles per iteration |
| instruction | float | executed instructions per iteration |
| FPR | float | false-positive rate of the filter (only available when lookup is benchmarked) |
| failures | integer | number of failed builds |
| retries | integer | number of retries needed to build the filter |
| bits | float | number of bits per key allocated to the filter |
| size | integer | size of the filter in bytes |
### Repository Structure
* `benchmark`: code for benchmarking the filter and the definition files with our results on the Skylake-X machine.
* `cmake`: optional packages.
* `lib`: external dependencies and existing filter implementations. *The code in this folder is not licensed under the MIT License (see Dependencies).*
* `python`: scripts for generating, executing and plotting benchmarks.
* `src`: filter implementations.
* `test`: extensive test cases for our filter implementations and the integration the competitors.
* `vendor`: external packages.
### Dependencies
* `lib/amd_mortonfilter`: original [Morton Filter](https://github.com/AMDComputeLibraries/morton_filter) implementation used in [3], licensed under **the MIT License**.
* `lib/bsd`: [(register-)blocked and (cache-)sectorized Bloom Filter](https://github.com/peterboncz/bloomfilter-bsd) implementations with SIMD support and external competitors used in [1], licensed under **the Apache License (Version 2.0), the 2-clause BSD License, and the 3-clause BSD License**.
* `lib/cityhash`: [Google's CityHash](https://github.com/google/cityhash) implementation, licensed under **the MIT License**.
* `lib/efficient_cuckoofilter`: original [Cuckoo Filter](https://github.com/efficient/cuckoofilter) implementation used in [2], licensed under **the Apache License (Version 2.0)**.
* `lib/fastfilter`: original [Xor Filter](https://github.com/FastFilter/fastfilter_cpp) implementation used in [4] licensed under **the Apache License (Version 2.0)**.
* `lib/impala`: original [sectorized Bloom Filter](https://github.com/apache/impala) used in the Impala, licensed under **the Apache License (Version 2.0)**.
* `lib/libdivide`: the [LibDivide](https://github.com/ridiculousfish/libdivide) library computes magic numbers for optimizing integer divisions, licensed under **the zlib License**.
* `lib/perfevent`: library for reading perf counters in C(++), licensed under **the MIT License**.
* `lib/vacuumfilter`: [Vacuum Filter](https://github.com/wuwuz/Vacuum-Filter) implementation.
## Related Work
We included the following state-the-art-filter implementations:
\ No newline at end of file
[1] [Performance-Optimal Filtering: Bloom Overtakes Cuckoo at High Throughput](http://www.vldb.org/pvldb/vol12/p502-lang.pdf)
[2] [Cuckoo Filter: Practically Better Than Bloom](http://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)
[3] [Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity](https://www.vldb.org/pvldb/vol11/p1041-breslow.pdf)
[4] [Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters](https://arxiv.org/pdf/1912.08258.pdf)
\ No newline at end of file
Cuckoo filter
-------------
Copyright (C) 2013, Carnegie Mellon University and Intel Corporation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Vectorized Bloom filter
-----------------------
Copyright (c) 2014, Orestis Polychroniou
Department of Computer Science, Columbia University
All rights reserved.
Material for research paper:
Venue: Data Management on New Hardware (DaMoN) 2014
Title: Vectorized Bloom Filters for Advanced SIMD Processors
Authors: Orestis Polychroniou (orestis@cs.columbia.edu)
Kenneth A. Ross (kar@cs.columbia.edu)
Affiliation: Department of Computer Science, Columbia University
License:
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Impala Bloom filter
-------------------
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
\ No newline at end of file
Bloom and Cuckoo Filter Benchmark
=================================
This repo contains the benchmark runner that was used to evaluate
Bloom and Cuckoo filters for the VLDB'19 paper [*Performance-Optimal Filtering:
Bloom Overtakes Cuckoo at High Throughput*](http://www.vldb.org/pvldb/vol12/p502-lang.pdf).
The repo is based on the Cuckoo filter repo and includes a slightly modified
version of the Cuckoo filter as presented in the ACM CoNEXT'14 paper
[*Cuckoo Filter: Practically Better Than Bloom*](http://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf).
If you are looking for the latest version of the Cuckoo filter, please refer to
[https://github.com/efficient/cuckoofilter](https://github.com/efficient/cuckoofilter).
Further we include a copy of the Bloom filter implementation from the
[Impala](https://impala.apache.org/) database system (see 'src/simd-block.h')
and the [vectorized Bloom filter](http://www.cs.columbia.edu/~orestis/vbf.c)
as presented in the DaMoN'14 paper
[*Vectorized Bloom Filters for Advanced SIMD Processors*](http://www.cs.columbia.edu/~orestis/damon14.pdf).
Our SIMD-optimized implementations of Bloom and Cuckoo filters are included
as a git submodule. The source code can be found in the GitHub repo
[bloomfilter-bsd](https://github.com/peterboncz/bloomfilter-bsd).
### Erratum
Post-publication an error was found (and fixed) in the collision resolution of
cuckoo filters with arbitrarily sized tables.
We refer to our blog post
["Cuckoo Filters with arbitrarily sized tables"](https://databasearchitects.blogspot.com/2019/07/cuckoo-filters-with-arbitrarily-sized.html) for details.
Using the Code
--------------
### Prerequisites
* A C++14 compliant compiler; only GCC has been tested.
* [CMake](http://www.cmake.org/) version 3.5 or later.
* The [Boost C++ Libraries](https://www.boost.org/), version 1.58 or later.
* A Linux environment (including the BASH shell and the typical GNU tools).
* SQLite version 3.x
* a TeX distribution, e.g. TeX Live (optional)
### Repository structure
* `benchmarks/`: the benchmark runner
* `module/dtl/`: git submodule for the SIMD-optimized filter implementations
* In particular, our Bloom filter implementation can be found in `./filter/blocked_bloomfilter/` and our Cuckoo implementation in
`./filter/cuckoofilter/`
* `scripts/`: several shell scripts that drive the benchmark
* `src/`: the C++ header and implementation of the (original) cuckoo filter, the
Impala and vectorized Bloom filters
* `tex/`: LaTeX files to typeset the results
### Building
```
git clone git@github.com:peterboncz/bloomfilter-repro.git
cd bloomfilter-repro
git submodule update --remote --recursive --init
mkdir build
cd build/
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j 8 n_filter
make -j 8 get_cache_size
make -j 8 benchmark_`./determine_arch.sh`
```
The benchmark runner can be compiled for the following architectures:
| Architecture | Description |
| ------------ | ---------------------------------------------------------------------------------------- |
| `corei7` | targets pre-AVX2 processor generations. All SIMD optimizations are disabled. |
| `core-avx2` | targets Intel Haswell (or later) and AMD Ryzen processors with the AVX2 instruction set. |
| `knl` | targets Intel Knights Landing (KNL) processor with the AVX-512F instruction set. |
| `skx` | targets Intel Skylake-X (or later) processors with the AVX-512F/BW instruction set. |
### Benchmarking
For a quick start, we provide a *scripted* benchmark which automatically
performs several performance measurements and imports the results into a
SQLite database. Optionally a summary sheet is generated.
The following scripts need to be executed in the given order:
```
./benchmark.sh
./aggr_results.sh
./summary.sh
```
The `benchmark.sh` script performs the actual measurements and stores the CSV results in
the directory `./results`.
The `aggr_results.sh` script imports the raw results into a SQLite database
stored in `./results/skyline.sqlite3`.
Optionally, the `summary.sh` script typesets a summary PDF.
To perform other analyses, we refer to the source code of the scripts
mentioned above.
Further details on the output format and
the benchmark options can be found [here](BENCHMARK.md).
Related Work
------------
* [Morton Filter](https://github.com/AMDComputeLibraries/morton_filter)
> A Morton filter is a modified cuckoo filter [...] that is optimized for bandwidth-constrained systems.
* [Fluid Co-Processing](https://github.com/t1mm3/fluid_coprocessing)
Licenses
--------
* The [Cuckoo filter](https://github.com/efficient/cuckoofilter) and the
[Impala](https://impala.apache.org/) Bloom filter implementation are licensed
under the Apache License, Version 2.0.
* [Vectorized Bloom filters](http://www.cs.columbia.edu/~orestis/vbf.c) are
licensed under the 2-clause BSD license.
* Our [SIMD-optimized implementations](https://github.com/peterboncz/bloomfilter-bsd)
are dual licensed under the Apache License, Version 2.0 and the 3-clause BSD
license.
\ No newline at end of file
CityHash, a family of hash functions for strings.
Introduction
============
CityHash provides hash functions for strings. The functions mix the
input bits thoroughly but are not suitable for cryptography. See
"Hash Quality," below, for details on how CityHash was tested and so on.
We provide reference implementations in C++, with a friendly MIT license.
CityHash32() returns a 32-bit hash.
CityHash64() and similar return a 64-bit hash.
CityHash128() and similar return a 128-bit hash and are tuned for
strings of at least a few hundred bytes. Depending on your compiler
and hardware, it's likely faster than CityHash64() on sufficiently long
strings. It's slower than necessary on shorter strings, but we expect
that case to be relatively unimportant.
CityHashCrc128() and similar are variants of CityHash128() that depend
on _mm_crc32_u64(), an intrinsic that compiles to a CRC32 instruction
on some CPUs. However, none of the functions we provide are CRCs.
CityHashCrc256() is a variant of CityHashCrc128() that also depends
on _mm_crc32_u64(). It returns a 256-bit hash.
All members of the CityHash family were designed with heavy reliance
on previous work by Austin Appleby, Bob Jenkins, and others.
For example, CityHash32 has many similarities with Murmur3a.
Performance on long strings: 64-bit CPUs
========================================
We are most excited by the performance of CityHash64() and its variants on
short strings, but long strings are interesting as well.
CityHash is intended to be fast, under the constraint that it hash very
well. For CPUs with the CRC32 instruction, CRC is speedy, but CRC wasn't
designed as a hash function and shouldn't be used as one. CityHashCrc128()
is not a CRC, but it uses the CRC32 machinery.
On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at about
5 to 5.5 bytes/cycle. The other CityHashCrc functions are wrappers around
CityHashCrc256 and should have similar performance on long strings.
(CityHashCrc256 in v1.0.3 was even faster, but we decided it wasn't as thorough
as it should be.) CityHash128 peaks at about 4.3 bytes/cycle. The fastest
Murmur variant on that hardware, Murmur3F, peaks at about 2.4 bytes/cycle.
We expect the peak speed of CityHash128 to dominate CityHash64, which is
aimed more toward short strings or use in hash tables.
For long strings, a new function by Bob Jenkins, SpookyHash, is just
slightly slower than CityHash128 on Intel x86-64 CPUs, but noticeably
faster on AMD x86-64 CPUs. For hashing long strings on AMD CPUs
and/or CPUs without the CRC instruction, SpookyHash may be just as
good or better than any of the CityHash variants.
Performance on short strings: 64-bit CPUs
=========================================
For short strings, e.g., most hash table keys, CityHash64 is faster than
CityHash128, and probably faster than all the aforementioned functions,
depending on the mix of string lengths. Here are a few results from that
same hardware, where we (unrealistically) tested a single string length over
and over again:
Hash Results
------------------------------------------------------------------------------
CityHash64 v1.0.3 7ns for 1 byte, or 6ns for 8 bytes, or 9ns for 64 bytes
Murmur2 (64-bit) 6ns for 1 byte, or 6ns for 8 bytes, or 15ns for 64 bytes
Murmur3F 14ns for 1 byte, or 15ns for 8 bytes, or 23ns for 64 bytes
We don't have CityHash64 benchmarks results for v1.1, but we expect the
numbers to be similar.
Performance: 32-bit CPUs
========================
CityHash32 is the newest variant of CityHash. It is intended for
32-bit hardware in general but has been mostly tested on x86. Our benchmarks
suggest that Murmur3 is the nearest competitor to CityHash32 on x86.
We don't know of anything faster that has comparable quality. The speed rankings
in our testing: CityHash32 > Murmur3f > Murmur3a (for long strings), and
CityHash32 > Murmur3a > Murmur3f (for short strings).
Installation
============
We provide reference implementations of several CityHash functions, written
in C++. The build system is based on autoconf. It defaults the C++
compiler flags to "-g -O2", which is probably slower than -O3 if you are
using gcc. YMMV.
On systems with gcc, we generally recommend:
./configure
make all check CXXFLAGS="-g -O3"
sudo make install
Or, if your system has the CRC32 instruction, and you want to build everything:
./configure --enable-sse4.2
make all check CXXFLAGS="-g -O3 -msse4.2"
sudo make install
Note that our build system doesn't try to determine the appropriate compiler
flag for enabling SSE4.2. For gcc it is "-msse4.2". The --enable-sse4.2
flag to the configure script controls whether citycrc.h is installed when
you "make install." In general, picking the right compiler flags can be
tricky, and may depend on your compiler, your hardware, and even how you
plan to use the library.
For generic information about how to configure this software, please try:
./configure --help
Failing that, please work from city.cc and city*.h, as they contain all the
necessary code.
Usage
=====
The above installation instructions will produce a single library. It will
contain CityHash32(), CityHash64(), and CityHash128(), and their variants,
and possibly CityHashCrc128(), CityHashCrc128WithSeed(), and
CityHashCrc256(). The functions with Crc in the name are declared in
citycrc.h; the rest are declared in city.h.
Limitations
===========
1) CityHash32 is intended for little-endian 32-bit code, and everything else in
the current version of CityHash is intended for little-endian 64-bit CPUs.
All functions that don't use the CRC32 instruction should work in
little-endian 32-bit or 64-bit code. CityHash should work on big-endian CPUs
as well, but we haven't tested that very thoroughly yet.
2) CityHash is fairly complex. As a result of its complexity, it may not
perform as expected on some compilers. For example, preliminary reports
suggest that some Microsoft compilers compile CityHash to assembly that's
10-20% slower than it could be.
Hash Quality
============
We like to test hash functions with SMHasher, among other things.
SMHasher isn't perfect, but it seems to find almost any significant flaw.
SMHasher is available at http://code.google.com/p/smhasher/
SMHasher is designed to pass a 32-bit seed to the hash functions it tests.
No CityHash function is designed to work that way, so we adapt as follows:
For our functions that accept a seed, we use the given seed directly (padded
with zeroes); for our functions that don't accept a seed, we hash the
concatenation of the given seed and the input string.
The CityHash functions have the following flaws according to SMHasher:
(1) CityHash64: none
(2) CityHash64WithSeed: none
(3) CityHash64WithSeeds: did not test
(4) CityHash128: none
(5) CityHash128WithSeed: none
(6) CityHashCrc128: none
(7) CityHashCrc128WithSeed: none
(8) CityHashCrc256: none
(9) CityHash32: none
Some minor flaws in 32-bit and 64-bit functions are harmless, as we
expect the primary use of these functions will be in hash tables. We
may have gone slightly overboard in trying to please SMHasher and other
similar tests, but we don't want anyone to choose a different hash function
because of some minor issue reported by a quality test.
For more information
====================
http://code.google.com/p/cityhash/
cityhash-discuss@googlegroups.com
Please feel free to send us comments, questions, bug reports, or patches.
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment