diff --git a/README.md b/README.md index e14fde3325485e46cccdb92f82c9b01da69d7d42..69965edeed5d8c5d4dbdd2750d467701864d5178 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Partitioned Filters -This repository contains our implementations of four approximate filters: the Bloom filter, the Cuckoo filter, theMorton filter, and the Xor filter. We used the code in our paper [A four-dimensional Analysis of Partitioned Filters](https://www.db.in.tum.de). +This repository contains our implementations of four approximate filters: the Bloom filter [1], the Cuckoo filter [2], the Morton filter [3], and the Xor filter [4]. We used the code in our paper [A four-dimensional Analysis of Partitioned Filters](https://www.db.in.tum.de). In addition to our optimized filter implementations, the repository also contains the code of state-of-the-art competitors we compare to and extensive test cases. We generate the benchmarks using python scripts and included our results on an Intel i9-9900x (Skylake-X) with 64 GiB memory. @@ -24,28 +24,52 @@ In addition to our optimized filter implementations, the repository also contain Executing all benchmarks takes roughly 1 week and requires 64 GiB memory. Some of the benchmarks do measure only the false-positive rate and the failed builds and, thus, should be executed with all available threads. The csv includes the following fields: -| Field | Unit | Description | -| ------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| name | Text | Configuration: `<BenchmarkName>_<k>`/ `<Fixture>`/ `<s>` / `<n_threads>` / `<n_partitions>` / `<elements_build>` / `<elements_lookup>` / `<shared_elements` / `_` / `_` | -| real_time | milliseconds | execution time per iteration | -| DTLB-misses | float | data translation lookaside buffer misses per iteration | -| ITLB-misses | float | instruction translation lookaside buffer misses per iteration | -| L1D-misses | float | level 1 data cache misses per iteration | -| L1I-misses | float | level 1 instruction cache misses per iteration | -| LLC-misses | float | last-level (L3) cache misses per iteration | -| branch-misses | float | branch misses per iteration | -| cycles | float | cycles per iteration | -| instruction | float | executed instructions per iteration | -| FPR | float | false-positive rate of the filter (only available when lookup is benchmarked) | -| failures | integer | number of failed builds | -| retries | integer | number of retries needed to build the filter | -| bits | float | number of bits per key allocated to the filter | -| size | integer | size of the filter in bytes | +| Field | Unit | Description | +| ------------- | ------------ | ------------------------------------------------------------ | +| name | Text | Configuration: `<BenchmarkName>_<k>`/ `<Fixture>`/ `<s>` / `<n_threads>` / `<n_partitions>` / `<elements_build>` / `<elements_lookup>` / `<shared_elements>` / `_` / `_` | +| real_time | milliseconds | execution time per iteration | +| DTLB-misses | float | data translation lookaside buffer misses per iteration | +| ITLB-misses | float | instruction translation lookaside buffer misses per iteration | +| L1D-misses | float | level 1 data cache misses per iteration | +| L1I-misses | float | level 1 instruction cache misses per iteration | +| LLC-misses | float | last-level (L3) cache misses per iteration | +| branch-misses | float | branch misses per iteration | +| cycles | float | cycles per iteration | +| instruction | float | executed instructions per iteration | +| FPR | float | false-positive rate of the filter (only available when lookup is benchmarked) | +| failures | integer | number of failed builds | +| retries | integer | number of retries needed to build the filter | +| bits | float | number of bits per key allocated to the filter | +| size | integer | size of the filter in bytes | ### Repository Structure +* `benchmark`: code for benchmarking the filter and the definition files with our results on the Skylake-X machine. +* `cmake`: optional packages. +* `lib`: external dependencies and existing filter implementations. *The code in this folder is not licensed under the MIT License (see Dependencies).* +* `python`: scripts for generating, executing and plotting benchmarks. +* `src`: filter implementations. +* `test`: extensive test cases for our filter implementations and the integration the competitors. +* `vendor`: external packages. + ### Dependencies +* `lib/amd_mortonfilter`: original [Morton Filter](https://github.com/AMDComputeLibraries/morton_filter) implementation used in [3], licensed under **the MIT License**. +* `lib/bsd`: [(register-)blocked and (cache-)sectorized Bloom Filter](https://github.com/peterboncz/bloomfilter-bsd) implementations with SIMD support and external competitors used in [1], licensed under **the Apache License (Version 2.0), the 2-clause BSD License, and the 3-clause BSD License**. +* `lib/cityhash`: [Google's CityHash](https://github.com/google/cityhash) implementation, licensed under **the MIT License**. +* `lib/efficient_cuckoofilter`: original [Cuckoo Filter](https://github.com/efficient/cuckoofilter) implementation used in [2], licensed under **the Apache License (Version 2.0)**. +* `lib/fastfilter`: original [Xor Filter](https://github.com/FastFilter/fastfilter_cpp) implementation used in [4] licensed under **the Apache License (Version 2.0)**. +* `lib/impala`: original [sectorized Bloom Filter](https://github.com/apache/impala) used in the Impala, licensed under **the Apache License (Version 2.0)**. +* `lib/libdivide`: the [LibDivide](https://github.com/ridiculousfish/libdivide) library computes magic numbers for optimizing integer divisions, licensed under **the zlib License**. +* `lib/perfevent`: library for reading perf counters in C(++), licensed under **the MIT License**. +* `lib/vacuumfilter`: [Vacuum Filter](https://github.com/wuwuz/Vacuum-Filter) implementation. + ## Related Work -We included the following state-the-art-filter implementations: \ No newline at end of file +[1] [Performance-Optimal Filtering: Bloom Overtakes Cuckoo at High Throughput](http://www.vldb.org/pvldb/vol12/p502-lang.pdf) + +[2] [Cuckoo Filter: Practically Better Than Bloom](http://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf) + +[3] [Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity](https://www.vldb.org/pvldb/vol11/p1041-breslow.pdf) + +[4] [Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters](https://arxiv.org/pdf/1912.08258.pdf) \ No newline at end of file diff --git a/lib/bsd/LICENSE b/lib/bsd/LICENSE new file mode 100644 index 0000000000000000000000000000000000000000..c85708fb43e2e2804a97ed38ea4560fb09f7289a --- /dev/null +++ b/lib/bsd/LICENSE @@ -0,0 +1,73 @@ +Cuckoo filter +------------- + +Copyright (C) 2013, Carnegie Mellon University and Intel Corporation + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + + +Vectorized Bloom filter +----------------------- + +Copyright (c) 2014, Orestis Polychroniou +Department of Computer Science, Columbia University +All rights reserved. + +Material for research paper: + Venue: Data Management on New Hardware (DaMoN) 2014 + Title: Vectorized Bloom Filters for Advanced SIMD Processors + Authors: Orestis Polychroniou (orestis@cs.columbia.edu) + Kenneth A. Ross (kar@cs.columbia.edu) + Affiliation: Department of Computer Science, Columbia University + +License: + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + +Impala Bloom filter +------------------- + +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. \ No newline at end of file diff --git a/lib/bsd/README.md b/lib/bsd/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a5821eaaba3d920b122f4256fa82d3b7ab4efbe7 --- /dev/null +++ b/lib/bsd/README.md @@ -0,0 +1,118 @@ +Bloom and Cuckoo Filter Benchmark +================================= + +This repo contains the benchmark runner that was used to evaluate +Bloom and Cuckoo filters for the VLDB'19 paper [*Performance-Optimal Filtering: +Bloom Overtakes Cuckoo at High Throughput*](http://www.vldb.org/pvldb/vol12/p502-lang.pdf). + +The repo is based on the Cuckoo filter repo and includes a slightly modified +version of the Cuckoo filter as presented in the ACM CoNEXT'14 paper +[*Cuckoo Filter: Practically Better Than Bloom*](http://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf). +If you are looking for the latest version of the Cuckoo filter, please refer to +[https://github.com/efficient/cuckoofilter](https://github.com/efficient/cuckoofilter). + +Further we include a copy of the Bloom filter implementation from the +[Impala](https://impala.apache.org/) database system (see 'src/simd-block.h') +and the [vectorized Bloom filter](http://www.cs.columbia.edu/~orestis/vbf.c) +as presented in the DaMoN'14 paper +[*Vectorized Bloom Filters for Advanced SIMD Processors*](http://www.cs.columbia.edu/~orestis/damon14.pdf). + +Our SIMD-optimized implementations of Bloom and Cuckoo filters are included +as a git submodule. The source code can be found in the GitHub repo +[bloomfilter-bsd](https://github.com/peterboncz/bloomfilter-bsd). + + +### Erratum +Post-publication an error was found (and fixed) in the collision resolution of +cuckoo filters with arbitrarily sized tables. +We refer to our blog post +["Cuckoo Filters with arbitrarily sized tables"](https://databasearchitects.blogspot.com/2019/07/cuckoo-filters-with-arbitrarily-sized.html) for details. + + +Using the Code +-------------- +### Prerequisites +* A C++14 compliant compiler; only GCC has been tested. +* [CMake](http://www.cmake.org/) version 3.5 or later. +* The [Boost C++ Libraries](https://www.boost.org/), version 1.58 or later. +* A Linux environment (including the BASH shell and the typical GNU tools). +* SQLite version 3.x +* a TeX distribution, e.g. TeX Live (optional) + + +### Repository structure +* `benchmarks/`: the benchmark runner +* `module/dtl/`: git submodule for the SIMD-optimized filter implementations +* In particular, our Bloom filter implementation can be found in `./filter/blocked_bloomfilter/` and our Cuckoo implementation in + `./filter/cuckoofilter/` +* `scripts/`: several shell scripts that drive the benchmark +* `src/`: the C++ header and implementation of the (original) cuckoo filter, the + Impala and vectorized Bloom filters +* `tex/`: LaTeX files to typeset the results + +### Building +``` +git clone git@github.com:peterboncz/bloomfilter-repro.git +cd bloomfilter-repro +git submodule update --remote --recursive --init +mkdir build +cd build/ +cmake -DCMAKE_BUILD_TYPE=Release .. +make -j 8 n_filter +make -j 8 get_cache_size +make -j 8 benchmark_`./determine_arch.sh` +``` +The benchmark runner can be compiled for the following architectures: + +| Architecture | Description | +| ------------ | ---------------------------------------------------------------------------------------- | +| `corei7` | targets pre-AVX2 processor generations. All SIMD optimizations are disabled. | +| `core-avx2` | targets Intel Haswell (or later) and AMD Ryzen processors with the AVX2 instruction set. | +| `knl` | targets Intel Knights Landing (KNL) processor with the AVX-512F instruction set. | +| `skx` | targets Intel Skylake-X (or later) processors with the AVX-512F/BW instruction set. | + +### Benchmarking + +For a quick start, we provide a *scripted* benchmark which automatically +performs several performance measurements and imports the results into a +SQLite database. Optionally a summary sheet is generated. + +The following scripts need to be executed in the given order: +``` +./benchmark.sh +./aggr_results.sh +./summary.sh +``` +The `benchmark.sh` script performs the actual measurements and stores the CSV results in +the directory `./results`. +The `aggr_results.sh` script imports the raw results into a SQLite database +stored in `./results/skyline.sqlite3`. +Optionally, the `summary.sh` script typesets a summary PDF. + +To perform other analyses, we refer to the source code of the scripts +mentioned above. +Further details on the output format and +the benchmark options can be found [here](BENCHMARK.md). + +Related Work +------------ + +* [Morton Filter](https://github.com/AMDComputeLibraries/morton_filter) +> A Morton filter is a modified cuckoo filter [...] that is optimized for bandwidth-constrained systems. + +* [Fluid Co-Processing](https://github.com/t1mm3/fluid_coprocessing) + + + + +Licenses +-------- + +* The [Cuckoo filter](https://github.com/efficient/cuckoofilter) and the + [Impala](https://impala.apache.org/) Bloom filter implementation are licensed + under the Apache License, Version 2.0. +* [Vectorized Bloom filters](http://www.cs.columbia.edu/~orestis/vbf.c) are + licensed under the 2-clause BSD license. +* Our [SIMD-optimized implementations](https://github.com/peterboncz/bloomfilter-bsd) + are dual licensed under the Apache License, Version 2.0 and the 3-clause BSD + license. \ No newline at end of file diff --git a/lib/cityhash/README b/lib/cityhash/README new file mode 100644 index 0000000000000000000000000000000000000000..4d868b16a846929b251d84f12cbe0e43db39dff8 --- /dev/null +++ b/lib/cityhash/README @@ -0,0 +1,196 @@ +CityHash, a family of hash functions for strings. + + +Introduction +============ + +CityHash provides hash functions for strings. The functions mix the +input bits thoroughly but are not suitable for cryptography. See +"Hash Quality," below, for details on how CityHash was tested and so on. + +We provide reference implementations in C++, with a friendly MIT license. + +CityHash32() returns a 32-bit hash. + +CityHash64() and similar return a 64-bit hash. + +CityHash128() and similar return a 128-bit hash and are tuned for +strings of at least a few hundred bytes. Depending on your compiler +and hardware, it's likely faster than CityHash64() on sufficiently long +strings. It's slower than necessary on shorter strings, but we expect +that case to be relatively unimportant. + +CityHashCrc128() and similar are variants of CityHash128() that depend +on _mm_crc32_u64(), an intrinsic that compiles to a CRC32 instruction +on some CPUs. However, none of the functions we provide are CRCs. + +CityHashCrc256() is a variant of CityHashCrc128() that also depends +on _mm_crc32_u64(). It returns a 256-bit hash. + +All members of the CityHash family were designed with heavy reliance +on previous work by Austin Appleby, Bob Jenkins, and others. +For example, CityHash32 has many similarities with Murmur3a. + +Performance on long strings: 64-bit CPUs +======================================== + +We are most excited by the performance of CityHash64() and its variants on +short strings, but long strings are interesting as well. + +CityHash is intended to be fast, under the constraint that it hash very +well. For CPUs with the CRC32 instruction, CRC is speedy, but CRC wasn't +designed as a hash function and shouldn't be used as one. CityHashCrc128() +is not a CRC, but it uses the CRC32 machinery. + +On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at about +5 to 5.5 bytes/cycle. The other CityHashCrc functions are wrappers around +CityHashCrc256 and should have similar performance on long strings. +(CityHashCrc256 in v1.0.3 was even faster, but we decided it wasn't as thorough +as it should be.) CityHash128 peaks at about 4.3 bytes/cycle. The fastest +Murmur variant on that hardware, Murmur3F, peaks at about 2.4 bytes/cycle. +We expect the peak speed of CityHash128 to dominate CityHash64, which is +aimed more toward short strings or use in hash tables. + +For long strings, a new function by Bob Jenkins, SpookyHash, is just +slightly slower than CityHash128 on Intel x86-64 CPUs, but noticeably +faster on AMD x86-64 CPUs. For hashing long strings on AMD CPUs +and/or CPUs without the CRC instruction, SpookyHash may be just as +good or better than any of the CityHash variants. + +Performance on short strings: 64-bit CPUs +========================================= + +For short strings, e.g., most hash table keys, CityHash64 is faster than +CityHash128, and probably faster than all the aforementioned functions, +depending on the mix of string lengths. Here are a few results from that +same hardware, where we (unrealistically) tested a single string length over +and over again: + +Hash Results +------------------------------------------------------------------------------ +CityHash64 v1.0.3 7ns for 1 byte, or 6ns for 8 bytes, or 9ns for 64 bytes +Murmur2 (64-bit) 6ns for 1 byte, or 6ns for 8 bytes, or 15ns for 64 bytes +Murmur3F 14ns for 1 byte, or 15ns for 8 bytes, or 23ns for 64 bytes + +We don't have CityHash64 benchmarks results for v1.1, but we expect the +numbers to be similar. + +Performance: 32-bit CPUs +======================== + +CityHash32 is the newest variant of CityHash. It is intended for +32-bit hardware in general but has been mostly tested on x86. Our benchmarks +suggest that Murmur3 is the nearest competitor to CityHash32 on x86. +We don't know of anything faster that has comparable quality. The speed rankings +in our testing: CityHash32 > Murmur3f > Murmur3a (for long strings), and +CityHash32 > Murmur3a > Murmur3f (for short strings). + +Installation +============ + +We provide reference implementations of several CityHash functions, written +in C++. The build system is based on autoconf. It defaults the C++ +compiler flags to "-g -O2", which is probably slower than -O3 if you are +using gcc. YMMV. + +On systems with gcc, we generally recommend: + +./configure +make all check CXXFLAGS="-g -O3" +sudo make install + +Or, if your system has the CRC32 instruction, and you want to build everything: + +./configure --enable-sse4.2 +make all check CXXFLAGS="-g -O3 -msse4.2" +sudo make install + +Note that our build system doesn't try to determine the appropriate compiler +flag for enabling SSE4.2. For gcc it is "-msse4.2". The --enable-sse4.2 +flag to the configure script controls whether citycrc.h is installed when +you "make install." In general, picking the right compiler flags can be +tricky, and may depend on your compiler, your hardware, and even how you +plan to use the library. + +For generic information about how to configure this software, please try: + +./configure --help + +Failing that, please work from city.cc and city*.h, as they contain all the +necessary code. + + +Usage +===== + +The above installation instructions will produce a single library. It will +contain CityHash32(), CityHash64(), and CityHash128(), and their variants, +and possibly CityHashCrc128(), CityHashCrc128WithSeed(), and +CityHashCrc256(). The functions with Crc in the name are declared in +citycrc.h; the rest are declared in city.h. + + +Limitations +=========== + +1) CityHash32 is intended for little-endian 32-bit code, and everything else in +the current version of CityHash is intended for little-endian 64-bit CPUs. + +All functions that don't use the CRC32 instruction should work in +little-endian 32-bit or 64-bit code. CityHash should work on big-endian CPUs +as well, but we haven't tested that very thoroughly yet. + +2) CityHash is fairly complex. As a result of its complexity, it may not +perform as expected on some compilers. For example, preliminary reports +suggest that some Microsoft compilers compile CityHash to assembly that's +10-20% slower than it could be. + + +Hash Quality +============ + +We like to test hash functions with SMHasher, among other things. +SMHasher isn't perfect, but it seems to find almost any significant flaw. +SMHasher is available at http://code.google.com/p/smhasher/ + +SMHasher is designed to pass a 32-bit seed to the hash functions it tests. +No CityHash function is designed to work that way, so we adapt as follows: +For our functions that accept a seed, we use the given seed directly (padded +with zeroes); for our functions that don't accept a seed, we hash the +concatenation of the given seed and the input string. + +The CityHash functions have the following flaws according to SMHasher: + +(1) CityHash64: none + +(2) CityHash64WithSeed: none + +(3) CityHash64WithSeeds: did not test + +(4) CityHash128: none + +(5) CityHash128WithSeed: none + +(6) CityHashCrc128: none + +(7) CityHashCrc128WithSeed: none + +(8) CityHashCrc256: none + +(9) CityHash32: none + +Some minor flaws in 32-bit and 64-bit functions are harmless, as we +expect the primary use of these functions will be in hash tables. We +may have gone slightly overboard in trying to please SMHasher and other +similar tests, but we don't want anyone to choose a different hash function +because of some minor issue reported by a quality test. + + +For more information +==================== + +http://code.google.com/p/cityhash/ + +cityhash-discuss@googlegroups.com + +Please feel free to send us comments, questions, bug reports, or patches. \ No newline at end of file