update Readme and add license information

a4612caf · Tobias Schmidt · 5ba8c550 · a4612caf · a4612caf · a4612caf
Verified Commit a4612caf authored 4 years ago by Tobias Schmidt
--- a/README.md
+++ b/README.md
 # Partitioned Filters

-This repository contains our implementations of four approximate filters: the Bloom filter, the Cuckoo filter, theMorton filter, and the Xor filter. We used the code in our paper [A four-dimensional Analysis of Partitioned Filters](https://www.db.in.tum.de).
+This repository contains our implementations of four approximate filters: the Bloom filter [1], the Cuckoo filter [2], the Morton filter [3], and the Xor filter [4]. We used the code in our paper [A four-dimensional Analysis of Partitioned Filters](https://www.db.in.tum.de).

 In addition to our optimized filter implementations, the repository also contains the code of state-of-the-art competitors we compare to and extensive test cases. We generate the benchmarks using python scripts and included our results on an Intel i9-9900x (Skylake-X) with 64 GiB memory.

@@ -24,28 +24,52 @@ In addition to our optimized filter implementations, the repository also contain
 Executing all benchmarks takes roughly 1 week and requires 64 GiB memory. Some of the benchmarks do measure only the false-positive rate and the failed builds and, thus, should be executed with all available threads.

 The csv includes the following fields:
-| Field         | Unit         | Description                                                                                                                                                               |
-| ------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| name          | Text         | Configuration: `<BenchmarkName>_<k>`/  `<Fixture>`/  `<s>` / `<n_threads>` / `<n_partitions>` / `<elements_build>` / `<elements_lookup>` / `<shared_elements` / `_` / `_` |
-| real_time     | milliseconds | execution time per iteration                                                                                                                                              |
-| DTLB-misses   | float        | data translation lookaside buffer misses per iteration                                                                                                                    |
-| ITLB-misses   | float        | instruction translation lookaside buffer misses per iteration                                                                                                             |
-| L1D-misses    | float        | level 1 data cache misses per iteration                                                                                                                                   |
-| L1I-misses    | float        | level 1 instruction cache misses per iteration                                                                                                                            |
-| LLC-misses    | float        | last-level (L3) cache misses per iteration                                                                                                                                |
-| branch-misses | float        | branch misses per iteration                                                                                                                                               |
-| cycles        | float        | cycles per iteration                                                                                                                                                      |
-| instruction   | float        | executed instructions per iteration                                                                                                                                       |
-| FPR           | float        | false-positive rate of the filter (only available when lookup is benchmarked)                                                                                             |
-| failures      | integer      | number of failed builds                                                                                                                                                   |
-| retries       | integer      | number of retries needed to build the filter                                                                                                                              |
-| bits          | float        | number of bits per key allocated to the filter                                                                                                                            |
-| size          | integer      | size of the filter in bytes                                                                                                                                               |
+| Field         | Unit         | Description                                                  |
+| ------------- | ------------ | ------------------------------------------------------------ |
+| name          | Text         | Configuration: `<BenchmarkName>_<k>`/  `<Fixture>`/  `<s>` / `<n_threads>` / `<n_partitions>` / `<elements_build>` / `<elements_lookup>` / `<shared_elements>` / `_` / `_` |
+| real_time     | milliseconds | execution time per iteration                                 |
+| DTLB-misses   | float        | data translation lookaside buffer misses per iteration       |
+| ITLB-misses   | float        | instruction translation lookaside buffer misses per iteration |
+| L1D-misses    | float        | level 1 data cache misses per iteration                      |
+| L1I-misses    | float        | level 1 instruction cache misses per iteration               |
+| LLC-misses    | float        | last-level (L3) cache misses per iteration                   |
+| branch-misses | float        | branch misses per iteration                                  |
+| cycles        | float        | cycles per iteration                                         |
+| instruction   | float        | executed instructions per iteration                          |
+| FPR           | float        | false-positive rate of the filter (only available when lookup is benchmarked) |
+| failures      | integer      | number of failed builds                                      |
+| retries       | integer      | number of retries needed to build the filter                 |
+| bits          | float        | number of bits per key allocated to the filter               |
+| size          | integer      | size of the filter in bytes                                  |

 ### Repository Structure

+* `benchmark`: code for benchmarking the filter and the definition files with our results on the Skylake-X machine.
+* `cmake`: optional packages.
+* `lib`: external dependencies and existing filter implementations. *The code in this folder is not licensed under the MIT License (see Dependencies).*
+* `python`: scripts for generating, executing and plotting benchmarks.
+* `src`: filter implementations.
+* `test`: extensive test cases for our filter implementations and the integration the competitors.
+* `vendor`: external packages.
+
 ### Dependencies

+* `lib/amd_mortonfilter`: original [Morton Filter](https://github.com/AMDComputeLibraries/morton_filter) implementation used in [3], licensed under **the MIT License**.
+* `lib/bsd`: [(register-)blocked and (cache-)sectorized Bloom Filter](https://github.com/peterboncz/bloomfilter-bsd) implementations with SIMD support and external competitors used in [1], licensed under **the Apache License (Version 2.0), the 2-clause BSD License, and the 3-clause BSD License**.
+* `lib/cityhash`: [Google's CityHash](https://github.com/google/cityhash) implementation, licensed under **the MIT License**.
+* `lib/efficient_cuckoofilter`: original [Cuckoo Filter](https://github.com/efficient/cuckoofilter) implementation used in [2], licensed under **the Apache License (Version 2.0)**.
+* `lib/fastfilter`: original [Xor Filter](https://github.com/FastFilter/fastfilter_cpp) implementation used in [4] licensed under **the Apache License (Version 2.0)**.
+* `lib/impala`: original [sectorized Bloom Filter](https://github.com/apache/impala) used in the Impala, licensed under **the Apache License (Version 2.0)**.
+* `lib/libdivide`: the [LibDivide](https://github.com/ridiculousfish/libdivide) library computes magic numbers for optimizing integer divisions, licensed under **the zlib License**.
+* `lib/perfevent`: library for reading perf counters in C(++), licensed under **the MIT License**.
+* `lib/vacuumfilter`: [Vacuum Filter](https://github.com/wuwuz/Vacuum-Filter) implementation.
+
 ## Related Work

-We included the following state-the-art-filter implementations:
\ No newline at end of file
+[1] [Performance-Optimal Filtering: Bloom Overtakes Cuckoo at High Throughput](http://www.vldb.org/pvldb/vol12/p502-lang.pdf)
+
+[2] [Cuckoo Filter: Practically Better Than Bloom](http://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf)
+
+[3] [Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity](https://www.vldb.org/pvldb/vol11/p1041-breslow.pdf)
+
+[4] [Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters](https://arxiv.org/pdf/1912.08258.pdf)
\ No newline at end of file
--- a/lib/bsd/LICENSE
+++ b/lib/bsd/LICENSE
+Cuckoo filter
+-------------
+
+Copyright (C) 2013, Carnegie Mellon University and Intel Corporation
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+
+Vectorized Bloom filter
+-----------------------
+
+Copyright (c) 2014, Orestis Polychroniou
+Department of Computer Science, Columbia University
+All rights reserved.
+
+Material for research paper:
+  Venue:  Data Management on New Hardware (DaMoN) 2014
+  Title:  Vectorized Bloom Filters for Advanced SIMD Processors
+  Authors:  Orestis Polychroniou (orestis@cs.columbia.edu)
+            Kenneth A. Ross (kar@cs.columbia.edu)
+  Affiliation:  Department of Computer Science, Columbia University
+
+License:
+  Redistribution and use in source and binary forms, with or without
+  modification, are permitted provided that the following conditions
+  are met:
+  1. Redistributions of source code must retain the above copyright
+     notice, this list of conditions and the following disclaimer.
+  2. Redistributions in binary form must reproduce the above copyright
+     notice, this list of conditions and the following disclaimer in the
+     documentation and/or other materials provided with the distribution.
+  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+  OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+  DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+  THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+Impala Bloom filter
+-------------------
+
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
\ No newline at end of file
--- a/lib/bsd/README.md
+++ b/lib/bsd/README.md
+Bloom and Cuckoo Filter Benchmark
+=================================
+
+This repo contains the benchmark runner that was used to evaluate
+Bloom and Cuckoo filters for the VLDB'19 paper  [*Performance-Optimal Filtering:
+Bloom Overtakes Cuckoo at High Throughput*](http://www.vldb.org/pvldb/vol12/p502-lang.pdf).
+
+The repo is based on the Cuckoo filter repo and includes a slightly modified
+version of the Cuckoo filter as presented in the ACM CoNEXT'14 paper
+[*Cuckoo Filter: Practically Better Than Bloom*](http://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf).
+If you are looking for the latest version of the Cuckoo filter, please refer to
+[https://github.com/efficient/cuckoofilter](https://github.com/efficient/cuckoofilter).
+
+Further we include a copy of the Bloom filter implementation from the
+[Impala](https://impala.apache.org/) database system (see 'src/simd-block.h')
+and the [vectorized Bloom filter](http://www.cs.columbia.edu/~orestis/vbf.c)
+as presented in the DaMoN'14 paper
+[*Vectorized Bloom Filters for Advanced SIMD Processors*](http://www.cs.columbia.edu/~orestis/damon14.pdf).
+
+Our SIMD-optimized implementations of Bloom and Cuckoo filters are included
+as a git submodule. The source code can be found in the GitHub repo
+[bloomfilter-bsd](https://github.com/peterboncz/bloomfilter-bsd).
+
+
+### Erratum
+Post-publication an error was found (and fixed) in the collision resolution of
+cuckoo filters with arbitrarily sized tables.
+We refer to our blog post
+["Cuckoo Filters with arbitrarily sized tables"](https://databasearchitects.blogspot.com/2019/07/cuckoo-filters-with-arbitrarily-sized.html) for details.
+
+
+Using the Code
+--------------
+### Prerequisites
+* A C++14 compliant compiler; only GCC has been tested.
+* [CMake](http://www.cmake.org/) version 3.5 or later.
+* The [Boost C++ Libraries](https://www.boost.org/), version 1.58 or later.
+* A Linux environment (including the BASH shell and the typical GNU tools).
+* SQLite version 3.x
+* a TeX distribution, e.g. TeX Live (optional)
+
+
+### Repository structure
+* `benchmarks/`: the benchmark runner
+* `module/dtl/`: git submodule for the SIMD-optimized filter implementations
+* In particular, our Bloom filter implementation can be found in `./filter/blocked_bloomfilter/` and our Cuckoo implementation in
+  `./filter/cuckoofilter/`
+* `scripts/`: several shell scripts that drive the benchmark
+* `src/`: the C++ header and implementation of the (original) cuckoo filter, the
+  Impala and vectorized Bloom filters
+* `tex/`: LaTeX files to typeset the results
+
+### Building
+```
+git clone git@github.com:peterboncz/bloomfilter-repro.git
+cd bloomfilter-repro
+git submodule update --remote --recursive --init
+mkdir build
+cd build/
+cmake -DCMAKE_BUILD_TYPE=Release ..
+make -j 8 n_filter
+make -j 8 get_cache_size
+make -j 8 benchmark_`./determine_arch.sh`
+```
+The benchmark runner can be compiled for the following architectures:
+
+| Architecture | Description                                                                              |
+| ------------ | ---------------------------------------------------------------------------------------- |
+| `corei7`     | targets pre-AVX2 processor generations. All SIMD optimizations are disabled.             |
+| `core-avx2`  | targets Intel Haswell (or later) and AMD Ryzen processors with the AVX2 instruction set. |
+| `knl`        | targets Intel Knights Landing (KNL) processor with the AVX-512F instruction set.         |
+| `skx`        | targets Intel Skylake-X (or later) processors with the AVX-512F/BW instruction set.      |
+
+### Benchmarking
+
+For a quick start, we provide a *scripted* benchmark which automatically
+performs several performance measurements and imports the results into a
+SQLite database. Optionally a summary sheet is generated.
+
+The following scripts need to be executed in the given order:
+```
+./benchmark.sh
+./aggr_results.sh
+./summary.sh
+```
+The `benchmark.sh` script performs the actual measurements and stores the CSV results in
+the directory `./results`.
+The `aggr_results.sh` script imports the raw results into a SQLite database
+stored in `./results/skyline.sqlite3`.
+Optionally, the `summary.sh` script typesets a summary PDF.
+
+To perform other analyses, we refer to the source code of the scripts
+mentioned above.
+Further details on the output format and
+the benchmark options can be found [here](BENCHMARK.md).
+
+Related Work
+------------
+
+* [Morton Filter](https://github.com/AMDComputeLibraries/morton_filter)
+> A Morton filter is a modified cuckoo filter [...] that is optimized for bandwidth-constrained systems.
+
+* [Fluid Co-Processing](https://github.com/t1mm3/fluid_coprocessing)
+
+
+
+
+Licenses
+--------
+
+* The [Cuckoo filter](https://github.com/efficient/cuckoofilter) and the
+  [Impala](https://impala.apache.org/) Bloom filter implementation are licensed
+  under the Apache License, Version 2.0.
+* [Vectorized Bloom filters](http://www.cs.columbia.edu/~orestis/vbf.c) are
+  licensed under the 2-clause BSD license.
+* Our [SIMD-optimized implementations](https://github.com/peterboncz/bloomfilter-bsd)
+  are dual licensed under the Apache License, Version 2.0 and the 3-clause BSD
+  license.  
\ No newline at end of file
--- a/lib/cityhash/README
+++ b/lib/cityhash/README
+CityHash, a family of hash functions for strings.
+
+
+Introduction
+============
+
+CityHash provides hash functions for strings.  The functions mix the
+input bits thoroughly but are not suitable for cryptography.  See
+"Hash Quality," below, for details on how CityHash was tested and so on.
+
+We provide reference implementations in C++, with a friendly MIT license.
+
+CityHash32() returns a 32-bit hash.
+
+CityHash64() and similar return a 64-bit hash.
+
+CityHash128() and similar return a 128-bit hash and are tuned for
+strings of at least a few hundred bytes.  Depending on your compiler
+and hardware, it's likely faster than CityHash64() on sufficiently long
+strings.  It's slower than necessary on shorter strings, but we expect
+that case to be relatively unimportant.
+
+CityHashCrc128() and similar are variants of CityHash128() that depend
+on _mm_crc32_u64(), an intrinsic that compiles to a CRC32 instruction
+on some CPUs.  However, none of the functions we provide are CRCs.
+
+CityHashCrc256() is a variant of CityHashCrc128() that also depends
+on _mm_crc32_u64().  It returns a 256-bit hash.
+
+All members of the CityHash family were designed with heavy reliance
+on previous work by Austin Appleby, Bob Jenkins, and others.
+For example, CityHash32 has many similarities with Murmur3a.
+
+Performance on long strings: 64-bit CPUs
+========================================
+
+We are most excited by the performance of CityHash64() and its variants on
+short strings, but long strings are interesting as well.
+
+CityHash is intended to be fast, under the constraint that it hash very
+well.  For CPUs with the CRC32 instruction, CRC is speedy, but CRC wasn't
+designed as a hash function and shouldn't be used as one.  CityHashCrc128()
+is not a CRC, but it uses the CRC32 machinery.
+
+On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at about
+5 to 5.5 bytes/cycle.  The other CityHashCrc functions are wrappers around
+CityHashCrc256 and should have similar performance on long strings.
+(CityHashCrc256 in v1.0.3 was even faster, but we decided it wasn't as thorough
+as it should be.)  CityHash128 peaks at about 4.3 bytes/cycle.  The fastest
+Murmur variant on that hardware, Murmur3F, peaks at about 2.4 bytes/cycle.
+We expect the peak speed of CityHash128 to dominate CityHash64, which is
+aimed more toward short strings or use in hash tables.
+
+For long strings, a new function by Bob Jenkins, SpookyHash, is just
+slightly slower than CityHash128 on Intel x86-64 CPUs, but noticeably
+faster on AMD x86-64 CPUs.  For hashing long strings on AMD CPUs
+and/or CPUs without the CRC instruction, SpookyHash may be just as
+good or better than any of the CityHash variants.
+
+Performance on short strings: 64-bit CPUs
+=========================================
+
+For short strings, e.g., most hash table keys, CityHash64 is faster than
+CityHash128, and probably faster than all the aforementioned functions,
+depending on the mix of string lengths.  Here are a few results from that
+same hardware, where we (unrealistically) tested a single string length over
+and over again:
+
+Hash              Results
+------------------------------------------------------------------------------
+CityHash64 v1.0.3 7ns for 1 byte, or 6ns for 8 bytes, or 9ns for 64 bytes
+Murmur2 (64-bit)  6ns for 1 byte, or 6ns for 8 bytes, or 15ns for 64 bytes
+Murmur3F          14ns for 1 byte, or 15ns for 8 bytes, or 23ns for 64 bytes
+
+We don't have CityHash64 benchmarks results for v1.1, but we expect the
+numbers to be similar.
+
+Performance: 32-bit CPUs
+========================
+
+CityHash32 is the newest variant of CityHash.  It is intended for
+32-bit hardware in general but has been mostly tested on x86.  Our benchmarks
+suggest that Murmur3 is the nearest competitor to CityHash32 on x86.
+We don't know of anything faster that has comparable quality.  The speed rankings
+in our testing: CityHash32 > Murmur3f > Murmur3a (for long strings), and
+CityHash32 > Murmur3a > Murmur3f (for short strings).
+
+Installation
+============
+
+We provide reference implementations of several CityHash functions, written
+in C++.  The build system is based on autoconf.  It defaults the C++
+compiler flags to "-g -O2", which is probably slower than -O3 if you are
+using gcc.  YMMV.
+
+On systems with gcc, we generally recommend:
+
+./configure
+make all check CXXFLAGS="-g -O3"
+sudo make install
+
+Or, if your system has the CRC32 instruction, and you want to build everything:
+
+./configure --enable-sse4.2
+make all check CXXFLAGS="-g -O3 -msse4.2"
+sudo make install
+
+Note that our build system doesn't try to determine the appropriate compiler
+flag for enabling SSE4.2.  For gcc it is "-msse4.2".  The --enable-sse4.2
+flag to the configure script controls whether citycrc.h is installed when
+you "make install."  In general, picking the right compiler flags can be
+tricky, and may depend on your compiler, your hardware, and even how you
+plan to use the library.
+
+For generic information about how to configure this software, please try:
+
+./configure --help
+
+Failing that, please work from city.cc and city*.h, as they contain all the
+necessary code.
+
+
+Usage
+=====
+
+The above installation instructions will produce a single library.  It will
+contain CityHash32(), CityHash64(), and CityHash128(), and their variants,
+and possibly CityHashCrc128(), CityHashCrc128WithSeed(), and
+CityHashCrc256().  The functions with Crc in the name are declared in
+citycrc.h; the rest are declared in city.h.
+
+
+Limitations
+===========
+
+1) CityHash32 is intended for little-endian 32-bit code, and everything else in
+the current version of CityHash is intended for little-endian 64-bit CPUs.
+
+All functions that don't use the CRC32 instruction should work in
+little-endian 32-bit or 64-bit code.  CityHash should work on big-endian CPUs
+as well, but we haven't tested that very thoroughly yet.
+
+2) CityHash is fairly complex.  As a result of its complexity, it may not
+perform as expected on some compilers.  For example, preliminary reports
+suggest that some Microsoft compilers compile CityHash to assembly that's
+10-20% slower than it could be.
+
+
+Hash Quality
+============
+
+We like to test hash functions with SMHasher, among other things.
+SMHasher isn't perfect, but it seems to find almost any significant flaw.
+SMHasher is available at http://code.google.com/p/smhasher/
+
+SMHasher is designed to pass a 32-bit seed to the hash functions it tests.
+No CityHash function is designed to work that way, so we adapt as follows:
+For our functions that accept a seed, we use the given seed directly (padded
+with zeroes); for our functions that don't accept a seed, we hash the
+concatenation of the given seed and the input string.
+
+The CityHash functions have the following flaws according to SMHasher:
+
+(1) CityHash64: none
+
+(2) CityHash64WithSeed: none
+
+(3) CityHash64WithSeeds: did not test
+
+(4) CityHash128: none
+
+(5) CityHash128WithSeed: none
+
+(6) CityHashCrc128: none
+
+(7) CityHashCrc128WithSeed: none
+
+(8) CityHashCrc256: none
+
+(9) CityHash32: none
+
+Some minor flaws in 32-bit and 64-bit functions are harmless, as we
+expect the primary use of these functions will be in hash tables.  We
+may have gone slightly overboard in trying to please SMHasher and other
+similar tests, but we don't want anyone to choose a different hash function
+because of some minor issue reported by a quality test.
+
+
+For more information
+====================
+
+http://code.google.com/p/cityhash/
+
+cityhash-discuss@googlegroups.com
+
+Please feel free to send us comments, questions, bug reports, or patches.
\ No newline at end of file