{"id":249,"date":"2020-11-11T20:14:31","date_gmt":"2020-11-11T20:14:31","guid":{"rendered":"https:\/\/maboc.nl\/?p=249"},"modified":"2020-11-11T20:14:31","modified_gmt":"2020-11-11T20:14:31","slug":"bloomfilters","status":"publish","type":"post","link":"https:\/\/maboc.nl\/?p=249","title":{"rendered":"Bloomfilters"},"content":{"rendered":"<p>From time to time I read something and the term &#8220;bloomfilter&#8221; comes up. This weekend I wanted to know what actually is a bloomfilter. For a more thorough explananation go to <a href=\"https:\/\/en.wikipedia.org\/wiki\/Bloom_filter\" target=\"_blank\" rel=\"noopener noreferrer\">wiki<\/a>. What follows is my &#8220;laymens&#8221; explanantion.<\/p>\n<p>If you have a set of values, and you want to check whether items in a second set are in the first set, you can make use of a bloomfilter.\u00a0 I suppose the easiest way to understand the how and the why is I just explain what happens constructing and using the bloomfilter.<\/p>\n<p>Some Characteristics o a bloomfilter:<\/p>\n<ul>\n<li>Relatively small memory footprint<\/li>\n<li>It will NEVER deliver a false negative (It will not discard a value of\u00a0 the second set if it actually is in the first set)<\/li>\n<li>It may deliver some false positives. A value that is in set 2 will might be selected although it&#8217;s not in the first set.<\/li>\n<\/ul>\n<p>OK&#8230;let&#8217;s see how this works:<\/p>\n<ol>\n<li>Define the datasets\n<ol>\n<li>The first set may look like : 2, 5 This is the\u00a0 set we are going to test against.<\/li>\n<li>The second set may look like 2, 6, 7\u00a0 (are values of 2nd in the 1st set)<\/li>\n<\/ol>\n<\/li>\n<li>Constructing the Bloomfilter\n<ol>\n<li>Define a array of bits (all zero), let&#8217;s say 10 bits width. In practice the array will be much wider. (the filter looks like : 0000000000)<\/li>\n<li>Define a number of hashes, let&#8217;s say 3 hashses. If a value is fed to a hash it will return a integer, representing the number of a bucket in the filter. Our filter is 10 bits width, so every hash will generate a value between 1 and 10 (including).<\/li>\n<li>Process the first value of the first set (2)\n<ol>\n<li>Feed this value to hash 1. Let&#8217;s say the hash return 4. So the fourth bit of the filter will be set to 1 (0001000000)<\/li>\n<li>Feed this value to hash 2. Let&#8217;s say the hash return 8. So the eight bit of the filter will be set to 1 (0001000100)<\/li>\n<li>Feed this value to hash 3. Let&#8217;s say the hash return 5. So the fifth bit of the filter will be set to 1 (0001100100)<\/li>\n<\/ol>\n<\/li>\n<li>OK&#8230;we&#8217;re done with the first value, let&#8217;s go to the second value : 5\n<ol>\n<li>Feed this value set to hash 1. Let&#8217;s say it returns 8. So the eight bit of the bloom filter will be set to 1 (0001100100). Actuallty this bit was allready set&#8230;.no problem&#8230;it stays 1<\/li>\n<li>Feed this value to hash 2. Let&#8217;s say it return 10. So the tenth bit of the filter will be set to 1 (0001100101).<\/li>\n<li>Feed this value to hash 3.\u00a0 Let&#8217;s say it return 1. So the first bit of the filter is set to 1 (1001100101)<\/li>\n<\/ol>\n<\/li>\n<li>Done with the second value &#8230;. in a real example there would be many many more values to feed to the filter&#8230;.not here \ud83d\ude42<\/li>\n<\/ol>\n<\/li>\n<li>Now we want to test whether a value in the second set is in the first set.\n<ol>\n<li>Start with the first value of the second set: 2\n<ol>\n<li>Feed the value into the 3 hashes. The 3 hashes return 4, 8 and 5 (a hash will return the same output every time you feed it the same input)<\/li>\n<li>Test whether bits 4,8 and 5 are set in the bloomfilter\n<ol>\n<li>If all 3 are set&#8230;then YES this value is (probably) in the first set<\/li>\n<li>If not all 3 postions in the bloomfilter are set, then this value does not belong to the first set.<\/li>\n<li>Bit 4, 8 and 5 are set&#8230;.Yes value 2 will probably be in the first set<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<\/li>\n<li>Now the second value of the seond set : 6\n<ol>\n<li>Feed the value 6 to the 3 hashes. Let&#8217;s say that they return 5, 6 and 9.<\/li>\n<li>Test whether\u00a0 the bits in the bloomfilter on position 5, 6\u00a0 and 9 are set.\n<ol>\n<li>If all 3 are set&#8230;then YES this value is (probably) in the first set<\/li>\n<li>If not all 3 postions in the bloomfilter are set, then this value does not belong to the first set.<\/li>\n<li>Bit 5 is set. Bit 6 and 9 are not set. This value (6) is defenitely not be in the first set.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<\/li>\n<li>Now the 3th value\u00a0 (7)\n<ol>\n<li>Feed the the value to the 3 hashes. Let&#8217;s say that they return the values 1, 8 and 10<\/li>\n<li>Test whether the bits in the bloomfilter on position 1, 8 and 10 are set.\n<ol>\n<li>If all 3 are set&#8230;then YES this value is (probably) in the first set<\/li>\n<li>If not all 3 postions in the bloomfilter are set, then this value does not belong to the first set.<\/li>\n<li>Bit 1, 8 and 10 are set, so YES (???) the value 7 is proably in the first set. And there is a important detail of the Bloomfilter. It guarantees that it will select the values which are definetly in the first set, and it might let some values pass (false positives).<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p>That&#8217;s it. I was thinking it is a very, very difficult process to understand, nut actually&#8230;I do understand it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>From time to time I read something and the term &#8220;bloomfilter&#8221; comes up. This weekend I wanted to know what actually is a bloomfilter. For a more thorough explananation go to wiki. What follows is my &#8220;laymens&#8221; explanantion. If you have a set of values, and you want to check whether items in a second [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[41],"tags":[42,43],"class_list":["post-249","post","type-post","status-publish","format-standard","hentry","category-algorithm","tag-algorithm","tag-bloomfilter"],"_links":{"self":[{"href":"https:\/\/maboc.nl\/index.php?rest_route=\/wp\/v2\/posts\/249","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/maboc.nl\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/maboc.nl\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/maboc.nl\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/maboc.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=249"}],"version-history":[{"count":5,"href":"https:\/\/maboc.nl\/index.php?rest_route=\/wp\/v2\/posts\/249\/revisions"}],"predecessor-version":[{"id":254,"href":"https:\/\/maboc.nl\/index.php?rest_route=\/wp\/v2\/posts\/249\/revisions\/254"}],"wp:attachment":[{"href":"https:\/\/maboc.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=249"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/maboc.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=249"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/maboc.nl\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=249"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}