BloomFilterhttp://bloomfilter.codeplex.com/project/feeds/rssUpdated Wiki: Homehttps://bloomfilter.codeplex.com/wikipage?version=6<div class="wikidoc">A Bloom filter is a data structure optimized for fast, space-efficient set membership tests. Bloom filters have the unusual property of requiring constant time to add an element to the set or test for membership, regardless of the size of the elements or the number of elements already in the set. No other constant-space set data structure has this property. <br /><br />It works by storing a bit vector representing the set S' = {h[i](x) | x in S, i = 1, …, k}, where h[1], …, h[k] := {0, 1} -> [n lg(1/ε) lg e] are hash functions. Additions are simply setting k bits to 1, specifically those at h[1](x), …, h[k](x). Checks are implemented by performing those same hash functions and returning true if all of the resulting positions are 1. <br /><br />Because the set stored is a proper superset of the set of items added, false positives may occur, though false negatives cannot. The false positive rate can be specified. <br /><br />Bloom filters offer the following advantages:
<ul><li>Space: Approximately n * lg(1/ε), where ε is the false positive rate and n is the number of elements in the set.
<ul><li>Example: There are approximately 170k words in the English language. If we consider that to be our set (therefore n = 1.7E5), and we wish to search a corpus for them with a 1% false positive rate, the filter would require about (1.7E5 * lg(1 / 0.01)) ≈ 162 KB. Contrast this with a hashtable, which would require (1.7E5 elements * 32 bits per element) ≈ 664 KB. Obviously explicit string storage would be significantly more. </li></ul></li>
<li>Precision: Arbitrary precision, where increasing precision requires more space (following the above size equation) but not more time.
<ul><li>Example: If we wanted to reduce our false positive rate in the above example from one percent to one permille the space requirement would go from about 162 KB to about 207 KB. </li></ul></li>
<li>Time: O(k) where k is the number of hash functions. The optimal number of hash functions (though a different number can be supplied by the user if desired) is ceiling(lg(1/ε))
<ul><li>Example: In keeping with our above example, if the accepted false positive rate is 0.001, k = 10. </li></ul></li></ul>
<br />This implementation uses Dillinger & Manolios double hashing to provide all but the first two hash functions. By default the first hash function is the type's GetHashCode() method. This implementation also includes default secondary hash functions for strings (<a href="http://www.burtleburtle.net/bob/hash/doobs.html">Jenkin's "One at a time" method</a>) and integers (<a href="http://burtleburtle.net/bob/hash/integer.html">Wang's method</a>). <br /><br />Bloom filters are due to Burton H. Bloom, as described in the Communications of the ACM in July 1970. The full paper is available <a href="http://portal.acm.org/citation.cfm?doid=362686.362692">here</a>. </div><div class="ClearBoth"></div>JustinRussellTue, 11 Nov 2014 21:38:58 GMTUpdated Wiki: Home 20141111093858PReviewed: 1.0 Production (Nov 19, 2013)https://bloomfilter.codeplex.com/releases/view/25930#ReviewBy-juidanRated 5 Stars (out of 5) - I'm appreciated of what you did. Please contact me if you see this : director@busyneed.com if there is anything i can help you, i will help you to repay you back.juidanTue, 19 Nov 2013 11:31:11 GMTReviewed: 1.0 Production (Nov 19, 2013) 20131119113111ANew Post: integer errorRatehttp://bloomfilter.codeplex.com/discussions/468038<div style="line-height: normal;">The errorRate param in the two-parameter constructor is an int. Shouldn't it be a float?<br />
</div>thoqbkMon, 11 Nov 2013 17:27:53 GMTNew Post: integer errorRate 20131111052753PReviewed: 1.0 Production (Jan 30, 2013)http://bloomfilter.codeplex.com/releases/view/25930#ReviewBy-youngcoderRated 5 Stars (out of 5) - thank you very much!youngcoderThu, 31 Jan 2013 01:17:17 GMTReviewed: 1.0 Production (Jan 30, 2013) 20130131011717ACreated Issue: Invalid constructor [23487]http://bloomfilter.codeplex.com/workitem/23487Now:<br />public Filter(int capacity, int errorRate)<br /><br />Have to be:<br />public Filter(int capacity, float errorRate)<br />vjevdokimovThu, 11 Oct 2012 09:50:21 GMTCreated Issue: Invalid constructor [23487] 20121011095021ASource code checked in, #69709http://bloomfilter.codeplex.com/SourceControl/changeset/changes/69709Upgrade: New Version of LabDefaultTemplate.xaml. To upgrade your build definitions, please visit the following link: http://go.microsoft.com/fwlink/?LinkId=254563Project Collection Service AccountsMon, 01 Oct 2012 22:10:56 GMTSource code checked in, #69709 20121001101056PSource code checked in, #69708http://bloomfilter.codeplex.com/SourceControl/changeset/changes/69708Checked in by server upgradeProject Collection Service AccountsMon, 01 Oct 2012 22:06:45 GMTSource code checked in, #69708 20121001100645PNew Post: Invalid casting in hashInt32http://bloomfilter.codeplex.com/discussions/361612<div style="line-height: normal;">
<div id="_mcePaste" style="width:1px; height:1px; overflow:hidden; top:0px; left:-10000px">
﻿</div>
<p>In the function <span style="text-decoration:underline">hashInt32</span> the input is safe-cast to a Nullable<UInt32> with disasterous effects: no Int32 can be hashed as the input, "x", remains null.</p>
<p>The only alternative I can see is to box the "T" input via Convert.ToUInt32 or to make an Int32 specific filter to avoid boxing.</p>
</div>codekaizenSun, 01 Jul 2012 22:55:02 GMTNew Post: Invalid casting in hashInt32 20120701105502PSource code checked in, #60451http://bloomfilter.codeplex.com/SourceControl/changeset/changes/60451Upgrading solution to VS2010. Fixing typo in readme. fatcat1111Mon, 02 May 2011 20:05:26 GMTSource code checked in, #60451 20110502080526PUpdated Wiki: Homehttp://bloomfilter.codeplex.com/wikipage?version=5<div class="wikidoc">A Bloom filter is a data structure optimized for fast, space-efficient set membership tests. Bloom filters have the unusual property of requiring constant time to add an element to the set or test for membership, regardless of the size of the elements or the number of elements already in the set. No other constant-space set data structure has this property. <br /><br />It works by storing a bit vector representing the set S' = {h[i](x) | x in S, i = 1, …, k}, where h[1], …, h[k] := {0, 1} -> [n lg(1/ε) lg e] are hash functions. Additions are simply setting k bits to 1, specifically those at h[1](x), …, h[k](x). Checks are implemented by performing those same hash functions and returning true if all of the resulting positions are 1. <br /><br />Because the set stored is a proper superset of the set of items added, false positives may occur, though false negatives cannot. The false positive rate can be specified. <br /><br />Bloom filters offer the following advantages:
<ul><li>Space: Approximately n * lg(1/ε), where ε is the false positive rate and n is the number of elements in the set.
<ul><li>Example: There are approximately 170k words in the English language. If we consider that to be our set (therefore n = 1.7E5), and we wish to search a corpus for them with a 1% false positive rate, the filter would require about (1.7E5 * lg(1 / 0.01)) ≈ 162 KB. Contrast this with a hashtable, which would require (1.7E5 elements * 32 bits per element) ≈ 664 KB. Obviously explicit string storage would be significantly more. </li></ul></li>
<li>Precision: Arbitrary precision, where increasing precision requires more space (following the above size equation) but not more time.
<ul><li>Example: If we wanted to reduce our false positive rate in the above example from one percent to one permille the space requirement would go from about 162 KB to about 207 KB. </li></ul></li>
<li>Time: O(k) where k is the number of hash functions. The optimal number of hash functions (though a different number can be supplied by the user if desired) is ceiling(lg(1/ε))
<ul><li>Example: In keeping with our above example, if the accepted false positive rate is 0.001, k = 10. </li></ul></li></ul>
<br />This implementation uses Dillinger & Manolios double hashing to provide all but the first two hash functions. By default the first hash function is the type's GetHashCode() method. This implementation also includes default secondary hash functions for strings (Jenkin's "One at a time" method) and integers (Wang's method). <br /><br />Bloom filters are due to Burton H. Bloom, as described in the Communications of the ACM in July 1970. The full paper is available here: <a href="http://portal.acm.org/citation.cfm?doid=362686.362692" class="externalLink">http://portal.acm.org/citation.cfm?doid=362686.362692<span class="externalLinkIcon"></span></a>. </div><div class="ClearBoth"></div>fatcat1111Mon, 02 May 2011 19:35:38 GMTUpdated Wiki: Home 20110502073538PReviewed: 1.0 Production (十月 23, 2010)http://bloomfilter.codeplex.com/releases/view/25930#ReviewBy-lxw2012Rated 4 Stars (out of 5) - I need a bloom filter ,thanks for your releaselxw2012Sun, 24 Oct 2010 02:26:48 GMTReviewed: 1.0 Production (十月 23, 2010) 20101024022648ASource code checked in, #52373http://bloomfilter.codeplex.com/SourceControl/changeset/changes/52373Checked in by server upgrade_TFSSERVICEWed, 28 Jul 2010 17:08:20 GMTSource code checked in, #52373 20100728050820PUpdated Wiki: Homehttp://bloomfilter.codeplex.com/wikipage?version=4<div class="wikidoc">A Bloom filter is a data structure optimized for fast, space-efficient set membership tests. Bloom filters have the unusual property of requiring constant time to add an element to the set or test for membership, regardless of the size of the elements or the number of elements already in the set. No other constant-space set data structure has this property. <br /><br />It works by storing a bit vector representing the set S' = {h[i](x) | x in S, i = 1, …, k}, where h[1], …, h[k] := {0, 1} -> [n lg(1/ε) lg e] are hash functions. Additions are simply setting k bits to 1, specifically those at h[1](x), …, h[k](x). Checks are implemented by performing those same hash functions and returning if all of the resulting positions are 1. <br /><br />Because the set stored is a proper superset of the set of items added, false positives may occur, though false negatives cannot. The false positive rate can be specified. <br /><br />Bloom filters offer the following advantages:
<ul><li>Space: Approximately n * lg(1/ε), where ε is the false positive rate and n is the number of elements in the set.
<ul><li>Example: There are approximately 170k words in the English language. If we consider that to be our set (therefore n = 1.7E5), and we wish to search a corpus for them with a 1% false positive rate, the filter would require about (1.7E5 * lg(1 / 0.01)) ≈ 162 KB. Contrast this with a hashtable, which would require (1.7E5 elements * 32 bits per element) ≈ 664 KB. Obviously explicit string storage would be significantly more. </li></ul></li>
<li>Precision: Arbitrary precision, where increasing precision requires more space (following the above size equation) but not more time.
<ul><li>Example: If we wanted to reduce our false positive rate in the above example from one percent to one permille the space requirement would go from about 162 KB to about 207 KB. </li></ul></li>
<li>Time: O(k) where k is the number of hash functions. The optimal number of hash functions (though a different number can be supplied by the user if desired) is ceiling(lg(1/ε))
<ul><li>Example: In keeping with our above example, if the accepted false positive rate is 0.001, k = 10. </li></ul></li></ul>
<br />This implementation uses Dillinger & Manolios double hashing to provide all but the first two hash functions. By default the first hash function is the type's GetHashCode() method. This implementation also includes default secondary hash functions for strings (Jenkin's "One at a time" method) and integers (Wang's method). <br /><br />Bloom filters are due to Burton H. Bloom, as described in the Communications of the ACM in July 1970. The full paper is available here: <a href="http://portal.acm.org/citation.cfm?doid=362686.362692" class="externalLink">http://portal.acm.org/citation.cfm?doid=362686.362692<span class="externalLinkIcon"></span></a>. </div><div class="ClearBoth"></div>fatcat1111Wed, 03 Mar 2010 20:24:55 GMTUpdated Wiki: Home 20100303082455PUpdated Release: 1.0 Production (Apr 09, 2009)http://bloomfilter.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=25930<div class="wikidoc"><ul><li>Simplified usage by providing secondary hash functions for strings and ints.</li>
<li>Added constructors to provide additional control for those that need it.</li>
<li>Improved calculation of optimal double-hashing function count and underlying data structure size.</li>
<li>Added a default false-positive rate (calculated based on capacity) for those that do not wish to pass one.</li>
<li>Now catching when a provided capacity and error rate would result in an overflow.</li>
<li>Added a readme to explain usage. </li></ul></div><div class="ClearBoth"></div>fatcat1111Wed, 16 Sep 2009 22:57:48 GMTUpdated Release: 1.0 Production (Apr 09, 2009) 20090916105748PReleased: 1.0 Production (Apr 09, 2009)http://bloomfilter.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=25930<div><ul><li>Simplified usage by providing secondary hash functions for strings and ints.</li>
<li>Added constructors to provide additional control for those that need it.</li>
<li>Improved calculation of optimal double-hashing function count and underlying data structure size.</li>
<li>Added a default false-positive rate (calculated based on capacity) for those that do not wish to pass one.</li>
<li>Now catching when a provided capacity and error rate would result in an overflow.</li>
<li>Added a readme to explain usage. </li></ul></div><div></div>Wed, 16 Sep 2009 22:57:48 GMTReleased: 1.0 Production (Apr 09, 2009) 20090916105748PSource code checked in, #34019http://bloomfilter.codeplex.com/SourceControl/ListDownloadableCommits.aspxFixing error rate calculation. fatcat1111Sun, 17 May 2009 18:24:39 GMTSource code checked in, #34019 20090517062439PNew Post: bestErrorRate constanthttp://bloomfilter.codeplex.com/Thread/View.aspx?ThreadId=55760<div style="line-height: normal;"><p>Great catch! Thank you, this will be corrected immediately.</p></div>fatcat1111Sun, 17 May 2009 17:17:26 GMTNew Post: bestErrorRate constant 20090517051726PNew Post: bestErrorRate constanthttp://bloomfilter.codeplex.com/Thread/View.aspx?ThreadId=55760<div style="line-height: normal;"><p>Shouldn't the constant be:</p>
<p>0.6185</p>
<p>not 0.06185 as it appears in the code base.</p>
<p>Even the PDF that is in the comments seems to confirm that...</p>
<pre> <span style="color:blue">return</span> (<span style="color:blue">float</span>)Math.Pow(0.06185, <span style="color:blue">int</span>.MaxValue / capacity); <span style="color:green">// http://www.cs.princeton.edu/courses/archive/spring02/cs493/lec7.pdf</span><br></pre></div>torialFri, 08 May 2009 21:19:17 GMTNew Post: bestErrorRate constant 20090508091917PUpdated Wiki: Homehttp://bloomfilter.codeplex.com/Wiki/View.aspx?title=Home&version=3<div class="wikidoc">A Bloom filter is a data structure optimized for fast, space-efficient set membership tests. Bloom filters have the unusual property of requiring constant time to add an element to the set or test for membership, regardless of the size of the elements or the number of elements already in the set. No other constant-space set data structure has this property. <br /><br />It works by storing a bit vector representing the set S' = {h[i](x) | x in S, i = 1, …, k}, where h[1], …, h[k] := {0, 1} -> [n lg(1/ε) lg e] are hash functions. Additions are simply setting k bits to 1, specifically those at h[1](x), …, h[k](x). Checks are implemented by performing those same hash functions and returning if all of the resulting positions are 1. <br /><br />Because the set stored is a proper superset of the set of items added, false positives may occur, though false negatives cannot. The false positive rate can be specified. <br /><br />Bloom filters offer the following advantages:<br />• Space: Approximately n * lg(1/ε), where ε is the false positive rate and n is the number of elements in the set. <br /> ○ Example: There are approximately 170k words in the English language. If we consider that to be our set (therefore n = 1.7E5), and we wish to search a corpus for them with a 1% false positive rate, the filter would require about (1.7E5 * lg(1 / 0.01)) ≈ 162 KB. Contrast this with a hashtable, which would require (1.7E5 elements * 32 bits per element) ≈ 664 KB. Obviously explicit string storage would be significantly more. <br />• Precision: Arbitrary precision, where increasing precision requires more space (following the above size equation) but not more time. <br /> ○ Example: If we wanted to reduce our false positive rate in the above example from one percent to one permille the space requirement would go from about 162 KB to about 207 KB. <br />• Time: O(k) where k is the number of hash functions. The optimal number of hash functions (though a different number can be supplied by the user if desired) is ceiling(lg(1/ε))<br /> ○ Example: In keeping with our above example, if the accepted false positive rate is 0.001, k = 10. <br /><br />This implementation uses Dillinger & Manolios double hashing to provide all but the first two hash functions. By default the first hash function is the type's GetHashCode() method. This implementation also includes default secondary hash functions for strings (Jenkin's "One at a time" method) and integers (Wang's method). <br /><br />Bloom filters are due to Burton H. Bloom, as described in the Communications of the ACM in July 1970. The full paper is available here: <a href="http://portal.acm.org/citation.cfm?doid=362686.362692" class="externalLink">http://portal.acm.org/citation.cfm?doid=362686.362692<span class="externalLinkIcon"></span></a>. </div>fatcat1111Sun, 26 Apr 2009 05:15:37 GMTUpdated Wiki: Home 20090426051537AUpdated Release: 1.0 Production (Apr 09, 2009)http://bloomfilter.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=25930<div><ul><li>Simplified usage by providing secondary hash functions for strings and ints.</li>
<li>Added constructors to provide additional control for those that need it.</li>
<li>Improved calculation of optimal double-hashing function count and underlying data structure size.</li>
<li>Added a default false-positive rate (calculated based on capacity) for those that do not wish to pass one.</li>
<li>Now catching when a provided capacity and error rate would result in an overflow.</li>
<li>Added a readme to explain usage. </li></ul></div>fatcat1111Fri, 10 Apr 2009 16:46:27 GMTUpdated Release: 1.0 Production (Apr 09, 2009) 20090410044627P