Return-Path: <laolu32@gmail.com> Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 77E80B75 for <bitcoin-dev@lists.linuxfoundation.org>; Thu, 1 Jun 2017 19:01:28 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-yb0-f181.google.com (mail-yb0-f181.google.com [209.85.213.181]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 793D41CE for <bitcoin-dev@lists.linuxfoundation.org>; Thu, 1 Jun 2017 19:01:27 +0000 (UTC) Received: by mail-yb0-f181.google.com with SMTP id 130so13038974ybl.3 for <bitcoin-dev@lists.linuxfoundation.org>; Thu, 01 Jun 2017 12:01:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=onfTIy+lu9TMYw8uiX6lzxNabF9i82NmG7V/lI7QaxM=; b=ZUO1z9TQRhUpqPW3fx7LphfQW1ceuqy0Frd4NIvZf8J3b1tjdqCr31Vwed27eGNZc1 2oyHV/n6tONDEiuLap5vMoin0JoO9CMWN32+8StprHP9WC0C8B1X+sqSIdvLe38jND86 FPn4gvDJWgBAzW5kyrcJLr4TQq4Cl/CwypN8SboTDYuy9e3wt5D45vRRFjupOk2VEr2W ibNbQ4p1y1TjUjai0z6b8RSjf0V2dVMHwU6EVTmPQDKVRzztN4cZVKTdsAN0DxiOO8NN RdJu8v8bhns2Xu71DzMAPx1w1ba3QK57N0HCQ9G+QXS/Uw/2aY4ylxYR2eJlCO357GlZ 5JYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=onfTIy+lu9TMYw8uiX6lzxNabF9i82NmG7V/lI7QaxM=; b=SRys/HHVRmUFqzKPmiPzLh6PXhfkeHCS5PJqRI+kdA594ZTeOfgPzMP1ue8nv80N0I bMv/xDFgRqXklj6fmVlyKczUNFLwZSQi9fnVbjTdzf1Ieb0euLc2Z9m3oHSBpt9/a9Sp xmooUWPPaq7qAQz5cFfVnMeJBtNb9GyjTxtgPzlIQJYknqvtAJShQCA9WQWclKjUHfCZ /HCI2i9juklgYb8n/4jaoPSQbiHzdBuVjU6PrMr/khjpEC/5sGHeFmxKCw/COeQ//KJe 9JPG5Is56Bd2HTDegrwPZa2/+fOquZuF6rrb3gaBTyGeZQUY8gO1rxe1nz2neO9+L/1Q xSlQ== X-Gm-Message-State: AODbwcCvGgpNQ/CH5+q+jIBrMJzT9Ivy40eCHi3fMh5FW7SbdYtoMzNB ioopkbjRx9lZ8kxLduIYc1nj1AlBfGjH X-Received: by 10.37.206.2 with SMTP id x2mr11112567ybe.16.1496343685177; Thu, 01 Jun 2017 12:01:25 -0700 (PDT) MIME-Version: 1.0 From: Olaoluwa Osuntokun <laolu32@gmail.com> Date: Thu, 01 Jun 2017 19:01:14 +0000 Message-ID: <CAO3Pvs8ccTkgrecJG6KFbBW+9moHF-FTU+4qNfayeE3hM9uRrg@mail.gmail.com> To: Arnoud Kouwenhoven - Pukaki Corp via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> Content-Type: multipart/alternative; boundary="94eb2c190790d124ca0550eaa932" X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Thu, 01 Jun 2017 19:06:35 +0000 Subject: [bitcoin-dev] BIP Proposal: Compact Client Side Filtering for Light Clients X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org> List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>, <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe> List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/> List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org> List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help> List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>, <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe> X-List-Received-Date: Thu, 01 Jun 2017 19:01:28 -0000 --94eb2c190790d124ca0550eaa932 Content-Type: text/plain; charset="UTF-8" Hi y'all, Alex Akselrod and I would like to propose a new light client BIP for consideration: * https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki This BIP proposal describes a concrete specification (along with a reference implementations[1][2][3]) for the much discussed client-side filtering reversal of BIP-37. The precise details are described in the BIP, but as a summary: we've implemented a new light-client mode that uses client-side filtering based off of Golomb-Rice coded sets. Full-nodes maintain an additional index of the chain, and serve this compact filter (the index) to light clients which request them. Light clients then fetch these filters, query the locally and _maybe_ fetch the block if a relevant item matches. The cool part is that blocks can be fetched from _any_ source, once the light client deems it necessary. Our primary motivation for this work was enabling a light client mode for lnd[4] in order to support a more light-weight back end paving the way for the usage of Lightning on mobile phones and other devices. We've integrated neutrino as a back end for lnd, and will be making the updated code public very soon. One specific area we'd like feedback on is the parameter selection. Unlike BIP-37 which allows clients to dynamically tune their false positive rate, our proposal uses a _fixed_ false-positive. Within the document, it's currently specified as P = 1/2^20. We've done a bit of analysis and optimization attempting to optimize the following sum: filter_download_bandwidth + expected_block_false_positive_bandwidth. Alex has made a JS calculator that allows y'all to explore the affect of tweaking the false positive rate in addition to the following variables: the number of items the wallet is scanning for, the size of the blocks, number of blocks fetched, and the size of the filters themselves. The calculator calculates the expected bandwidth utilization using the CDF of the Geometric Distribution. The calculator can be found here: https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical script he's been running on actual data, and the results seem to match up rather nicely. We we're excited to see that Karl Johan Alm (kallewoof) has done some (rather extensive!) analysis of his own, focusing on a distinct encoding type [5]. I haven't had the time yet to dig into his report yet, but I think I've read enough to extract the key difference in our encodings: his filters use a binomial encoding _directly_ on the filter contents, will we instead create a Golomb-Coded set with the contents being _hashes_ (we use siphash) of the filter items. Using a fixed fp=20, I have some stats detailing the total index size, as well as averages for both mainnet and testnet. For mainnet, using the filter contents as currently described in the BIP (basic + extended), the total size of the index comes out to 6.9GB. The break down is as follows: * total size: 6976047156 * total avg: 14997.220622758816 * total median: 3801 * total max: 79155 * regular size: 3117183743 * regular avg: 6701.372750217131 * regular median: 1734 * regular max: 67533 * extended size: 3858863413 * extended avg: 8295.847872541684 * extended median: 2041 * extended max: 52508 In order to consider the average+median filter sizes in a world worth larger blocks, I also ran the index for testnet: * total size: 2753238530 * total avg: 5918.95736054141 * total median: 60202 * total max: 74983 * regular size: 1165148878 * regular avg: 2504.856172982827 * regular median: 24812 * regular max: 64554 * extended size: 1588089652 * extended avg: 3414.1011875585823 * extended median: 35260 * extended max: 41731 Finally, here are the testnet stats which take into account the increase in the maximum filter size due to segwit's block-size increase. The max filter sizes are a bit larger due to some of the habitual blocks I created last year when testing segwit (transactions with 30k inputs, 30k outputs, etc). * total size: 585087597 * total avg: 520.8839608674402 * total median: 20 * total max: 164598 * regular size: 299325029 * regular avg: 266.4790836307566 * regular median: 13 * regular max: 164583 * extended size: 285762568 * extended avg: 254.4048772366836 * extended median: 7 * extended max: 127631 For those that are interested in the raw data, I've uploaded a CSV file of raw data for each block (mainnet + testnet), which can be found here: * mainnet: (14MB): https://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0 * testnet: (25MB): https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0 We look forward to getting feedback from all of y'all! -- Laolu [1]: https://github.com/lightninglabs/neutrino [2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf [3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs [4]: https://github.com/lightningnetwork/lnd/ -- Laolu --94eb2c190790d124ca0550eaa932 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>Hi y'all,=C2=A0</div><div><br></div><div>Alex Aks= elrod and I would like to propose a new light client BIP for</div><div>cons= ideration:=C2=A0</div><div>=C2=A0 =C2=A0* <a href=3D"https://github.com/Roa= sbeef/bips/blob/master/gcs_light_client.mediawiki">https://github.com/Roasb= eef/bips/blob/master/gcs_light_client.mediawiki</a></div><div><br></div><di= v>This BIP proposal describes a concrete specification (along with a</div><= div>reference implementations[1][2][3]) for the much discussed client-side<= /div><div>filtering reversal of BIP-37. The precise details are described i= n the</div><div>BIP, but as a summary: we've implemented a new light-cl= ient mode that uses</div><div>client-side filtering based off of Golomb-Ric= e coded sets. Full-nodes</div><div>maintain an additional index of the chai= n, and serve this compact filter</div><div>(the index) to light clients whi= ch request them. Light clients then fetch</div><div>these filters, query th= e locally and _maybe_ fetch the block if a relevant</div><div>item matches.= The cool part is that blocks can be fetched from _any_</div><div>source, o= nce the light client deems it necessary. Our primary motivation</div><div>f= or this work was enabling a light client mode for lnd[4] in order to</div><= div>support a more light-weight back end paving the way for the usage of</d= iv><div>Lightning on mobile phones and other devices. We've integrated = neutrino</div><div>as a back end for lnd, and will be making the updated co= de public very</div><div>soon.</div><div><br></div><div>One specific area w= e'd like feedback on is the parameter selection. Unlike</div><div>BIP-3= 7 which allows clients to dynamically tune their false positive rate,</div>= <div>our proposal uses a _fixed_ false-positive. Within the document, it= 9;s</div><div>currently specified as P =3D 1/2^20. We've done a bit of = analysis and</div><div>optimization attempting to optimize the following su= m:</div><div>filter_download_bandwidth + expected_block_false_positive_band= width. Alex</div><div>has made a JS calculator that allows y'all to exp= lore the affect of</div><div>tweaking the false positive rate in addition t= o the following variables:</div><div>the number of items the wallet is scan= ning for, the size of the blocks,</div><div>number of blocks fetched, and t= he size of the filters themselves. The</div><div>calculator calculates the = expected bandwidth utilization using the CDF of</div><div>the Geometric Dis= tribution. The calculator can be found here:</div><div><a href=3D"https://a= akselrod.github.io/gcs_calc.html">https://aakselrod.github.io/gcs_calc.html= </a>. Alex also has an empirical</div><div>script he's been running on = actual data, and the results seem to match up</div><div>rather nicely.</div= ><div><br></div><div>We we're excited to see that Karl Johan Alm (kalle= woof) has done some</div><div>(rather extensive!) analysis of his own, focu= sing on a distinct encoding</div><div>type [5]. I haven't had the time = yet to dig into his report yet, but I</div><div>think I've read enough = to extract the key difference in our encodings: his</div><div>filters use a= binomial encoding _directly_ on the filter contents, will we</div><div>ins= tead create a Golomb-Coded set with the contents being _hashes_ (we use</di= v><div>siphash) of the filter items.</div><div><br></div><div>Using a fixed= fp=3D20, I have some stats detailing the total index size, as</div><div>we= ll as averages for both mainnet and testnet. For mainnet, using the</div><d= iv>filter contents as currently described in the BIP (basic + extended), th= e</div><div>total size of the index comes out to 6.9GB. The break down is a= s follows:</div><div><br></div><div>=C2=A0 =C2=A0 * total size: =C2=A069760= 47156</div><div>=C2=A0 =C2=A0 * total avg: =C2=A014997.220622758816</div><d= iv>=C2=A0 =C2=A0 * total median: =C2=A03801</div><div>=C2=A0 =C2=A0 * total= max: =C2=A079155</div><div>=C2=A0 =C2=A0 * regular size: =C2=A03117183743<= /div><div>=C2=A0 =C2=A0 * regular avg: =C2=A06701.372750217131</div><div>= =C2=A0 =C2=A0 * regular median: =C2=A01734</div><div>=C2=A0 =C2=A0 * regula= r max: =C2=A067533</div><div>=C2=A0 =C2=A0 * extended size: =C2=A0385886341= 3</div><div>=C2=A0 =C2=A0 * extended avg: =C2=A08295.847872541684</div><div= >=C2=A0 =C2=A0 * extended median: =C2=A02041</div><div>=C2=A0 =C2=A0 * exte= nded max: =C2=A052508</div><div><br></div><div>In order to consider the ave= rage+median filter sizes in a world worth</div><div>larger blocks, I also r= an the index for testnet:=C2=A0</div><div><br></div><div>=C2=A0 =C2=A0 * to= tal size: =C2=A02753238530</div><div>=C2=A0 =C2=A0 * total avg: =C2=A05918.= 95736054141</div><div>=C2=A0 =C2=A0 * total median: =C2=A060202</div><div>= =C2=A0 =C2=A0 * total max: =C2=A074983</div><div>=C2=A0 =C2=A0 * regular si= ze: =C2=A01165148878</div><div>=C2=A0 =C2=A0 * regular avg: =C2=A02504.8561= 72982827</div><div>=C2=A0 =C2=A0 * regular median: =C2=A024812</div><div>= =C2=A0 =C2=A0 * regular max: =C2=A064554</div><div>=C2=A0 =C2=A0 * extended= size: =C2=A01588089652</div><div>=C2=A0 =C2=A0 * extended avg: =C2=A03414.= 1011875585823</div><div>=C2=A0 =C2=A0 * extended median: =C2=A035260</div><= div>=C2=A0 =C2=A0 * extended max: =C2=A041731</div><div><br></div><div>Fina= lly, here are the testnet stats which take into account the increase</div><= div>in the maximum filter size due to segwit's block-size increase. The= max</div><div>filter sizes are a bit larger due to some of the habitual bl= ocks I</div><div>created last year when testing segwit (transactions with 3= 0k inputs, 30k</div><div>outputs, etc).</div><div><br></div><div>=C2=A0 =C2= =A0 =C2=A0* total size: =C2=A0585087597</div><div>=C2=A0 =C2=A0 =C2=A0* tot= al avg: =C2=A0520.8839608674402</div><div>=C2=A0 =C2=A0 =C2=A0* total media= n: =C2=A020</div><div>=C2=A0 =C2=A0 =C2=A0* total max: =C2=A0164598</div><d= iv>=C2=A0 =C2=A0 =C2=A0* regular size: =C2=A0299325029</div><div>=C2=A0 =C2= =A0 =C2=A0* regular avg: =C2=A0266.4790836307566</div><div>=C2=A0 =C2=A0 = =C2=A0* regular median: =C2=A013</div><div>=C2=A0 =C2=A0 =C2=A0* regular ma= x: =C2=A0164583</div><div>=C2=A0 =C2=A0 =C2=A0* extended size: =C2=A0285762= 568</div><div>=C2=A0 =C2=A0 =C2=A0* extended avg: =C2=A0254.4048772366836</= div><div>=C2=A0 =C2=A0 =C2=A0* extended median: =C2=A07</div><div>=C2=A0 = =C2=A0 =C2=A0* extended max: =C2=A0127631</div><div><br></div><div>For thos= e that are interested in the raw data, I've uploaded a CSV file</div><d= iv>of raw data for each block (mainnet + testnet), which can be found here:= </div><div>=C2=A0 =C2=A0 =C2=A0* mainnet: (14MB): <a href=3D"https://www.dr= opbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=3D0">https://www.dropb= ox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=3D0</a></div><div>=C2=A0 = =C2=A0 =C2=A0* testnet: (25MB): <a href=3D"https://www.dropbox.com/s/w7dmmc= bocnmjfbo/gcs-stats-testnet.csv?dl=3D0">https://www.dropbox.com/s/w7dmmcboc= nmjfbo/gcs-stats-testnet.csv?dl=3D0</a></div><div><br></div><div><br></div>= <div>We look forward to getting feedback from all of y'all!</div><div><= br></div><div>-- Laolu</div><div><br></div><div><br></div><div>[1]: <a href= =3D"https://github.com/lightninglabs/neutrino">https://github.com/lightning= labs/neutrino</a></div><div>[2]: <a href=3D"https://github.com/Roasbeef/btc= d/tree/segwit-cbf">https://github.com/Roasbeef/btcd/tree/segwit-cbf</a></di= v><div>[3]: <a href=3D"https://github.com/Roasbeef/btcutil/tree/gcs/gcs">ht= tps://github.com/Roasbeef/btcutil/tree/gcs/gcs</a></div><div>[4]: <a href= =3D"https://github.com/lightningnetwork/lnd/">https://github.com/lightningn= etwork/lnd/</a></div><div><br></div><div>-- Laolu</div><div><br></div></div= > --94eb2c190790d124ca0550eaa932--