Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id B4DCFBC8 for ; Fri, 2 Jun 2017 17:55:34 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-qk0-f177.google.com (mail-qk0-f177.google.com [209.85.220.177]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 89E9F163 for ; Fri, 2 Jun 2017 17:55:33 +0000 (UTC) Received: by mail-qk0-f177.google.com with SMTP id d14so65299569qkb.1 for ; Fri, 02 Jun 2017 10:55:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to; bh=IoGFXj7mNtDlAt2lqU1fAXrz5SZCKa8JP4KRs2N7kZI=; b=ruNzOoGTlsrTyZBfbfsPMXbYjnR6WwUCagfHC2nMJwE8ZmDj80/GQZ9YMg+ZOVieaN St/iRofzaWHAre8xIQb2Z7tXAY1SR28Pb4XTS8KZdwb0vWJCFNRuJpTVDI7U5RbyJ8tw fAWDZw1Clf8ezBqCiNuQ6nhxoq8NBrxNaUOjydBiatByfEryZP+lLpFvt22VfJpjZLPL mSHVY1ICLkUl61VfsxO1tmlni7u8mq9lV0Ov3h5vRKwznHM7E0LKoAEsuQt78ukEg3u8 slMUnrWZzXpAs+ENprus/ovEAv5U7mbTew/YAN3uHV3Lo7rpI02dzZ0FKAQBc2PmuB8a ZrQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to; bh=IoGFXj7mNtDlAt2lqU1fAXrz5SZCKa8JP4KRs2N7kZI=; b=gNB7XHFCPOH9yuOkpFsdgZ+AQO2n5lqcveCFruinT6AkW/PHZNv7zxx85UfAZU6nlN AudXdvYualhenpHqMOm65OzJOs/4gyRS8fAIUB7TsKePgPZ54dS0XLJj8I53f5iJV2Op 2BYTc76A8EmmAFRBN5S5eM87txS2JALCxjng9iSzxn4pk4WJP7EPLXYQ2Kdh50NRZpK3 /iwDqs3eWzbnfE4Zztdm5c36r3eiiWWqg7dGQu0H1N87Kqjpz7Q+un/Z79VPtF3MTTKG 1HZcEcBNvN2cijNYwxERqgRucJ5cYE7tmViL2Y4odXu0fCqiMm++FRqPpKnXSY3TA60x xlIQ== X-Gm-Message-State: AODbwcBcRyar9onQyUDDDB6FNGN83PC6Vh6E2Dfn1adNH11l9TnpOKM9 TvhyTFTw5wTD1yvEn1qZCCS3Y5DglVKi X-Received: by 10.55.144.5 with SMTP id s5mr9118886qkd.39.1496426132270; Fri, 02 Jun 2017 10:55:32 -0700 (PDT) MIME-Version: 1.0 Sender: blueadept@gmail.com Received: by 10.12.144.139 with HTTP; Fri, 2 Jun 2017 10:55:31 -0700 (PDT) In-Reply-To: References: From: Alex Akselrod Date: Fri, 2 Jun 2017 13:55:31 -0400 X-Google-Sender-Auth: q5yXWqCB3djhftYiJ_m7r9SutWg Message-ID: To: Bitcoin Dev Content-Type: multipart/alternative; boundary="94eb2c08504a0bf8b00550fddc89" X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Fri, 02 Jun 2017 17:57:23 +0000 Subject: Re: [bitcoin-dev] BIP Proposal: Compact Client Side Filtering for Light Clients X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jun 2017 17:55:34 -0000 --94eb2c08504a0bf8b00550fddc89 Content-Type: text/plain; charset="UTF-8" On Jun 2, 2017 8:09 AM, "Karl Johan Alm via bitcoin-dev" < bitcoin-dev@lists.linuxfoundation.org> wrote: Hello, Really wish I'd known you were working on this a few weeks ago, but such is life. Hopefully I can provide some useful feedback. Your feedback is greatly appreciated! On Fri, Jun 2, 2017 at 4:01 AM, Olaoluwa Osuntokun via bitcoin-dev wrote: > Full-nodes > maintain an additional index of the chain, and serve this compact filter > (the index) to light clients which request them. Light clients then fetch > these filters, query the locally and _maybe_ fetch the block if a relevant > item matches. Is it necessary to maintain the index all the way to the beginning of the chain? When would clients request "really old digests" and why? Without a soft fork, this is the only way for light clients to verify that peers aren't lying to them. Clients can request headers (just hashes of the filters and the previous headers, creating a chain) and look for conflicts between peers. If a conflict is found at a certain block, the client can download the block, generate a filter, calculate the header by hashing together the previous header and the generated filter, and banning any peers that don't match. A full node could prune old filters if you wanted and recalculate them as necessary if you just keep the filter header chain info as really old filters are unlikely to be requested by correctly written software but you can't guarantee every client will follow best practices either. > The calculator can be found here: > https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical > script he's been running on actual data, and the results seem to match up > rather nicely. I haven't tried the tool yet, and maybe it will answer some of my questions. On what data were the simulated wallets on actual data based? How did false positive rates for wallets with lots of items (pubkeys etc) play out? Is there a maximum number of items for a wallet before it becomes too bandwidth costly to use digests? The simulations are based on completely random data within given parameters. For example, it will generate a wallet of a specified size and generate blocks of specified size with specified number of transactions of specified format, all guaranteed to not match the wallet. It then tries to match the wallet and tracks the filter size and the bandwidth used by block downloads which are all due to false positives. The maximum wallet size can be millions or more of addresses and outpoints before the filter isn't worth it. I published the simulation code at https://gist.github.com/aakselrod/0ee665205f7c9538c2339876b0424b26 but the calculation code gives you the same results (on average but very close with a big enough sample size) much faster. I will definitely try to reproduce my experiments with Golomb-Coded sets and see what I come up with. It seems like you've got a little less than half the size of my digests for 1-block digests but I haven't tried making digests for all blocks (and lots of early blocks are empty). Filters for empty blocks only take a few bytes and sometimes zero when the coinbase output is a burn that doesn't push any data (example will be in the test vectors that I'll have ready shortly). On the BIP proposal itself: In Compact Filter Header Chain, you mention that clients should download filters from nodes if filter_headers is not identical, and ban offending nodes. What about temporary forks in the chain? What about longer forks? In general, I am curious how you will deal with reorgs and temporary non-consensus related chain splits. The cfheaders messages give you the hash of the final block for which there's a header in the message. This means you can ignore the message as necessary rather than ban the peer, or track cfheaders for multiple forks if desired. I am also curious if you have considered digests containing multiple blocks. Retaining a permanent binsearchable record of the entire chain is obviously too space costly, but keeping the last X blocks as binsearchable could speed up syncing for clients tremendously, I feel. We hadn't (or I hadn't) until we read your recent post/paper and are considering it now. It may also be space efficient to ONLY store older digests in chunks of e.g. 8 blocks. A client syncing up finding a match in an 8-block chunk would have to grab those 8 blocks, but if it's not recent, that may be acceptable. It may even be possible to make 4-, 2-, 1-block digests on demand. This is also something we (or at least I) hadn't considered before your recent post. We have been working on this for a few months now so didn't have time to work on trying out and possibly incorporating the idea before release. How fast are these to create? Would it make sense to provide digests on demand in some cases, rather than keeping them around indefinitely? They're pretty fast and can be pruned if desired, as mentioned above, as long as the header chain is kept. --94eb2c08504a0bf8b00550fddc89 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Jun 2, 2017 8:09 AM, "Karl Johan Alm via bitcoin-d= ev" <bitcoin-dev@lists.linuxfoundation.org> wrote:
Hello,

Really wish I'd known you were working on this a few weeks ago, but
such is life. Hopefully I can provide some useful feedback.

You= r feedback is greatly appreciated!


On Fri, Jun 2, 2017 at 4:01 AM, Olaoluwa Osuntokun via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> Full-nodes
> maintain an additional index of the chain, and serve this compact filt= er
> (the index) to light clients which request them. Light clients then fe= tch
> these filters, query the locally and _maybe_ fetch the block if a rele= vant
> item matches.

Is it necessary to maintain the index all the way to the beginning of=
the chain? When would clients request "really old digests" and wh= y?

Without a soft fork, this is the only way for light clients = to verify that peers aren't lying to them. Clients can request headers = (just hashes of the filters and the previous headers, creating a chain) and= look for conflicts between peers. If a conflict is found at a certain bloc= k, the client can download the block, generate a filter, calculate the head= er by hashing together the previous header and the generated filter, and ba= nning any peers that don't match. A full node could prune old filters i= f you wanted and recalculate them as necessary if you just keep the filter = header chain info as really old filters are unlikely to be requested by cor= rectly written software but you can't guarantee every client will follo= w best practices either.

=

> The calculator can be found here:
> https://aakselrod.github.io/gcs_calc.html. Al= ex also has an empirical
> script he's been running on actual data, and the results seem to m= atch up
> rather nicely.

I haven't tried the tool yet, and maybe it will answer some of my= questions.

On what data were the simulated wallets on actual data based? How did
false positive rates for wallets with lots of items (pubkeys etc) play
out? Is there a maximum number of items for a wallet before it becomes
too bandwidth costly to use digests?

The simulations are based = on completely random data within given parameters. For example, it will gen= erate a wallet of a specified size and generate blocks of specified size wi= th specified number of transactions of specified format, all guaranteed to = not match the wallet. It then tries to match the wallet and tracks the filt= er size and the bandwidth used by block downloads which are all due to fals= e positives. The maximum wallet size can be millions or more of addresses a= nd outpoints before the filter isn't worth it.
<= br>
I published the simulation code at https://g= ist.github.com/aakselrod/0ee665205f7c9538c2339876b0424b26 but the calcu= lation code gives you the same results (on average but very close with a bi= g enough sample size) much faster.

I will definitely try to reproduce my exp= eriments with Golomb-Coded
sets and see what I come up with. It seems like you've got a little
less than half the size of my digests for 1-block digests but I
haven't tried making digests for all blocks (and lots of early blocks are empty).

<= /span>
Filters for empty blocks only take a few bytes and = sometimes zero when the coinbase output is a burn that doesn't push any= data (example will be in the test vectors that I'll have ready shortly= ).


The cf= headers messages give you the hash of the final block for which there's= a header in the message. This means you can ignore the message as necessar= y rather than ban the peer, or track cfheaders for multiple forks if desire= d.


We hadn't (or I hadn't) until we read your recent post/pa= per and are considering it now.


It may also be space efficient to ONLY store older digests in chunks
of e.g. 8 blocks. A client syncing up finding a match in an 8-block
chunk would have to grab those 8 blocks, but if it's not recent, that may be acceptable. It may even be possible to make 4-, 2-, 1-block
digests on demand.

=
This is also something we (or at least I) ha= dn't considered before your recent post. We have been working on this f= or a few months now so didn't have time to work on trying out and possi= bly incorporating the idea before release.

How fast are these to create? Would= it make sense to provide digests
on demand in some cases, rather than keeping them around indefinitely?

They= 're pretty fast and can be pruned if desired, as mentioned above, as lo= ng as the header chain is kept.

--94eb2c08504a0bf8b00550fddc89--