Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 1E7981318 for ; Wed, 14 Mar 2018 00:38:04 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from outmail149084.authsmtp.net (outmail149084.authsmtp.net [62.13.149.84]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id A8C96576 for ; Wed, 14 Mar 2018 00:38:02 +0000 (UTC) Received: from mail-c245.authsmtp.com (mail-c245.authsmtp.com [62.13.128.245]) by punt21.authsmtp.com. (8.15.2/8.15.2) with ESMTP id w2E0bxFG056161; Wed, 14 Mar 2018 00:37:59 GMT (envelope-from pete@petertodd.org) Received: from petertodd.org (ec2-52-5-185-120.compute-1.amazonaws.com [52.5.185.120]) (authenticated bits=0) by mail.authsmtp.com (8.15.2/8.15.2) with ESMTPSA id w2E0bvAk074166 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 14 Mar 2018 00:37:58 GMT (envelope-from pete@petertodd.org) Received: from [127.0.0.1] (localhost [127.0.0.1]) by petertodd.org (Postfix) with ESMTPSA id EEB1840118; Wed, 14 Mar 2018 00:37:56 +0000 (UTC) Received: by localhost (Postfix, from userid 1000) id 0780320065; Tue, 13 Mar 2018 20:37:53 -0400 (EDT) Date: Tue, 13 Mar 2018 20:37:52 -0400 From: Peter Todd To: Daniel Robinson Message-ID: <20180314003752.GA3853@fedora-23-dvm> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="17pEHd4RhPHOinZp" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Server-Quench: f1a8a1ad-271f-11e8-9f3c-9cb654bb2504 X-AuthReport-Spam: If SPAM / abuse - report it at: http://www.authsmtp.com/abuse X-AuthRoute: OCd2Yg0TA1ZNQRgX IjsJECJaVQIpKltL GxAVKBZePFsRUQkR aAdMdwYUFVQGAgsB Am4bW1ReU1p7XGs7 bghPaBtcak9QXgdq T0pMXVMcUwcdAU5H d0UeVR1zcQMIeXh3 bUIsCHEKVBV7JBJg QElVQ3AHZDJodWlJ UxNFflAGdgZOLE1H b1B7GhFYa3VzNz4x VxQvJS06IShFJWxb RRtFIFwcQE0KEzgg DwgYGjIhBgUCSW01 KBpjK1gXGFsKM0I0 WQAA X-Authentic-SMTP: 61633532353630.1039:706 X-AuthFastPath: 0 (Was 255) X-AuthSMTP-Origin: 52.5.185.120/25 X-AuthVirus-Status: No virus detected - but ensure you scan with your own anti-virus system. X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: bitcoin-dev@lists.linuxfoundation.org Subject: Re: [bitcoin-dev] Data structure for efficient proofs of non-inclusion X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Mar 2018 00:38:04 -0000 --17pEHd4RhPHOinZp Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Feb 14, 2018 at 09:09:18PM +0000, Daniel Robinson wrote: > Hi Peter, >=20 > Thanks for the mind-expanding presentation on client-side validation at > Chaincode last week. >=20 CCing bitcoin-dev because this is of general interest... For background, Daniel is asking about my client-side verified single-use-s= eals via proof-of-publication model, previously published here=B9, which creates= an anti-double-spend primitive via a proof-of-publication, and many proofs-of-non-publication. tl;dr: A seal is closed by publishing a valid signature for the seal to a ledger. The first signature is the valid one, so if Alice want to convince = Bob you she closed a seal correctly (e.g. to pay Bob), she has to supply Bob wi= th a proof that the signature _was_ published - a proof-of-publication - as well= as proof that prior to being published, no valid signature was previously published - a proof-of-non-publication. It's the proofs-of-non-publication that take up the most space. > I'm trying to sketch out a system that would use some of those ideas, and > am looking at what Merkle-tree-like structure for the "transaction root" > would best allow efficient proofs of non-inclusion in a block. The simple= st > solution seems to just be a Merkle tree with the transactions sorted by > public key, but it seems like having intermediate information about ranges > higher up in the branches might help make non-inclusion proofs more > compact. Other solutions I've seen like Patricia trees and sparse Merkle > trees seem to be optimizing for easy recomputation on update, which doesn= 't > seem to be necessary for at least this version of the idea. > > Are there any verifiable data structures you've found that improve on > sorted Merkle trees with respect to proofs of non-inclusion? So remember that the system I proposed=B9 used sorted merkle trees only wit= hin a block; for blocks themselves you mostly can't do any better than a linear l= ist. Though I think there may be some exceptions which deserves another email. :) As you know, asymptotically merkle trees have excellent log2(n) proof size growth. But as you correctly suggest, their high overhead in the small-n ca= se suggests that we can do better. In fact, Bram Cohen previously proposed=B2 = a "TXO Bitfield" for the somewhat similar use-case of committing to the spentness = of outputs efficiently. # Naive Analysis So suppose at an intermediate node you commit to a simple bitfield where ea= ch possible value under that node is represented by a single bit. Thus for m values under that point in the tree, the marginal size of the non-inclusion proof is m bits. By comparison a naive merkle tree built from a hash functi= on with k bits takes approximately k*log2(m) bits to prove non-inclusion. For = an rough, unsophisticated, analysis just solve: m =3D k * log2(m) Apparently you can do this analytically, but as this analysis is only approximate a much better idea is to just plot it on a graph: for k=3D256bi= ts the crossover point is roughly m=3D3000. # Merkle Tree Structure But what is the bitfield competing against exactly? Just saying "merkle tre= e" isn't very precise. Most designs along these lines use something like a merkelized patricia tree, where each bit of the key is compared individually and each node is a two-way (left vs right) branch. Better yet is the radix tree, where each inner node that has only one child is merged with its pare= nt. Regardless, the logic of these kinds of trees can be thought of a recursive query, where each type of node has a `get(key)` operation that returns eith= er a value or None. So let's define a new type of inner node that commits to a merkle-mountain-range (MMR) tip and a 2^j element bitfield. `get(key)` is t= hen this pseudo-rust: fn get(self, prefix) -> Option { let idx =3D Int::from(prefix[0 .. j]); if self.bitfield[idx] { let mmr_idx =3D node.bitfield[0 .. idx].count_ones() - 1; Some(node.mmr[mmr_idx].get(prefix) } else { None } } The hard part with this is choosing when to use a bitfield-based inner node instead of a standard one. Assuming keys are randomly distributed, it makes sense to do so when the bitfield table is partially empty, but how empty? I= t's a trade-off between multiple parameters, including the size of proofs-of-publication - although the latter may be OK to ignore as the numb= er of proof-of-non-publication needed should usually greatly outnumber proofs-of-publication. Question: is it OK for this choice to not be part of the deterministic consensus? Is that even possible to enforce? # Security For a proof-of-publication to be secure, it must ensure that any attempt to construct a false proof-of-non-publication will fail. In the pruning model, that means that a proof-of-publication is simply the data necessary for the proof-of-non-publication verifier to return false. Concretely: fn verify_pop(tree, key) -> bool { !verify_non_pop(tree, key) } However note the subtle difference in trust model with respect to censorship between the following two possible non-pop verifiers: fn verify_non_pop(tree, key) -> bool { !tree.exists(key) } fn verify_non_pop(tree, key) -> bool { match tree.get(key) { Some(value) =3D> !verify_signature(value), None =3D> true, } } ## False Positives Note how if we use the second `verify_non_pop()` function shown above we can also use probabilistic data structures such as bloom filters in place of a bitfield. This works because a false-positive is acceptable, as it will sti= ll fail signature verification (or sooner if the key is committed in the leaf node). For example, it's plausible that a compressed bloom filter would be more sp= ace efficient than a bitfield, as the multiple hashing steps might use the bits= in the filter more efficiently. Investigating this further would be a good research topic. # References 1) "[bitcoin-dev] Scalable Semi-Trustless Asset Transfer via Single-Use-Sea= ls and Proof-of-Publication", Peter Todd, Dec 5th 2017, https://lists.linuxfoundation.org/pipermail/bi= tcoin-dev/2017-December/015350.html 2) "[bitcoin-dev] The TXO bitfield", Bram Cohen, Mar 31st 2017, https://lists.linuxfoundation.org/pipermail/b= itcoin-dev/2017-March/013928.html 3) "Bloom filters", Wikipedia, Jan 27th 2018, https://en.wikipedia.org/w/index.php?title=3D= Bloom_filter&oldid=3D822632093 --=20 https://petertodd.org 'peter'[:-1]@petertodd.org --17pEHd4RhPHOinZp Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- iQGSBAEBCAB8FiEEFcyURjhyM68BBPYTJIFAPaXwkfsFAlqobt5eFIAAAAAAFQBA YmxvY2toYXNoQGJpdGNvaW4ub3JnMDAwMDAwMDAwMDAwMDAwMDAwMzFmOTYxY2Ri YjQ3MjliNjcxMGVmOTg5MWY4M2QxMDljODA0ODg5NTllMDFjYwAKCRAkgUA9pfCR +wL2B/4svFyIc4JdPpLonz2KVwM+YM5MiWxnf7xQnaVE4QnN2eoShUQ2YWIdcqaf NbIEgX+z5F5G8WYGZsymiaYqkSRHVDdg3FeXG/b/DKvdo4JcsvAWI7AegxRiPgUy rxIDbWZ+pjcsvMwKBonXry7EknNSqRq5iP8lEoZE6K5jh85o5mDzM/fsIYznB8gp hEkuQyuc4SfjDJpy4hNHGaJ6YkA901wBvMNLHm5wzCHJEAl23L2VSvi2nfpAUKcb u7hGckNddVCyMIXBEWCC6UXusf6ttfMnj3onTRGC7cb0aMY2nRyVnTn1N+pdT8bg 6LO/zgroGxCMG9IRe2prQFeLxfQQ =985T -----END PGP SIGNATURE----- --17pEHd4RhPHOinZp--