Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id B863E8A6 for ; Fri, 18 Nov 2016 16:47:14 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-yb0-f173.google.com (mail-yb0-f173.google.com [209.85.213.173]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 3B5811D3 for ; Fri, 18 Nov 2016 16:47:13 +0000 (UTC) Received: by mail-yb0-f173.google.com with SMTP id v78so81609256ybe.3 for ; Fri, 18 Nov 2016 08:47:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=voskuil-org.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=WZH08C05lJGa5s0DlaF1skZNjpaBAR7gXMo5o7gutDU=; b=bgVXfhyZoSZsXbukULRBx2LgsGZVc9eoyhk64OjfivxoMoNKmr0YPpsLAQmkgW3FNE mKTVFbzXugw6Y2OqFYb7qnMXTxqRq9fm0bad/oIVIVgG82l4Szilrpzm6Y304kMDWo2W JYJjVvDq1oamt5UoiHPrcEXP+wpxdBf/Af0HtlNDsVsvh+z9D2vIKglumQ/YDMHqf+Vr hX93RXQ2puwEldfN9xvVDPxN80GuixJJwvax3xhfvxyzdnaLfWVqYiCaUPufyMTi22pP OD1QCa1h0SU8csKvhcJhIa5daoQK0Td4L/Dv4Vhylmi3Kwj3M90mKaYNXyhJLUcKBaYq WmLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=WZH08C05lJGa5s0DlaF1skZNjpaBAR7gXMo5o7gutDU=; b=ek+kwM+uqb/I8b5OfskQ6+m2Os07rPql4XNWHV3uQ2SUyGvHuv2EkTGei1eZt/F5yX ioVWEf5axvmYoAyhzMyuLpuAfVBjPnAHUi+dUuGhu9acn9go6tj73fg3d4yCnNjoMi1+ AsyYcgZHbi/4ocFSpYNmADvSviFNGX4DeuufYENYgWcY28WgYYNrWkrPvstsf0brqyw1 hF08/Nm9qdBRRaZmfKiloOML0xx7fBe5KjSHl5BDHp53AJgUnt48t3V6N8lOEtttxzY+ F2zS7sXiEGjttknH/Ft5G4ZEC8+3eAk8JD0V/VTgkPReScNTqIPIrnZL0fjRHVyuDf7R sM5A== X-Gm-Message-State: AKaTC00emA6javxiGzakUZ7sVnd1Ly6A3IoW46jf/r1TgAONRHyps+UHerA91KSkfPPyxg== X-Received: by 10.37.197.201 with SMTP id v192mr341753ybe.33.1479487632068; Fri, 18 Nov 2016 08:47:12 -0800 (PST) Received: from [10.26.17.106] ([166.177.185.204]) by smtp.gmail.com with ESMTPSA id d136sm3138998ywh.50.2016.11.18.08.47.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 18 Nov 2016 08:47:11 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) From: Eric Voskuil X-Mailer: iPhone Mail (14B100) In-Reply-To: <169CC80A-3B63-4D58-8E8C-0D1D9489E891@xbt.hk> Date: Fri, 18 Nov 2016 10:47:09 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <1313CE5A-430F-45B6-A476-9FFA984452C7@voskuil.org> References: <5ef23296-5909-a350-ab11-e717f8fffc41@voskuil.org> <34949746-c0c9-7f14-0e92-69d5a7d44b04@voskuil.org> <8d92ae05-ac6a-30b7-5ef3-f7aa1298e46d@voskuil.org> <632B36D5-74AF-41E2-8E21-359F02645066@xbt.hk> <59D27CC6-120C-4673-9F20-6B5E95EA60C6@voskuil.org> <6F2B3EA2-4245-4A0E-8E19-12D02A871815@xbt.hk> <11B3C69E-5F1B-4D25-86CE-E5F3B603266F@voskuil.org> <169CC80A-3B63-4D58-8E8C-0D1D9489E891@xbt.hk> To: Johnson Lau X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Fri, 18 Nov 2016 17:10:45 +0000 Cc: bitcoin-dev Subject: Re: [bitcoin-dev] BIP30 and BIP34 interaction (was Re: [BIP Proposal] Buried Deployments) X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Nov 2016 16:47:14 -0000 What is the difference between downloading a hash and comparing it to a hash= vs downloading a hash and then a block and comparing it to a block? You are talking about breaking a system in order to make it run faster. Usin= g the hash is an non-optimization trade against correctness. There is no "first seen" rule, there is only valid and invalid. Even the nam= e exposes the error of this thinking as "first" requires order. Caching invalidity for DOS protection is fine. It should be quite obvious th= at the blockchain is nothing more than a coach of validity. If it's faster i= n some cases to store both validity and all invalidity that you are aware of= it is fine, you are trading space for time. But caching information that is neither validity nor invalidity, and using i= t to validate blocks is a break. I cannot emphasize this point enough. A technique that provides varied resul= ts based on communication history, such as this "rule", is an attack vector.= It allows the attacker to place information into your cache and read it bac= k later from another connection. Even optimizing correct results based on co= mmunication history exposes the node in this manner. These sort of attacks h= ave been shown to be very effective at deanonymizing hidden nodes. The p2p protocol actually makes this sort of attack a matter of communicatio= n standard via the sharing of address information, but this can be disabled w= ithout impacting correctness. Due to such non-optimizations as the first see= n "rule" however, a node becomes a candy store of fingerprinting attack vect= ors. Bitcoin provides the mechanism to reject cheaply-produced invalid blocks qui= ckly. This is after all the fundamental principle of hash cash - force the a= ttacker to pay to spam attack. By obtaining headers first a node can obtain p= roof of work and perform correct and fast validation before ever obtaining t= he block's transactions. This technique is probably no more time-costly than= the incorrect technique of checking a cache of hashes (ironically, a "hash c= ache" is an incorrect "hash cash"), and avoids the extra space of a secondar= y cache (the blockchain is the primary cache). It also avoids the varied tim= e response that a secondary cache creates. So once again, premature optimization erupts from the underlying design flaw= , and creates more problems than proper design. The p2p network standard did= n't have headers first at one point, making correct checks more costly. That= is no longer the case. But nevertheless, one cannot trade correctness for t= ime. The tx pool, like the orphan pool, as I mentioned previously, is an optimiza= tion. It is not a part of consensus, so it isn't relevant to a discussion ab= out forks. It is also a design flaw that nodes are expected to hold invalid t= ransactions. It exposes nodes to both DOS and fingerprinting attacks. Proper= tx handling implies that a tx connect to a valid block. There is no "header= " for a transaction so correctness requires that the tx be downloaded before= it can be validated. e > On Nov 18, 2016, at 8:43 AM, Johnson Lau wrote: >=20 > In this case I don=E2=80=99t understand how your implementation won=E2=80=99= t be DoS-ed. An attacker could keep sending you inv for the same block / tra= nsaction. Since you don=E2=80=99t assume the hash is unique, each time you h= ave to download the block/tx again before you could tell if that is the same= one you have already known. Otherwise, you are implementing the =E2=80=9Cfi= rst seen=E2=80=9D rule. >=20 > Also, you can=E2=80=99t ban a peer just because you get an invalid tx from= him, because he might be referring to a hash-colliding UTXO that you don=E2= =80=99t know. In that case you need to request for the parent tx to verify. I= wonder if you are really doing that. >=20 >> On 18 Nov 2016, at 11:20, Eric Voskuil wrote: >>=20 >> You are suggesting that, since a node implements a denial of service poli= cy that actually denies itself otherwise valid blocks, those blocks are cond= itionally invalid. And that, since the validity condition is based on order o= f arrival and therefore independently unverifiable, Bitcoin consensus is bro= ken in the face of a hash collision. >>=20 >> I am aware of two other hash collision scenarios that cause Core to decla= re blocks invalid based on ordering. The block hash duplicate check (it's no= t fork-point relative) and signature verification caching. Like the "block b= anning" issue above, the latter is related to an internal optimization. I wo= uld categorize the former as a simple oversight that presumably goes way bac= k. >>=20 >> What then is the consequence of validity that is unverifiable? You believ= e this means that Bitcoin consensus is broken. This is incorrect. First unde= rstand that it is not possible for consensus rules to invalidate blocks base= d on order of arrival. As such any *implementation* that invalidates blocks b= ased on order of arrival is broken. It is an error to claim that these behav= iors are part of consensus, despite being implemented in the satoshi node(s)= . >>=20 >> Validity must be verifiable independent of the state of other nodes. Cons= ensus is a function of block history and time alone. Time is presumed to be u= niversally consistent. To be a consensus rule all nodes must be able to inde= pendently reach the same validity conclusion, given the same set of blocks, i= ndependent of order. If this is not the case the behavior is not a consensus= rule, it is simply a bug.=20 >>=20 >> Deviating from such bugs is not a break with consensus, since such non-ru= les cannot be part of consensus. One node implementation can behave determin= istically while others are behaving non-deterministically, with the two node= s remaining consistent from a consensus standpoint (deterministic produces a= subset of non-deterministic results). But, unlike arbitrary nodes, determin= istic nodes will not cause disruption on the network. >>=20 >> You imply that these determinism bugs are necessary, that there is no fix= . This is also incorrect. >>=20 >> The block banning hash collision bug is avoided by not using non-chain/cl= ock state to determine validity. Doing otherwise is clearly a bug. The hash o= f a block is not the block itself, a logically-correct ban would be to compa= re the wire serialization of the block as opposed to the hash, or not mainta= in the feature at all. >>=20 >> The signature verification caching hash collision bug is the same problem= , an optimization based on an invalid assumption. A full serialization compa= rison (true identity), or elimination of the feature resolves the bug. >>=20 >> The block hash check collision bug is trivially resolved by checking at t= he fork point as opposed to the tip. This prevents arbitrary (and irrational= ) invalidity based on conflict with irrelevant blocks that may or may not ex= ist above the fork point. >>=20 >> Libbitcoin is deterministic in all three cases (although the third issue i= s not made consistent until v3). I am not aware of any other non-determinism= in Core, but I don't spend a lot of time there. There is no need to study o= ther implementations to ensure determinism, as that can be verified independ= ently. >>=20 >> Any situation in which a node cannot provide deterministic validation of u= nordered blocks constitutes a non-consensus bug, as the behavior is not cons= istently verifiable by others under any conditions. Fixing/preventing these b= ugs is responsible development behavior, and does not require forks or BIPs,= since Bitcoin doesn't inherently contain any such bugs. They are the conseq= uence of incorrect implementation, and in two of the three cases above have r= esulted from supposed optimizations. But any code that creates non-determini= sm in exchange for speed, etc. is not an optimization, it's a bug. A node mu= st implement its optimizations in a manner that does not alter consensus. >>=20 >> The BIP30 regression hard fork is not a case of non-determinism. This wil= l produce deterministic results (apart from the impact of unrelated bugs). H= owever the results are both a clear break from previous (and documented) con= sensus but also produce a very undesirable outcome - destruction of all unsp= ent outputs in the "replaced" transaction for starters. So this is a distinc= t category, not a determinism bug but a hard fork that produces undesired co= nsequences. >>=20 >> The BIP30 regression hard fork actually enables the various pathological s= cenarios that you were describing, where no such issues existed in Bitcoin c= onsensus previously. It is now possible to produce a block that mutates anot= her arbitrarily deep block, and forces a reorg all the way back to the mutat= ed block. This was done to save microseconds per block. Despite the improbab= ility of hash collisions, I find this deplorable and the lack of public disc= ussion on the decision concerning. >>=20 >> With respect to the original post, the point at issue is the introduction= of another hard fork, with some odd behaviors, but without any justificatio= n apart from tidying up the small amount of necessary code. These issues are= related in that they are both consensus forks that have been introduced as s= upposed optimizations, with no public discussion prior to release (or at lea= st merging to master with the presumption of shipping in the latter case). T= wo of the three hash collision issues above are also related in that they ar= e bugs introduced by a desire to optimize internals. >>=20 >> The engineering lesson here should be clear - watch out for developers be= aring optimizations. A trade against correctness is not an optimization, it'= s a break. Satoshi was clearly a fan of the premature optimization. FindAndD= elete is a howler. So this is a tradition in Bitcoin. My intent is not to sl= ing mud but to improve the situation. >>=20 >> It is very possible to produce straightforward and deterministic code tha= t abides consensus and materially outperforms Core, without any of the above= optimization breaks, even avoiding the utxo set optimization. Even the tx (= memory) and block (orphan) pools are complex store denormalizations implemen= ted as optimizations. Optimizing before producing a clean conceptual model a= rchitecture and design is a software development anti-pattern (premature opt= imization). The proposed fork is a premature optimization. There are much mo= re significant opportunities to better organize code (and improve performanc= e). I cannot support the decision to advance it. >>=20 >> I was unaware Core had regressed BIP30. Given that the behavior is catast= rophic and that it introduces the *only* hash-collision consensus misbehavio= r (unless we consider a deep reorg sans the otherwise necessary proof of wor= k desirable behavior), I strongly recommend it be reverted, with a post-mort= em BIP. >>=20 >> Finally I recommend people contemplate the difference between unlikely an= d impossible. The chance of random collision is very small, but not zero. Co= lliding hashes is extremely difficult, but not impossible. But Bitcoin does n= ot rely on impossibility for correct behavior. It relies of difficulty. This= is a subtle but important distinction that people are missing. >>=20 >> Difficulty is a knowable quantity - a function of computing power. If ha= sh operations remain difficult, Bitcoin is undeterred. Collisions will have n= o impact, even if they happen with unexpected frequency (which would still b= e vanishingly infrequent). If the difficulty of producing a collision is red= uced to the point where people cannot rely on addresses (for example), then B= itcoin has a problem, as it has become a leaky ship (and then there's mining= ). But with the unnecessary problems described above, a single hash collisio= n can be catastrophic. Unlike difficulty, which is known, nobody can know wh= en a single collision will show up. Betting Bitcoin, and potentially the wor= ld's money, on the unknowable is poor reasoning, especially given that the c= ost of not doing so is so very low. >>=20 >> e >>=20 >>> On Nov 17, 2016, at 10:08 AM, Johnson Lau wrote: >>>=20 >>> The fact that some implementations ban an invalid block hash and some do= not, suggests that it=E2=80=99s not a pure p2p protocol issue. A pure p2p s= plit should be unified by a bridge node. However, a bridge node is not helpf= ul in this case. Banning an invalid block hash is an implicit =E2=80=9Cfirst= seen=E2=80=9D consensus rule. >>>=20 >>> jl2012 >>>=20 >>>> On 18 Nov 2016, at 01:49, Eric Voskuil wrote: >>>>=20 >>>> Actually both possibilities were specifically covered in my description= . Sorry if it wasn't clear. >>>>=20 >>>> If you create a new valid block out of an old one it's has potential to= cause a reorg. The blocks that previously built on the original are still a= ble to do so but presumably cannot build forever on the *new* block as it ha= s a different tx. But other new blocks can. There is no chain split due to a= different interpretation of valid, there are simply two valid competing cha= ins. >>>>=20 >>>> Note that this scenario requires not only block and tx validity with a t= x hash collision, but also that the tx be valid within the block. Pretty far= to reach to not even get a chain split, but it could produce a deep reorg w= ith a very low chance of success. As I keep telling people, deep reorgs can h= appen, they are just unlikely, as is this scenario. >>>>=20 >>>> If you create a new invalid block it is discarded by everyone. That doe= s not invalidate the hash of that block. Permanent blocking as you describe i= t would be a p2p protocol design choice, having nothing to do with consensus= . Libbitcoin for example does not ban invalidated hashes at all. It just dis= cards the block and drops the peer. >>>>=20 >>>> e >>>=20 >>>=20 >=20 >=20