Delivery-date: Thu, 18 Jul 2024 11:07:43 -0700 Received: from mail-yb1-f186.google.com ([209.85.219.186]) by mail.fairlystable.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sUVXZ-0002LP-S8 for bitcoindev@gnusha.org; Thu, 18 Jul 2024 11:07:43 -0700 Received: by mail-yb1-f186.google.com with SMTP id 3f1490d57ef6-e03a1ef4585sf2608813276.3 for ; Thu, 18 Jul 2024 11:07:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20230601; t=1721326056; x=1721930856; darn=gnusha.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-sender:mime-version :subject:references:in-reply-to:message-id:to:from:date:sender:from :to:cc:subject:date:message-id:reply-to; bh=SqAh22nsXw9geT4dz2scWGBtWsN0NgsWF80C19kwABw=; b=qySPR+9XGZCydOxV7zQuK7GcVaJFcnDPRRmgbALsGSeyn4zeq988f47uzBkszCJJE4 dA2D9kpVjRp9fYxfoUSRY3hXLL9FbbrHNsGCVtFlg+XzWJgpRG3Mu0ErfaHlTJA9rlan hy5mPlpblEpgDQG4LEbNHco4H2eLuePmkLCc0pbthx4E3HrH97h7nlP7+5KLnYpfDi9t e9DDNq5gUOlfhT/8G1b1rGfgap8eF/YrcqKxvyynAaMnAjmGCsy9LAm7WkPIP8AzKG75 tXt+P6hq2awBLg2pFy9zztXPRyp1K0+uc4IX+fQNlq3xXj+ef4G5eh0BslXsmvXFu+W0 7zTQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721326056; x=1721930856; darn=gnusha.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-sender:mime-version :subject:references:in-reply-to:message-id:to:from:date:from:to:cc :subject:date:message-id:reply-to; bh=SqAh22nsXw9geT4dz2scWGBtWsN0NgsWF80C19kwABw=; b=nPmUySIGmjUfCe9f31jQMm4ngh70sRibeFeYZ/sk8/g8+mr+fnRMirkiuRkHs/IMRw 5PJq0Qq+peDyiiNnfUy3Paew00owAWlJFElYLB3GxBDC+sNa01DzAFwL/PO2r7I9b46M xX5tU663KN9Q6HTLcDXYyIYh2eDII2jZMswZ21GYNwT48UfrsZ2hzF58mKXz0IUh4UST w0466g3qv0wzxIsEmq6nbQL5Ot3kFdwBINchs0YmpCZH19ioz+XYxi1qPfYoVIbXXJQn amHe7IYvn8jRFtAKCan9YVNTzEBqQ0dVIdPSUw7pPNDbrlH5zjp1+u5Vmoaov5ZMxUEL 8z+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721326056; x=1721930856; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-sender:mime-version :subject:references:in-reply-to:message-id:to:from:date:x-beenthere :x-gm-message-state:sender:from:to:cc:subject:date:message-id :reply-to; bh=SqAh22nsXw9geT4dz2scWGBtWsN0NgsWF80C19kwABw=; b=gQLxjveSrizjHF8hAbzrl2uDKoLjx6NgX0nEMSqoXMhZRtdoxgHgTkT3HY+znk6aiz WpjFdJgQPzfs3CHhG4caS+DdsBvJv2kd1p8ILs/b4XOMDhL/z1GS0F/HD2r1q2orDj1u 8uRPAY1ZrHs8NXfoYgTmRHiIYxbugvETTETGUdfZhX4t03UWziTr476B/yKRMqzIeMaa ND+LS3wQRTkXIJlYon0AuEjB5lwtRu2DqWr70EKgIum2MkFq0qUGfxqCrcRsjcKcUgkX 3CVKMJfBVnFR5zGgsAZtDCp9KLMnJmFAya2kRPl+ea0el9e8Q4LA+1b57omAqPuqIyR7 FzpQ== Sender: bitcoindev@googlegroups.com X-Forwarded-Encrypted: i=1; AJvYcCVaWqWsAKnEfDoKPRVDayT4G5RGRQt2873m2FkjXvWYcEiznIU/6Ijdm4vWumymtS6/oVq+1+Vpewl0AYhcBlYJpoyHD9M= X-Gm-Message-State: AOJu0YwVflofddR4YH5bA452Agq7M7lBOAnL4LBfwTGmOYd82xXI7Nlk ckErerMGOSLo9RRrjWMI86kJYBvuN51rwAJXeNYwnEzEhE0SEeF3 X-Google-Smtp-Source: AGHT+IETWF0/rz7Q2MX5Z4TP4ta1kn5lq1LwhM2kd9EtP/RRAbg5JXSUkm2C8h0/WeB3a/qNfTPHtg== X-Received: by 2002:a05:6902:72f:b0:e03:530d:3a1a with SMTP id 3f1490d57ef6-e05feb1013amr4786080276.25.1721326055328; Thu, 18 Jul 2024 11:07:35 -0700 (PDT) X-BeenThere: bitcoindev@googlegroups.com Received: by 2002:a25:2e07:0:b0:e03:514d:f716 with SMTP id 3f1490d57ef6-e05fdbbe497ls2142668276.2.-pod-prod-07-us; Thu, 18 Jul 2024 11:07:33 -0700 (PDT) X-Received: by 2002:a05:690c:1e:b0:62c:c5ea:66ad with SMTP id 00721157ae682-66603703b7amr2609197b3.4.1721326053516; Thu, 18 Jul 2024 11:07:33 -0700 (PDT) Received: by 2002:a05:690c:3104:b0:664:87b6:d9e0 with SMTP id 00721157ae682-66918fcc18ams7b3; Thu, 18 Jul 2024 10:39:07 -0700 (PDT) X-Received: by 2002:a05:690c:6605:b0:62c:f01d:3470 with SMTP id 00721157ae682-66604d73884mr2180607b3.6.1721324346990; Thu, 18 Jul 2024 10:39:06 -0700 (PDT) Date: Thu, 18 Jul 2024 10:39:06 -0700 (PDT) From: Antoine Riard To: Bitcoin Development Mailing List Message-Id: In-Reply-To: References: <72e83c31-408f-4c13-bff5-bf0789302e23n@googlegroups.com> <5b0331a5-4e94-465d-a51d-02166e2c1937n@googlegroups.com> <9a4c4151-36ed-425a-a535-aa2837919a04n@googlegroups.com> <3f0064f9-54bd-46a7-9d9a-c54b99aca7b2n@googlegroups.com> <26b7321b-cc64-44b9-bc95-a4d8feb701e5n@googlegroups.com> <607a2233-ac12-4a80-ae4a-08341b3549b3n@googlegroups.com> <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com> <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com> <33dfd007-ac28-44a5-acee-cec4b381e854n@googlegroups.com> Subject: Re: [bitcoindev] Re: Great Consensus Cleanup Revival MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_200236_1021125574.1721324346776" X-Original-Sender: antoine.riard@gmail.com Precedence: list Mailing-list: list bitcoindev@googlegroups.com; contact bitcoindev+owners@googlegroups.com List-ID: X-Google-Group-Id: 786775582512 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -0.5 (/) ------=_Part_200236_1021125574.1721324346776 Content-Type: multipart/alternative; boundary="----=_Part_200237_850757959.1721324346776" ------=_Part_200237_850757959.1721324346776 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Eric, > While at some level the block message buffer would generally be=20 referenced by one or more C pointers, the difference between a valid=20 coinbase input (i.e. with a "null point") and any other input, is not=20 nullptr vs. !nullptr. A "null point" is a 36 byte value, 32 0x00 byes=20 followed by 4 0xff bytes. In his infinite wisdom Satoshi decided it was=20 better (or easier) to serialize a first block tx (coinbase) with an input= =20 containing an unusable script and pointing to an invalid [tx:index] tuple= =20 (input point) as opposed to just not having any input. That invalid input= =20 point is called a "null point", and of course cannot be pointed to by a=20 "null pointer". The coinbase must be identified by comparing those 36 bytes= =20 to the well-known null point value (and if this does not match the Merkle= =20 hash cannot have been type64 malleated). Good for the clarification here, I had in mind the core's `CheckBlock` path= =20 where the first block transaction pointer is dereferenced to verify if the= =20 transaction is a coinbase (i.e a "null point" where the prevout is null).= =20 Zooming out and back to my remark, I think this is correct that adding a=20 new 64 byte size check on all block transactions to detect block hash=20 invalidity could be a low memory overhead (implementation dependant),=20 rather than making that 64 byte check alone on the coinbase transaction as= =20 in my understanding you're proposing. > We call this type64 malleability (or malleation where it is not only=20 possible but occurs). Yes, the problem which has been described as the lack of "domain=20 separation". > The second one is the bip141 wtxid commitment in one of the coinbase=20 transaction `scriptpubkey` output, which is itself covered by a txid in the= =20 merkle tree. > While symmetry seems to imply that the witness commitment would be=20 malleable, just as the txs commitment, this is not the case. If the tx=20 commitment is correct it is computationally infeasible for the witness=20 commitment to be malleated, as the witness commitment incorporates each=20 full tx (with witness, sentinel, and marker). As such the block identifier,= =20 which relies only on the header and tx commitment, is a sufficient=20 identifier. Yet it remains necessary to validate the witness commitment to= =20 ensure that the correct witness data has been provided in the block message= . >=20 > The second type of malleability, in addition to type64, is what we call= =20 type32. This is the consequence of duplicated trailing sets of txs (and=20 therefore tx hashes) in a block message. This is applicable to some but not= =20 all blocks, as a function of the number of txs contained. To precise more your statement in describing source of malleability. The=20 witness stack can be malleated altering the wtxid and yet still valid. I=20 think you can still have the case where you're feeded a block header with a= =20 merkle root commitment deserializing to a valid coinbase transaction with= =20 an invalid witness commitment. This is the case of a "block message with=20 valid header but malleatead committed valid tx data". Validation of the=20 witness commitment to ensure the correct witness data has been provided in= =20 the block message is indeed necessary. >> Background: A fully-validated block has established identity in its=20 block hash. However an invalid block message may include the same block=20 header, producing the same hash, but with any kind of nonsense following=20 the header. The purpose of the transaction and witness commitments is of=20 course to establish this identity, so these two checks are therefore=20 necessary even under checkpoint/milestone. And then of course the two=20 Merkle tree issues complicate the tx commitment (the integrity of the=20 witness commitment is assured by that of the tx commitment). >> >> So what does it mean to speak of a block hash derived from: >> (1) a block message with an unparseable header? >> (2) a block message with parseable but invalid header? >> (3) a block message with valid header but unparseable tx data? >> (4) a block message with valid header but parseable invalid uncommitted= =20 tx data? >> (5) a block message with valid header but parseable invalid malleated=20 committed tx data? >> (6) a block message with valid header but parseable invalid unmalleated= =20 committed tx data? >> (7) a block message with valid header but uncommitted valid tx data? >> (8) a block message with valid header but malleated committed valid tx= =20 data? >> (9) a block message with valid header but unmalleated committed valid tx= =20 data? >> >> Note that only the #9 p2p block message contains an actual Bitcoin=20 block, the others are bogus messages. In all cases the message can be=20 sha256 hashed to establish the identity of the *message*. And if one's=20 objective is to reject repeating bogus messages, this might be a useful=20 strategy. It's already part of the p2p protocol, is orders of magnitude=20 cheaper to produce than a Merkle root, and has no identity issues. > I think I mostly agree with the identity issue as laid out so far, there= =20 is one caveat to add if you're considering identity caching as the problem= =20 solved. A validation node might have to consider differently block messages= =20 processed if they connect on the longest most PoW valid chain for which all= =20 blocks have been validated. Or alternatively if they have to be added on a= =20 candidate longest most PoW valid chain. > Certainly an important consideration. We store both types. Once there is= =20 a stronger candidate header chain we store the headers and proceed to=20 obtaining the blocks (if we don't already have them). The blocks are stored= =20 in the same table; the confirmed vs. candidate indexes simply point to them= =20 as applicable. It is feasible (and has happened twice) for two blocks to=20 share the very same coinbase tx, even with either/all bip30/34/90 active=20 (and setting aside future issues here for the sake of simplicity). This=20 remains only because two competing branches can have blocks at the same=20 height, and bip34 requires only height in the coinbase input script. This= =20 therefore implies the same transaction but distinct blocks. It is however= =20 infeasible for one block to exist in multiple distinct chains. In order for= =20 this to happen two blocks at the same height must have the same coinbase=20 (ok), and also the same parent (ok). But this then means that they either= =20 (1) have distinct identity due to another header property deviation, or (2)= =20 are the same block with the same parent and are therefore in just one=20 chain. So I don't see an actual caveat. I'm not certain if this is the=20 ambiguity that you were referring to. If not please feel free to clarify. If you assume no network partition and the no blocks more than 2h in the=20 future consensus rule, I cannot see how one block with no header property= =20 deviation can exist in multiple distinct chains. The ambiguity I was=20 referring was about a different angle, if the design goal of introducing a= =20 64 byte size check is to "it was about being able to cache the hash of a=20 (non-malleated) invalid block as permanently invalid to avoid=20 re-downloading and re-validating it", in my thinking we shall consider the= =20 whole block headers caching strategy and be sure we don't get situations=20 where an attacker can attach a chain of low-pow block headers with=20 malleated committed valid tx data yielding a block invalidity at the end,= =20 provoking as a side-effect a network-wide data download blowup. So I think= =20 any implementation of the validation of a block validity, of which identity= =20 is a sub-problem, should be strictly ordered by adequate proof-of-work=20 checks. > We don't do this and I don't see how it would be relevant. If a peer=20 provides any invalid message or otherwise violates the protocol it is=20 simply dropped. >=20 > The "problematic" that I'm referring to is the reliance on the block hash= =20 as a message identifier, because it does not identify the message and=20 cannot be useful in an effectively unlimited number of zero-cost cases. Historically, it was to isolate transaction-relay from block-relay to=20 optimistically harden in face of network partition, as this is easy to=20 infer transaction-relay topology with a lot of heuristics. I think this is correct that block hash message cannot be relied on as it= =20 cannot be useful in an unlimited number of zero-cost cases, as I was=20 pointing that bitcoin core partially mitigate that with discouraging=20 connections to block-relay peers servicing block messages=20 (`MaybePunishNodeForBlocks`). > #4 and #5 refer to "uncommitted" and "malleated committed". It may not be= =20 clear, but "uncommitted" means that the tx commitment is not valid (Merkle= =20 root doesn't match the header's value) and "malleated committed" means that= =20 the (matching) commitment cannot be relied upon because the txs represent= =20 malleation, invalidating the identifier. So neither of these are usable=20 identifiers. >=20 > It seems you may be referring to "unconfirmed" txs as opposed to=20 "uncommitted" txs. This doesn't pertain to tx storage or identifiers.=20 Neither #7 nor #8 are usable for the same reasons. >=20 > I'm making no reference to tx malleability. This concerns only Merkle=20 tree (block hash) malleability, the two types described in detail in the=20 paper I referenced earlier, here again: >=20 >=20 https://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/2019022= 5/a27d8837/attachment-0001.pdf I believe somehow the bottleneck we're circling around is computationally= =20 definining what are the "usable" identifiers for block messages. The most straightforward answer to this question is the full block in one= =20 single peer message, at least in my perspective. Reality since headers first synchronization (`getheaders`), block=20 validation has been dissociated in steps for performance reasons, among=20 others. > Again, this has no relation to tx hashes/identifiers. Libbitcoin has a tx= =20 pool, we just don't store them in RAM (memory). > I don't follow this. An invalid 64 byte tx consensus rule would=20 definitely not make it harder to exploit block message invalidity. In fact= =20 it would just slow down validation by adding a redundant rule. Furthermore,= =20 as I have detailed in a previous message, caching invalidity does=20 absolutely nothing to increase protection. In fact it makes the situation= =20 materially worse. Just to recall, in my understanding the proposal we're discussing is about= =20 outlawing 64 bytes size transactions at the consensus-level to minimize=20 denial-of-service vectors during block validation. I think we're talking=20 about each other because the mempool already introduce a layer of caching= =20 in bitcoin core, of which the result are re-used at block validation, such= =20 as signature verification results. I'm not sure we can fully waive apart=20 performance considerations, though I agree implementation architecture=20 subsystems like mempool should only be a sideline considerations. > No, this is not the case. As I detailed in my previous message, there is= =20 no possible scenario where invalidation caching does anything but make the= =20 situation materially worse. I think this can be correct that invalidation caching make the situation=20 materially worse, or is denial-of-service neutral, as I believe a full node= =20 is only trading space for time resources in matters of block messages=20 validation. I still believe such analysis, as detailed in your previous=20 message, would benefit to be more detailed. > On the other hand, just dealing with parse failure on the spot by=20 introducing a leading pattern in the stream just inflates the size of p2p= =20 messages, and the transaction-relay bandwidth cost. > I think you misunderstood me. I am suggesting no change to serialization.= =20 I can see how it might be unclear, but I said, "nothing precludes=20 incorporating a requirement for a necessary leading pattern in the stream."= =20 I meant that the parser can simply incorporate the *requirement* that the= =20 byte stream starts with a null input point. That identifies the malleation= =20 or invalidity without a single hash operation and while only reading a=20 handful of bytes. No change to any messages. Indeed, this is clearer with the re-explanation above about what you meant= =20 by the "null point". In my understanding, you're suggesting the following= =20 algorithm: - receive transaction p2p messages - deserialize transaction p2p messages - if the transaction is a coinbase candidate, verify null input point - if null input point pattern invalid, reject the transaction If I'm understanding correctly, the last rule has for effect to constraint= =20 the transaction space that can be used to brute-force and mount a Merkle=20 root forgery with a 64-byte coinbase transaction. As described in the 3.1.1 of the paper:=20 https://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/2019022= 5/a27d8837/attachment-0001.pdf > I'm referring to DoS mitigation (the only relevant security consideration= =20 here). I'm pointing out that invalidity caching is pointless in all cases,= =20 and in this case is the most pointless as type64 malleation is the cheapest= =20 of all invalidity to detect. I would prefer that all bogus blocks sent to= =20 my node are of this type. The worst types of invalidity detection have no= =20 mitigation and from a security standpoint are counterproductive to cache.= =20 I'm describing what overall is actually not a tradeoff. It's all negative= =20 and no positive. I think we're both discussing the same issue about DoS mitigation for sure.= =20 Again, I think that saying the "invalidity caching" is pointless in all=20 cases cannot be fully grounded as a statement without precising (a) what is= =20 the internal cache(s) layout of the full node processing block messages and= =20 (b) the sha256 mining resources available during N difficulty period and if= =20 any miner engage in self-fish mining like strategy. About (a), I'll maintain my point I think it's a classic time-space=20 trade-off to ponder in function of the internal cache layouts. About (b) I= =20 think we''ll be back to the headers synchronization strategy as implemented by a full node to discuss if they're exploitable asymmetries for self-fish= =20 mining like strategies. If you can give a pseudo-code example of the "null point" validation=20 implementation in libbitcoin code (?) I think this can make the=20 conversation more concrete on the caching aspect. > Rust has its own set of problems. No need to get into a language Jihad=20 here. My point was to clarify that the particular question was not about a= =20 C (or C++) null pointer value, either on the surface or underneath an=20 abstraction. Thanks for the additional comments on libbitcoin usage of dependencies, yes= =20 I don't think there is a need to get into a language jihad here. It's just= =20 like all languages have their memory model (stack, dynamic alloc, smart=20 pointers, etc) and when you're talking about performance it's useful to=20 have their minds, imho. Best, Antoine ots hash: 058d7b3adb154a3e64d5f8ccf1944903bcd0c49dbb525f7212adf4f7ac7f8c55 Le mardi 9 juillet 2024 =C3=A0 02:16:20 UTC+1, Eric Voskuil a =C3=A9crit : > > This is why we don't use C - unsafe, unclear, unnecessary. > > Actually, I think libbitcoin is using its own maintained fork of=20 > secp256k1, which is written in C. > > > We do not maintain secp256k1 code. For years that library carried the sam= e=20 > version, despite regular breaking changes to its API. This compelled us t= o=20 > place these different versions on distinct git branches. When it finally= =20 > became versioned we started phasing this unfortunate practice out. > > Out of the 10 repositories and at least half million lines of code, apart= =20 > from an embedded copy of qrencode that we don=E2=80=99t independently mai= ntain, I=20 > believe there is only one .c file in use in the entire project - the=20 > database mmap.c implementation for msvc builds. This includes hash=20 > functions, with vectorization optimizations, etc. > =20 > > For sure, I wouldn't recommend using C across a whole codebase as it's no= t=20 > memory-safe (euphemism) though it's still un-match if you wish to=20 > understand low-level memory management in hot paths. > > > This is a commonly held misperception. > > It can be easier to use C++ or Rust, though it doesn't mean it will be as= =20 > (a) perf optimal and (b) hardened against side-channels. > > > Rust has its own set of problems. No need to get into a language Jihad=20 > here. My point was to clarify that the particular question was not about = a=20 > C (or C++) null pointer value, either on the surface or underneath an=20 > abstraction. > > e=20 > --=20 You received this message because you are subscribed to the Google Groups "= Bitcoin Development Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to bitcoindev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/= bitcoindev/ac6cc3b8-43e5-4cd6-aabe-f5ffc4672812n%40googlegroups.com. ------=_Part_200237_850757959.1721324346776 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Eric,

> While at some level the block message buffer = would generally be referenced by one or more C pointers, the difference bet= ween a valid coinbase input (i.e. with a "null point") and any other input,= is not nullptr vs. !nullptr. A "null point" is a 36 byte value, 32 0x00 by= es followed by 4 0xff bytes. In his infinite wisdom Satoshi decided it was = better (or easier) to serialize a first block tx (coinbase) with an input c= ontaining an unusable script and pointing to an invalid [tx:index] tuple (i= nput point) as opposed to just not having any input. That invalid input poi= nt is called a "null point", and of course cannot be pointed to by a "null = pointer". The coinbase must be identified by comparing those 36 bytes to th= e well-known null point value (and if this does not match the Merkle hash c= annot have been type64 malleated).

Good for the clarification he= re, I had in mind the core's `CheckBlock` path where the first block transa= ction pointer is dereferenced to verify if the transaction is a coinbase (i= .e a "null point" where the prevout is null). Zooming out and back to my re= mark, I think this is correct that adding a new 64 byte size check on all b= lock transactions to detect block hash invalidity could be a low memory ove= rhead (implementation dependant), rather than making that 64 byte check alo= ne on the coinbase transaction as in my understanding you're proposing.

> We call this type64 malleability (or malleation where it is no= t only possible but occurs).

Yes, the problem which has been des= cribed as the lack of "domain separation".

> The second one i= s the bip141 wtxid commitment in one of the coinbase transaction `scriptpub= key` output, which is itself covered by a txid in the merkle tree.
> While symmetry seems to imply that the witness commitment would be = malleable, just as the txs commitment, this is not the case. If the tx comm= itment is correct it is computationally infeasible for the witness commitme= nt to be malleated, as the witness commitment incorporates each full tx (wi= th witness, sentinel, and marker). As such the block identifier, which reli= es only on the header and tx commitment, is a sufficient identifier. Yet it= remains necessary to validate the witness commitment to ensure that the co= rrect witness data has been provided in the block message.
>
= > The second type of malleability, in addition to type64, is what we cal= l type32. This is the consequence of duplicated trailing sets of txs (and t= herefore tx hashes) in a block message. This is applicable to some but not = all blocks, as a function of the number of txs contained.

To pre= cise more your statement in describing source of malleability. The witness = stack can be malleated altering the wtxid and yet still valid. I think you = can still have the case where you're feeded a block header with a merkle ro= ot commitment deserializing to a valid coinbase transaction with an invalid= witness commitment. This is the case of a "block message with valid header= but malleatead committed valid tx data". Validation of the witness commitm= ent to ensure the correct witness data has been provided in the block messa= ge is indeed necessary.

>> Background: A fully-validated b= lock has established identity in its block hash. However an invalid block m= essage may include the same block header, producing the same hash, but with= any kind of nonsense following the header. The purpose of the transaction = and witness commitments is of course to establish this identity, so these t= wo checks are therefore necessary even under checkpoint/milestone. And then= of course the two Merkle tree issues complicate the tx commitment (the int= egrity of the witness commitment is assured by that of the tx commitment).<= br />>>
>> So what does it mean to speak of a block hash d= erived from:
>> (1) a block message with an unparseable header?<= br />>> (2) a block message with parseable but invalid header?
&= gt;> (3) a block message with valid header but unparseable tx data?
>> (4) a block message with valid header but parseable invalid uncom= mitted tx data?
>> (5) a block message with valid header but par= seable invalid malleated committed tx data?
>> (6) a block messa= ge with valid header but parseable invalid unmalleated committed tx data?>> (7) a block message with valid header but uncommitted valid tx= data?
>> (8) a block message with valid header but malleated co= mmitted valid tx data?
>> (9) a block message with valid header = but unmalleated committed valid tx data?
>>
>> Note t= hat only the #9 p2p block message contains an actual Bitcoin block, the oth= ers are bogus messages. In all cases the message can be sha256 hashed to es= tablish the identity of the *message*. And if one's objective is to reject = repeating bogus messages, this might be a useful strategy. It's already par= t of the p2p protocol, is orders of magnitude cheaper to produce than a Mer= kle root, and has no identity issues.

> I think I mostly agre= e with the identity issue as laid out so far, there is one caveat to add if= you're considering identity caching as the problem solved. A validation no= de might have to consider differently block messages processed if they conn= ect on the longest most PoW valid chain for which all blocks have been vali= dated. Or alternatively if they have to be added on a candidate longest mos= t PoW valid chain.

> Certainly an important consideration. We= store both types. Once there is a stronger candidate header chain we store= the headers and proceed to obtaining the blocks (if we don't already have = them). The blocks are stored in the same table; the confirmed vs. candidate= indexes simply point to them as applicable. It is feasible (and has happen= ed twice) for two blocks to share the very same coinbase tx, even with eith= er/all bip30/34/90 active (and setting aside future issues here for the sak= e of simplicity). This remains only because two competing branches can have= blocks at the same height, and bip34 requires only height in the coinbase = input script. This therefore implies the same transaction but distinct bloc= ks. It is however infeasible for one block to exist in multiple distinct ch= ains. In order for this to happen two blocks at the same height must have t= he same coinbase (ok), and also the same parent (ok). But this then means t= hat they either (1) have distinct identity due to another header property d= eviation, or (2) are the same block with the same parent and are therefore = in just one chain. So I don't see an actual caveat. I'm not certain if this= is the ambiguity that you were referring to. If not please feel free to cl= arify.

If you assume no network partition and the no blocks more= than 2h in the future consensus rule, I cannot see how one block with no h= eader property deviation can exist in multiple distinct chains. The ambigui= ty I was referring was about a different angle, if the design goal of intro= ducing a 64 byte size check is to "it was about being able to cache the has= h of a (non-malleated) invalid block as permanently invalid to avoid re-dow= nloading and re-validating it", in my thinking we shall consider the whole = block headers caching strategy and be sure we don't get situations where an= attacker can attach a chain of low-pow block headers with malleated commit= ted valid tx data yielding a block invalidity at the end, provoking as a si= de-effect a network-wide data download blowup. So I think any implementatio= n of the validation of a block validity, of which identity is a sub-problem= , should be strictly ordered by adequate proof-of-work checks.

&= gt; We don't do this and I don't see how it would be relevant. If a peer pr= ovides any invalid message or otherwise violates the protocol it is simply = dropped.
>
> The "problematic" that I'm referring to is th= e reliance on the block hash as a message identifier, because it does not i= dentify the message and cannot be useful in an effectively unlimited number= of zero-cost cases.

Historically, it was to isolate transaction= -relay from block-relay to optimistically harden in face of network partiti= on, as this is easy to infer transaction-relay topology with a lot of heuri= stics.

I think this is correct that block hash message cannot be= relied on as it cannot be useful in an unlimited number of zero-cost cases= , as I was pointing that bitcoin core partially mitigate that with discoura= ging connections to block-relay peers servicing block messages (`MaybePunis= hNodeForBlocks`).

> #4 and #5 refer to "uncommitted" and "mal= leated committed". It may not be clear, but "uncommitted" means that the tx= commitment is not valid (Merkle root doesn't match the header's value) and= "malleated committed" means that the (matching) commitment cannot be relie= d upon because the txs represent malleation, invalidating the identifier. S= o neither of these are usable identifiers.
>
> It seems yo= u may be referring to "unconfirmed" txs as opposed to "uncommitted" txs. Th= is doesn't pertain to tx storage or identifiers. Neither #7 nor #8 are usab= le for the same reasons.
>
> I'm making no reference to tx= malleability. This concerns only Merkle tree (block hash) malleability, th= e two types described in detail in the paper I referenced earlier, here aga= in:
>
> https://lists.linuxfoundation.org/pipermail/bitcoi= n-dev/attachments/20190225/a27d8837/attachment-0001.pdf

I believ= e somehow the bottleneck we're circling around is computationally defininin= g what are the "usable" identifiers for block messages.
The most strai= ghtforward answer to this question is the full block in one single peer mes= sage, at least in my perspective.
Reality since headers first synchron= ization (`getheaders`), block validation has been dissociated in steps for = performance reasons, among others.

> Again, this has no relat= ion to tx hashes/identifiers. Libbitcoin has a tx pool, we just don't store= them in RAM (memory).

> I don't follow this. An invalid 64 b= yte tx consensus rule would definitely not make it harder to exploit block = message invalidity. In fact it would just slow down validation by adding a = redundant rule. Furthermore, as I have detailed in a previous message, cach= ing invalidity does absolutely nothing to increase protection. In fact it m= akes the situation materially worse.

Just to recall, in my under= standing the proposal we're discussing is about outlawing 64 bytes size tra= nsactions at the consensus-level to minimize denial-of-service vectors duri= ng block validation. I think we're talking about each other because the mem= pool already introduce a layer of caching in bitcoin core, of which the res= ult are re-used at block validation, such as signature verification results= . I'm not sure we can fully waive apart performance considerations, though = I agree implementation architecture subsystems like mempool should only be = a sideline considerations.

> No, this is not the case. A= s I detailed in my previous message, there is no possible scenario where in= validation caching does anything but make the situation materially worse.
I think this can be correct that invalidation caching make the si= tuation materially worse, or is denial-of-service neutral, as I believe a f= ull node is only trading space for time resources in matters of block messa= ges validation. I still believe such analysis, as detailed in your previous= message, would benefit to be more detailed.

> On the other h= and, just dealing with parse failure on the spot by introducing a leading p= attern in the stream just inflates the size of p2p messages, and the transa= ction-relay bandwidth cost.

> I think you misunderstood me. I= am suggesting no change to serialization. I can see how it might be unclea= r, but I said, "nothing precludes incorporating a requirement for a necessa= ry leading pattern in the stream." I meant that the parser can simply incor= porate the *requirement* that the byte stream starts with a null input poin= t. That identifies the malleation or invalidity without a single hash opera= tion and while only reading a handful of bytes. No change to any messages.<= br />
Indeed, this is clearer with the re-explanation above about what= you meant by the "null point". In my understanding, you're suggesting the = following algorithm:
- receive transaction p2p messages
- deseria= lize transaction p2p messages
- if the transaction is a coinbase candi= date, verify null input point
- if null input point pattern invalid, r= eject the transaction

If I'm understanding correctly, the last r= ule has for effect to constraint the transaction space that can be used to = brute-force and mount a Merkle root forgery with a 64-byte coinbase transac= tion.

As described in the 3.1.1 of the paper: ht= tps://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20190225/= a27d8837/attachment-0001.pdf

> I'm referring to DoS mitigatio= n (the only relevant security consideration here). I'm pointing out that in= validity caching is pointless in all cases, and in this case is the most po= intless as type64 malleation is the cheapest of all invalidity to detect. I= would prefer that all bogus blocks sent to my node are of this type. The w= orst types of invalidity detection have no mitigation and from a security s= tandpoint are counterproductive to cache. I'm describing what overall is ac= tually not a tradeoff. It's all negative and no positive.

I thin= k we're both discussing the same issue about DoS mitigation for sure. Again= , I think that saying the "invalidity caching" is pointless in all cases ca= nnot be fully grounded as a statement without precising (a) what is the int= ernal cache(s) layout of the full node processing block messages and (b) th= e sha256 mining resources available during N difficulty period and if any m= iner engage in self-fish mining like strategy.

About (a), I'll m= aintain my point I think it's a classic time-space trade-off to ponder in f= unction of the internal cache layouts. About (b) I think we''ll be back to = the headers synchronization strategy as implemented
by a full node to = discuss if they're exploitable asymmetries for self-fish mining like strate= gies.

If you can give a pseudo-code example of the "null point" = validation implementation in libbitcoin code (?) I think this can make the = conversation more concrete on the caching aspect.

> Rust has = its own set of problems. No need to get into a language Jihad here. My poin= t was to clarify that the particular question was not about a C (or C++) nu= ll pointer value, either on the surface or underneath an abstraction.
=
Thanks for the additional comments on libbitcoin usage of dependencie= s, yes I don't think there is a need to get into a language jihad here. It'= s just like all languages have their memory model (stack, dynamic alloc, sm= art pointers, etc) and when you're talking about performance it's useful to= have their minds, imho.

Best,
Antoine
ots hash:=C2=A0058d7b3adb154a3e64d5f8ccf1944903bcd0c49dbb525f7212adf4f7ac7= f8c55
Le mardi 9 juillet 2024 =C3=A0 02:16:20 UTC+1, Eric Voskuil a =C3= =A9crit=C2=A0:
> This is why we don't use C - unsafe, uncle= ar, unnecessary.

Actual= ly, I think libbitcoin is using its own maintained fork of secp256k1, which= is written in C.

We= do not maintain secp256k1 code. For years that library carried the same ve= rsion, despite regular breaking changes to its API. This compelled us to pl= ace these different versions on distinct git branches. When=C2=A0it f= inally became versioned we started phasing this unfortunate practice out.

Out of the 10 repositorie= s and at least half million lines of code, apart from an embedded copy of q= rencode that we don=E2=80=99t independently maintain, I believe there is on= ly one .c file in use in the entire project - the database mmap.c implement= ation for msvc builds. This includes hash functions, with vectorization opt= imizations, etc.
=C2=A0
= For sure, I wouldn't recommend using C across a whole codebase as it= 9;s not memory-safe (euphemism) though it's still un-match if you wish = to understand low-level memory management in hot paths.

This is a commonly held misperception.<= /div>

It can be easier to use C++= or Rust, though it doesn't mean it will be as (a) perf optimal and (b)= hardened against side-channels.

Rust has its own set of problems. No need to get into a langua= ge Jihad here. My point was to clarify that the particular question was not= about a C (or C++) null pointer value, either on the surface or underneath= an abstraction.

e=C2=A0
<= /div>

--
You received this message because you are subscribed to the Google Groups &= quot;Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bitcoind= ev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg= id/bitcoindev/ac6cc3b8-43e5-4cd6-aabe-f5ffc4672812n%40googlegroups.com.=
------=_Part_200237_850757959.1721324346776-- ------=_Part_200236_1021125574.1721324346776--