61/bb1aaa8509ac301cfa5ac6a7cecea6570872ed


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738

Delivery-date: Sat, 20 Jul 2024 13:51:36 -0700
Received: from mail-yw1-f191.google.com ([209.85.128.191])
	by mail.fairlystable.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
	(Exim 4.94.2)
	(envelope-from <bitcoindev+bncBC5P5KEHZQLBBTOG6C2AMGQEY5WMVPA@googlegroups.com>)
	id 1sVH3G-0001fN-20
	for bitcoindev@gnusha.org; Sat, 20 Jul 2024 13:51:36 -0700
Received: by mail-yw1-f191.google.com with SMTP id 00721157ae682-650fccfd1dfsf79506027b3.0
        for <bitcoindev@gnusha.org>; Sat, 20 Jul 2024 13:51:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=20230601; t=1721508688; x=1722113488; darn=gnusha.org;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:x-original-sender:mime-version
         :subject:references:in-reply-to:message-id:to:from:date:sender:from
         :to:cc:subject:date:message-id:reply-to;
        bh=xGl/4+RCgz6K8bpLfD1f5disjh8h6zKbiYYYjL3gVQ8=;
        b=W2ls2VMLRVvBJ396TJTGgqI+iI/WyonFOFtlQ3iY3Uy+yAqojrTsjqLqU2ZsYrVFGh
         FS0ti+Si9fDrhXdu9GbOboRZhwMIZiND8EPxqbqnTyNxJhw726O9wg/NcaxYGFa+BAnO
         mFHnQ+glirLK4ljZC3p56MwEfMjeBsMWgvXZU0T6/m8icsqbNl5PumeCEXAQC8HoI7tT
         GxQR5+gAFlHx837h5rPPeuvvRIjuHuidhansDQJwXgyi46Yoa2v4IRIyqNXgF8fHo4z4
         D20JD2abdyPGvhos7l1vKtxS8dTFPs9KnGBkLgeFXfzV/HLV7iLGMHEUGh97f6K54NR0
         GjOw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups-com.20230601.gappssmtp.com; s=20230601; t=1721508688; x=1722113488; darn=gnusha.org;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:x-original-sender:mime-version
         :subject:references:in-reply-to:message-id:to:from:date:from:to:cc
         :subject:date:message-id:reply-to;
        bh=xGl/4+RCgz6K8bpLfD1f5disjh8h6zKbiYYYjL3gVQ8=;
        b=EfTsgxqq4rbtClY7YRn/oDO8SX1hiv0QKd64GHBieazbFz+TsO/o+lwpp8Y4mh7/oM
         kqnX/fqsqknJ6cuOGeWhKmG5IYriObffV7bWv6yT9zqIVmo9DCRBko0M77fKSA1bfKKY
         YkxCTZnlm5BQpoA/VgraZ1BTTVE3H9hwb+DIzoEGKR8+GGkzB2M6Le7MTWTLRAC9eESN
         2O+T1mR1w0AwbIVrJbIEc9lda1fkCNiB3TbH3NM8djhanudqeuC2sX1GGzulem0T4nlP
         AfpSkSeqccMTuqv76YHY/G/qx9LewdbZzMacEoo5z5Bv/Rz8NX9S9jzt1yD/LyIgfmF1
         ro2Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1721508688; x=1722113488;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:x-original-sender:mime-version
         :subject:references:in-reply-to:message-id:to:from:date:x-beenthere
         :x-gm-message-state:sender:from:to:cc:subject:date:message-id
         :reply-to;
        bh=xGl/4+RCgz6K8bpLfD1f5disjh8h6zKbiYYYjL3gVQ8=;
        b=TB2/PfXHEfVU09PkkKTxDlfFyA15kjs0F9YD7THFbGMjc8yOfC0SZhn/UucSl+P9u9
         gFudbYfJPxrb7/Bk4cJECBnw9p+lLLS/eG6LKO/oEhALfEj+uTctONji/JulDsGhAose
         SjS52MAagHjS17v0EIwezMCOzV1Q6sGD0pwW6utJDmSskqqV/PkRoqP3fqhngrZUDlLM
         Fo0KOMjk4pwYyK5YTPDAwxt4fa422D6+jUy/je3BsWAr7TsFX1RYml0sDEFkjert4Pkh
         XgO6/8ULuRc/AQKvrm6VDGG6fefncz2Fa9555INQCu8EOOb/f5NnVuKyLKKopw1t6tLx
         cSMQ==
Sender: bitcoindev@googlegroups.com
X-Forwarded-Encrypted: i=1; AJvYcCU+ZEuTc3upccqfcZu1QDuBJaMtAFKlOPBoJjS8+M5BM2JDK3zHQ5thC4gDms1EqTkFQ8eQzaj3cIdJE6R1fVXfH/EOe6w=
X-Gm-Message-State: AOJu0Yzi7KHOY3HRGTHkV7iVFyrXOWpSaBj+6+eQ39ps2fbthvTfZU7y
	0405oMsPfYH47YtepyAG4Zh+sf0BF4rSv+EaDXZjf7ZGzYU18mDM
X-Google-Smtp-Source: AGHT+IF1ppFb8GtFGcbv5Sk21nR+VPfcg9MG6ratBM5PlLJIYwaW6fOyL97BLIrkZyA3acWi0NaqQw==
X-Received: by 2002:a05:6902:e81:b0:e05:674f:fad6 with SMTP id 3f1490d57ef6-e086fe6187amr4095235276.5.1721508687668;
        Sat, 20 Jul 2024 13:51:27 -0700 (PDT)
X-BeenThere: bitcoindev@googlegroups.com
Received: by 2002:a05:6902:1348:b0:dfe:f69f:99 with SMTP id
 3f1490d57ef6-e05fdb6fc73ls5233444276.2.-pod-prod-02-us; Sat, 20 Jul 2024
 13:51:25 -0700 (PDT)
X-Received: by 2002:a05:690c:298:b0:64a:d8f6:b9ed with SMTP id 00721157ae682-66a66e0debbmr871627b3.9.1721508685029;
        Sat, 20 Jul 2024 13:51:25 -0700 (PDT)
Received: by 2002:a05:690c:3104:b0:664:87b6:d9e0 with SMTP id 00721157ae682-66918fcc18bms7b3;
        Sat, 20 Jul 2024 13:29:54 -0700 (PDT)
X-Received: by 2002:a05:690c:389:b0:62a:4932:68de with SMTP id 00721157ae682-66a65c72308mr2574167b3.8.1721507393222;
        Sat, 20 Jul 2024 13:29:53 -0700 (PDT)
Date: Sat, 20 Jul 2024 13:29:53 -0700 (PDT)
From: Eric Voskuil <eric@voskuil.org>
To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com>
Message-Id: <926fdd12-4e50-433d-bd62-9cc41c7b22a0n@googlegroups.com>
In-Reply-To: <ac6cc3b8-43e5-4cd6-aabe-f5ffc4672812n@googlegroups.com>
References: <gnM89sIQ7MhDgI62JciQEGy63DassEv7YZAMhj0IEuIo0EdnafykF6RH4OqjTTHIHsIoZvC2MnTUzJI7EfET4o-UQoD-XAQRDcct994VarE=@protonmail.com>
 <72e83c31-408f-4c13-bff5-bf0789302e23n@googlegroups.com>
 <heKH68GFJr4Zuf6lBozPJrb-StyBJPMNvmZL0xvKFBnBGVA3fVSgTLdWc-_8igYWX8z3zCGvzflH-CsRv0QCJQcfwizNyYXlBJa_Kteb2zg=@protonmail.com>
 <5b0331a5-4e94-465d-a51d-02166e2c1937n@googlegroups.com>
 <yt1O1F7NiVj-WkmnYeta1fSqCYNFx8h6OiJaTBmwhmJ2MWAZkmmjPlUST6FM7t6_-2NwWKdglWh77vcnEKA8swiAnQCZJY2SSCAh4DOKt2I=@protonmail.com>
 <be78e733-6e9f-4f4e-8dc2-67b79ddbf677n@googlegroups.com>
 <jJLDrYTXvTgoslhl1n7Fk9-pL1mMC-0k6gtoniQINmioJpzgtqrJ_WqyFZkLltsCUusnQ4jZ6HbvRC-mGuaUlDi3kcqcFHALd10-JQl-FMY=@protonmail.com>
 <9a4c4151-36ed-425a-a535-aa2837919a04n@googlegroups.com>
 <3f0064f9-54bd-46a7-9d9a-c54b99aca7b2n@googlegroups.com>
 <26b7321b-cc64-44b9-bc95-a4d8feb701e5n@googlegroups.com>
 <CALZpt+EwVyaz1=A6hOOycqFGJs+zxyYYocZixTJgVmzZezUs9Q@mail.gmail.com>
 <607a2233-ac12-4a80-ae4a-08341b3549b3n@googlegroups.com>
 <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com>
 <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com>
 <e2c61ee5-68c4-461e-a132-bb86a4c3e2ccn@googlegroups.com>
 <33dfd007-ac28-44a5-acee-cec4b381e854n@googlegroups.com>
 <CALZpt+Fs1U5f3S6_tR7AFfEMEkgBPSp3OaNEq+eqYoCSSYXD7g@mail.gmail.com>
 <a76b8dc5-d37f-4059-882b-207004874887n@googlegroups.com>
 <ac6cc3b8-43e5-4cd6-aabe-f5ffc4672812n@googlegroups.com>
Subject: Re: [bitcoindev] Re: Great Consensus Cleanup Revival
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_713942_506106341.1721507393009"
X-Original-Sender: eric@voskuil.org
Precedence: list
Mailing-list: list bitcoindev@googlegroups.com; contact bitcoindev+owners@googlegroups.com
List-ID: <bitcoindev.googlegroups.com>
X-Google-Group-Id: 786775582512
List-Post: <https://groups.google.com/group/bitcoindev/post>, <mailto:bitcoindev@googlegroups.com>
List-Help: <https://groups.google.com/support/>, <mailto:bitcoindev+help@googlegroups.com>
List-Archive: <https://groups.google.com/group/bitcoindev
List-Subscribe: <https://groups.google.com/group/bitcoindev/subscribe>, <mailto:bitcoindev+subscribe@googlegroups.com>
List-Unsubscribe: <mailto:googlegroups-manage+786775582512+unsubscribe@googlegroups.com>,
 <https://groups.google.com/group/bitcoindev/subscribe>
X-Spam-Score: -0.7 (/)

------=_Part_713942_506106341.1721507393009
Content-Type: multipart/alternative; 
	boundary="----=_Part_713943_64256413.1721507393009"

------=_Part_713943_64256413.1721507393009
Content-Type: text/plain; charset="UTF-8"

Hi Antoine R,

>> While at some level the block message buffer would generally be 
referenced by one or more C pointers, the difference between a valid 
coinbase input (i.e. with a "null point") and any other input, is not 
nullptr vs. !nullptr. A "null point" is a 36 byte value, 32 0x00 byes 
followed by 4 0xff bytes. In his infinite wisdom Satoshi decided it was 
better (or easier) to serialize a first block tx (coinbase) with an input 
containing an unusable script and pointing to an invalid [tx:index] tuple 
(input point) as opposed to just not having any input. That invalid input 
point is called a "null point", and of course cannot be pointed to by a 
"null pointer". The coinbase must be identified by comparing those 36 bytes 
to the well-known null point value (and if this does not match the Merkle 
hash cannot have been type64 malleated).

> Good for the clarification here, I had in mind the core's `CheckBlock` 
path where the first block transaction pointer is dereferenced to verify if 
the transaction is a coinbase (i.e a "null point" where the prevout is 
null). Zooming out and back to my remark, I think this is correct that 
adding a new 64 byte size check on all block transactions to detect block 
hash invalidity could be a low memory overhead (implementation dependant), 
rather than making that 64 byte check alone on the coinbase transaction as 
in my understanding you're proposing.

I'm not sure what you mean by stating that a new consensus rule, "could be 
a low memory overhead". Checking all tx sizes is far more overhead than 
validating the coinbase for a null point. As AntoineP agreed, it cannot be 
done earlier, and I have shown that it is *significantly* more 
computationally intensive. It makes the determination much more costly and 
in all other cases by adding an additional check that serves no purpose.

>>> The second one is the bip141 wtxid commitment in one of the coinbase 
transaction `scriptpubkey` output, which is itself covered by a txid in the 
merkle tree.

>> While symmetry seems to imply that the witness commitment would be 
malleable, just as the txs commitment, this is not the case. If the tx 
commitment is correct it is computationally infeasible for the witness 
commitment to be malleated, as the witness commitment incorporates each 
full tx (with witness, sentinel, and marker). As such the block identifier, 
which relies only on the header and tx commitment, is a sufficient 
identifier. Yet it remains necessary to validate the witness commitment to 
ensure that the correct witness data has been provided in the block message.
>>
>> The second type of malleability, in addition to type64, is what we call 
type32. This is the consequence of duplicated trailing sets of txs (and 
therefore tx hashes) in a block message. This is applicable to some but not 
all blocks, as a function of the number of txs contained.

> To precise more your statement in describing source of malleability. The 
witness stack can be malleated altering the wtxid and yet still valid. I 
think you can still have the case where you're feeded a block header with a 
merkle root commitment deserializing to a valid coinbase transaction with 
an invalid witness commitment. This is the case of a "block message with 
valid header but malleatead committed valid tx data". Validation of the 
witness commitment to ensure the correct witness data has been provided in 
the block message is indeed necessary.

I think you misunderstood me. Of course the witness commitment must be 
validated (as I said, "Yet it remains necessary to validate the witness 
commitment..."), as otherwise the witnesses within a block can be anything 
without affecting the block hash. And of course the witness commitment is 
computed in the same manner as the tx commitment and is therefore subject 
to the same malleations. However, because the coinbase tx is committed to 
the block hash, there is no need to guard the witness commitment for 
malleation. And to my knowledge nobody has proposed doing so.

>>> I think I mostly agree with the identity issue as laid out so far, 
there is one caveat to add if you're considering identity caching as the 
problem solved. A validation node might have to consider differently block 
messages processed if they connect on the longest most PoW valid chain for 
which all blocks have been validated. Or alternatively if they have to be 
added on a candidate longest most PoW valid chain.

>> Certainly an important consideration. We store both types. Once there is 
a stronger candidate header chain we store the headers and proceed to 
obtaining the blocks (if we don't already have them). The blocks are stored 
in the same table; the confirmed vs. candidate indexes simply point to them 
as applicable. It is feasible (and has happened twice) for two blocks to 
share the very same coinbase tx, even with either/all bip30/34/90 active 
(and setting aside future issues here for the sake of simplicity). This 
remains only because two competing branches can have blocks at the same 
height, and bip34 requires only height in the coinbase input script. This 
therefore implies the same transaction but distinct blocks. It is however 
infeasible for one block to exist in multiple distinct chains. In order for 
this to happen two blocks at the same height must have the same coinbase 
(ok), and also the same parent (ok). But this then means that they either 
(1) have distinct identity due to another header property deviation, or (2) 
are the same block with the same parent and are therefore in just one 
chain. So I don't see an actual caveat. I'm not certain if this is the 
ambiguity that you were referring to. If not please feel free to clarify.

> If you assume no network partition and the no blocks more than 2h in the 
future consensus rule, I cannot see how one block with no header property 
deviation can exist in multiple distinct chains.

It cannot, that was my point: "(1) have distinct identity due to another 
header property deviation, or (2) are the same block..."

> The ambiguity I was referring was about a different angle, if the design 
goal of introducing a 64 byte size check is to "it was about being able to 
cache the hash of a (non-malleated) invalid block as permanently invalid to 
avoid re-downloading and re-validating it", in my thinking we shall 
consider the whole block headers caching strategy and be sure we don't get 
situations where an attacker can attach a chain of low-pow block headers 
with malleated committed valid tx data yielding a block invalidity at the 
end, provoking as a side-effect a network-wide data download blowup. So I 
think any implementation of the validation of a block validity, of which 
identity is a sub-problem, should be strictly ordered by adequate 
proof-of-work checks.

This was already the presumption.

>> We don't do this and I don't see how it would be relevant. If a peer 
provides any invalid message or otherwise violates the protocol it is 
simply dropped.
>>
>> The "problematic" that I'm referring to is the reliance on the block 
hash as a message identifier, because it does not identify the message and 
cannot be useful in an effectively unlimited number of zero-cost cases.

> Historically, it was to isolate transaction-relay from block-relay to 
optimistically harden in face of network partition, as this is easy to 
infer transaction-relay topology with a lot of heuristics.

I'm not seeing the connection here. Are you suggesting that tx and block 
hashes may collide with each other? Or that that a block message may be 
confused with a transaction message?

> I think this is correct that block hash message cannot be relied on as it 
cannot be useful in an unlimited number of zero-cost cases, as I was 
pointing that bitcoin core partially mitigate that with discouraging 
connections to block-relay peers servicing block messages 
(`MaybePunishNodeForBlocks`).

This does not mitigate the issue. It's essentially dead code. It's exactly 
like saying, "there's an arbitrary number of holes in the bucket, but we 
can plug a subset of those holes." Infinite minus any number is still 
infinite.

> I believe somehow the bottleneck we're circling around is computationally 
definining what are the "usable" identifiers for block messages. The most 
straightforward answer to this question is the full block in one single 
peer message, at least in my perspective.

I don't follow this statement. The term "usable" was specifically 
addressing the proposal - that a header hash must uniquely identify a block 
(a header and committed set of txs) as valid or otherwise. As I have 
pointed out, this will still not be the case if 64 byte blocks are 
invalidated. It is also not the case that detection of type64 malleated 
blocks can be made more performant if 64 byte txs are globally invalid. In 
fact the opposite is true, it becomes more costly (and complex) and is 
therefore just dead code.

> Reality since headers first synchronization (`getheaders`), block 
validation has been dissociated in steps for performance reasons, among 
others.

Headers first only defers malleation checks. The same checks are necessary 
whether you perform blocks first or headers first sync (we support both 
protocol levels). The only difference is that for headers first, a stored 
header might later become invalidated. However, this is the case with and 
without the possibility of malleation.

>> Again, this has no relation to tx hashes/identifiers. Libbitcoin has a 
tx pool, we just don't store them in RAM (memory).
>>
>> I don't follow this. An invalid 64 byte tx consensus rule would 
definitely not make it harder to exploit block message invalidity. In fact 
it would just slow down validation by adding a redundant rule. Furthermore, 
as I have detailed in a previous message, caching invalidity does 
absolutely nothing to increase protection. In fact it makes the situation 
materially worse.

> Just to recall, in my understanding the proposal we're discussing is 
about outlawing 64 bytes size transactions at the consensus-level to 
minimize denial-of-service vectors during block validation. I think we're 
talking about each other because the mempool already introduce a layer of 
caching in bitcoin core, of which the result are re-used at block 
validation, such as signature verification results. I'm not sure we can 
fully waive apart performance considerations, though I agree implementation 
architecture subsystems like mempool should only be a sideline 
considerations.

I have not suggested that anything is waived or ignored here. I'm stating 
that there is no "mempool" performance benefit whatsoever to invalidating 
64 byte txs. Mempool caching could only rely on tx identifiers, not block 
identifiers. Tx identifiers are not at issue.

>> No, this is not the case. As I detailed in my previous message, there is 
no possible scenario where invalidation caching does anything but make the 
situation materially worse.

> I think this can be correct that invalidation caching make the situation 
materially worse, or is denial-of-service neutral, as I believe a full node 
is only trading space for time resources in matters of block messages 
validation. I still believe such analysis, as detailed in your previous 
message, would benefit to be more detailed.

I don't know how to add any more detail than I already have. There are 
three relevant considerations:

(1) block hashes will not become unique identifiers for block messages.
(2) the earliest point at which type64 malleation can be detected will not 
be reduced.
(3) the necessary cost of type64 malleated determination will not be 
reduced.
(4) the additional consensus rule will increase validation cost and code 
complexity.
(5) invalid blocks can still be produced at no cost that require full 
double tx hashing/Merkle root computations.

Which of these statements are not evident at this point?

>> On the other hand, just dealing with parse failure on the spot by 
introducing a leading pattern in the stream just inflates the size of p2p 
messages, and the transaction-relay bandwidth cost.
>>
>> I think you misunderstood me. I am suggesting no change to 
serialization. I can see how it might be unclear, but I said, "nothing 
precludes incorporating a requirement for a necessary leading pattern in 
the stream." I meant that the parser can simply incorporate the 
*requirement* that the byte stream starts with a null input point. That 
identifies the malleation or invalidity without a single hash operation and 
while only reading a handful of bytes. No change to any messages.

> Indeed, this is clearer with the re-explanation above about what you 
meant by the "null point".

Ok

> In my understanding, you're suggesting the following algorithm:
> - receive transaction p2p messages
> - deserialize transaction p2p messages
> - if the transaction is a coinbase candidate, verify null input point
> - if null input point pattern invalid, reject the transaction

No, no part of this thread has any bearing on p2p transaction messages - 
nor are coinbase transactions relayed as transaction messages. You could 
restate it as:

- receive block p2p messages
- if the first tx's first input does not have a null point, reject the block

> If I'm understanding correctly, the last rule has for effect to 
constraint the transaction space that can be used to brute-force and mount 
a Merkle root forgery with a 64-byte coinbase transaction.
>
> As described in the 3.1.1 of the paper: 
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20190225/a27d8837/attachment-0001.pdf

The above approach makes this malleation computationally infeasible.

>> I'm referring to DoS mitigation (the only relevant security 
consideration here). I'm pointing out that invalidity caching is pointless 
in all cases, and in this case is the most pointless as type64 malleation 
is the cheapest of all invalidity to detect. I would prefer that all bogus 
blocks sent to my node are of this type. The worst types of invalidity 
detection have no mitigation and from a security standpoint are 
counterproductive to cache. I'm describing what overall is actually not a 
tradeoff. It's all negative and no positive.

> I think we're both discussing the same issue about DoS mitigation for 
sure. Again, I think that saying the "invalidity caching" is pointless in 
all cases cannot be fully grounded as a statement without precising (a) 
what is the internal cache(s) layout of the full node processing block 
messages and (b) the sha256 mining resources available during N difficulty 
period and if any miner engage in self-fish mining like strategy.

It has nothing to do with internal cache layout and nothing to do with 
mining resources. Not having a cache is clearly more efficient than having 
a cache that provides no advantage, regardless of how the cache is laid 
out. There is no cost to forcing a node to perform far more block 
validation computations than can be precluded by invalidity caching. The 
caching simply increases the overall computational cost (as would another 
redundant rule to try and make it more efficient). Discarding invalid 
blocks after the minimal amount of work is the most efficient resolution. 
What one does with the peer at that point is orthogonal (e.g. drop, ban).

> About (a), I'll maintain my point I think it's a classic time-space 
trade-off to ponder in function of the internal cache layouts.

An attacker can throw a nearly infinite number of distinct invalid blocks 
at your node (and all will connect to the chain and show proper PoW). As 
such you will encounter zero cache hits and therefore nothing but overhead 
from the cache. Please explain to me in detail how "cache layout" is going 
to make any difference at all.

> About (b) I think we''ll be back to the headers synchronization strategy 
as implemented by a full node to discuss if they're exploitable asymmetries 
for self-fish mining like strategies.

I don't see this as a related/relevant topic. There are zero mining 
resources required to overflow the invalidity cache. Just as Core recently 
published regarding overflowing to its "ban" store, resulting in process 
termination, this then introduces another attack vector that must be 
mitigated.

> If you can give a pseudo-code example of the "null point" validation 
implementation in libbitcoin code (?) I think this can make the 
conversation more concrete on the caching aspect.

pseudo-code , not from libbitcoin...

```
bool malleated64(block)
{
    segregated = ((block[80 + 4] == 0) and (block[80 + 4 + 1] == 1))
    return block[segregated ? 86 : 85] != 
0xffffffff0000000000000000000000000000000000000000000000000000000000000000
}
```

Obviously there is no error handling (e.g. block too small, too many 
inputs, etc.) but that is not relevant to the particular question. The 
block.header is fixed size, always 80 bytes. The tx.version is also fixed, 
always 4 bytes. A following 0 implies a segregated witness (otherwise it's 
the input count), assuming there is a following 1. The first and only input 
for the coinbase tx, which must be the first block tx, follows. If it does 
not match 
0xffffffff0000000000000000000000000000000000000000000000000000000000000000 
then the block is invalid. If it does match, it is computationally 
infeasible that the merkle root is type64 malleated. That's it, absolutely 
trivial and with no prerequisites. The only thing that even makes it 
interesting is the segwit bifurcation.

>> Rust has its own set of problems. No need to get into a language Jihad 
here. My point was to clarify that the particular question was not about a 
C (or C++) null pointer value, either on the surface or underneath an 
abstraction.

> Thanks for the additional comments on libbitcoin usage of dependencies, 
yes I don't think there is a need to get into a language jihad here. It's 
just like all languages have their memory model (stack, dynamic alloc, 
smart pointers, etc) and when you're talking about performance it's useful 
to have their minds, imho.

Sure, but no language difference that I'm aware of could have any bearing 
on this particular question.

Best,
Eric

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bitcoindev/926fdd12-4e50-433d-bd62-9cc41c7b22a0n%40googlegroups.com.

------=_Part_713943_64256413.1721507393009
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Antoine R,<br /><br />&gt;&gt; While at some level the block message buf=
fer would generally be referenced by one or more C pointers, the difference=
 between a valid coinbase input (i.e. with a "null point") and any other in=
put, is not nullptr vs. !nullptr. A "null point" is a 36 byte value, 32 0x0=
0 byes followed by 4 0xff bytes. In his infinite wisdom Satoshi decided it =
was better (or easier) to serialize a first block tx (coinbase) with an inp=
ut containing an unusable script and pointing to an invalid [tx:index] tupl=
e (input point) as opposed to just not having any input. That invalid input=
 point is called a "null point", and of course cannot be pointed to by a "n=
ull pointer". The coinbase must be identified by comparing those 36 bytes t=
o the well-known null point value (and if this does not match the Merkle ha=
sh cannot have been type64 malleated).<br /><br />&gt; Good for the clarifi=
cation here, I had in mind the core's `CheckBlock` path where the first blo=
ck transaction pointer is dereferenced to verify if the transaction is a co=
inbase (i.e a "null point" where the prevout is null). Zooming out and back=
 to my remark, I think this is correct that adding a new 64 byte size check=
 on all block transactions to detect block hash invalidity could be a low m=
emory overhead (implementation dependant), rather than making that 64 byte =
check alone on the coinbase transaction as in my understanding you're propo=
sing.<br /><br />I'm not sure what you mean by stating that a new consensus=
 rule, "could be a low memory overhead". Checking all tx sizes is far more =
overhead than validating the coinbase for a null point. As AntoineP agreed,=
 it cannot be done earlier, and I have shown that it is *significantly* mor=
e computationally intensive. It makes the determination much more costly an=
d in all other cases by adding an additional check that serves no purpose.<=
br /><br />&gt;&gt;&gt; The second one is the bip141 wtxid commitment in on=
e of the coinbase transaction `scriptpubkey` output, which is itself covere=
d by a txid in the merkle tree.<br /><br />&gt;&gt; While symmetry seems to=
 imply that the witness commitment would be malleable, just as the txs comm=
itment, this is not the case. If the tx commitment is correct it is computa=
tionally infeasible for the witness commitment to be malleated, as the witn=
ess commitment incorporates each full tx (with witness, sentinel, and marke=
r). As such the block identifier, which relies only on the header and tx co=
mmitment, is a sufficient identifier. Yet it remains necessary to validate =
the witness commitment to ensure that the correct witness data has been pro=
vided in the block message.<br />&gt;&gt;<br />&gt;&gt; The second type of =
malleability, in addition to type64, is what we call type32. This is the co=
nsequence of duplicated trailing sets of txs (and therefore tx hashes) in a=
 block message. This is applicable to some but not all blocks, as a functio=
n of the number of txs contained.<br /><br />&gt; To precise more your stat=
ement in describing source of malleability. The witness stack can be mallea=
ted altering the wtxid and yet still valid. I think you can still have the =
case where you're feeded a block header with a merkle root commitment deser=
ializing to a valid coinbase transaction with an invalid witness commitment=
. This is the case of a "block message with valid header but malleatead com=
mitted valid tx data". Validation of the witness commitment to ensure the c=
orrect witness data has been provided in the block message is indeed necess=
ary.<br /><br />I think you misunderstood me. Of course the witness commitm=
ent must be validated (as I said, "Yet it remains necessary to validate the=
 witness commitment..."), as otherwise the witnesses within a block can be =
anything without affecting the block hash. And of course the witness commit=
ment is computed in the same manner as the tx commitment and is therefore s=
ubject to the same malleations. However, because the coinbase tx is committ=
ed to the block hash, there is no need to guard the witness commitment for =
malleation. And to my knowledge nobody has proposed doing so.<br /><br />&g=
t;&gt;&gt; I think I mostly agree with the identity issue as laid out so fa=
r, there is one caveat to add if you're considering identity caching as the=
 problem solved. A validation node might have to consider differently block=
 messages processed if they connect on the longest most PoW valid chain for=
 which all blocks have been validated. Or alternatively if they have to be =
added on a candidate longest most PoW valid chain.<br /><br />&gt;&gt; Cert=
ainly an important consideration. We store both types. Once there is a stro=
nger candidate header chain we store the headers and proceed to obtaining t=
he blocks (if we don't already have them). The blocks are stored in the sam=
e table; the confirmed vs. candidate indexes simply point to them as applic=
able. It is feasible (and has happened twice) for two blocks to share the v=
ery same coinbase tx, even with either/all bip30/34/90 active (and setting =
aside future issues here for the sake of simplicity). This remains only bec=
ause two competing branches can have blocks at the same height, and bip34 r=
equires only height in the coinbase input script. This therefore implies th=
e same transaction but distinct blocks. It is however infeasible for one bl=
ock to exist in multiple distinct chains. In order for this to happen two b=
locks at the same height must have the same coinbase (ok), and also the sam=
e parent (ok). But this then means that they either (1) have distinct ident=
ity due to another header property deviation, or (2) are the same block wit=
h the same parent and are therefore in just one chain. So I don't see an ac=
tual caveat. I'm not certain if this is the ambiguity that you were referri=
ng to. If not please feel free to clarify.<br /><br />&gt; If you assume no=
 network partition and the no blocks more than 2h in the future consensus r=
ule, I cannot see how one block with no header property deviation can exist=
 in multiple distinct chains.<br /><br />It cannot, that was my point: "(1)=
 have distinct identity due to another header property deviation, or (2) ar=
e the same block..."<br /><br />&gt; The ambiguity I was referring was abou=
t a different angle, if the design goal of introducing a 64 byte size check=
 is to "it was about being able to cache the hash of a (non-malleated) inva=
lid block as permanently invalid to avoid re-downloading and re-validating =
it", in my thinking we shall consider the whole block headers caching strat=
egy and be sure we don't get situations where an attacker can attach a chai=
n of low-pow block headers with malleated committed valid tx data yielding =
a block invalidity at the end, provoking as a side-effect a network-wide da=
ta download blowup. So I think any implementation of the validation of a bl=
ock validity, of which identity is a sub-problem, should be strictly ordere=
d by adequate proof-of-work checks.<br /><br />This was already the presump=
tion.<br /><br />&gt;&gt; We don't do this and I don't see how it would be =
relevant. If a peer provides any invalid message or otherwise violates the =
protocol it is simply dropped.<br />&gt;&gt;<br />&gt;&gt; The "problematic=
" that I'm referring to is the reliance on the block hash as a message iden=
tifier, because it does not identify the message and cannot be useful in an=
 effectively unlimited number of zero-cost cases.<br /><br />&gt; Historica=
lly, it was to isolate transaction-relay from block-relay to optimistically=
 harden in face of network partition, as this is easy to infer transaction-=
relay topology with a lot of heuristics.<br /><br />I'm not seeing the conn=
ection here. Are you suggesting that tx and block hashes may collide with e=
ach other? Or that that a block message may be confused with a transaction =
message?<br /><br />&gt; I think this is correct that block hash message ca=
nnot be relied on as it cannot be useful in an unlimited number of zero-cos=
t cases, as I was pointing that bitcoin core partially mitigate that with d=
iscouraging connections to block-relay peers servicing block messages (`May=
bePunishNodeForBlocks`).<br /><br />This does not mitigate the issue. It's =
essentially dead code. It's exactly like saying, "there's an arbitrary numb=
er of holes in the bucket, but we can plug a subset of those holes." Infini=
te minus any number is still infinite.<br /><br />&gt; I believe somehow th=
e bottleneck we're circling around is computationally definining what are t=
he "usable" identifiers for block messages. The most straightforward answer=
 to this question is the full block in one single peer message, at least in=
 my perspective.<br /><br />I don't follow this statement. The term "usable=
" was specifically addressing the proposal - that a header hash must unique=
ly identify a block (a header and committed set of txs) as valid or otherwi=
se. As I have pointed out, this will still not be the case if 64 byte block=
s are invalidated. It is also not the case that detection of type64 malleat=
ed blocks can be made more performant if 64 byte txs are globally invalid. =
In fact the opposite is true, it becomes more costly (and complex) and is t=
herefore just dead code.<br /><br />&gt; Reality since headers first synchr=
onization (`getheaders`), block validation has been dissociated in steps fo=
r performance reasons, among others.<br /><br />Headers first only defers m=
alleation checks. The same checks are necessary whether you perform blocks =
first or headers first sync (we support both protocol levels). The only dif=
ference is that for headers first, a stored header might later become inval=
idated. However, this is the case with and without the possibility of malle=
ation.<br /><br />&gt;&gt; Again, this has no relation to tx hashes/identif=
iers. Libbitcoin has a tx pool, we just don't store them in RAM (memory).<b=
r />&gt;&gt;<br />&gt;&gt; I don't follow this. An invalid 64 byte tx conse=
nsus rule would definitely not make it harder to exploit block message inva=
lidity. In fact it would just slow down validation by adding a redundant ru=
le. Furthermore, as I have detailed in a previous message, caching invalidi=
ty does absolutely nothing to increase protection. In fact it makes the sit=
uation materially worse.<br /><br />&gt; Just to recall, in my understandin=
g the proposal we're discussing is about outlawing 64 bytes size transactio=
ns at the consensus-level to minimize denial-of-service vectors during bloc=
k validation. I think we're talking about each other because the mempool al=
ready introduce a layer of caching in bitcoin core, of which the result are=
 re-used at block validation, such as signature verification results. I'm n=
ot sure we can fully waive apart performance considerations, though I agree=
 implementation architecture subsystems like mempool should only be a sidel=
ine considerations.<br /><br />I have not suggested that anything is waived=
 or ignored here. I'm stating that there is no "mempool" performance benefi=
t whatsoever to invalidating 64 byte txs. Mempool caching could only rely o=
n tx identifiers, not block identifiers. Tx identifiers are not at issue.<b=
r /><br />&gt;&gt; No, this is not the case. As I detailed in my previous m=
essage, there is no possible scenario where invalidation caching does anyth=
ing but make the situation materially worse.<br /><br />&gt; I think this c=
an be correct that invalidation caching make the situation materially worse=
, or is denial-of-service neutral, as I believe a full node is only trading=
 space for time resources in matters of block messages validation. I still =
believe such analysis, as detailed in your previous message, would benefit =
to be more detailed.<br /><br />I don't know how to add any more detail tha=
n I already have. There are three relevant considerations:<br /><br />(1) b=
lock hashes will not become unique identifiers for block messages.<br />(2)=
 the earliest point at which type64 malleation can be detected will not be =
reduced.<br />(3) the necessary cost of type64=C2=A0malleated=C2=A0determin=
ation will not be reduced.<br />(4) the additional consensus rule will incr=
ease validation cost and code complexity.<br />(5) invalid blocks can still=
 be produced at no cost that require full double tx hashing/Merkle root com=
putations.<br /><br />Which of these statements are not evident at this poi=
nt?<br /><br />&gt;&gt; On the other hand, just dealing with parse failure =
on the spot by introducing a leading pattern in the stream just inflates th=
e size of p2p messages, and the transaction-relay bandwidth cost.<br />&gt;=
&gt;<br />&gt;&gt; I think you misunderstood me. I am suggesting no change =
to serialization. I can see how it might be unclear, but I said, "nothing p=
recludes incorporating a requirement for a necessary leading pattern in the=
 stream." I meant that the parser can simply incorporate the *requirement* =
that the byte stream starts with a null input point. That identifies the ma=
lleation or invalidity without a single hash operation and while only readi=
ng a handful of bytes. No change to any messages.<br /><br />&gt; Indeed, t=
his is clearer with the re-explanation above about what you meant by the "n=
ull point".<br /><br />Ok<br /><br />&gt; In my understanding, you're sugge=
sting the following algorithm:<br />&gt; - receive transaction p2p messages=
<br />&gt; - deserialize transaction p2p messages<br />&gt; - if the transa=
ction is a coinbase candidate, verify null input point<br />&gt; - if null =
input point pattern invalid, reject the transaction<br /><br />No, no part =
of this thread has any bearing on p2p transaction messages - nor are coinba=
se transactions relayed as transaction messages. You could restate it as:<b=
r /><br />- receive block p2p messages<br />- if the first tx's first input=
 does not have a null point, reject the block<br /><br />&gt; If I'm unders=
tanding correctly, the last rule has for effect to constraint the transacti=
on space that can be used to brute-force and mount a Merkle root forgery wi=
th a 64-byte coinbase transaction.<br />&gt;<br />&gt; As described in the =
3.1.1 of the paper: https://lists.linuxfoundation.org/pipermail/bitcoin-dev=
/attachments/20190225/a27d8837/attachment-0001.pdf<br /><br />The above app=
roach makes this malleation computationally infeasible.<br /><br />&gt;&gt;=
 I'm referring to DoS mitigation (the only relevant security consideration =
here). I'm pointing out that invalidity caching is pointless in all cases, =
and in this case is the most pointless as type64 malleation is the cheapest=
 of all invalidity to detect. I would prefer that all bogus blocks sent to =
my node are of this type. The worst types of invalidity detection have no m=
itigation and from a security standpoint are counterproductive to cache. I'=
m describing what overall is actually not a tradeoff. It's all negative and=
 no positive.<br /><br />&gt; I think we're both discussing the same issue =
about DoS mitigation for sure. Again, I think that saying the "invalidity c=
aching" is pointless in all cases cannot be fully grounded as a statement w=
ithout precising (a) what is the internal cache(s) layout of the full node =
processing block messages and (b) the sha256 mining resources available dur=
ing N difficulty period and if any miner engage in self-fish mining like st=
rategy.<br /><br />It has nothing to do with internal cache layout and noth=
ing to do with mining resources. Not having a cache is clearly more efficie=
nt than having a cache that provides no advantage, regardless of how the ca=
che is laid out. There is no cost to forcing a node to perform far more blo=
ck validation computations than can be precluded by invalidity caching. The=
 caching simply increases the overall computational cost (as would another =
redundant rule to try and make it more efficient). Discarding invalid block=
s after the minimal amount of work is the most efficient resolution. What o=
ne does with the peer at that point is orthogonal (e.g. drop, ban).<br /><b=
r />&gt; About (a), I'll maintain my point I think it's a classic time-spac=
e trade-off to ponder in function of the internal cache layouts.<br /><br /=
>An attacker can throw a nearly infinite number of distinct invalid blocks =
at your node (and all will connect to the chain and show proper PoW). As su=
ch you will encounter zero cache hits and therefore nothing but overhead fr=
om the cache. Please explain to me in detail how "cache layout" is going to=
 make any difference at all.<br /><br />&gt; About (b) I think we''ll be ba=
ck to the headers synchronization strategy as implemented by a full node to=
 discuss if they're exploitable asymmetries for self-fish mining like strat=
egies.<br /><br />I don't see this as a related/relevant topic. There are z=
ero mining resources required to overflow the invalidity cache. Just as Cor=
e recently published regarding overflowing to its "ban" store, resulting in=
 process termination, this then introduces another attack vector that must =
be mitigated.<br /><br />&gt; If you can give a pseudo-code example of the =
"null point" validation implementation in libbitcoin code (?) I think this =
can make the conversation more concrete on the caching aspect.<br /><br />p=
seudo-code=C2=A0, not from libbitcoin...<br /><br />```<br />bool malleated=
64(block)<br />{<br />=C2=A0 =C2=A0 segregated =3D ((block[80 + 4] =3D=3D 0=
) and (block[80 + 4 + 1] =3D=3D 1))<br />=C2=A0 =C2=A0 return block[segrega=
ted ? 86 : 85] !=3D 0xffffffff000000000000000000000000000000000000000000000=
0000000000000000000<br />}<br />```<br /><br />Obviously there is no error =
handling (e.g. block too small, too many inputs, etc.) but that is not rele=
vant to the particular question. The block.header is fixed size, always 80 =
bytes. The tx.version is also fixed, always 4 bytes. A following 0 implies =
a segregated witness (otherwise it's the input count), assuming there is a =
following 1. The first and only input for the coinbase tx, which must be th=
e first block tx, follows. If it does not match 0xffffffff00000000000000000=
00000000000000000000000000000000000000000000000 then the block is invalid. =
If it does match, it is computationally infeasible that the merkle root is =
type64 malleated. That's it, absolutely trivial and with no prerequisites. =
The only thing that even makes it interesting is the segwit bifurcation.<br=
 /><br />&gt;&gt; Rust has its own set of problems. No need to get into a l=
anguage Jihad here. My point was to clarify that the particular question wa=
s not about a C (or C++) null pointer value, either on the surface or under=
neath an abstraction.<br /><br />&gt; Thanks for the additional comments on=
 libbitcoin usage of dependencies, yes I don't think there is a need to get=
 into a language jihad here. It's just like all languages have their memory=
 model (stack, dynamic alloc, smart pointers, etc) and when you're talking =
about performance it's useful to have their minds, imho.<br /><br />Sure, b=
ut no language difference that I'm aware of could have any bearing on this =
particular question.<br /><br />Best,<br />Eric<br />

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;Bitcoin Development Mailing List&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:bitcoindev+unsubscribe@googlegroups.com">bitcoind=
ev+unsubscribe@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/d/msgid/bitcoindev/926fdd12-4e50-433d-bd62-9cc41c7b22a0n%40googlegroups.=
com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com/d/msg=
id/bitcoindev/926fdd12-4e50-433d-bd62-9cc41c7b22a0n%40googlegroups.com</a>.=
<br />

------=_Part_713943_64256413.1721507393009--

------=_Part_713942_506106341.1721507393009--