summaryrefslogtreecommitdiff
path: root/86/8feeb6193efe7f78e73ea2a9fe9051e8173def
blob: fbecf79abe8788526ba94352c6addfc67271c827 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
Delivery-date: Thu, 18 Jul 2024 11:07:43 -0700
Received: from mail-yb1-f186.google.com ([209.85.219.186])
	by mail.fairlystable.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
	(Exim 4.94.2)
	(envelope-from <bitcoindev+bncBC3PT7FYWAMRBZNT4W2AMGQEGOTL4OY@googlegroups.com>)
	id 1sUVXZ-0002LP-S8
	for bitcoindev@gnusha.org; Thu, 18 Jul 2024 11:07:43 -0700
Received: by mail-yb1-f186.google.com with SMTP id 3f1490d57ef6-e03a1ef4585sf2608813276.3
        for <bitcoindev@gnusha.org>; Thu, 18 Jul 2024 11:07:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=20230601; t=1721326056; x=1721930856; darn=gnusha.org;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:x-original-sender:mime-version
         :subject:references:in-reply-to:message-id:to:from:date:sender:from
         :to:cc:subject:date:message-id:reply-to;
        bh=SqAh22nsXw9geT4dz2scWGBtWsN0NgsWF80C19kwABw=;
        b=qySPR+9XGZCydOxV7zQuK7GcVaJFcnDPRRmgbALsGSeyn4zeq988f47uzBkszCJJE4
         dA2D9kpVjRp9fYxfoUSRY3hXLL9FbbrHNsGCVtFlg+XzWJgpRG3Mu0ErfaHlTJA9rlan
         hy5mPlpblEpgDQG4LEbNHco4H2eLuePmkLCc0pbthx4E3HrH97h7nlP7+5KLnYpfDi9t
         e9DDNq5gUOlfhT/8G1b1rGfgap8eF/YrcqKxvyynAaMnAjmGCsy9LAm7WkPIP8AzKG75
         tXt+P6hq2awBLg2pFy9zztXPRyp1K0+uc4IX+fQNlq3xXj+ef4G5eh0BslXsmvXFu+W0
         7zTQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1721326056; x=1721930856; darn=gnusha.org;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:x-original-sender:mime-version
         :subject:references:in-reply-to:message-id:to:from:date:from:to:cc
         :subject:date:message-id:reply-to;
        bh=SqAh22nsXw9geT4dz2scWGBtWsN0NgsWF80C19kwABw=;
        b=nPmUySIGmjUfCe9f31jQMm4ngh70sRibeFeYZ/sk8/g8+mr+fnRMirkiuRkHs/IMRw
         5PJq0Qq+peDyiiNnfUy3Paew00owAWlJFElYLB3GxBDC+sNa01DzAFwL/PO2r7I9b46M
         xX5tU663KN9Q6HTLcDXYyIYh2eDII2jZMswZ21GYNwT48UfrsZ2hzF58mKXz0IUh4UST
         w0466g3qv0wzxIsEmq6nbQL5Ot3kFdwBINchs0YmpCZH19ioz+XYxi1qPfYoVIbXXJQn
         amHe7IYvn8jRFtAKCan9YVNTzEBqQ0dVIdPSUw7pPNDbrlH5zjp1+u5Vmoaov5ZMxUEL
         8z+A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1721326056; x=1721930856;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:x-original-sender:mime-version
         :subject:references:in-reply-to:message-id:to:from:date:x-beenthere
         :x-gm-message-state:sender:from:to:cc:subject:date:message-id
         :reply-to;
        bh=SqAh22nsXw9geT4dz2scWGBtWsN0NgsWF80C19kwABw=;
        b=gQLxjveSrizjHF8hAbzrl2uDKoLjx6NgX0nEMSqoXMhZRtdoxgHgTkT3HY+znk6aiz
         WpjFdJgQPzfs3CHhG4caS+DdsBvJv2kd1p8ILs/b4XOMDhL/z1GS0F/HD2r1q2orDj1u
         8uRPAY1ZrHs8NXfoYgTmRHiIYxbugvETTETGUdfZhX4t03UWziTr476B/yKRMqzIeMaa
         ND+LS3wQRTkXIJlYon0AuEjB5lwtRu2DqWr70EKgIum2MkFq0qUGfxqCrcRsjcKcUgkX
         3CVKMJfBVnFR5zGgsAZtDCp9KLMnJmFAya2kRPl+ea0el9e8Q4LA+1b57omAqPuqIyR7
         FzpQ==
Sender: bitcoindev@googlegroups.com
X-Forwarded-Encrypted: i=1; AJvYcCVaWqWsAKnEfDoKPRVDayT4G5RGRQt2873m2FkjXvWYcEiznIU/6Ijdm4vWumymtS6/oVq+1+Vpewl0AYhcBlYJpoyHD9M=
X-Gm-Message-State: AOJu0YwVflofddR4YH5bA452Agq7M7lBOAnL4LBfwTGmOYd82xXI7Nlk
	ckErerMGOSLo9RRrjWMI86kJYBvuN51rwAJXeNYwnEzEhE0SEeF3
X-Google-Smtp-Source: AGHT+IETWF0/rz7Q2MX5Z4TP4ta1kn5lq1LwhM2kd9EtP/RRAbg5JXSUkm2C8h0/WeB3a/qNfTPHtg==
X-Received: by 2002:a05:6902:72f:b0:e03:530d:3a1a with SMTP id 3f1490d57ef6-e05feb1013amr4786080276.25.1721326055328;
        Thu, 18 Jul 2024 11:07:35 -0700 (PDT)
X-BeenThere: bitcoindev@googlegroups.com
Received: by 2002:a25:2e07:0:b0:e03:514d:f716 with SMTP id 3f1490d57ef6-e05fdbbe497ls2142668276.2.-pod-prod-07-us;
 Thu, 18 Jul 2024 11:07:33 -0700 (PDT)
X-Received: by 2002:a05:690c:1e:b0:62c:c5ea:66ad with SMTP id 00721157ae682-66603703b7amr2609197b3.4.1721326053516;
        Thu, 18 Jul 2024 11:07:33 -0700 (PDT)
Received: by 2002:a05:690c:3104:b0:664:87b6:d9e0 with SMTP id 00721157ae682-66918fcc18ams7b3;
        Thu, 18 Jul 2024 10:39:07 -0700 (PDT)
X-Received: by 2002:a05:690c:6605:b0:62c:f01d:3470 with SMTP id 00721157ae682-66604d73884mr2180607b3.6.1721324346990;
        Thu, 18 Jul 2024 10:39:06 -0700 (PDT)
Date: Thu, 18 Jul 2024 10:39:06 -0700 (PDT)
From: Antoine Riard <antoine.riard@gmail.com>
To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com>
Message-Id: <ac6cc3b8-43e5-4cd6-aabe-f5ffc4672812n@googlegroups.com>
In-Reply-To: <a76b8dc5-d37f-4059-882b-207004874887n@googlegroups.com>
References: <gnM89sIQ7MhDgI62JciQEGy63DassEv7YZAMhj0IEuIo0EdnafykF6RH4OqjTTHIHsIoZvC2MnTUzJI7EfET4o-UQoD-XAQRDcct994VarE=@protonmail.com>
 <72e83c31-408f-4c13-bff5-bf0789302e23n@googlegroups.com>
 <heKH68GFJr4Zuf6lBozPJrb-StyBJPMNvmZL0xvKFBnBGVA3fVSgTLdWc-_8igYWX8z3zCGvzflH-CsRv0QCJQcfwizNyYXlBJa_Kteb2zg=@protonmail.com>
 <5b0331a5-4e94-465d-a51d-02166e2c1937n@googlegroups.com>
 <yt1O1F7NiVj-WkmnYeta1fSqCYNFx8h6OiJaTBmwhmJ2MWAZkmmjPlUST6FM7t6_-2NwWKdglWh77vcnEKA8swiAnQCZJY2SSCAh4DOKt2I=@protonmail.com>
 <be78e733-6e9f-4f4e-8dc2-67b79ddbf677n@googlegroups.com>
 <jJLDrYTXvTgoslhl1n7Fk9-pL1mMC-0k6gtoniQINmioJpzgtqrJ_WqyFZkLltsCUusnQ4jZ6HbvRC-mGuaUlDi3kcqcFHALd10-JQl-FMY=@protonmail.com>
 <9a4c4151-36ed-425a-a535-aa2837919a04n@googlegroups.com>
 <3f0064f9-54bd-46a7-9d9a-c54b99aca7b2n@googlegroups.com>
 <26b7321b-cc64-44b9-bc95-a4d8feb701e5n@googlegroups.com>
 <CALZpt+EwVyaz1=A6hOOycqFGJs+zxyYYocZixTJgVmzZezUs9Q@mail.gmail.com>
 <607a2233-ac12-4a80-ae4a-08341b3549b3n@googlegroups.com>
 <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com>
 <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com>
 <e2c61ee5-68c4-461e-a132-bb86a4c3e2ccn@googlegroups.com>
 <33dfd007-ac28-44a5-acee-cec4b381e854n@googlegroups.com>
 <CALZpt+Fs1U5f3S6_tR7AFfEMEkgBPSp3OaNEq+eqYoCSSYXD7g@mail.gmail.com>
 <a76b8dc5-d37f-4059-882b-207004874887n@googlegroups.com>
Subject: Re: [bitcoindev] Re: Great Consensus Cleanup Revival
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_200236_1021125574.1721324346776"
X-Original-Sender: antoine.riard@gmail.com
Precedence: list
Mailing-list: list bitcoindev@googlegroups.com; contact bitcoindev+owners@googlegroups.com
List-ID: <bitcoindev.googlegroups.com>
X-Google-Group-Id: 786775582512
List-Post: <https://groups.google.com/group/bitcoindev/post>, <mailto:bitcoindev@googlegroups.com>
List-Help: <https://groups.google.com/support/>, <mailto:bitcoindev+help@googlegroups.com>
List-Archive: <https://groups.google.com/group/bitcoindev
List-Subscribe: <https://groups.google.com/group/bitcoindev/subscribe>, <mailto:bitcoindev+subscribe@googlegroups.com>
List-Unsubscribe: <mailto:googlegroups-manage+786775582512+unsubscribe@googlegroups.com>,
 <https://groups.google.com/group/bitcoindev/subscribe>
X-Spam-Score: -0.5 (/)

------=_Part_200236_1021125574.1721324346776
Content-Type: multipart/alternative; 
	boundary="----=_Part_200237_850757959.1721324346776"

------=_Part_200237_850757959.1721324346776
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Eric,

> While at some level the block message buffer would generally be=20
referenced by one or more C pointers, the difference between a valid=20
coinbase input (i.e. with a "null point") and any other input, is not=20
nullptr vs. !nullptr. A "null point" is a 36 byte value, 32 0x00 byes=20
followed by 4 0xff bytes. In his infinite wisdom Satoshi decided it was=20
better (or easier) to serialize a first block tx (coinbase) with an input=
=20
containing an unusable script and pointing to an invalid [tx:index] tuple=
=20
(input point) as opposed to just not having any input. That invalid input=
=20
point is called a "null point", and of course cannot be pointed to by a=20
"null pointer". The coinbase must be identified by comparing those 36 bytes=
=20
to the well-known null point value (and if this does not match the Merkle=
=20
hash cannot have been type64 malleated).

Good for the clarification here, I had in mind the core's `CheckBlock` path=
=20
where the first block transaction pointer is dereferenced to verify if the=
=20
transaction is a coinbase (i.e a "null point" where the prevout is null).=
=20
Zooming out and back to my remark, I think this is correct that adding a=20
new 64 byte size check on all block transactions to detect block hash=20
invalidity could be a low memory overhead (implementation dependant),=20
rather than making that 64 byte check alone on the coinbase transaction as=
=20
in my understanding you're proposing.

> We call this type64 malleability (or malleation where it is not only=20
possible but occurs).

Yes, the problem which has been described as the lack of "domain=20
separation".

> The second one is the bip141 wtxid commitment in one of the coinbase=20
transaction `scriptpubkey` output, which is itself covered by a txid in the=
=20
merkle tree.

> While symmetry seems to imply that the witness commitment would be=20
malleable, just as the txs commitment, this is not the case. If the tx=20
commitment is correct it is computationally infeasible for the witness=20
commitment to be malleated, as the witness commitment incorporates each=20
full tx (with witness, sentinel, and marker). As such the block identifier,=
=20
which relies only on the header and tx commitment, is a sufficient=20
identifier. Yet it remains necessary to validate the witness commitment to=
=20
ensure that the correct witness data has been provided in the block message=
.
>=20
> The second type of malleability, in addition to type64, is what we call=
=20
type32. This is the consequence of duplicated trailing sets of txs (and=20
therefore tx hashes) in a block message. This is applicable to some but not=
=20
all blocks, as a function of the number of txs contained.

To precise more your statement in describing source of malleability. The=20
witness stack can be malleated altering the wtxid and yet still valid. I=20
think you can still have the case where you're feeded a block header with a=
=20
merkle root commitment deserializing to a valid coinbase transaction with=
=20
an invalid witness commitment. This is the case of a "block message with=20
valid header but malleatead committed valid tx data". Validation of the=20
witness commitment to ensure the correct witness data has been provided in=
=20
the block message is indeed necessary.

>> Background: A fully-validated block has established identity in its=20
block hash. However an invalid block message may include the same block=20
header, producing the same hash, but with any kind of nonsense following=20
the header. The purpose of the transaction and witness commitments is of=20
course to establish this identity, so these two checks are therefore=20
necessary even under checkpoint/milestone. And then of course the two=20
Merkle tree issues complicate the tx commitment (the integrity of the=20
witness commitment is assured by that of the tx commitment).
>>
>> So what does it mean to speak of a block hash derived from:
>> (1) a block message with an unparseable header?
>> (2) a block message with parseable but invalid header?
>> (3) a block message with valid header but unparseable tx data?
>> (4) a block message with valid header but parseable invalid uncommitted=
=20
tx data?
>> (5) a block message with valid header but parseable invalid malleated=20
committed tx data?
>> (6) a block message with valid header but parseable invalid unmalleated=
=20
committed tx data?
>> (7) a block message with valid header but uncommitted valid tx data?
>> (8) a block message with valid header but malleated committed valid tx=
=20
data?
>> (9) a block message with valid header but unmalleated committed valid tx=
=20
data?
>>
>> Note that only the #9 p2p block message contains an actual Bitcoin=20
block, the others are bogus messages. In all cases the message can be=20
sha256 hashed to establish the identity of the *message*. And if one's=20
objective is to reject repeating bogus messages, this might be a useful=20
strategy. It's already part of the p2p protocol, is orders of magnitude=20
cheaper to produce than a Merkle root, and has no identity issues.

> I think I mostly agree with the identity issue as laid out so far, there=
=20
is one caveat to add if you're considering identity caching as the problem=
=20
solved. A validation node might have to consider differently block messages=
=20
processed if they connect on the longest most PoW valid chain for which all=
=20
blocks have been validated. Or alternatively if they have to be added on a=
=20
candidate longest most PoW valid chain.

> Certainly an important consideration. We store both types. Once there is=
=20
a stronger candidate header chain we store the headers and proceed to=20
obtaining the blocks (if we don't already have them). The blocks are stored=
=20
in the same table; the confirmed vs. candidate indexes simply point to them=
=20
as applicable. It is feasible (and has happened twice) for two blocks to=20
share the very same coinbase tx, even with either/all bip30/34/90 active=20
(and setting aside future issues here for the sake of simplicity). This=20
remains only because two competing branches can have blocks at the same=20
height, and bip34 requires only height in the coinbase input script. This=
=20
therefore implies the same transaction but distinct blocks. It is however=
=20
infeasible for one block to exist in multiple distinct chains. In order for=
=20
this to happen two blocks at the same height must have the same coinbase=20
(ok), and also the same parent (ok). But this then means that they either=
=20
(1) have distinct identity due to another header property deviation, or (2)=
=20
are the same block with the same parent and are therefore in just one=20
chain. So I don't see an actual caveat. I'm not certain if this is the=20
ambiguity that you were referring to. If not please feel free to clarify.

If you assume no network partition and the no blocks more than 2h in the=20
future consensus rule, I cannot see how one block with no header property=
=20
deviation can exist in multiple distinct chains. The ambiguity I was=20
referring was about a different angle, if the design goal of introducing a=
=20
64 byte size check is to "it was about being able to cache the hash of a=20
(non-malleated) invalid block as permanently invalid to avoid=20
re-downloading and re-validating it", in my thinking we shall consider the=
=20
whole block headers caching strategy and be sure we don't get situations=20
where an attacker can attach a chain of low-pow block headers with=20
malleated committed valid tx data yielding a block invalidity at the end,=
=20
provoking as a side-effect a network-wide data download blowup. So I think=
=20
any implementation of the validation of a block validity, of which identity=
=20
is a sub-problem, should be strictly ordered by adequate proof-of-work=20
checks.

> We don't do this and I don't see how it would be relevant. If a peer=20
provides any invalid message or otherwise violates the protocol it is=20
simply dropped.
>=20
> The "problematic" that I'm referring to is the reliance on the block hash=
=20
as a message identifier, because it does not identify the message and=20
cannot be useful in an effectively unlimited number of zero-cost cases.

Historically, it was to isolate transaction-relay from block-relay to=20
optimistically harden in face of network partition, as this is easy to=20
infer transaction-relay topology with a lot of heuristics.

I think this is correct that block hash message cannot be relied on as it=
=20
cannot be useful in an unlimited number of zero-cost cases, as I was=20
pointing that bitcoin core partially mitigate that with discouraging=20
connections to block-relay peers servicing block messages=20
(`MaybePunishNodeForBlocks`).

> #4 and #5 refer to "uncommitted" and "malleated committed". It may not be=
=20
clear, but "uncommitted" means that the tx commitment is not valid (Merkle=
=20
root doesn't match the header's value) and "malleated committed" means that=
=20
the (matching) commitment cannot be relied upon because the txs represent=
=20
malleation, invalidating the identifier. So neither of these are usable=20
identifiers.
>=20
> It seems you may be referring to "unconfirmed" txs as opposed to=20
"uncommitted" txs. This doesn't pertain to tx storage or identifiers.=20
Neither #7 nor #8 are usable for the same reasons.
>=20
> I'm making no reference to tx malleability. This concerns only Merkle=20
tree (block hash) malleability, the two types described in detail in the=20
paper I referenced earlier, here again:
>=20
>=20
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/2019022=
5/a27d8837/attachment-0001.pdf

I believe somehow the bottleneck we're circling around is computationally=
=20
definining what are the "usable" identifiers for block messages.
The most straightforward answer to this question is the full block in one=
=20
single peer message, at least in my perspective.
Reality since headers first synchronization (`getheaders`), block=20
validation has been dissociated in steps for performance reasons, among=20
others.

> Again, this has no relation to tx hashes/identifiers. Libbitcoin has a tx=
=20
pool, we just don't store them in RAM (memory).

> I don't follow this. An invalid 64 byte tx consensus rule would=20
definitely not make it harder to exploit block message invalidity. In fact=
=20
it would just slow down validation by adding a redundant rule. Furthermore,=
=20
as I have detailed in a previous message, caching invalidity does=20
absolutely nothing to increase protection. In fact it makes the situation=
=20
materially worse.

Just to recall, in my understanding the proposal we're discussing is about=
=20
outlawing 64 bytes size transactions at the consensus-level to minimize=20
denial-of-service vectors during block validation. I think we're talking=20
about each other because the mempool already introduce a layer of caching=
=20
in bitcoin core, of which the result are re-used at block validation, such=
=20
as signature verification results. I'm not sure we can fully waive apart=20
performance considerations, though I agree implementation architecture=20
subsystems like mempool should only be a sideline considerations.

> No, this is not the case. As I detailed in my previous message, there is=
=20
no possible scenario where invalidation caching does anything but make the=
=20
situation materially worse.

I think this can be correct that invalidation caching make the situation=20
materially worse, or is denial-of-service neutral, as I believe a full node=
=20
is only trading space for time resources in matters of block messages=20
validation. I still believe such analysis, as detailed in your previous=20
message, would benefit to be more detailed.

> On the other hand, just dealing with parse failure on the spot by=20
introducing a leading pattern in the stream just inflates the size of p2p=
=20
messages, and the transaction-relay bandwidth cost.

> I think you misunderstood me. I am suggesting no change to serialization.=
=20
I can see how it might be unclear, but I said, "nothing precludes=20
incorporating a requirement for a necessary leading pattern in the stream."=
=20
I meant that the parser can simply incorporate the *requirement* that the=
=20
byte stream starts with a null input point. That identifies the malleation=
=20
or invalidity without a single hash operation and while only reading a=20
handful of bytes. No change to any messages.

Indeed, this is clearer with the re-explanation above about what you meant=
=20
by the "null point". In my understanding, you're suggesting the following=
=20
algorithm:
- receive transaction p2p messages
- deserialize transaction p2p messages
- if the transaction is a coinbase candidate, verify null input point
- if null input point pattern invalid, reject the transaction

If I'm understanding correctly, the last rule has for effect to constraint=
=20
the transaction space that can be used to brute-force and mount a Merkle=20
root forgery with a 64-byte coinbase transaction.

As described in the 3.1.1 of the paper:=20
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/2019022=
5/a27d8837/attachment-0001.pdf

> I'm referring to DoS mitigation (the only relevant security consideration=
=20
here). I'm pointing out that invalidity caching is pointless in all cases,=
=20
and in this case is the most pointless as type64 malleation is the cheapest=
=20
of all invalidity to detect. I would prefer that all bogus blocks sent to=
=20
my node are of this type. The worst types of invalidity detection have no=
=20
mitigation and from a security standpoint are counterproductive to cache.=
=20
I'm describing what overall is actually not a tradeoff. It's all negative=
=20
and no positive.

I think we're both discussing the same issue about DoS mitigation for sure.=
=20
Again, I think that saying the "invalidity caching" is pointless in all=20
cases cannot be fully grounded as a statement without precising (a) what is=
=20
the internal cache(s) layout of the full node processing block messages and=
=20
(b) the sha256 mining resources available during N difficulty period and if=
=20
any miner engage in self-fish mining like strategy.

About (a), I'll maintain my point I think it's a classic time-space=20
trade-off to ponder in function of the internal cache layouts. About (b) I=
=20
think we''ll be back to the headers synchronization strategy as implemented
by a full node to discuss if they're exploitable asymmetries for self-fish=
=20
mining like strategies.

If you can give a pseudo-code example of the "null point" validation=20
implementation in libbitcoin code (?) I think this can make the=20
conversation more concrete on the caching aspect.

> Rust has its own set of problems. No need to get into a language Jihad=20
here. My point was to clarify that the particular question was not about a=
=20
C (or C++) null pointer value, either on the surface or underneath an=20
abstraction.

Thanks for the additional comments on libbitcoin usage of dependencies, yes=
=20
I don't think there is a need to get into a language jihad here. It's just=
=20
like all languages have their memory model (stack, dynamic alloc, smart=20
pointers, etc) and when you're talking about performance it's useful to=20
have their minds, imho.

Best,
Antoine
ots hash: 058d7b3adb154a3e64d5f8ccf1944903bcd0c49dbb525f7212adf4f7ac7f8c55
Le mardi 9 juillet 2024 =C3=A0 02:16:20 UTC+1, Eric Voskuil a =C3=A9crit :

> > This is why we don't use C - unsafe, unclear, unnecessary.
>
> Actually, I think libbitcoin is using its own maintained fork of=20
> secp256k1, which is written in C.
>
>
> We do not maintain secp256k1 code. For years that library carried the sam=
e=20
> version, despite regular breaking changes to its API. This compelled us t=
o=20
> place these different versions on distinct git branches. When it finally=
=20
> became versioned we started phasing this unfortunate practice out.
>
> Out of the 10 repositories and at least half million lines of code, apart=
=20
> from an embedded copy of qrencode that we don=E2=80=99t independently mai=
ntain, I=20
> believe there is only one .c file in use in the entire project - the=20
> database mmap.c implementation for msvc builds. This includes hash=20
> functions, with vectorization optimizations, etc.
> =20
>
> For sure, I wouldn't recommend using C across a whole codebase as it's no=
t=20
> memory-safe (euphemism) though it's still un-match if you wish to=20
> understand low-level memory management in hot paths.
>
>
> This is a commonly held misperception.
>
> It can be easier to use C++ or Rust, though it doesn't mean it will be as=
=20
> (a) perf optimal and (b) hardened against side-channels.
>
>
> Rust has its own set of problems. No need to get into a language Jihad=20
> here. My point was to clarify that the particular question was not about =
a=20
> C (or C++) null pointer value, either on the surface or underneath an=20
> abstraction.
>
> e=20
>

--=20
You received this message because you are subscribed to the Google Groups "=
Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/=
bitcoindev/ac6cc3b8-43e5-4cd6-aabe-f5ffc4672812n%40googlegroups.com.

------=_Part_200237_850757959.1721324346776
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div>Hi Eric,<br /><br />&gt; While at some level the block message buffer =
would generally be referenced by one or more C pointers, the difference bet=
ween a valid coinbase input (i.e. with a "null point") and any other input,=
 is not nullptr vs. !nullptr. A "null point" is a 36 byte value, 32 0x00 by=
es followed by 4 0xff bytes. In his infinite wisdom Satoshi decided it was =
better (or easier) to serialize a first block tx (coinbase) with an input c=
ontaining an unusable script and pointing to an invalid [tx:index] tuple (i=
nput point) as opposed to just not having any input. That invalid input poi=
nt is called a "null point", and of course cannot be pointed to by a "null =
pointer". The coinbase must be identified by comparing those 36 bytes to th=
e well-known null point value (and if this does not match the Merkle hash c=
annot have been type64 malleated).<br /><br />Good for the clarification he=
re, I had in mind the core's `CheckBlock` path where the first block transa=
ction pointer is dereferenced to verify if the transaction is a coinbase (i=
.e a "null point" where the prevout is null). Zooming out and back to my re=
mark, I think this is correct that adding a new 64 byte size check on all b=
lock transactions to detect block hash invalidity could be a low memory ove=
rhead (implementation dependant), rather than making that 64 byte check alo=
ne on the coinbase transaction as in my understanding you're proposing.<br =
/><br />&gt; We call this type64 malleability (or malleation where it is no=
t only possible but occurs).<br /><br />Yes, the problem which has been des=
cribed as the lack of "domain separation".<br /><br />&gt; The second one i=
s the bip141 wtxid commitment in one of the coinbase transaction `scriptpub=
key` output, which is itself covered by a txid in the merkle tree.<br /><br=
 />&gt; While symmetry seems to imply that the witness commitment would be =
malleable, just as the txs commitment, this is not the case. If the tx comm=
itment is correct it is computationally infeasible for the witness commitme=
nt to be malleated, as the witness commitment incorporates each full tx (wi=
th witness, sentinel, and marker). As such the block identifier, which reli=
es only on the header and tx commitment, is a sufficient identifier. Yet it=
 remains necessary to validate the witness commitment to ensure that the co=
rrect witness data has been provided in the block message.<br />&gt; <br />=
&gt; The second type of malleability, in addition to type64, is what we cal=
l type32. This is the consequence of duplicated trailing sets of txs (and t=
herefore tx hashes) in a block message. This is applicable to some but not =
all blocks, as a function of the number of txs contained.<br /><br />To pre=
cise more your statement in describing source of malleability. The witness =
stack can be malleated altering the wtxid and yet still valid. I think you =
can still have the case where you're feeded a block header with a merkle ro=
ot commitment deserializing to a valid coinbase transaction with an invalid=
 witness commitment. This is the case of a "block message with valid header=
 but malleatead committed valid tx data". Validation of the witness commitm=
ent to ensure the correct witness data has been provided in the block messa=
ge is indeed necessary.<br /><br />&gt;&gt; Background: A fully-validated b=
lock has established identity in its block hash. However an invalid block m=
essage may include the same block header, producing the same hash, but with=
 any kind of nonsense following the header. The purpose of the transaction =
and witness commitments is of course to establish this identity, so these t=
wo checks are therefore necessary even under checkpoint/milestone. And then=
 of course the two Merkle tree issues complicate the tx commitment (the int=
egrity of the witness commitment is assured by that of the tx commitment).<=
br />&gt;&gt;<br />&gt;&gt; So what does it mean to speak of a block hash d=
erived from:<br />&gt;&gt; (1) a block message with an unparseable header?<=
br />&gt;&gt; (2) a block message with parseable but invalid header?<br />&=
gt;&gt; (3) a block message with valid header but unparseable tx data?<br /=
>&gt;&gt; (4) a block message with valid header but parseable invalid uncom=
mitted tx data?<br />&gt;&gt; (5) a block message with valid header but par=
seable invalid malleated committed tx data?<br />&gt;&gt; (6) a block messa=
ge with valid header but parseable invalid unmalleated committed tx data?<b=
r />&gt;&gt; (7) a block message with valid header but uncommitted valid tx=
 data?<br />&gt;&gt; (8) a block message with valid header but malleated co=
mmitted valid tx data?<br />&gt;&gt; (9) a block message with valid header =
but unmalleated committed valid tx data?<br />&gt;&gt;<br />&gt;&gt; Note t=
hat only the #9 p2p block message contains an actual Bitcoin block, the oth=
ers are bogus messages. In all cases the message can be sha256 hashed to es=
tablish the identity of the *message*. And if one's objective is to reject =
repeating bogus messages, this might be a useful strategy. It's already par=
t of the p2p protocol, is orders of magnitude cheaper to produce than a Mer=
kle root, and has no identity issues.<br /><br />&gt; I think I mostly agre=
e with the identity issue as laid out so far, there is one caveat to add if=
 you're considering identity caching as the problem solved. A validation no=
de might have to consider differently block messages processed if they conn=
ect on the longest most PoW valid chain for which all blocks have been vali=
dated. Or alternatively if they have to be added on a candidate longest mos=
t PoW valid chain.<br /><br />&gt; Certainly an important consideration. We=
 store both types. Once there is a stronger candidate header chain we store=
 the headers and proceed to obtaining the blocks (if we don't already have =
them). The blocks are stored in the same table; the confirmed vs. candidate=
 indexes simply point to them as applicable. It is feasible (and has happen=
ed twice) for two blocks to share the very same coinbase tx, even with eith=
er/all bip30/34/90 active (and setting aside future issues here for the sak=
e of simplicity). This remains only because two competing branches can have=
 blocks at the same height, and bip34 requires only height in the coinbase =
input script. This therefore implies the same transaction but distinct bloc=
ks. It is however infeasible for one block to exist in multiple distinct ch=
ains. In order for this to happen two blocks at the same height must have t=
he same coinbase (ok), and also the same parent (ok). But this then means t=
hat they either (1) have distinct identity due to another header property d=
eviation, or (2) are the same block with the same parent and are therefore =
in just one chain. So I don't see an actual caveat. I'm not certain if this=
 is the ambiguity that you were referring to. If not please feel free to cl=
arify.<br /><br />If you assume no network partition and the no blocks more=
 than 2h in the future consensus rule, I cannot see how one block with no h=
eader property deviation can exist in multiple distinct chains. The ambigui=
ty I was referring was about a different angle, if the design goal of intro=
ducing a 64 byte size check is to "it was about being able to cache the has=
h of a (non-malleated) invalid block as permanently invalid to avoid re-dow=
nloading and re-validating it", in my thinking we shall consider the whole =
block headers caching strategy and be sure we don't get situations where an=
 attacker can attach a chain of low-pow block headers with malleated commit=
ted valid tx data yielding a block invalidity at the end, provoking as a si=
de-effect a network-wide data download blowup. So I think any implementatio=
n of the validation of a block validity, of which identity is a sub-problem=
, should be strictly ordered by adequate proof-of-work checks.<br /><br />&=
gt; We don't do this and I don't see how it would be relevant. If a peer pr=
ovides any invalid message or otherwise violates the protocol it is simply =
dropped.<br />&gt; <br />&gt; The "problematic" that I'm referring to is th=
e reliance on the block hash as a message identifier, because it does not i=
dentify the message and cannot be useful in an effectively unlimited number=
 of zero-cost cases.<br /><br />Historically, it was to isolate transaction=
-relay from block-relay to optimistically harden in face of network partiti=
on, as this is easy to infer transaction-relay topology with a lot of heuri=
stics.<br /><br />I think this is correct that block hash message cannot be=
 relied on as it cannot be useful in an unlimited number of zero-cost cases=
, as I was pointing that bitcoin core partially mitigate that with discoura=
ging connections to block-relay peers servicing block messages (`MaybePunis=
hNodeForBlocks`).<br /><br />&gt; #4 and #5 refer to "uncommitted" and "mal=
leated committed". It may not be clear, but "uncommitted" means that the tx=
 commitment is not valid (Merkle root doesn't match the header's value) and=
 "malleated committed" means that the (matching) commitment cannot be relie=
d upon because the txs represent malleation, invalidating the identifier. S=
o neither of these are usable identifiers.<br />&gt; <br />&gt; It seems yo=
u may be referring to "unconfirmed" txs as opposed to "uncommitted" txs. Th=
is doesn't pertain to tx storage or identifiers. Neither #7 nor #8 are usab=
le for the same reasons.<br />&gt; <br />&gt; I'm making no reference to tx=
 malleability. This concerns only Merkle tree (block hash) malleability, th=
e two types described in detail in the paper I referenced earlier, here aga=
in:<br />&gt; <br />&gt; https://lists.linuxfoundation.org/pipermail/bitcoi=
n-dev/attachments/20190225/a27d8837/attachment-0001.pdf<br /><br />I believ=
e somehow the bottleneck we're circling around is computationally defininin=
g what are the "usable" identifiers for block messages.<br />The most strai=
ghtforward answer to this question is the full block in one single peer mes=
sage, at least in my perspective.<br />Reality since headers first synchron=
ization (`getheaders`), block validation has been dissociated in steps for =
performance reasons, among others.<br /><br />&gt; Again, this has no relat=
ion to tx hashes/identifiers. Libbitcoin has a tx pool, we just don't store=
 them in RAM (memory).<br /><br />&gt; I don't follow this. An invalid 64 b=
yte tx consensus rule would definitely not make it harder to exploit block =
message invalidity. In fact it would just slow down validation by adding a =
redundant rule. Furthermore, as I have detailed in a previous message, cach=
ing invalidity does absolutely nothing to increase protection. In fact it m=
akes the situation materially worse.<br /><br />Just to recall, in my under=
standing the proposal we're discussing is about outlawing 64 bytes size tra=
nsactions at the consensus-level to minimize denial-of-service vectors duri=
ng block validation. I think we're talking about each other because the mem=
pool already introduce a layer of caching in bitcoin core, of which the res=
ult are re-used at block validation, such as signature verification results=
. I'm not sure we can fully waive apart performance considerations, though =
I agree implementation architecture subsystems like mempool should only be =
a sideline considerations.</div><div><br />&gt; No, this is not the case. A=
s I detailed in my previous message, there is no possible scenario where in=
validation caching does anything but make the situation materially worse.<b=
r /><br />I think this can be correct that invalidation caching make the si=
tuation materially worse, or is denial-of-service neutral, as I believe a f=
ull node is only trading space for time resources in matters of block messa=
ges validation. I still believe such analysis, as detailed in your previous=
 message, would benefit to be more detailed.<br /><br />&gt; On the other h=
and, just dealing with parse failure on the spot by introducing a leading p=
attern in the stream just inflates the size of p2p messages, and the transa=
ction-relay bandwidth cost.<br /><br />&gt; I think you misunderstood me. I=
 am suggesting no change to serialization. I can see how it might be unclea=
r, but I said, "nothing precludes incorporating a requirement for a necessa=
ry leading pattern in the stream." I meant that the parser can simply incor=
porate the *requirement* that the byte stream starts with a null input poin=
t. That identifies the malleation or invalidity without a single hash opera=
tion and while only reading a handful of bytes. No change to any messages.<=
br /><br />Indeed, this is clearer with the re-explanation above about what=
 you meant by the "null point". In my understanding, you're suggesting the =
following algorithm:<br />- receive transaction p2p messages<br />- deseria=
lize transaction p2p messages<br />- if the transaction is a coinbase candi=
date, verify null input point<br />- if null input point pattern invalid, r=
eject the transaction<br /><br />If I'm understanding correctly, the last r=
ule has for effect to constraint the transaction space that can be used to =
brute-force and mount a Merkle root forgery with a 64-byte coinbase transac=
tion.</div><div><br /></div><div>As described in the 3.1.1 of the paper: ht=
tps://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20190225/=
a27d8837/attachment-0001.pdf<br /><br />&gt; I'm referring to DoS mitigatio=
n (the only relevant security consideration here). I'm pointing out that in=
validity caching is pointless in all cases, and in this case is the most po=
intless as type64 malleation is the cheapest of all invalidity to detect. I=
 would prefer that all bogus blocks sent to my node are of this type. The w=
orst types of invalidity detection have no mitigation and from a security s=
tandpoint are counterproductive to cache. I'm describing what overall is ac=
tually not a tradeoff. It's all negative and no positive.<br /><br />I thin=
k we're both discussing the same issue about DoS mitigation for sure. Again=
, I think that saying the "invalidity caching" is pointless in all cases ca=
nnot be fully grounded as a statement without precising (a) what is the int=
ernal cache(s) layout of the full node processing block messages and (b) th=
e sha256 mining resources available during N difficulty period and if any m=
iner engage in self-fish mining like strategy.<br /><br />About (a), I'll m=
aintain my point I think it's a classic time-space trade-off to ponder in f=
unction of the internal cache layouts. About (b) I think we''ll be back to =
the headers synchronization strategy as implemented<br />by a full node to =
discuss if they're exploitable asymmetries for self-fish mining like strate=
gies.<br /><br />If you can give a pseudo-code example of the "null point" =
validation implementation in libbitcoin code (?) I think this can make the =
conversation more concrete on the caching aspect.<br /><br />&gt; Rust has =
its own set of problems. No need to get into a language Jihad here. My poin=
t was to clarify that the particular question was not about a C (or C++) nu=
ll pointer value, either on the surface or underneath an abstraction.<br />=
<br />Thanks for the additional comments on libbitcoin usage of dependencie=
s, yes I don't think there is a need to get into a language jihad here. It'=
s just like all languages have their memory model (stack, dynamic alloc, sm=
art pointers, etc) and when you're talking about performance it's useful to=
 have their minds, imho.<br /></div><div><br />Best,<br />Antoine</div><div=
>ots hash:=C2=A0058d7b3adb154a3e64d5f8ccf1944903bcd0c49dbb525f7212adf4f7ac7=
f8c55<br /></div><div class=3D"gmail_quote"><div dir=3D"auto" class=3D"gmai=
l_attr">Le mardi 9 juillet 2024 =C3=A0 02:16:20 UTC+1, Eric Voskuil a =C3=
=A9crit=C2=A0:<br/></div><blockquote class=3D"gmail_quote" style=3D"margin:=
 0 0 0 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;=
"><div><blockquote style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;=
border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex=
"><div dir=3D"ltr"><div>&gt; This is why we don&#39;t use C - unsafe, uncle=
ar, unnecessary.<br></div><div><br></div></div><div dir=3D"ltr"><div>Actual=
ly, I think libbitcoin is using its own maintained fork of secp256k1, which=
 is written in C.</div></div></blockquote><div><br></div></div><div><div>We=
 do not maintain secp256k1 code. For years that library carried the same ve=
rsion, despite regular breaking changes to its API. This compelled us to pl=
ace these different versions on distinct git branches. When<span>=C2=A0it f=
inally became versioned we started phasing this unfortunate practice out.</=
span></div><div><span><br></span></div><div><span>Out of the 10 repositorie=
s and at least half million lines of code, apart from an embedded copy of q=
rencode that we don=E2=80=99t independently maintain, I believe there is on=
ly one .c file in use in the entire project - the database mmap.c implement=
ation for msvc builds. This includes hash functions, with vectorization opt=
imizations, etc.</span></div></div><div><div>=C2=A0</div><blockquote style=
=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;=
border-left-color:rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div>=
For sure, I wouldn&#39;t recommend using C across a whole codebase as it&#3=
9;s not memory-safe (euphemism) though it&#39;s still un-match if you wish =
to understand low-level memory management in hot paths.</div></div></blockq=
uote><div><br></div></div><div><div>This is a commonly held misperception.<=
/div></div><div><div><br></div><blockquote style=3D"margin:0px 0px 0px 0.8e=
x;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,2=
04,204);padding-left:1ex"><div dir=3D"ltr"><div>It can be easier to use C++=
 or Rust, though it doesn&#39;t mean it will be as (a) perf optimal and (b)=
 hardened against side-channels.</div></div></blockquote><div><br></div></d=
iv><div><div>Rust has its own set of problems. No need to get into a langua=
ge Jihad here. My point was to clarify that the particular question was not=
 about a C (or C++) null pointer value, either on the surface or underneath=
 an abstraction.</div><div><br></div><div>e=C2=A0</div></div></blockquote><=
/div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;Bitcoin Development Mailing List&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:bitcoindev+unsubscribe@googlegroups.com">bitcoind=
ev+unsubscribe@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/d/msgid/bitcoindev/ac6cc3b8-43e5-4cd6-aabe-f5ffc4672812n%40googlegroups.=
com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com/d/msg=
id/bitcoindev/ac6cc3b8-43e5-4cd6-aabe-f5ffc4672812n%40googlegroups.com</a>.=
<br />

------=_Part_200237_850757959.1721324346776--

------=_Part_200236_1021125574.1721324346776--