1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
|
Delivery-date: Thu, 01 May 2025 23:48:17 -0700
Received: from mail-yb1-f185.google.com ([209.85.219.185])
by mail.fairlystable.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
(Exim 4.94.2)
(envelope-from <bitcoindev+bncBCJNLJPWXAIBBJ6V2HAAMGQEYMUT62A@googlegroups.com>)
id 1uAkBz-0004s0-Ty
for bitcoindev@gnusha.org; Thu, 01 May 2025 23:48:17 -0700
Received: by mail-yb1-f185.google.com with SMTP id 3f1490d57ef6-e5740c858besf2450781276.2
for <bitcoindev@gnusha.org>; Thu, 01 May 2025 23:48:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=googlegroups.com; s=20230601; t=1746168490; x=1746773290; darn=gnusha.org;
h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
:list-id:mailing-list:precedence:x-original-sender:mime-version
:subject:references:in-reply-to:message-id:to:from:date:sender:from
:to:cc:subject:date:message-id:reply-to;
bh=vd1pM1x674wLmpfMpkQEJ8Ylvw6T98jBJRRZ+lked38=;
b=P5dUpP2wEwQf+118IdPZGxR5tVkWgI2kx9QXAJYhDWkJbTdEUn3JVfyQ0hjysNSxqY
Xak/+/th+9Ho7NPhSFu129lnoMIU+d67vbyeONzrobpHU8M9/zmAOaUqILngNi4vFJw4
epO+vmRSTkbXoVXuKcO5ELVZJFICH+RHBelE7Om0t2Qmh/jTiuEogaOn6iCL8D+A/E12
dLElo8pWetacxafCVXZDEpri4XY9UKd2W3oUIekpUXJ6NuqpjiLgS9YOrTmIkmaKyAHT
WfyVquYN7JjXUNWRYmysEFDMZCwBFewCmVfmbD+ppUfe+lxteCZ525UKh7jtKeNCeAJx
tPoQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1746168490; x=1746773290; darn=gnusha.org;
h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
:list-id:mailing-list:precedence:x-original-sender:mime-version
:subject:references:in-reply-to:message-id:to:from:date:from:to:cc
:subject:date:message-id:reply-to;
bh=vd1pM1x674wLmpfMpkQEJ8Ylvw6T98jBJRRZ+lked38=;
b=O8sgW7Me6d7+ISl+Uj81OC3whHni0Xwpt38bOnrSBKRayNlK2/4yRCUhdoqfpGEggH
58Tk2NMeiFSjirnp0XNvwqHoLVfLun5VlhUosn8Eh5Fkm44UOq3hExGlCJ96QhdEOLl+
7MgvaaHYb7DVKZHcpXW2jV5NA5a/ivfm5RXFEKJaQh0+kkZng/a4gxkNHEl5hv6eJz2X
Nec4gzw5eiApY6aibZGaq5muyHo7Em02Fk8jFGROvjF2lBxmUm+OlEo8+4aw/7/ekDMf
FrNryNKL93/yr/UMWClOaLmsn1R85BTPSI/d+uSHmsk+uedpOKzTVzvY6AdrAsGTADQi
8Dsg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1746168490; x=1746773290;
h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
:list-id:mailing-list:precedence:x-original-sender:mime-version
:subject:references:in-reply-to:message-id:to:from:date:x-beenthere
:x-gm-message-state:sender:from:to:cc:subject:date:message-id
:reply-to;
bh=vd1pM1x674wLmpfMpkQEJ8Ylvw6T98jBJRRZ+lked38=;
b=EF+0p2sv9x4i+PDgWfKSb4pmW430Aq3vUB4z5m4lGEwUGQZ5mv1RwTLVAbjcoCLkNO
D4F3ycDQqCLgsXm82uIX7hGXXybQgXAdgcNGQaiUvCPtOtJyGSacVTZr/6BlQQAXFyYv
esaU15PbON1QxtkN3q0HbxFys7LyO3Q9vfNooGQNPyCpE9yVL9kXXojflbyPfQwjuH2E
xzHxSZ43D75EF+kkobU7BuMyuIgIFW8DnwuqNavHd0669AI+ifN9f6reNppdgMzUOt8i
YlRN8TKhq5sM7KJC5EeVJ52smXmdu5f4TN4+JCJ8jKd35oijOIA7oTEGU8IUzQY5kEwM
ktfg==
Sender: bitcoindev@googlegroups.com
X-Forwarded-Encrypted: i=1; AJvYcCWc4L6hGuEGz8A45BCSDQdaTYVd4lW7hJdX5CVPAGFsymYqOk/NqdSAIefSNX0e0Y9BtYUay2L43Z9V@gnusha.org
X-Gm-Message-State: AOJu0Yy/BUeO4qd2VfTdxAYVHqnwTs11BF6RxnZiALifRy6zqFr1MblS
Hc6Lbxi5BPqUFxxItnYpCOxkIjFKpOxKopnrvbSNSUtTv8qaTCzD
X-Google-Smtp-Source: AGHT+IEKa31mmaFHkPq1T1hJQcYOvfmWZG7mgmTgQfr/EV9QvKz8hIi+9csItDgcCF58AHKxrVdi1A==
X-Received: by 2002:a05:6902:4812:b0:e6d:e985:5eb1 with SMTP id 3f1490d57ef6-e75655f7e32mr2322537276.38.1746168489852;
Thu, 01 May 2025 23:48:09 -0700 (PDT)
X-BeenThere: bitcoindev@googlegroups.com; h=AVT/gBFFpSGVHdbkGHOAro2TNHahFIkTN/sevrsKlVDefCDVFg==
Received: by 2002:a25:7bc1:0:b0:e75:60d4:3256 with SMTP id 3f1490d57ef6-e7560d43323ls165585276.0.-pod-prod-00-us;
Thu, 01 May 2025 23:48:06 -0700 (PDT)
X-Received: by 2002:a05:690c:6892:b0:702:5886:7804 with SMTP id 00721157ae682-708bcf646bcmr68393037b3.19.1746168486626;
Thu, 01 May 2025 23:48:06 -0700 (PDT)
Received: by 2002:a81:f801:0:b0:6ef:590d:3213 with SMTP id 00721157ae682-708cfb0b4cems7b3;
Thu, 1 May 2025 23:47:22 -0700 (PDT)
X-Received: by 2002:a05:690c:a8f:b0:6ff:2777:92b7 with SMTP id 00721157ae682-708ced7059bmr23985057b3.9.1746168441212;
Thu, 01 May 2025 23:47:21 -0700 (PDT)
Date: Thu, 1 May 2025 23:47:20 -0700 (PDT)
From: Greg Maxwell <gmaxwell@gmail.com>
To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com>
Message-Id: <cc2dfa79-89f0-4170-9725-894ea189a0e2n@googlegroups.com>
In-Reply-To: <CAPv7TjaM0tfbcBTRa0_713Bk6Y9jr+ShOC1KZi2V3V2zooTXyg@mail.gmail.com>
References: <CAPv7TjaM0tfbcBTRa0_713Bk6Y9jr+ShOC1KZi2V3V2zooTXyg@mail.gmail.com>
Subject: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_32778_194082566.1746168440844"
X-Original-Sender: gmaxwell@gmail.com
Precedence: list
Mailing-list: list bitcoindev@googlegroups.com; contact bitcoindev+owners@googlegroups.com
List-ID: <bitcoindev.googlegroups.com>
X-Google-Group-Id: 786775582512
List-Post: <https://groups.google.com/group/bitcoindev/post>, <mailto:bitcoindev@googlegroups.com>
List-Help: <https://groups.google.com/support/>, <mailto:bitcoindev+help@googlegroups.com>
List-Archive: <https://groups.google.com/group/bitcoindev
List-Subscribe: <https://groups.google.com/group/bitcoindev/subscribe>, <mailto:bitcoindev+subscribe@googlegroups.com>
List-Unsubscribe: <mailto:googlegroups-manage+786775582512+unsubscribe@googlegroups.com>,
<https://groups.google.com/group/bitcoindev/subscribe>
X-Spam-Score: -0.5 (/)
------=_Part_32778_194082566.1746168440844
Content-Type: multipart/alternative;
boundary="----=_Part_32779_1552355224.1746168440844"
------=_Part_32779_1552355224.1746168440844
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
This sounds like an excellent idea that preserves the assume valid-like=20
security properties but with more performance and more optimization=20
oppturnities.
I like particularly -- if I understand it correctly-- that the hints=20
themselves are not security relevant, if they're wrong you'll just fail=20
rather than end up with something incorrect. Also I like the lack of=20
basically anything being normative, it's easier to feel comfortable with=20
something when you won't be stuck with it forever... the whole scheme could=
=20
just be reworked every version with no particular harm because its effects=
=20
are all ephemeral. At worst you might have problems if you started IBD=20
with one version and tried to complete it with another.
I haven't seen any code, but if hash() is a MD style hash function such as=
=20
sha256 you should place the salt first so that it can be absorbed then the=
=20
midstate reused. For sha256 you could get potentially double the hashing=20
performance and I assume/hope that hash is actually fairly high in the=20
profile cost.. maybe more like a 1/3 improvement considering the size of=20
the entry, care should be taken to try to minimize compression function=20
runs. ... but at least the utxo representation there is entirely=20
implementation defined.
You may be able to optimize the size of the hints further with the=20
observation that false positives are harmless as long as they're rare, as=
=20
in you can save some extra txouts and if you later find them being spent,=
=20
then just go ahead and remove them. So for example ribbon filters give you=
=20
a memory space and read efficient hash table that is constructed by solving=
=20
a linear system to make sure all the inputs match. One could solve for=20
successively larger filters to target a specific false positive rate. (I=
=20
mean just a 2,4 cuckoo filter or similar would also work, but that's two=20
memory accesses required and you don't actually need runtime modification)=
.
On Wednesday, April 9, 2025 at 10:11:29=E2=80=AFAM UTC Ruben Somsen wrote:
> Hi everyone,
>
> SwiftSync is a new validation method that allows for near-stateless, full=
y=20
> parallelizable validation of the Bitcoin blockchain via hints about which=
=20
> outputs remain unspent (<100MB total). All other inputs/outputs are=20
> efficiently crossed off inside a single hash aggregate that only reaches=
=20
> zero if validation was successful and the hints were correct.
>
> The main observation is that it can be much easier to validate that a=20
> given UTXO set is correct than to compute it yourself. It allows us to no=
=20
> longer require a stateful moment-to-moment UTXO set during IBD and allows=
=20
> everything to be order independent. I'll briefly summarize the protocol,=
=20
> before sharing the link to the full write-up.
>
> Each output gets a boolean hint (e.g. committed into Bitcoin Core) about=
=20
> whether or not it will still be in the UTXO set after validation complete=
s.=20
> If it does, we write it to disk (append-only - it won't be used until=20
> SwiftSync finishes). If it does not, we hash the UTXO data and add it to =
an=20
> aggregate. For each input, we once again hash the UTXO data and remove it=
=20
> from the aggregate.
>
> At the end, for every added output there should have been exactly one=20
> removed input, bringing the end total of the aggregate to zero. If this i=
s=20
> indeed the case, we will have validated that the hints, and the resulting=
=20
> UTXO set, were correct.
>
> E.g. For spent outputs A, B and inputs C, D we calculate=20
> hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) -=20
> hash(UTXO_D||salt) =3D=3D 0 (proving (A=3D=3DC && B=3D=3DD) || (A=3D=3DD =
&& B=3D=3DC)).
>
> There is one missing step. The UTXO data is only available when processin=
g=20
> the output, but not when processing the input. We resolve this by either=
=20
> downloading the outputs that were spent for each block (equivalent to the=
=20
> undo data, maybe 10-15% of a block), or we lean on assumevalid, making it=
=20
> sufficient to only hash the outpoints (which are available in both the=20
> output and input) rather than the full UTXO data.
>
> Ignoring bandwidth, the expectation is that the speedup will be most=20
> significant on either RAM limited devices and/or devices with many CPU=20
> cores. Initial PoC benchmarks (thanks to theStack) show a 5.28x speed-up,=
=20
> while currently still being largely sequential.
>
> Many more details are in the full write-up:
> https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd
>
> It will answer the following questions (and more):
>
> - How the hash aggregate can be made secure against the generalized=20
> birthday problem
> - How exactly assumevalid is utilized and what the implications are
> - How we can still check for inflation when we skip the amounts with=20
> assumevalid
> - How we can validate transaction order while doing everything in paralle=
l
> - How we can perform the BIP30 duplicate output check without the UTXO se=
t
> - How this all relates to assumeutxo
>
> To my knowledge, every validation check involving the UTXO set is covered=
,=20
> but I'd be curious to hear if anything was overlooked or if you spot any=
=20
> other issues.
>
> Thanks for reading, and thanks to everyone who provided invaluable=20
> feedback while the idea was coming together.
>
> -- Ruben Somsen
>
--=20
You received this message because you are subscribed to the Google Groups "=
Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/=
cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com.
------=_Part_32779_1552355224.1746168440844
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div>This sounds like an excellent idea that preserves the assume valid-lik=
e security properties but with more performance and more optimization opptu=
rnities.</div><div><br /></div><div>I like particularly -- if I understand =
it correctly-- that the hints themselves are not security relevant, if they=
're wrong you'll just fail rather than end up with something incorrect.=C2=
=A0 Also I like the lack of basically anything being normative, it's easier=
to feel comfortable with something when you won't be stuck with it forever=
... the whole scheme could just be reworked every version with no particula=
r harm because its effects are all ephemeral.=C2=A0 At worst you might have=
problems if you started IBD with one version and tried to complete it with=
another.</div><div><br /></div><div>I haven't seen any code,=C2=A0 but if =
hash() is a MD style hash function such as sha256 you should place the salt=
first so that it can be absorbed then the midstate reused. For sha256 you =
could get potentially double the hashing performance and I assume/hope that=
hash is actually fairly high in the profile cost.. maybe more like a 1/3 i=
mprovement considering the size of the entry, care should be taken to try t=
o minimize compression function runs. ... but at least the utxo representat=
ion there is entirely implementation defined.</div><div><br /></div><div>Yo=
u may be able to optimize the size of the hints further with the observatio=
n that false positives are harmless as long as they're rare,=C2=A0 as in yo=
u can save some extra txouts and if you later find them being spent, then j=
ust go ahead and remove them.=C2=A0 So for example ribbon filters give you =
a memory space and read efficient hash table that is constructed by solving=
a linear system to make sure all the inputs match.=C2=A0 One could solve f=
or successively larger filters to target a specific false positive rate.=C2=
=A0 (I mean just a 2,4 cuckoo filter or similar would also work, but that's=
two memory accesses required and=C2=A0 you don't actually need runtime mod=
ification).</div><div><br /></div><div><br /></div><div><br /></div><br /><=
div class=3D"gmail_quote"><div dir=3D"auto" class=3D"gmail_attr">On Wednesd=
ay, April 9, 2025 at 10:11:29=E2=80=AFAM UTC Ruben Somsen wrote:<br/></div>=
<blockquote class=3D"gmail_quote" style=3D"margin: 0 0 0 0.8ex; border-left=
: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div dir=3D"ltr"><div>H=
i everyone,</div><div><br></div>SwiftSync is a new validation method that a=
llows for near-stateless, fully parallelizable validation of the Bitcoin bl=
ockchain via hints about which outputs remain unspent (<100MB total). Al=
l other inputs/outputs are efficiently crossed off inside a single hash agg=
regate that only reaches zero if validation was successful and the hints we=
re correct.<br><br>The main observation is that it can be much easier to va=
lidate that a given UTXO set is correct than to compute it yourself. It all=
ows us to no longer require a stateful moment-to-moment UTXO set during IBD=
and allows everything to be order independent. I'll briefly summarize =
the protocol, before sharing the link to the full write-up.<br><br>Each out=
put gets a boolean hint (e.g. committed into Bitcoin Core) about whether or=
not it will still be in the UTXO set after validation completes. If it doe=
s, we write it to disk (append-only - it won't be used until SwiftSync =
finishes). If it does not, we hash the UTXO data and add it to an aggregate=
. For each input, we once again hash the UTXO data and remove it from the a=
ggregate.<br><br>At the end, for every added output there should have been =
exactly one removed input, bringing the end total of the aggregate to zero.=
If this is indeed the case, we will have validated that the hints, and the=
resulting UTXO set, were correct.<br><br>E.g. For spent outputs A, B and i=
nputs C, D we calculate hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO=
_C||salt) - hash(UTXO_D||salt) =3D=3D 0 (proving (A=3D=3DC && B=3D=
=3DD) || (A=3D=3DD && B=3D=3DC)).<br><br>There is one missing step.=
The UTXO data is only available when processing the output, but not when p=
rocessing the input. We resolve this by either downloading the outputs that=
were spent for each block (equivalent to the undo data, maybe 10-15% of a =
block), or we lean on assumevalid, making it sufficient to only hash the ou=
tpoints (which are available in both the output and input) rather than the =
full UTXO data.<br><br>Ignoring bandwidth, the expectation is that the spee=
dup will be most significant on either RAM limited devices and/or devices w=
ith many CPU cores. Initial PoC benchmarks (thanks to theStack) show a 5.28=
x speed-up, while currently still being largely sequential.<br><br>Many mor=
e details are in the full write-up:<br><a href=3D"https://gist.github.com/R=
ubenSomsen/a61a37d14182ccd78760e477c78133cd" target=3D"_blank" rel=3D"nofol=
low" data-saferedirecturl=3D"https://www.google.com/url?hl=3Den&q=3Dhtt=
ps://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd&sourc=
e=3Dgmail&ust=3D1746253793388000&usg=3DAOvVaw3cgzmLrC_4mJ_G1AQp-qnL=
">https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd</a><=
br><br>It will answer the following questions (and more):<br><br>- How the =
hash aggregate can be made secure against the generalized birthday problem<=
div>- How exactly assumevalid is utilized and what the implications are</di=
v><div>- How we can still check for inflation when we skip the amounts with=
assumevalid</div><div>- How we can validate transaction order while doing =
everything in parallel</div><div>- How we can perform the BIP30 duplicate o=
utput check without the UTXO set</div><div>- How this all relates to assume=
utxo<br><br>To my knowledge, every validation check involving the UTXO set =
is covered, but I'd be curious to hear if anything was overlooked or if=
you spot any other issues.<br><br>Thanks for reading, and thanks to everyo=
ne who provided invaluable feedback while the idea was coming together.<br>=
<br>-- Ruben Somsen</div></div>
</blockquote></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;Bitcoin Development Mailing List" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:bitcoindev+unsubscribe@googlegroups.com">bitcoind=
ev+unsubscribe@googlegroups.com</a>.<br />
To view this discussion visit <a href=3D"https://groups.google.com/d/msgid/=
bitcoindev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com?utm_med=
ium=3Demail&utm_source=3Dfooter">https://groups.google.com/d/msgid/bitcoind=
ev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com</a>.<br />
------=_Part_32779_1552355224.1746168440844--
------=_Part_32778_194082566.1746168440844--
|