6d/a971de1b0cff9d2c66b2b3c82da515a7cbf827


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359

Return-Path: <jaredr26@gmail.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 50DBAB62
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Sat,  1 Apr 2017 06:15:13 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-vk0-f48.google.com (mail-vk0-f48.google.com
	[209.85.213.48])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 366FE90
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Sat,  1 Apr 2017 06:15:12 +0000 (UTC)
Received: by mail-vk0-f48.google.com with SMTP id d188so106045479vka.0
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 31 Mar 2017 23:15:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to
	:cc:content-transfer-encoding;
	bh=pF4KWDpv19Hs619apQZ6iFK66YSEJrHUh+1Quzad8Dk=;
	b=EMRls8D/44JTcUZADZKuahYMMM7GWNgoFTEPIVw3SCUDjG1c7F/CdClP/YEj2DJ/Pc
	GtKwCJto8ilIKoDIKrNJrFdEJhmEBKrFJ6GBZpZtH6MFPtNESwv+FtrpBWIyUvdJLUfu
	Cea8ITdVfKokTFoqAdnwf+EISb3e2n0mtlvIWgqwvpXCUskJf+uQ9B5SmtQ4g93wd0tI
	sy63kOXjpzzk/kT7+Swx3Kn9UE2mxfGK5yUWuwr/z+9k3aRpoD+JG4TLo4HYdy/modtw
	rrDQnteF7x8y1XpTFDZvBSK6CgUHUn9HdY6v9JBP4uChKwGZhIo8bRUhOBhrGEszVL3G
	DhIA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:in-reply-to:references:from:date
	:message-id:subject:to:cc:content-transfer-encoding;
	bh=pF4KWDpv19Hs619apQZ6iFK66YSEJrHUh+1Quzad8Dk=;
	b=TMGZJFX2SzpwALMfXQawEQ84YKgbrSFD962zddX8R8UwLNmeg1xTZHPU8Jkp7jHfS3
	3pHMlx3i6XGOXFy0179auLp98jOZSGmFFx6pgx+TTqEE4AS6azUfzfr+fCWV3RmMEt+E
	AGT/TkC6Tke0dZqMBKwg18QHLe5XNUENkG3EmeZWwTE6i+cSZ9YIteJMGQyStWqR4+1g
	y5GMr+3iaOroD2MEyh4NFOOpBl/DCAjtV9jCFFjrtmwYvPvGwaLxXG8BXzLpDAQntY4D
	LoSP/8qrsbbm93b6UNzjuQgcB5MgF9Be9SYUuqj7AtVdVagSCeOCc6n8/qiJFlV7LWAr
	qx2g==
X-Gm-Message-State: AFeK/H274vb3IMMQqJXLMRCuhqDBvNGS7Kx7WFgL/v0hMONszwMGQSJbDlVX05EOPnojhT9GtcTblJxbcnLkzA==
X-Received: by 10.31.92.69 with SMTP id q66mr3011197vkb.119.1491027311034;
	Fri, 31 Mar 2017 23:15:11 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.31.157.143 with HTTP; Fri, 31 Mar 2017 23:15:09 -0700 (PDT)
In-Reply-To: <CAFVRnyr-Z9YWtT3r+7-fGejzgxKhH3-kQuo8JQFqKDpyZNBBdg@mail.gmail.com>
References: <CAEgR2PEG1UMqY0hzUH4YE_an=qOvQUgfXreSRsoMWfFWxG3Vqg@mail.gmail.com>
	<CAFVRnyq9Qgw88RZqenjQTDZHEWeuNCdh12Dq7wCGZdu9ZuEN9w@mail.gmail.com>
	<CAD1TkXvd4yLHZDAdMi78WwJ_siO1Vt7=DgYiBmP45ffVveuHBg@mail.gmail.com>
	<SINPR04MB1949AB581C6870184445E0C4C2340@SINPR04MB1949.apcprd04.prod.outlook.com>
	<CAD1TkXsj53JRYhqot2aHSQR+HEDKm7+6S5kGtaLYBCoc24PuWg@mail.gmail.com>
	<SINPR04MB1949A0AF3AD33B4664417068C2370@SINPR04MB1949.apcprd04.prod.outlook.com>
	<CAD1TkXtPZ7w+qYqr_hvyeq95aJ2ge1YYkoC1taDkzv1vEMKpog@mail.gmail.com>
	<SINPR04MB1949BE883C69CFF1477AFAEFC2370@SINPR04MB1949.apcprd04.prod.outlook.com>
	<CAD1TkXvXYX0f+jMMc41vhANuKfw-rNg9tUOG0bCS=T-YGYYjPw@mail.gmail.com>
	<CAFVRnyqSMVj2Ttc4_5vuk73Z5yRJdxeSodvkdjqsrHbgghcmUQ@mail.gmail.com>
	<CAD1TkXvmo8ygbFdPJxdiL5-QiN6+ujeSgOJ7_4eit43aZ2bzyg@mail.gmail.com>
	<CAFVRnyr-Z9YWtT3r+7-fGejzgxKhH3-kQuo8JQFqKDpyZNBBdg@mail.gmail.com>
From: Jared Lee Richardson <jaredr26@gmail.com>
Date: Fri, 31 Mar 2017 23:15:09 -0700
Message-ID: <CAD1TkXuyAH++1nwzZ-4BYv=kT=-8jycMsYx3xFriBiYeLmwvNg@mail.gmail.com>
To: David Vorick <david.vorick@gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,
	RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM autolearn=no version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
X-Mailman-Approved-At: Sat, 01 Apr 2017 14:36:40 +0000
Cc: Bitcoin Dev <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] Hard fork proposal from last week's meeting
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Apr 2017 06:15:13 -0000

> So your cluster isn't going to need to plan to handle 15k transactions pe=
r second, you're really looking at more like 200k or even 500k transactions=
 per second to handle peak-volumes. And if it can't, you're still going to =
see full blocks.

When I first began to enter the blocksize debate slime-trap that we
have all found ourselves in, I had the same line of reasoning that you
have now.  It is clearly untenable that blockchains are an incredibly
inefficient and poorly designed system for massive scales of
transactions, as I'm sure you would agree.  Therefore, I felt it was
an important point for people to accept this reality now and stop
trying to use Blockchains for things they weren't good for, as much
for their own good as anyone elses.  I backed this by calculating some
miner fee requirements as well as the very issue you raised.  A few
people argued with me rationally, and gradually I was forced to look
at a different question: Granted that we cannot fit all desired
transactions on a blockchain, how many CAN we effectively fit?

It took another month before I actually changed my mind.  What changed
it was when I tried to make estimations, assuming all the reasonable
trends I could find held, about future transaction fees and future
node costs.  Did they need to go up exponentially?  How fast, what
would we be dealing with in the future?  After seeing the huge
divergence in node operational costs without size increases($3 vs
$3000 after some number of years stands out in my memory), I tried to
adjust various things, until I started comparing the costs in BTC
terms.  I eventually realized that comparing node operational costs in
BTC per unit time versus transaction costs in dollars revealed that
node operational costs per unit time could decrease without causing
transaction fees to rise.  The transaction fees still had to hit $1 or
$2, sometimes $4, to remain a viable protection, but otherwise they
could become stable around those points and node operational costs per
unit time still decreased.

None of that may mean anything to you, so you may ignore it all if you
like, but my point in all of that is that I once used similar logic,
but any disagreements we may have does not mean I magically think as
you implied above.  Some people think blockchains should fit any
transaction of any size, and I'm sure you and I would both agree
that's ridiculous.  Blocks will nearly always be full in the future.
There is no need to attempt to handle unusual volume increases - The
fee markets will balance it and the use-cases that can barely afford
to fit on-chain will simply have to wait for awhile.  The question is
not "can we handle all traffic," it is "how many use-cases can we
enable without sacrificing our most essential features?"  (And for
that matter, what is each essential feature, and what is it worth?)

There are many distinct cut-off points that we could consider.  On the
extreme end, Raspberry Pi's and toasters are out.  Data-bound mobile
phones are out for at least the next few years if ever.  Currently the
concern is around home user bandwidth limits.  The next limit after
that may either be the CPU, memory, or bandwidth of a single top-end
PC.  The limit after that may be the highest dataspeeds that large,
remote Bitcoin mining facilities are able to afford, but after fees
rise and a few years, they may remove that limit for us.  Then the
next limit might be on the maximum amount of memory available within a
single datacenter server.

At each limit we consider, we have a choice of killing off a number of
on-chain usecases versus the cost of losing the nodes who can't reach
the next limit effectively.  I have my inclinations about where the
limits would be best set, but the reality is I don't know the numbers
on the vulnerability and security risks associated with various node
distributions.  I'd really like to, because if I did I could begin
evaluating the costs on each side.

> How much RAM do you need to process blocks like that?

That's a good question, and one I don't have a good handle on.  How
does Bitcoin's current memory usage scale?  It can't be based on the
UTXO, which is 1.7 GB while my node is only using ~450mb of ram.  How
does ram consumption increase with a large block versus small ones?
Are there trade-offs that can be made to write to disk if ram usage
grew too large?

If that proved to be a prohibitively large growth number, that becomes
a worthwhile number to consider for scaling.  Of note, you can
currently buy EC2 instances with 256gb of ram easily, and in 14 years
that will be even higher.

> So you have to rework the code to operate on a computer cluster.

I believe this is exactly the kind of discussion we should be having
14 years before it might be needed.  Also, this wouldn't be unique -
Some software I have used in the past (graphite metric collection)
came pre-packaged with the ability to scale out to multiple machines
split loads and replicate the data, and so could future node software.

> Further, are storage costs consistent when we're talking about setting up=
 clusters? Are bandwidth costs consistent when we're talking about setting =
up clusters? Are RAM and CPU costs consistent when we're talking about sett=
ing up clusters? No, they aren't.

Bandwidth costs are, as intra-datacenter bandwidth is generally free.
The other ones warrant evaluation for the distant future.  I would
expect that CPU resources is the first thing we would have to change -
13 thousand transactions per second is an awful lot to process.  I'm
not intimately familiar with the processing - Isn't it largely
signature verification of the transaction itself, plus a minority of
time spent checking and updating utxo values, and finally a small
number of hashes to check block validity?  If signature verification
was controlling, a specialized asic chip(on a plug-in card) might be
able to verify signatures hundreds of times faster, and it could even
be on a cheap 130nm chipset like the first asic miners rushed to
market.  Point being, there are options and it may warrant looking
into after the risk to node reductions.

> You'd need a handful of experts just to maintain such a thing.

I don't think this is as big a deal as it first might seem.  The
software would already come written to be spanned onto multiple
machines - it just needs to be configured.  For the specific question
at hand, the exchange would already have IT staff and datacenter
capacity/operations for their other operations.  In the more general
case, the numbers involved don't work out to extreme concerns at that
level.  The highest cpu usage I've observed on my nodes is less than
5%, less than 1% for the time I just checked, handling ~3 tx/s.  So
being conservative, if it hits 100% on one core at 60-120 tx/s, that
works out to ~25-50 8-core machines.  But again, that's a 2-year old
laptop CPU and we're talking about 14 years into the future.  Even if
it was 25 machines, that's the kind of operation a one or two man IT
team just runs on the side with their extra duties.  It isn't enough
to hire a fulltime tech for.

> Disks are going to be failing every day when you are storing multiple PB,=
 so you can't just count a flat cost of $20/TB and expect that to work.

I mean, that's literally what Amazon does for you with S3, which was
even cheaper than the EBS datastore pricing I was looking at.  So....
Even disregarding that, raid operation was a solved thing more than 10
years ago, and hard drives 14 years out would be roughly ~110 TB for a
$240 hard drive at a 14%/year growth rate.  In 2034 the blockchain
would fit on 10 of those.  Not exactly a "failing every day" kind of
problem.  By 2040, you'd need *gasp* 22 $240 hard drives.  I mean, it
is a lot, but not a lot like you're implying.

> And you need a way to rebuild everything without taking the system offlin=
e.

That depends heavily upon the tradeoffs the businesses can make.  I
don't think node operation at an exchange is a five-nines uptime
operation.  They could probably tolerate 3 nines.  The worst that
happens is occasionally people's withdrawals and deposit are delayed
slightly.  It won't shut down trading.

> I'm sure there are a dozen other significant issues that one of the Visa =
architects could tell you about when dealing with mission-critical data at =
this scale.

Visa stores the only copy.  They can't afford to lose the data.
Bitcoin isn't like that, as others pointed out.  And for most
businesses, if their node must be rebooted periodically, it isn't a
huge deal.

> Once we grow the blocksize large enough that a single computer can't do a=
ll the processing all by itself we get into a world of much harder, much mo=
re expensive scaling problems.

Ok, when is that point, and what is the tradeoff in terms of nodes?
Just because something is hard doesn't mean it isn't worth doing.
That's just a defeatist attitude.  How big can we get, for what
tradeoffs, and what do we need to do to get there?

> You have to check each transaction against each other transaction to make=
 sure that they aren't double spending eachother.

This is really not that hard.  Have a central database, update/check
the utxo values in block-store increments.  If a utxo has already been
used this increment, the block is invalid.  If the database somehow
got too big(not going to happen at these scales, but if it did), it
can be sharded trivially on the transaction information.  These are
solved problems, the free database software that's available is pretty
powerful.

> You have to be a lot more clever than that to get things working and cons=
istent.

NO, NOT CLEVER.  WE CAN'T DO THAT.

Sorry, I had to. :)

> None of them have cost structures in the 6 digit range, and I'd bet (with=
out actually knowing) that none of them have cost structures in the 7 digit=
 range either.

I know of and have experience working with systems that handled
several orders of magnitude more data than this.  None of the issues
brought up above are problems that someone hasn't solved.  Transaction
commitments to databases?  Data consistency across multiple workers?
Data storage measured in exabytes?  Data storage and updates
approaching hundreds of millions of datapoints per second?  These
things are done every single day at numerous companies.

On Fri, Mar 31, 2017 at 11:23 AM, David Vorick <david.vorick@gmail.com> wro=
te:
> Sure, your math is pretty much entirely irrelevant because scaling system=
s
> to massive sizes doesn't work that way.
>
> At 400B transactions per year we're looking at block sizes of 4.5 GB, and=
 a
> database size of petabytes. How much RAM do you need to process blocks li=
ke
> that? Can you fit that much RAM into a single machine? Okay, you can't fi=
t
> that much RAM into a single machine. So you have to rework the code to
> operate on a computer cluster.
>
> Already we've hit a significant problem. You aren't going to rewrite Bitc=
oin
> to do block validation on a computer cluster overnight. Further, are stor=
age
> costs consistent when we're talking about setting up clusters? Are bandwi=
dth
> costs consistent when we're talking about setting up clusters? Are RAM an=
d
> CPU costs consistent when we're talking about setting up clusters? No, th=
ey
> aren't. Clusters are a lot more expensive to set up per-resource because
> they need to talk to eachother and synchronize with eachother and you hav=
e a
> LOT more parts, so you have to build in redundancies that aren't necessar=
y
> in non-clusters.
>
> Also worth pointing out that peak transaction volumes are typically 20-50=
x
> the size of typical transaction volumes. So your cluster isn't going to n=
eed
> to plan to handle 15k transactions per second, you're really looking at m=
ore
> like 200k or even 500k transactions per second to handle peak-volumes. An=
d
> if it can't, you're still going to see full blocks.
>
> You'd need a handful of experts just to maintain such a thing. Disks are
> going to be failing every day when you are storing multiple PB, so you ca=
n't
> just count a flat cost of $20/TB and expect that to work. You're going to
> need redundancy and tolerance so that you don't lose the system when a fe=
w
> of your hard drives all fail within minutes of eachother. And you need a =
way
> to rebuild everything without taking the system offline.
>
> This isn't even my area of expertise. I'm sure there are a dozen other
> significant issues that one of the Visa architects could tell you about w=
hen
> dealing with mission-critical data at this scale.
>
> --------
>
> Massive systems operate very differently and are much more costly per-uni=
t
> than tiny systems. Once we grow the blocksize large enough that a single
> computer can't do all the processing all by itself we get into a world of
> much harder, much more expensive scaling problems. Especially because we'=
re
> talking about a distributed system where the nodes don't even trust each
> other. And transaction processing is largely non-parallel. You have to ch=
eck
> each transaction against each other transaction to make sure that they
> aren't double spending eachother. This takes synchronization and prevents
> 500 CPUs from all crunching the data concurrently. You have to be a lot m=
ore
> clever than that to get things working and consistent.
>
> When talking about scalability problems, you should ask yourself what oth=
er
> systems in the world operate at the scales you are talking about. None of
> them have cost structures in the 6 digit range, and I'd bet (without
> actually knowing) that none of them have cost structures in the 7 digit
> range either. In fact I know from working in a related industry that the
> cost structures for the datacenters (plus the support engineers, plus the
> software management, etc.) that do airline ticket processing are above $5
> million per year for the larger airlines. Visa is probably even more
> expensive than that (though I can only speculate).