1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
|
Return-Path: <tomas@tomasvdw.nl>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id CDECB360
for <bitcoin-dev@lists.linuxfoundation.org>;
Sat, 8 Apr 2017 23:58:04 +0000 (UTC)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com
[66.111.4.25])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id A245214C
for <bitcoin-dev@lists.linuxfoundation.org>;
Sat, 8 Apr 2017 23:58:03 +0000 (UTC)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42])
by mailout.nyi.internal (Postfix) with ESMTP id EC769208CD;
Sat, 8 Apr 2017 19:58:02 -0400 (EDT)
Received: from web3 ([10.202.2.213])
by compute2.internal (MEProxy); Sat, 08 Apr 2017 19:58:02 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
messagingengine.com; h=content-transfer-encoding:content-type
:date:from:in-reply-to:message-id:mime-version:references
:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=p0YHVR
qd1jkgVTsR0ymuueuqd2YekIw81G8kZbXFhlw=; b=Z1FmZOrdcU8Ttfai1Do3Hf
803qUzYakeqSCu7sp22jBv8BIjOZM0yYXPJAaFMbzGUKNb7/wWQdxwQ2cWElyKnx
INGtGSzQUxoZEPuqFWGgLB4K3uj7ERfv1lGrxkODzfdYoNd55Vr0BHCcWPXlmWWt
DGFJ6PUlDxCaRyrfnP0u8G9xuk1CqoaCIhoa+cl0/BFqe371FfbASWhoAaeGl4PX
KbYT8slgYsCWnP79Z99iOnFluJWWY3+lJcquUqER+L0YfN1C/4p4OSWi/My4Qq3P
7ggPNLBX2LPsE3/qd3m3w5TxF95QTUYr/7FYNB9nVeYek2a8Xp11zNBMyeQqNfcQ
==
X-ME-Sender: <xms:CnnpWPueyRcT7VHhYFEr6PFoQ8fZXGrWSO06Z0S4Gd3f_8dhBTiwFQ>
Received: by mailuser.nyi.internal (Postfix, from userid 99)
id BF6F49EC4C; Sat, 8 Apr 2017 19:58:02 -0400 (EDT)
Message-Id: <1491695882.3440363.938700256.78C37AC3@webmail.messagingengine.com>
From: Tomas <tomas@tomasvdw.nl>
To: Eric Voskuil <eric@voskuil.org>,
Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>,
Libbitcoin Development <libbitcoin@lists.dyne.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="utf-8"
X-Mailer: MessagingEngine.com Webmail Interface - ajax-7c174d5d
In-Reply-To: <83618cca-c6a3-301c-af70-ab7807474f30@voskuil.org>
References: <1491516747.3791700.936828232.69F82904@webmail.messagingengine.com>
<eebc06a4-5ab8-46a8-2f50-a472cb57a775@voskuil.org>
<1491524267.715789.936916864.1156D8CB@webmail.messagingengine.com>
<83618cca-c6a3-301c-af70-ab7807474f30@voskuil.org>
Date: Sun, 09 Apr 2017 01:58:02 +0200
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,
DKIM_VALID,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
X-Mailman-Approved-At: Sat, 08 Apr 2017 23:59:06 +0000
Subject: Re: [bitcoin-dev] Using a storage engine without UTXO-index
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Apr 2017 23:58:04 -0000
Thank you for your elaborate response Eric,
On Sun, Apr 9, 2017, at 00:37, Eric Voskuil wrote:
> My point was that "Using a storage engine without UTXO-index" has been
> done, and may be a useful reference, not that implementation details
> are the same.
I haven't dived into libbitcoin V2/V3 enough to fully grasp it and
though your comments help, I still not fully do. I will answer below
what is related to bitcrust itself.
My post wasn't posted to claim innovation; I merely try to explain how
Bitcrust works and why it performs well.
> First, I remain confused on your comments pertaining to UTXO growth
> and network protocol. I followed your conversation with Greg and it
> remains unclear to me. From what I understand you have isolated order
> (double spend) from script validation. I think we all understand that
> script validation requires inputs and outputs while double spend
> detection requires correlation of inputs. What I do not understand is
> your choice of optimization axis.
>
> Detection of double spend is not useful in isolation. One must also
> validate scripts, which requires outputs. I can see that there is an
> opportunity to reject blocks (within the same branch) faster by
> validating for double spends before validating script. But unconfirmed
> transactions do not exist in a branch, and are therefore not truly
> conflicting, until they are mined. And even after they are mined
> conflicting txs remain potentially valid in other branches. So
> rejecting txs due to conflict comes down to a denial of service
> policy, which ultimately must be based on fee increment (e.g. RBF).
> But fees are based on the amount of the output value that remains
> unspent in the transaction. So this in turn requires the retrieval of
> outputs.
>
> And yet the remaining scenario of fast rejection of invalid blocks is
> not a meaningful optimization. Optimizing for the case where a block
> has valid and sufficient PoW and yet is invalid (for double spend) is
> counterproductive. And even so, the txs within the invalid block may
> be entirely valid independent of the block, so you are back to looking
> up their outputs to obtain fees in the case of a double spend or to
> validate script otherwise. In all cases you need to get the outputs.
>
> > Bitcrust simply scans the tree. Although earlier designs used a
> > skip-list, it turns out that accompanied by a spent-index lagging a
> > few blocks behind, raw scanning is faster then anything even though
> > it needs to scan ~5 blocks times ~4000 inputs before reaching the
> > first spent-index, the actual scan is highly cache efficient and
> > little more then a "REP SCASQ", reaching sub-microsecond per input
> > on each core *including* the lookup in the spend index.
>
> I realize that you see the implementation of the ordering validation
> as interesting detail, but I find it hard to justify contemplating the
> implementation in isolation from the output lookup requirement. And if
> one must looking up both outputs and spends for each validation, it
> makes more sense to co-locate that data.
>
> Recovering in one step all data necessary to validate a tx has real
> advantages over either interleaving queries and validation or
> splitting input vs. output validation queries into two steps. It is a
> significantly more test-friendly approach, has better performance
> characteristics, and simplifies code. I cannot see any reason to
> perform the data read for double spend validation in isolation of that
> for script validation.
You seem to ignore here the difference between base load and peak load.
If Compact blocks/XThin with further optimizations can presync nearly
100% of the transactions, and nodes can do as much as possible when a
transaction comes in, the time spent when a block comes in can be
minimized and a lot more transactions can be handled with the same
resources.
The reason for "splitting" is that for an incoming transaction the
spent-state of the outputs being spent isn't particularly relevant as
you seem to acknowledge. When the block comes in, the actual output data
isn't relevant.
The *only* thing that needs to be checked when a block comes in is the
order, and the spend-tree approach absolves the need to access outputs
here.
As it also absolves the need for reorgs this greatly simplifies the
design. I am not sure why you say that a one-step approach is more
"test-friendly" as this seems to be unrelated.
>
> If by results you are referring to performance numbers, it's very hard
> to draw any conclusions without a full benchmark. It's great that if
> you are able to boost Core, but from my perspective the numbers aren't
> especially compelling.
>
I fully agree and hopefully do not pretend to hide that my numbers are
premature without a full implementation. I just think they are promising
enough to convince at least myself to move on with this model.
> Despite the site's explanation I cannot think of any reason to ever
> validate two blocks at the same time. You would always prioritize the
> block with the greatest PoW. Doing otherwise just slows down the net
> validation in all but the pathological case where a miner has produced
> an *invalid* block with *more* PoW than another valid block which
> arrived at the node within the same second. Rejecting a *valid* block
> with more PoW in favor of one with *less* "processing" is a hard fork,
> so you probably wouldn't want to do that either. But with compact
> block validation times approaching 25ms it's hard to justify stopping
> a block validation for any reason.
I don't get what you are saying. Why pick the greatest PoW of two
competing blocks? If two blocks come in, an implementation is free to
choose whichever block to build on. Choosing so is not a "hardfork".
Parallel validation simply makes it easier to make an optimal choice,
for if two blocks come in, the one that is validated fastest can be
build upon without the risk of validationless mining.
>
> That's not to say parallel block validation difficult to do. If you
> can validate one block's full set of inputs in parallel (which is not
> novel) doing the same with additional blocks has trivial additional
> complexity.
I am not trying to claim novelty here.
> I am also interested in your previous comments about soft forks. These
> are material considerations that Greg touched on but it doesn't sound
> like you fully appreciate just yet. When a tx is pre-validated the
> rules applied must be the same rules as those of some future block.
> Yet a tx can be included in more than one block (different branches).
> Across branches and even in one branch, validation rules change, and
> can change back. The changes are based on accumulated branch history.
> Pre-validation can later become invalidated, and differently in
> different branches. And maintaining proper context requires either
> storing state that you are apparently not storing, or invalidating
> optimizations. Based on your comments you do not seem to be accounting
> for this in your storage assumptions or in your results. A recent post
> by Greg highlights the complexity and consensus criticality of these
> considerations.
Frankly, I think this is a bit of an exaggeration. Soft forks are
counted on a hand, and I don't think there are many - if any -
transactions in the current chain that have changed compliance based on
height. This makes this a compliance issue and not a performance issue
and the solution I have explained, to add height-based compliance as
meta data of validation seems to
be adequate and safe.
> The hash table store that I described can fully navigate the block
> tree and transaction DAG, since the stored tx, parent and point hashes
> are also natural keys and each link is navigable in constant time. It
> is also lock-free, can concurrently write any number of blocks during
> initial block download and supports read/write concurrency. It has
> successfully indexed and stored the entire blockchain from the P2P
> network in 16 minutes (locally). It also stores both confirmed and
> unconfirmed transactions in the same store, so there is nothing to
> write when a block is confirmed except for the block header/hashes and
> updates to spender heights for any output spent by the new block's
> txs. It is similarly capable of storage in the block table of weak
> chain blocks...
>
I think I get the gist of your approach and it sounds very interesting
and I will definitely dive in deeper.
It also seems sufficiently different from Bitcrust to merit competing on
(eventual) results instead of the complicated theory alone.
Best,
Tomas
|