1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
|
Return-Path: <peter_r@gmx.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 1D7BE267
for <bitcoin-dev@lists.linuxfoundation.org>;
Mon, 9 May 2016 18:40:00 +0000 (UTC)
X-Greylist: delayed 00:05:06 by SQLgrey-1.7.6
Received: from mout.gmx.net (mout.gmx.net [212.227.15.18])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 1302C115
for <bitcoin-dev@lists.linuxfoundation.org>;
Mon, 9 May 2016 18:39:58 +0000 (UTC)
Received: from [192.168.1.59] ([205.250.126.165]) by mail.gmx.com (mrgmx002)
with ESMTPSA (Nemesis) id 0MexFh-1bFPzn0Qzh-00OZBs;
Mon, 09 May 2016 20:34:50 +0200
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Peter R <peter_r@gmx.com>
In-Reply-To: <5730C37E.2000004@gmail.com>
Date: Mon, 9 May 2016 11:34:47 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <D0F57BE3-EFD2-4557-8D05-704D9C4E5EA1@gmx.com>
References: <5727D102.1020807@mattcorallo.com> <5730C37E.2000004@gmail.com>
To: Pieter Wuille <pieter.wuille@gmail.com>,
Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
X-Mailer: Apple Mail (2.2104)
X-Provags-ID: V03:K0:tIOTP7Ck6WWlgO70r6mNTR7Z8m4X6Io7yx/xPRwn1aAh099mHWb
YftNaa9rVwis151erITocEf7ioRK1hZ+lCvRhwHKj+xewmIhI320H2K9I6+fprSRSMGE2Tm
l08dziE4fsfrgFVYUUJRoRkTPWKslEbbMZ7i2uZfdkpIPJKIxv3AfJre/SoUJ4wxFiPr7W/
2L5t7OHUcJWZnlgmY92Jg==
X-UI-Out-Filterresults: notjunk:1;V01:K0:cZ2zFL4tUXs=:y7JWyxB2Vmn9c1eAZ2F292
xboPKHOhjYy34lnK6TPmYYFwTtTAJ39jiF3lyEEGuhWFxkNndTrVt50Qr62Oe1b6j8AqgDhEt
LVDwF6gRMv/3HUxUe/QWaOJVBr2cmSXeHOI1pjVyz0/epjIF3X2Enhs/2dVrtFd8cdeZue4R8
d0F8axPogNsu9ZeTKTQ9uUn8LfvIArzR/O/GFWpun3svxrmoZb5UieONxfTAg3yzZD2H6klzQ
6WXaQwLByw8zP7FNh3jyo61HXtSCZibbrI+uWzr+0snRx1sT4qT2CnYcqD2rdkYEVbxeufa9x
33cW3Xue7lHKmfjxR1tRby6vBbmoQfYIrjWmQKigwcrl+9VNZpJXwAV8HrVkYzzQpjDQDMRol
5FmT+8HXT8PwNxrUzNepTdhQ5XDGOE+dUmwAVemEodCluHqDY5sglqsGT/NL6iAVQoHxGGxpM
s4Ui9lD4ZsJKlU7hZl6fCWD2majQfSgpe68bmuVCjAI1sgYjuet7VblxQ1ca5lx8ykR7L+bTK
a34kk2NCvNXB4tWRR+NIrVSvzkP7wMSryUQTifr5WQA6ylr7jEiKDSjlPh0I0qQl6wT4GXe4h
301G/dyn2RKE7YRUTOkFpkQHDcHt/9SD6n/i8AkgZaXF99nBMhK4LTnXlpQduOSjY/ysceTfr
lue4PzoYuP7JNV5I04ek7dnMCpkkci+xGlp0LivNu/BS4OAGkh8FK9Kis3q1J4DEZgph1cIkH
FCj1O/47rYrrZz1MbgP0Mw6ngAnM3Po+ZmXDMU9lR50JWqPHXQokzmIcNQ2SRSKhwrxSkBohA
MvDCYeh
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
X-Mailman-Approved-At: Mon, 09 May 2016 18:52:07 +0000
Subject: Re: [bitcoin-dev] Compact Block Relay BIP
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Mon, 09 May 2016 18:40:00 -0000
Hi Pieter,
> I tried to derive what length of short ids is actually necessary (some
> write-up is on
> https://gist.github.com/sipa/b2eb2e486156b5509ac711edd16153ed but it's
> incomplete).
>=20
> For any reasonable numbers I can come up with (in a very wide range),
> the number of bits needed is very well approximated by:
>=20
> log2(#receiver_mempool_txn * #block_txn_not_in_receiver_mempool /
> acceptable_per_block_failure_rate)
>=20
> For example, with 20000 mempool transactions, 2500 transactions in a
> block, 95% hitrate, and a chance of 1 in 10000 blocks to fail to
> reconstruct, needed_bits =3D log2(20000 * 2500 * (1 - 0.95) / 0.0001) =
=3D
> 34.54, or 5 byte txids would suffice.
>=20
> Note that 1 in 10000 failures may sound like a lot, but this is for =
each
> individual connection, and since every transmission uses separately
> salted identifiers, occasional failures should not affect global
> propagation. Given that transmission failures due to timeouts, network
> connectivity, ... already occur much more frequently than once every =
few
> gigabytes (what 10000 blocks corresponds to), that's probably already
> more than enough.
>=20
> In short: I believe 5 or 6 byte txids should be enough, but perhaps it
> makes sense to allow the sender to choose (so he can weigh trying
> multiple nonces against increasing the short txid length).
[9 May 16 @ 11am PDT] =20
We worked on this with respect to =E2=80=9CXthin" for Bitcoin Unlimited, =
and came to a similar conclusion. =20
But we (I think it was theZerg) also noticed another trick: if the node =
receiving the thin blocks has a small number of collisions with =
transactions in its mempool (e.g., 1 or 2), then it can test each =
possible block against the Merkle root in the block header to determine =
the correct one. Using this technique, it should be possible to further =
reduce the number of bytes used for the txids. That being said, even =
thin blocks built from 64-bit short IDs represent a tremendous savings =
compared to standard block propagation. So we (Bitcoin Unlimited) =
decided not to pursue this optimization any further at that time.
***
It=E2=80=99s also interesting to ask what the information-theoretic =
minimum amount of information necessary for a node to re-construct a =
block is. The way I=E2=80=99m thinking about this currently[1] is that =
the node needs all of the transactions in the block that were not =
initially part of its mempool, plus enough information to select and =
ordered subset from that mempool that represents the block. If m is the =
number of transactions in mempool and n is the number of transactions in =
the block, then the number of possible subsets (C') is given by the =
binomial coefficient:
C' =3D m! / [n! (m - n)!]
Since there are n! possible orderings for each subset, the total number =
of possible blocks (C) of size n from a mempool of size m is
C =3D n! C=E2=80=99 =3D m! / (m-n)!
Assuming that all possible blocks are equally likely, the Shannon =
entropy (the information that must be communicated) is the base-2 =
logarithm of the number of possible blocks. After making some =
approximations, this works out very close to
minimum information ~=3D n * log2(m),
which for your case of 20,000 transactions in mempool (m =3D 20,000) and =
a 2500-transaction block (n =3D 2500), yields
minimum information =3D 2500 * log2(20,000) ~ 2500 * 15 bits.
In other words, a lower bound on the information required is about 2 =
bytes per transactions for every transaction in the block that the node =
is already aware of, as well as all the missing transactions in full.=20
Of course, this assumes an unlimited number of round trips, and it is =
probably complicated by other factors that I haven=E2=80=99t considered =
(queue the =E2=80=9Cspherical cow=E2=80=9D jokes :), but I thought it =
was interesting that a technique like Xthin or compact blocks is already =
pretty close to this limit. =20
Cheers,
Peter=20
[1] There are still some things that I can=E2=80=99t wrap my mind around =
that I=E2=80=99d love to discuss with another math geek :)
|