summaryrefslogtreecommitdiff
path: root/ba/f99fc1d6ab02efe29b6bc64dd571b6654a8b07
blob: ad7cbf52c294302f259ebfab55b4aa61625dfbe1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
Received: from sog-mx-1.v43.ch3.sourceforge.com ([172.29.43.191]
	helo=mx.sourceforge.net)
	by sfs-ml-4.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
	(envelope-from <pw@vps7135.xlshosting.net>) id 1SnBlN-0003qk-8b
	for bitcoin-development@lists.sourceforge.net;
	Fri, 06 Jul 2012 16:52:13 +0000
X-ACL-Warn: 
Received: from vps7135.xlshosting.net ([178.18.90.41])
	by sog-mx-1.v43.ch3.sourceforge.com with esmtp (Exim 4.76)
	id 1SnBlM-00082I-4C for bitcoin-development@lists.sourceforge.net;
	Fri, 06 Jul 2012 16:52:13 +0000
Received: by vps7135.xlshosting.net (Postfix, from userid 1000)
	id 2B83E6152F; Fri,  6 Jul 2012 18:52:05 +0200 (CEST)
Date: Fri, 6 Jul 2012 18:52:04 +0200
From: Pieter Wuille <pieter.wuille@gmail.com>
To: bitcoin-development@lists.sourceforge.net
Message-ID: <20120706165204.GA27215@vps7135.xlshosting.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-PGP-Key: http://sipa.ulyssis.org/pubkey.asc
User-Agent: Mutt/1.5.20 (2009-06-14)
X-Spam-Score: 1.2 (+)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
	See http://spamassassin.org/tag/ for more details.
	0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
	(pieter.wuille[at]gmail.com)
	0.0 DKIM_ADSP_CUSTOM_MED   No valid author signature, adsp_override is
	CUSTOM_MED
	-0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
	domain 1.2 NML_ADSP_CUSTOM_MED    ADSP custom_med hit,
	and not from a mailing list
X-Headers-End: 1SnBlM-00082I-4C
Subject: [Bitcoin-development] Pruning in the reference client: ultraprune
	mode
X-BeenThere: bitcoin-development@lists.sourceforge.net
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <bitcoin-development.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
X-List-Received-Date: Fri, 06 Jul 2012 16:52:13 -0000

Hello all,

I've implemented a new block/transaction validation system for the 
reference client, called "ultraprune".

Traditionally, pruning for bitcoin is considered to be the ability to 
delete full transactions from storage once all their outputs are spent, 
and they are buried deeply enough in the chain. That is not what this 
is about.

Given that almost all operations performed on the blockchain do not 
require full previous transactions, but only their unspent outputs, it 
seemed wasteful to use the fully indexed blockchain for everything. 
Instead, we keep a database with only the unspent transaction outputs. 
After some effort to write custom compact serializations for these, I 
could reduce the storage required for such a dataset to less than 70 
MB. This is kept in a BDB database file (coins.dat), and with indexing 
and overhead, and takes around 130 MB.

Now, this is not enough. You cannot have a full node wit just these 
outputs. In particular, it cannot undo block connections, cannot rescan 
for wallet transactions, and cannot serve the chain to other nodes. 
These are, however, quite infrequent operations. To compensate, we keep 
non-indexed but entire blocks (just each block in a separate file, for 
now), plus "undo" information for connected blocks around in addition 
to coins.dat. We also need a block index with metadata about all stored 
blocks, which takes around 40 MB for now (though that could easily be 
reduced too). Note that this means we lose the ability to find an 
actual transaction in the chain from a txid, but this is not necessary 
for normal operation. Such an index could be re-added later, if 
necessary.

Once you have this, the step to pruning is easily made: just delete 
block files and undo information for old blocks. This isn't implemented 
for now, but there shouldn't be a problem. It simply means you cannot 
rescan/reorg/server those old blocks, but once those are deep enough 
(say a few thousand blocks), we can tolerate that.

So, in summary, it allows one to run a full node (now) with:
* 130 MB coins.dat
* 40 MB chain.dat
* the size of the retained blocks
  * + +-12% of that for undo information.

Oh, it's also faster. I benchmarked a full import of the blockchain 
(187800 blocks) on my laptop (2.2GHz i7, 8 GiB RAM, 640 GB spinning 
harddisk) in 22 minutes. That was from a local disk, and not from 
network (which has extra overhead, and is limited by bandwidth 
constraints).

If people want to experiment with it, see my "ultraprune" branch on 
github: https://github.com/sipa/bitcoin/tree/ultraprune

Note that this is experimental, and has some disadvantages:

* you cannot find a (full) transaction from just its txid. 
* if you have transactions that depend on unconfirmed transactions, 
  those will not get rebroadcasted
* only block download and reorganization are somewhat tested; use at 
  your own risk
* less consistency checks are possible on the database, and even fewer 
  are implemented

Also note that this is not directly related to the recent pruning 
proposals that use an alt chain with an index of unspent coins (and 
addresses), merged mined with the main chain. This could be a step 
towards such a system, however.

-- 
Pieter