b1/7ec23d0044b06c00ab6f529b4c9ec6a5a3e6df


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422

Received: from sog-mx-2.v43.ch3.sourceforge.com ([172.29.43.192]
	helo=mx.sourceforge.net)
	by sfs-ml-1.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
	(envelope-from <gustav.simonsson@gmail.com>) id 1WMMw0-0003Gu-03
	for bitcoin-development@lists.sourceforge.net;
	Sat, 08 Mar 2014 19:29:24 +0000
Received-SPF: pass (sog-mx-2.v43.ch3.sourceforge.com: domain of gmail.com
	designates 209.85.213.180 as permitted sender)
	client-ip=209.85.213.180;
	envelope-from=gustav.simonsson@gmail.com;
	helo=mail-ig0-f180.google.com; 
Received: from mail-ig0-f180.google.com ([209.85.213.180])
	by sog-mx-2.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128)
	(Exim 4.76) id 1WMMvw-0003tX-VL
	for bitcoin-development@lists.sourceforge.net;
	Sat, 08 Mar 2014 19:29:23 +0000
Received: by mail-ig0-f180.google.com with SMTP id hl1so4890119igb.1
	for <bitcoin-development@lists.sourceforge.net>;
	Sat, 08 Mar 2014 11:29:15 -0800 (PST)
MIME-Version: 1.0
X-Received: by 10.50.66.129 with SMTP id f1mr9791767igt.26.1394306955603; Sat,
	08 Mar 2014 11:29:15 -0800 (PST)
Received: by 10.64.32.10 with HTTP; Sat, 8 Mar 2014 11:29:15 -0800 (PST)
In-Reply-To: <CANEZrP1+=JY0RGEMvm9iL09L-tZAWqsSOOwFaroYUKkWumx+xg@mail.gmail.com>
References: <CANEZrP25N7W_MeZin_pyVQP5pC8bt5yqJzTXt_tN1P6kWb5i2w@mail.gmail.com>
	<0720C223-E9DD-4E76-AD6F-0308CA5B5289@gmail.com>
	<CAAS2fgTGDzPFDP=ii08VXcXYpWr2akYWxqJCNHW-ABuN=ESc8A@mail.gmail.com>
	<7E50E1D6-3A9F-419B-B01E-50C6DE044E0F@gmail.com>
	<CAAS2fgScLKgq8_V0oVpvP1gYAKxiyVNGVWA86XfecSmPqsMKUg@mail.gmail.com>
	<CANEZrP1+=JY0RGEMvm9iL09L-tZAWqsSOOwFaroYUKkWumx+xg@mail.gmail.com>
Date: Sat, 8 Mar 2014 20:29:15 +0100
Message-ID: <CANeYco-Zno1xAETTFoYA12K2TAqJN9u+ttEEuNgATjcLuUek+w@mail.gmail.com>
From: Gustav Simonsson <gustav.simonsson@gmail.com>
To: Mike Hearn <mike@plan99.net>
Content-Type: multipart/alternative; boundary=047d7bdc0cf4cbf37904f41d628e
X-Spam-Score: -0.6 (/)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
	See http://spamassassin.org/tag/ for more details.
	-1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for
	sender-domain
	0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
	(gustav.simonsson[at]gmail.com)
	-0.0 SPF_PASS               SPF: sender matches SPF record
	1.0 HTML_MESSAGE           BODY: HTML included in message
	-0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
	author's domain
	0.1 DKIM_SIGNED            Message has a DKIM or DK signature,
	not necessarily valid
	-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
X-Headers-End: 1WMMvw-0003tX-VL
Cc: Bitcoin Development <bitcoin-development@lists.sourceforge.net>
Subject: Re: [Bitcoin-development] New side channel attack that can recover
 Bitcoin keys
X-BeenThere: bitcoin-development@lists.sourceforge.net
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <bitcoin-development.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
X-List-Received-Date: Sat, 08 Mar 2014 19:29:24 -0000

--047d7bdc0cf4cbf37904f41d628e
Content-Type: text/plain; charset=ISO-8859-1

While there is no mention of virtualization in the side-channel article,
the FLUSH+RELOAD paper [1] mentions virtualization and claims the clflush
instruction works not only towards processes on the same OS, but also
against processes in a separate guest OS if executed on the host OS (type 2
hypervisor) [2]. It also works if executed from within another guest OS
(though that reduces the efficiency of the attack) [3].

Both the authors [4] and Vulnerability Note VU#976534 [5] claim disabling
hypervisor memory page de-duplication prevents the attack. This could
perhaps be a first step for bitcoin companies running their software on
shared hosts; demand their allocated instances to be on hosts with this
disabled. Question is how common it is for virtualization providers to
offer that as an option.

TRESOR is is only applicable if running in a non-virtualized OS [6].

While TRESOR only implements AES, it seems it could work for ECDSA as well,
as they use the four x86 debug registers to fit a 256 bit privkey [7] for
the entire machine uptime, and then use other registers when doing the
actual AES ops. They use the Intel AES-NI instruction set though, and since
there is no corresponding instruction set for EC extra work would be
required to manually implement EC math in assembler.

They actually do what Mike Hearn mentioned and disable preemption in Linux
(their code runs in kernel space; they patched the kernel) to ensure
atomicity. Not only do they manage to protect against memory attacks (and
RAM/cache timing attacks) from other processes running on the same host,
but even from root on the same host (from userland, the debug registers are
only accessible through ptrace, which they patched, and they also disabled
LKM & KMEM).

One could imagine different levels of TRESOR-like ECDSA with different
tradeoffs of complexity vs security. For example, if one is fine with
keeping the privkey(s) in RAM but want to avoid cache timing attacks, the
signing could be implemented as a userspace program holding key(s) in RAM
together with a kernel module providing a syscall for signing. Signing is
then run with preemption using only x86 registers for intermediate data and
then using e.g. movntps [8] to write to RAM without data being cached. The
benefit of this compared with the full TRESOR approach is that it would not
require a patched kernel, only a kernel module. It would also be simpler to
implement compared to keeping the privkey in the debug registers for the
entire machine uptime, especially if multiple privkeys are used. It would
not protect against root though, since an adversary getting root could load
their own kernel module and read the registers.

To handle multiple keys (maybe as one-time-use) and get full TRESOR
benefits, one could perhaps (with the original TRESOR approach, i.e. with
patched kernel) store a BIP 0032 starting string / seed + counter in the
debug registers and have the atomic kernel code generate new keys and do
the signing.

Cheers,
Gustav Simonsson

1. http://eprint.iacr.org/2013/448.pdf
2. Page 1 of [1]
3. page 5 of [1]
4. page 8 (end of conclusions section) of [1]
5. http://www.kb.cert.org/vuls/id/976534
6. page 8, "3.2 Hardware compatibility",
https://www.usenix.org/legacy/event/sec11/tech/full_papers/Muller.pdf
7. page 3, "2.2 Key Management" of [6]
8. page 1041 of
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf


On Thu, Mar 6, 2014 at 9:38 AM, Mike Hearn <mike@plan99.net> wrote:

> I'm wondering about whether (don't laugh) moving signing into the kernel
> and then using the MTRRs to disable caching entirely for a small scratch
> region of memory would also work. You could then disable pre-emption and
> prevent anything on the same core from interrupting or timing the signing
> operation.
>
> However I suspect just making a hardened secp256k1 signer implementation
> in userspace would be of similar difficulty, in which case it  would
> naturally be preferable.
>
>
> On Wed, Mar 5, 2014 at 11:25 PM, Gregory Maxwell <gmaxwell@gmail.com>wrote:
>
>> On Wed, Mar 5, 2014 at 2:14 PM, Eric Lombrozo <elombrozo@gmail.com>
>> wrote:
>> > Everything you say is true.
>> >
>> > However, branchless does reduce the attack surface considerably - if
>> nothing else, it significantly ups the difficulty of an attack for a
>> relatively low cost in program complexity, and that might still make it
>> worth doing.
>>
>> Absolutely. I believe these things are worth doing.
>>
>> My comment on it being insufficient was only that "my signer is
>> branchless" doesn't make other defense measures (avoiding reuse,
>> multsig with multiple devices, not sharing hardware, etc.)
>> unimportant.
>>
>> > As for uniform memory access, if we avoided any kind of heap
>> allocation, wouldn't we avoid such issues?
>>
>> No. At a minimum to hide a memory timing side-channel you must perform
>> no data dependent loads (e.g. no operation where an offset into memory
>> is calculated). A strategy for this is to always load the same values,
>> but then mask out the ones you didn't intend to read... even that I'd
>> worry about on sufficiently advanced hardware, since I would very much
>> not be surprised if the processor was able to determine that the load
>> had no effect and eliminate it! :) )
>>
>> Maybe in practice if your data dependencies end up only picking around
>> in the same cache-line it doesn't actually matter... but it's hard to
>> be sure, and unclear when a future optimization in the rest of the
>> system might leave it exposed again.
>>
>> (In particular, you can't generally write timing sign-channel immune
>> code in C (or other high level language) because the compiler is
>> freely permitted to optimize things in a way that break the property.
>> ... It may be _unlikely_ for it to do this, but its permitted-- and
>> will actually do so in some cases--, so you cannot be completely sure
>> unless you check and freeze the toolchain)
>>
>> > Anyhow, without having gone into the full details of this particular
>> attack, it seems the main attack point is differences in how squaring and
>> multiplication (in the case of field exponentiation) or doubling and point
>> addition (in the case of ECDSA) are performed. I believe using a branchless
>> implementation where each phase of the operation executes the exact same
>> code and accesses the exact same stack frames would not be vulnerable to
>> FLUSH+RELOAD.
>>
>> I wouldn't be surprised.
>>
>>
>> ------------------------------------------------------------------------------
>> Subversion Kills Productivity. Get off Subversion & Make the Move to
>> Perforce.
>> With Perforce, you get hassle-free workflows. Merge that actually works.
>> Faster operations. Version large binaries.  Built-in WAN optimization and
>> the
>> freedom to use Git, Perforce or both. Make the move to Perforce.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Bitcoin-development mailing list
>> Bitcoin-development@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>>
>
>
>
> ------------------------------------------------------------------------------
> Subversion Kills Productivity. Get off Subversion & Make the Move to
> Perforce.
> With Perforce, you get hassle-free workflows. Merge that actually works.
> Faster operations. Version large binaries.  Built-in WAN optimization and
> the
> freedom to use Git, Perforce or both. Make the move to Perforce.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>

--047d7bdc0cf4cbf37904f41d628e
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">While there is no mention of virtualization in the side-ch=
annel article, the FLUSH+RELOAD paper [1] mentions virtualization and claim=
s the clflush instruction works not only towards processes on the same OS, =
but also against processes in a separate guest OS if executed on the host O=
S (type 2 hypervisor) [2]. It also works if executed from within another gu=
est OS (though that reduces the efficiency of the attack) [3].<br>
<br>Both the authors [4] and Vulnerability Note VU#976534 [5] claim disabli=
ng hypervisor memory page de-duplication prevents the attack. This could pe=
rhaps be a first step for bitcoin companies running their software on share=
d hosts; demand their allocated instances to be on hosts with this disabled=
. Question is how common it is for virtualization providers to offer that a=
s an option.<br>
<br>TRESOR is is only applicable if running in a non-virtualized OS [6].<br=
><br>While TRESOR only implements AES, it seems it could work for ECDSA as =
well, as they use the four x86 debug registers to fit a 256 bit privkey [7]=
 for the entire machine uptime, and then use other registers when doing the=
 actual AES ops. They use the Intel AES-NI instruction set though, and sinc=
e there is no corresponding instruction set for EC extra work would be requ=
ired to manually implement EC math in assembler.<br>
<br>They actually do what Mike Hearn mentioned and disable preemption in Li=
nux (their code runs in kernel space; they patched the kernel) to ensure at=
omicity. Not only do they manage to protect against memory attacks (and RAM=
/cache timing attacks) from other processes running on the same host, but e=
ven from root on the same host (from userland, the debug registers are only=
 accessible through ptrace, which they patched, and they also disabled LKM =
&amp; KMEM).<br>
<br>One could imagine different levels of TRESOR-like ECDSA with different =
tradeoffs of complexity vs security. For example, if one is fine with keepi=
ng the privkey(s) in RAM but want to avoid cache timing attacks, the signin=
g could be implemented as a userspace program holding key(s) in RAM togethe=
r with a kernel module providing a syscall for signing. Signing is then run=
 with preemption using only x86 registers for intermediate data and then us=
ing e.g. movntps [8] to write to RAM without data being cached. The benefit=
 of this compared with the full TRESOR approach is that it would not requir=
e a patched kernel, only a kernel module. It would also be simpler to imple=
ment compared to keeping the privkey in the debug registers for the entire =
machine uptime, especially if multiple privkeys are used. It would not prot=
ect against root though, since an adversary getting root could load their o=
wn kernel module and read the registers.<br>
<br>To handle multiple keys (maybe as one-time-use) and get full TRESOR ben=
efits, one could perhaps (with the original TRESOR approach, i.e. with patc=
hed kernel) store a BIP 0032 starting string / seed + counter in the debug =
registers and have the atomic kernel code generate new keys and do the sign=
ing.<br>
<br>Cheers,<br>Gustav Simonsson<br><br>1. <a href=3D"http://eprint.iacr.org=
/2013/448.pdf">http://eprint.iacr.org/2013/448.pdf</a><br>2. Page 1 of [1]<=
br>3. page 5 of [1]<br>4. page 8 (end of conclusions section) of [1]<br>5. =
<a href=3D"http://www.kb.cert.org/vuls/id/976534">http://www.kb.cert.org/vu=
ls/id/976534</a><br>
6. page 8, &quot;3.2 Hardware compatibility&quot;, <a href=3D"https://www.u=
senix.org/legacy/event/sec11/tech/full_papers/Muller.pdf">https://www.useni=
x.org/legacy/event/sec11/tech/full_papers/Muller.pdf</a><br>7. page 3, &quo=
t;2.2 Key Management&quot; of [6]<br>
8. page 1041 of <a href=3D"http://www.intel.com/content/dam/www/public/us/e=
n/documents/manuals/64-ia-32-architectures-software-developer-manual-325462=
.pdf">http://www.intel.com/content/dam/www/public/us/en/documents/manuals/6=
4-ia-32-architectures-software-developer-manual-325462.pdf</a><br>
<br></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On =
Thu, Mar 6, 2014 at 9:38 AM, Mike Hearn <span dir=3D"ltr">&lt;<a href=3D"ma=
ilto:mike@plan99.net" target=3D"_blank">mike@plan99.net</a>&gt;</span> wrot=
e:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I&#39;m wondering about whe=
ther (don&#39;t laugh) moving signing into the kernel and then using the MT=
RRs to disable caching entirely for a small scratch region of memory would =
also work. You could then disable pre-emption and prevent anything on the s=
ame core from interrupting or timing the signing operation.<div>

<br></div><div>However I suspect just making a hardened secp256k1 signer im=
plementation in userspace would be of similar difficulty, in which case it =
&nbsp;would naturally be preferable.</div></div><div class=3D"HOEnZb"><div =
class=3D"h5">
<div class=3D"gmail_extra"><br>
<br><div class=3D"gmail_quote">On Wed, Mar 5, 2014 at 11:25 PM, Gregory Max=
well <span dir=3D"ltr">&lt;<a href=3D"mailto:gmaxwell@gmail.com" target=3D"=
_blank">gmaxwell@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gm=
ail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le=
ft:1ex">

<div>On Wed, Mar 5, 2014 at 2:14 PM, Eric Lombrozo &lt;<a href=3D"mailto:el=
ombrozo@gmail.com" target=3D"_blank">elombrozo@gmail.com</a>&gt; wrote:<br>
&gt; Everything you say is true.<br>
&gt;<br>
&gt; However, branchless does reduce the attack surface considerably - if n=
othing else, it significantly ups the difficulty of an attack for a relativ=
ely low cost in program complexity, and that might still make it worth doin=
g.<br>


<br>
</div>Absolutely. I believe these things are worth doing.<br>
<br>
My comment on it being insufficient was only that &quot;my signer is<br>
branchless&quot; doesn&#39;t make other defense measures (avoiding reuse,<b=
r>
multsig with multiple devices, not sharing hardware, etc.)<br>
unimportant.<br>
<div><br>
&gt; As for uniform memory access, if we avoided any kind of heap allocatio=
n, wouldn&#39;t we avoid such issues?<br>
<br>
</div>No. At a minimum to hide a memory timing side-channel you must perfor=
m<br>
no data dependent loads (e.g. no operation where an offset into memory<br>
is calculated). A strategy for this is to always load the same values,<br>
but then mask out the ones you didn&#39;t intend to read... even that I&#39=
;d<br>
worry about on sufficiently advanced hardware, since I would very much<br>
not be surprised if the processor was able to determine that the load<br>
had no effect and eliminate it! :) )<br>
<br>
Maybe in practice if your data dependencies end up only picking around<br>
in the same cache-line it doesn&#39;t actually matter... but it&#39;s hard =
to<br>
be sure, and unclear when a future optimization in the rest of the<br>
system might leave it exposed again.<br>
<br>
(In particular, you can&#39;t generally write timing sign-channel immune<br=
>
code in C (or other high level language) because the compiler is<br>
freely permitted to optimize things in a way that break the property.<br>
... It may be _unlikely_ for it to do this, but its permitted&mdash; and<br=
>
will actually do so in some cases&mdash;, so you cannot be completely sure<=
br>
unless you check and freeze the toolchain)<br>
<div><br>
&gt; Anyhow, without having gone into the full details of this particular a=
ttack, it seems the main attack point is differences in how squaring and mu=
ltiplication (in the case of field exponentiation) or doubling and point ad=
dition (in the case of ECDSA) are performed. I believe using a branchless i=
mplementation where each phase of the operation executes the exact same cod=
e and accesses the exact same stack frames would not be vulnerable to FLUSH=
+RELOAD.<br>


<br>
</div>I wouldn&#39;t be surprised.<br>
<div><div><br>
---------------------------------------------------------------------------=
---<br>
Subversion Kills Productivity. Get off Subversion &amp; Make the Move to Pe=
rforce.<br>
With Perforce, you get hassle-free workflows. Merge that actually works.<br=
>
Faster operations. Version large binaries. &nbsp;Built-in WAN optimization =
and the<br>
freedom to use Git, Perforce or both. Make the move to Perforce.<br>
<a href=3D"http://pubads.g.doubleclick.net/gampad/clk?id=3D122218951&amp;iu=
=3D/4140/ostg.clktrk" target=3D"_blank">http://pubads.g.doubleclick.net/gam=
pad/clk?id=3D122218951&amp;iu=3D/4140/ostg.clktrk</a><br>
_______________________________________________<br>
Bitcoin-development mailing list<br>
<a href=3D"mailto:Bitcoin-development@lists.sourceforge.net" target=3D"_bla=
nk">Bitcoin-development@lists.sourceforge.net</a><br>
<a href=3D"https://lists.sourceforge.net/lists/listinfo/bitcoin-development=
" target=3D"_blank">https://lists.sourceforge.net/lists/listinfo/bitcoin-de=
velopment</a><br>
</div></div></blockquote></div><br></div>
</div></div><br>-----------------------------------------------------------=
-------------------<br>
Subversion Kills Productivity. Get off Subversion &amp; Make the Move to Pe=
rforce.<br>
With Perforce, you get hassle-free workflows. Merge that actually works.<br=
>
Faster operations. Version large binaries. &nbsp;Built-in WAN optimization =
and the<br>
freedom to use Git, Perforce or both. Make the move to Perforce.<br>
<a href=3D"http://pubads.g.doubleclick.net/gampad/clk?id=3D122218951&amp;iu=
=3D/4140/ostg.clktrk" target=3D"_blank">http://pubads.g.doubleclick.net/gam=
pad/clk?id=3D122218951&amp;iu=3D/4140/ostg.clktrk</a><br>__________________=
_____________________________<br>

Bitcoin-development mailing list<br>
<a href=3D"mailto:Bitcoin-development@lists.sourceforge.net">Bitcoin-develo=
pment@lists.sourceforge.net</a><br>
<a href=3D"https://lists.sourceforge.net/lists/listinfo/bitcoin-development=
" target=3D"_blank">https://lists.sourceforge.net/lists/listinfo/bitcoin-de=
velopment</a><br>
<br></blockquote></div><br></div>

--047d7bdc0cf4cbf37904f41d628e--