Received: from sog-mx-2.v43.ch3.sourceforge.com ([172.29.43.192] helo=mx.sourceforge.net) by sfs-ml-2.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1VcOLp-0007Wq-ID for bitcoin-development@lists.sourceforge.net; Fri, 01 Nov 2013 23:42:01 +0000 Received-SPF: pass (sog-mx-2.v43.ch3.sourceforge.com: domain of gmail.com designates 209.85.212.171 as permitted sender) client-ip=209.85.212.171; envelope-from=allen.piscitello@gmail.com; helo=mail-wi0-f171.google.com; Received: from mail-wi0-f171.google.com ([209.85.212.171]) by sog-mx-2.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) id 1VcOLo-0006fV-5j for bitcoin-development@lists.sourceforge.net; Fri, 01 Nov 2013 23:42:01 +0000 Received: by mail-wi0-f171.google.com with SMTP id f4so1739953wiw.16 for ; Fri, 01 Nov 2013 16:41:54 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.180.87.69 with SMTP id v5mr3959294wiz.45.1383349313919; Fri, 01 Nov 2013 16:41:53 -0700 (PDT) Received: by 10.194.85.112 with HTTP; Fri, 1 Nov 2013 16:41:53 -0700 (PDT) In-Reply-To: References: Date: Fri, 1 Nov 2013 18:41:53 -0500 Message-ID: From: Allen Piscitello To: Brooks Boyd Content-Type: multipart/alternative; boundary=f46d044402a274e2de04ea261c61 X-Spam-Score: -0.6 (/) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: doubleclick.net] -1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for sender-domain 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (allen.piscitello[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 1.0 HTML_MESSAGE BODY: HTML included in message -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature X-Headers-End: 1VcOLo-0006fV-5j Cc: Bitcoin Development Subject: Re: [Bitcoin-development] BIP39 word list X-BeenThere: bitcoin-development@lists.sourceforge.net X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Nov 2013 23:42:01 -0000 --f46d044402a274e2de04ea261c61 Content-Type: text/plain; charset=ISO-8859-1 The problem with this is that you might have word A which is similar to B, but B is also similar to C. So we scrub B from the list, someone enters B, and we have no way to know if it means A or C. It leads to a much more complicated scheme to ensure that all errors are correctable. Scrubbing A, B, and C is preferable, since it leads to no ambiguity and there is no need to try to correct an error. On Fri, Nov 1, 2013 at 3:14 PM, Brooks Boyd wrote: > I was inspired to join the mailing list to comment on some of these > discussions about BIP39, which I think will have great use in the Bitcoin > community and outside it as a way to transcribe binary data. > > The one thought I had as the discussions about similar characters are > resulting in culling words from the list, is that it only helps to validate > input, not help the user if it is incorrect. > > For example, if both "cat" and "eat" were in the word list, and someone > wrote down "eat", but later mis-translated it and put "cat" back into > translator, the result would be a checksum error; "cat" is a different > number, so the checksum would fail. > > As it currently stands, "cat" would not be a valid word ("eat" is the real > word, and no other number is "cat"), so the translator can throw a > different error which is more helpful (i.e. "'cat' isn't a valid word > choice), but still doesn't get the user to the proper translation. > > What about if the wordlist included those "words that are so similar to > each other that we only kept one of them" and had them all refer to the > same number? I propose the wordlist have the possibility of multiple words > on a single line, with the first word on the line being the "primary" or > "real" word to be used, with the other similar words be included so that a > translation program if it wanted to assist the user could fix their input > for them (verbosely or not), along the lines of "'cat' isn't a valid word > choice; assuming you meant 'eat', which is valid". You might still hit a > checksum error if that similar word is still the wrong word, but as it > stands now, I know you culled a bunch of words from the wordlist as "too > similar", but if I want to try and help the user fix a bad input, I need to > write a translation program with a full english dictionary alongside the > BIP39 dictionary. > > I'd be willing to create a pull request for such an update, but before I > delve into that, does this sound like a good idea? I could see it devolving > into a slippery slope if every number in the 2048 set had a dozen word > variations (misspellings, similar words, slang terms for the real word, > etc.) which could get confusing of how similar is similar enough to be > added as an alternate, and the standard would need to be clear that when > translating binary to words, you only use the "main" word for that row, not > any of the variations. > > MidnightLightning > > > > I've just pushed updated wordlist which is filtered to similar > characters taken from this matrix. > > BIP39 now consider following character pairs as similar: > > similar = ( > > ('a', 'c'), ('a', 'e'), ('a', 'o'), > > ('b', 'd'), ('b', 'h'), ('b', 'p'), ('b', 'q'), ('b', 'r'), > > ('c', 'e'), ('c', 'g'), ('c', 'n'), ('c', 'o'), ('c', 'q'), > ('c', 'u'), > > ('d', 'g'), ('d', 'h'), ('d', 'o'), ('d', 'p'), ('d', 'q'), > > ('e', 'f'), ('e', 'o'), > > ('f', 'i'), ('f', 'j'), ('f', 'l'), ('f', 'p'), ('f', 't'), > > ('g', 'j'), ('g', 'o'), ('g', 'p'), ('g', 'q'), ('g', 'y'), > > ('h', 'k'), ('h', 'l'), ('h', 'm'), ('h', 'n'), ('h', 'r'), > > ('i', 'j'), ('i', 'l'), ('i', 't'), ('i', 'y'), > > ('j', 'l'), ('j', 'p'), ('j', 'q'), ('j', 'y'), > > ('k', 'x'), > > ('l', 't'), > > ('m', 'n'), ('m', 'w'), > > ('n', 'u'), ('n', 'z'), > > ('o', 'p'), ('o', 'q'), ('o', 'u'), ('o', 'v'), > > ('p', 'q'), ('p', 'r'), > > ('q', 'y'), > > ('s', 'z'), > > ('u', 'v'), ('u', 'w'), ('u', 'y'), > > ('v', 'w'), ('v', 'y') > > ) > > Feel free to review and comment current wordlist, but I think we're > slowly moving forward final list. > > slush > > > ------------------------------------------------------------------------------ > Android is increasing in popularity, but the open development platform that > developers love is also attractive to malware creators. Download this white > paper to learn more about secure code signing practices that can help keep > Android apps secure. > http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk > _______________________________________________ > Bitcoin-development mailing list > Bitcoin-development@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bitcoin-development > > --f46d044402a274e2de04ea261c61 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
The problem with this is that you might have word A which = is similar to B, but B is also similar to C. =A0So we scrub B from the list= , someone enters B, and we have no way to know if it means A or C. =A0It le= ads to a much more complicated scheme to ensure that all errors are correct= able.

Scrubbing A, B, and C is preferable, since it leads to no am= biguity and there is no need to try to correct an error.


On Fri, Nov 1, 2013 = at 3:14 PM, Brooks Boyd <boydb@midnightdesign.ws> wrot= e:
I was inspired to join the = mailing list to comment on some of these discussions about BIP39, which I t= hink will have great use in the Bitcoin community and outside it as a way t= o transcribe binary data.

The one thought I had as the discussions about similar characters are resul= ting in culling words from the list, is that it only helps to validate inpu= t, not help the user if it is incorrect.

For example, if both "= cat" and "eat" were in the word list, and someone wrote down= "eat", but later mis-translated it and put "cat" back = into translator, the result would be a checksum error; "cat" is a= different number, so the checksum would fail.

As it currently stands, "cat" would not be a valid word (&quo= t;eat" is the real word, and no other number is "cat"), so t= he translator can throw a different error which is more helpful (i.e. "= ;'cat' isn't a valid word choice), but still doesn't get th= e user to the proper translation.

What about if the wordlist included those "words that are so simil= ar to each other that we only kept one of them" and had them all refer= to the same number? I propose the wordlist have the possibility of multipl= e words on a single line, with the first word on the line being the "p= rimary" or "real" word to be used, with the other similar wo= rds be included so that a translation program if it wanted to assist the us= er could fix their input for them (verbosely or not), along the lines of &q= uot;'cat' isn't a valid word choice; assuming you meant 'ea= t', which is valid". You might still hit a checksum error if that = similar word is still the wrong word, but as it stands now, I know you cull= ed a bunch of words from the wordlist as "too similar", but if I = want to try and help the user fix a bad input, I need to write a translatio= n program with a full english dictionary alongside the BIP39 dictionary.
I'd be willing to create a pull request for such an update, but bef= ore I delve into that, does this sound like a good idea? I could see it dev= olving into a slippery slope if every number in the 2048 set had a dozen wo= rd variations (misspellings, similar words, slang terms for the real word, = etc.) which could get confusing of how similar is similar enough to be adde= d as an alternate, and the standard would need to be clear that when transl= ating binary to words, you only use the "main" word for that row,= not any of the variations.

MidnightLightning

=A0
> I've just pushed updated wordl= ist which is filtered to similar characters taken from this matrix.
>= BIP39 now consider following character pairs as similar:
> =A0 =A0 = =A0 =A0 similar =3D (
> =A0 =A0 =A0 =A0 =A0 =A0 ('a', 'c'), ('a', '= ;e'), ('a', 'o'),
> =A0 =A0 =A0 =A0 =A0 =A0 ('= ;b', 'd'), ('b', 'h'), ('b', 'p'= ;), ('b', 'q'), ('b', 'r'),
> =A0 =A0 =A0 =A0 =A0 =A0 ('c', 'e'), ('c', '= ;g'), ('c', 'n'), ('c', 'o'), ('c&#= 39;, 'q'), ('c', 'u'),
> =A0 =A0 =A0 =A0 =A0 = =A0 ('d', 'g'), ('d', 'h'), ('d', &= #39;o'), ('d', 'p'), ('d', 'q'),
> =A0 =A0 =A0 =A0 =A0 =A0 ('e', 'f'), ('e', '= ;o'),
> =A0 =A0 =A0 =A0 =A0 =A0 ('f', 'i'), ('= ;f', 'j'), ('f', 'l'), ('f', 'p'= ;), ('f', 't'),
> =A0 =A0 =A0 =A0 =A0 =A0 ('g', 'j'), ('g', '= ;o'), ('g', 'p'), ('g', 'q'), ('g&#= 39;, 'y'),
> =A0 =A0 =A0 =A0 =A0 =A0 ('h', 'k'= ;), ('h', 'l'), ('h', 'm'), ('h', &= #39;n'), ('h', 'r'),
> =A0 =A0 =A0 =A0 =A0 =A0 ('i', 'j'), ('i', '= ;l'), ('i', 't'), ('i', 'y'),
> = =A0 =A0 =A0 =A0 =A0 =A0 ('j', 'l'), ('j', 'p= 9;), ('j', 'q'), ('j', 'y'),
> =A0 =A0 =A0 =A0 =A0 =A0 ('k', 'x'),
> =A0 =A0 = =A0 =A0 =A0 =A0 ('l', 't'),
> =A0 =A0 =A0 =A0 =A0 =A0= ('m', 'n'), ('m', 'w'),
> =A0 =A0 = =A0 =A0 =A0 =A0 ('n', 'u'), ('n', 'z'),
> =A0 =A0 =A0 =A0 =A0 =A0 ('o', 'p'), ('o', '= ;q'), ('o', 'u'), ('o', 'v'),
> = =A0 =A0 =A0 =A0 =A0 =A0 ('p', 'q'), ('p', 'r= 9;),
> =A0 =A0 =A0 =A0 =A0 =A0 ('q', 'y'),
> =A0 =A0 =A0 =A0 =A0 =A0 ('s', 'z'),
> =A0 =A0 = =A0 =A0 =A0 =A0 ('u', 'v'), ('u', 'w'), (&#= 39;u', 'y'),
> =A0 =A0 =A0 =A0 =A0 =A0 ('v', '= ;w'), ('v', 'y')
> =A0 =A0 =A0 =A0 )
> Feel free to review and comment current word= list, but I think we're slowly moving forward final list.
> slush=

-----------------------------------------------------------------------= -------
Android is increasing in popularity, but the open development platform that=
developers love is also attractive to malware creators. Download this white=
paper to learn more about secure code signing practices that can help keep<= br> Android apps secure.
http://pubads.g.doubleclick.net/gam= pad/clk?id=3D65839951&iu=3D/4140/ostg.clktrk
___________________= ____________________________
Bitcoin-development mailing list
Bitcoin-develo= pment@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-de= velopment


--f46d044402a274e2de04ea261c61--