Received: from sog-mx-4.v43.ch3.sourceforge.com ([172.29.43.194]
	helo=mx.sourceforge.net)
	by sfs-ml-3.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
	(envelope-from <jan.marecek@gmail.com>) id 1VXJpS-0005AZ-Ls
	for bitcoin-development@lists.sourceforge.net;
	Fri, 18 Oct 2013 23:51:38 +0000
Received-SPF: pass (sog-mx-4.v43.ch3.sourceforge.com: domain of gmail.com
	designates 209.85.192.175 as permitted sender)
	client-ip=209.85.192.175; envelope-from=jan.marecek@gmail.com;
	helo=mail-pd0-f175.google.com; 
Received: from mail-pd0-f175.google.com ([209.85.192.175])
	by sog-mx-4.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128)
	(Exim 4.76) id 1VXJpR-0004Jq-Qy
	for bitcoin-development@lists.sourceforge.net;
	Fri, 18 Oct 2013 23:51:38 +0000
Received: by mail-pd0-f175.google.com with SMTP id g10so4337808pdj.20
	for <bitcoin-development@lists.sourceforge.net>;
	Fri, 18 Oct 2013 16:51:32 -0700 (PDT)
X-Received: by 10.66.152.102 with SMTP id ux6mr5761148pab.79.1382140291997;
	Fri, 18 Oct 2013 16:51:31 -0700 (PDT)
Received: from myhost (ppp121-45-222-119.lns20.cbr1.internode.on.net.
	[121.45.222.119])
	by mx.google.com with ESMTPSA id ta10sm7293382pab.5.2013.10.18.16.51.29
	for <bitcoin-development@lists.sourceforge.net>
	(version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Fri, 18 Oct 2013 16:51:31 -0700 (PDT)
From: jan <jan.marecek@gmail.com>
To: bitcoin-development@lists.sourceforge.net
Date: Sat, 19 Oct 2013 10:52:58 +1100
Message-ID: <87iowuuof9.fsf@gmail.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -1.6 (-)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
	See http://spamassassin.org/tag/ for more details.
	-1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for
	sender-domain
	0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
	(jan.marecek[at]gmail.com)
	-0.0 SPF_PASS               SPF: sender matches SPF record
	-0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
	author's domain
	0.1 DKIM_SIGNED            Message has a DKIM or DK signature,
	not necessarily valid
	-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
	-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/,
	no trust [209.85.192.175 listed in list.dnswl.org]
	0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked.
	See
	http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
	for more information. [URIs: github.com]
X-Headers-End: 1VXJpR-0004Jq-Qy
Subject: [Bitcoin-development] BIP39 word list
X-BeenThere: bitcoin-development@lists.sourceforge.net
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <bitcoin-development.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
X-List-Received-Date: Fri, 18 Oct 2013 23:51:38 -0000


The words 'public', 'private' and 'secret' could be confusing when
encoding public and private keys. eg. a private key that begins with
the word 'public'.

I think avoiding words that could look similar when written down would
be a good idea aswell. I searched for words that only differ by the
letters c & e, g & y, u & v and found the following:

car ear
cat eat
gear year
value valve

Other combinations could potentially be problematic depending on the
handwriting style: ft, ao, ij, vy, possibly even lt and il?

I've included the search utility I used below.


#include <stdbool.h>
#include <string.h>
#include <stdio.h>

char *similar_char_pairs[] = { "ce", "gy", "uv", NULL };

bool is_similar_char(char c1, char c2)
{
  char **pairs = similar_char_pairs;
  do {
    char *p = *pairs;
    if ((c1 == p[0] && c2 == p[1]) ||
        (c1 == p[1] && c2 == p[0]))
      return true;
  } while (*++pairs);

  return false;
}

bool print_words_if_similar(char *word1, char *word2)
{
  /* reject words of different lengths */
  if (strlen(word1) != strlen(word2))
    return false;

  size_t i, similarcount = 0;
  
  for (i = 0; i < strlen(word1); i++) {
    /* skip identical letters */
    if (word1[i] == word2[i])
      continue;

    /* reject words that don't match */
    if (is_similar_char(word1[i], word2[i]) == false)
      return false;

    similarcount++;
  }

  /* reject words with more than 1 different letter */
  //if (similarcount > 1)
  //  return false;

  printf("%s %s\n", word1, word2);

  return true;
}

int main(void)
{
  /* english.txt is assumed to exist in the working directory
     download from:
     https://github.com/trezor/python-mnemonic/blob/master/mnemonic/wordlist/english.txt */
  FILE* f = fopen("english.txt", "r");
  if (!f) {
    fprintf(stderr, "failed to open english.txt\n");
    return 1;
  }

  /* read in word list, assumes one word per line */
  #define MAXWORD 16
  char wordlist[2048][MAXWORD];
  int word = 0;
  while (fgets(wordlist[word], MAXWORD, f)) {
    /* strip trailing whitespace, assumes no leading whitespace */
    char *ch = strpbrk(wordlist[word], " \n\t");
    if (ch)
      *ch = '\0';
    word++;
  }

  if (word != 2048) {
    fprintf(stderr, "word list incorrect length\n");
    return 1;
  }

  /* check each word for similarity against every other word */
  int i, j, count = 0;
  for (i = 0; i < 2048; i++) {
    for (j = i+1; j < 2048; j++) {
      if (print_words_if_similar(wordlist[i], wordlist[j]))
        count++;
    }
  }

  printf("%d matches\n", count);
  
  return 0;
}