Tuesday, September 14, 2010

What is the probability that a pronounceable domain is registered?

I saw someone griping on Hacker News that all the short, pronounceable domains are taken.

I realized that I knew how to quantify this frustration. One, how many pronounceable domains exist for a given number of characters? Two, given a finite answer to One, what is the probability that a domain in this space is registered?

I recently made a tool to find pronounceable domains, partially using Markov techniques. I started with a published frequency table of letter trigrams in English and simply brute-forced my way to find every possible overlapping combination of trigrams that would fit in a given length of characters. I assumed that a domain had to start with a valid starting trigram but could end at the length limit with any trigram. This process produces a set of some real but mostly nonsense words that should be pronounceable to English speakers. All results are limited to dot-com domains.

The results:

  • For length = 5, there are 265,722 total possible pronounceable domains
  • For length = 6, there are 1,702,669 total possible pronounceable domains
  • For length = 7, there are 10,843,465 total possible pronounceable domains

Next I randomly sampled from these spaces to figure out the percentage of the possible dot-com domains for each length that are already registered. The results:

  • For length = 5, of 481 sampled from 265,722 possible names, 92.52% were registered
  • For length = 6, of 941 sampled from 1,702,669 possible names, 45.48% were registered
  • For length = 7, of 1906 sampled from 10,843,465 possible names, 12.8% were registered

So 92.5% of pronounceable 5 letter dot-com domains are registered, 45.48% were registered for 6 letter domains, and only 12.8% of seven letter domains were taken. (By the way, I've never been able to find an alphabetical 4 letter domain that is not already registered.) Eyeballing the results it did appear that the registered domains did tend to have a better "ring" to them than those that were not, and seven-letter nonsense words often had no ring to them at all, but automatically assessing domain quality was not attempted in this analysis.

This should be a number on our frustrations, and maybe a year from now I can run the same script and see how the numbers have changed.

The area of unregistered pronounceable 5 letter domains, representing 19,876 names, is barely visible. This is plotted according to the absolute number of domains on a linear scale. The next chart shows the percentage available per length:

To give you an idea of what the output of this process looks like, I reproduced some output for each character length below.

len = 5len = 6len = 7

ANTON.com registered
ANTOP.com registered
ANCLE.com registered
ANCRA.com registered
ANCOM.com registered
ANGLA.com registered
ANGRO.com registered
ANSED.com registered
ANSEC.com registered
ANNER.com registered
ANORT.com registered
ANORD.com registered
ANORR.com available
ANOME.com registered
ANONA.com registered
ANONO.com registered
ANEDI.com registered
ANEST.com registered
ANEEN.com registered
ANECA.com registered
ANETI.com registered

ANDENO.com available
ANDEMA.com registered
ANDITI.com available
ANDARN.com available
ANTERR.com available
ANTESS.com available
ANCESI.com available
ANCERO.com registered
ANCLEM.com available
ANGLEG.com registered
ANSTON.com registered
ANSENI.com available
ANICER.com registered
ANNICE.com registered
ANNOWI.com available
ANORNE.com available
ANONES.com available
ANECON.com registered
ANETAC.com registered
ANELOC.com available
ANELDE.com available

ANDEAST.com available
ANTORRI.com available
ANTOWEA.com available
ANTARRE.com available
ANTABIT.com available
ANTHARI.com available
ANTHIRE.com registered
ANTUTHE.com available
ANCEDEP.com available
ANCHATH.com available
ANCHITA.com available
ANCHREL.com available
ANCLEAK.com available
ANCRIME.com available
ANCROAT.com available
ANGERRO.com available
ANGEDIT.com available
ANGICIE.com available
ANGRANO.com available
ANSITUT.com available
ANSIMPE.com available