Thursday, September 16, 2010

Get that 4 letter domain you've always wanted: include one digit

In our last experiment we attempted attempted to estimate the proportion of pronounceable 5, 6 and 7-letter domains that are already registered.

But what about 4 letter domains?

I wrote a series of random domain generators to test different character distributions of 4 letter domains. (For all experiments n=500 and p=~0.0, chi-square test; all domains are dot-com.)

First, I assumed that pronounceability would not be a factor, and generated 500 domains of consisting of 4 random letters. The results were what I expected:

  • random a-z : 100%

All 500 random domains that I generated were registered. I let it go for a half-hour or so and generated thousands of random domains and was not able to find one that was unregistered.

So, all 4 letter .com's are registered then?

Of course not. The secret? Digits!

Domains can also contain characters in the range 0-9. So, I tested a second domain generator that would produce random 4 character domains consisting of one digit in any position and 3 other characters that could be digits or letters chosen randomly from the set [a-z, 0-9]. 

The result:

  • random a-z, 0-9 with at least one digit: 22.4%

only 22.4% were registered! So, if you want a 4 letter domain, use a digit.

This got me thinking. Any of the domains I generated could have from 1-4 digits. What if I controlled the number of digits?

  • 1 digit, 3 letters: 16.2%
  • 2 digits, 2 letters: 24.6%
  • 3 digits, 1 letter: 30.6%
  • 4 digits: 100%

The results surprised me. Apparently the optimal number of digits to include is 1, and the more digits you have the more likely it is to be registered. In fact, it appears that someone has registered every dot-com combination of 4 digits.

The story is not over yet, though: what if you include a hyphen? I tried several experiments with 3 letters or digits plus a hyphen to find out.

  • 3 random letters a-z + hyphen: 60.6%
  • 3 random characters a-z, 0-9 + hypen: 17.4%
  • 3 random digits 0-9 + hypen: 48%

Including a hyphen does not beat the 1 digit + 3 letter domain space but comes close. Interestingly domains in the digits + hypen set were dramatically more likely to be registered than the set of digits + characters + hyphen.

So, contrary to what you might believe, it turns out there are plenty of available 4 letter domains.

There are 456,976 possible 4 letter combinations of the letters a-z (26^4), and there are 703,040 possible combinations of 3 letters a-z plus one digit 0-9 ((10 * 26^3) * 4). Assuming the 16.2% proportion is safe to extrapolate on, there should be 589,147 unregistered one digit + 3 letter domains, more than the total number of possible 4 letter a-z domains. Popular wisdom suggests that attaining a 4 letter dot-com is nearly impossible. These results suggest that is not the case, if you're willing to include a digit.

Finally, I attempted one last experiment: sample the complete possible space for domains, letters, digits and hyphens chosen randomly 1 out of 36 or 1 out of 37 for each character (hyphens cannot lead or end a domain). Results:
  • any legal combination of letters, digits and hyphens: 43.0%

Out of the 1,774,224 possible 4 letter dot-com domains (36^2 * 37^2), actually less than half are registered.





update: Well, this post has generated significant interest. Check out my domain name generator, and here are some useful registrar coupon codes: for godaddy use code FALL99, and namecheap use BACK2REALITY. Let me know if they work.