Monday, July 25, 2011

Google ranking algorithm no longer secret

Or, if you prefer, Black boxes have hair

A "black box" is something that is known only by its inputs and its outputs, in principle nothing is known about what is actually inside the box. It's reasonably well known that if you allow someone to analyse a sufficiently large set of inputs and outputs of the box then they can analyse it. This is the principle behind the belief held by most cryptographers that there is little point keeping the algorithms secret and now this has happened to Google's search ranking algorithm.

Researchers took a limited number of ranking criteria and a set of search results and fed them to a machine learning algorithm. After a bit of churning the machine learning algorithm spat out a formula that gives a reasonably close match to Google's actual ranking. They only used 17 factors while Google actually uses over 200, but they have proved the point that this type of reverse engineering can reveal what's going on inside the black box that is the Googleplex. More Here

 I can see others taking their work forward and doing the analysis on many more factors. Maybe they will publish, maybe they won't but either way the genie's out of the bottle.

It will be interesting to see what Google's reaction is.

Thanks to Leo Kobes for the pointer.

