Friday, February 12, 2010

Chocolate eggs threated by witches' broom

With Valentine's day this weekend and Easter just around the corner, I Ggoogled a bit and found out that New Scientist reported a year ago that supplies were threatened by a fungus named witches' broom and a virus called CSSV (cacao swollen shoot virus). Cacao is the plant that grows the beans that we grind, ferment and roast to make cocoa ... the raw stuff of chocolate.

Thursday, February 11, 2010

Site Cloning Experiment

A few months back I had a few of my domains developed into mini-sites, but for various reasons I didn't switch on Adsense monitoring until a few days back so I didn't actually understand how well or poorly they were doing – silly mistake that I always advise other people against.
Because there's some related keyword pairs I might go after, I've decided not to publicly say exactly which domain name I'm talking about.
While doing some research last night I noticed that one of my better performing minisites that has it's name formed ( was receiving a lot of hits for the search term Xxx Zzz. It was on the front page of Google for this search but near the bottom.
For an experiment I decided to see what would happen if I cloned it to for a perfect match to the keyword. Already I'm seeing some things that surprised me.

Day 1


Here's my log and results (All times NZDT):
9-Feb-10 22:20
Registered the domain name
9-Feb-10 22:20-23:00
  • Cloned the site to on my server
  • changing all occurrences of XxxYyy to Xxx and fixing a couple of grammatical errors.
  • Added links to the new site on 3 of my sites.
  • Registered it with Google Analytics and placed the link code in the pages.
9-Feb-10 23:09
.nz zone file reloaded, DNS resolving and site live. Visited the site and checked it out. The Adsense Mediabot spider visited all the pages I visited.
10-Feb-10 07:28
Googlebot visited the site retrieving robots.txt, and / (Index.php)
10-Feb-10 09:06
Googlebot returned retrieving all the pages linked from index.php.
10-Feb-10 14:08
Yahoo! Slurp retrieved robots.txt, /, and /default.css
10-Feb-10 18:24
Googlebot returned and re-read / (NB: Did not read robots.txt)
11-Feb-10 00:01
Waking up to having missed something I created a Google Alert for Xxx Zzz

Analysis of day 1.

  • Mediabot is practically real-time (expected).
  • Googlebot found the site fairly quickly. Not sure if this was because of Analytics, Mediabot, or from scanning my other sites. Having read the first page it came back and read the rest fairly promptly.
  • Yahoo! Slurp is none too shabby on the initial read either. It didn't have the extra aid I gave Google, yet found a brand new domain in only 15 hours.
  • Either Slurp didn't like what it found, or is a lot slower than Googlebot in reading the other new pages. I find it a mystery that it gave priority to reading the css file over reading the other php files though.
Based on experience with adding pages to Tessa's Sweet Expectations site where it took 8 or 9 days for Google to notice an added page to an existing site I'm not expecting anything for a few days. I'll check periodically and update this blog with news of how it progresses.

Day 2

Very Quiet day. The only activity on the site was a return visit by Googlebot at 01:45 which just requested robots.txt - a 404. No sign of Yahoo Slurp or Microsoft's Bing spider.

Day 3

A couple more spiders visited: ScoutJet and Yandex, with a return visit by GoogleBot. In each case they fetched robots.txt and the homepage. Still no sign of Yahoo Slurp or Microsoft's Bing spider.

Day 4

A ScoutJet returned and spidered the site. So far Google and Scoutjet have both spidered the site, Scoutjet's directory isn't publicly usable yet, and Google isn't showing any pages for the site, yet.

Day 5

Yandex returned and read the robots.txt (I'm going to stop reporting minor search engines now, they've obviously found the site). Google re-read the index page and robots.txt only. A ScoutJet returned and spidered the site. So far Google and Scoutjet have both spidered the site, Spandex's directory isn't publicly usable yet, and Google isn't showing any pages for the site, yet.

Google sent its first real traffic today (1 visitor), and it was on the search I expected, I need to wait a few more days to check it wasn't just an aberration before declaring success, but for now at least it's looking hopeful.

Interestingly enough, this is only day 5 on a newly registered site, nearly twice as fast as day 9 on an existing site as in the Sweet Expectations Valentine's day page.

Day 6

Google revisited and re-spidered the site. They also sent another visitor and I received a notification from Google Alerts that the site exists. The visitor sent by Google was on a bit of a long tail keyword phrase that it matched up against the privacy page or all places, still the web browser went to a couple of other pages in the site; not ideal, but better than nothing.

Day 7

Quiet day, Yahoo! Slurp visited today and spidered the site. No sign of Google or any traffic from them but this doesn't matter as they have obviously decided to accept the site. Interestingly enough, still no sign of Bing. Bing sends traffic to one of the sites where I posted a link to the new site so it's either not interested in this site or has just given it a low priority for its spider.

As I've partially achieved my objective, and want to watch things for a few weeks I'm going to shut down this log now, with just occasional updates as significant changes occur.


Google was very fast off the mark indexing the site, and added it into the index faster than I expected. Slurp indexed the front page pretty fast but was quite a sluggard indexing the rest of the site. Bing has been noticeable by its absence, and I'm wondering if they are sharing their spidering with Slurp already.

Google has also sent traffic a lot sooner than I expected. Given that it's a brand new site (The one I cloned had been registered since 2006 and been in its current form as as a mini-site for 3 months) and there are almost no incoming links I'm surprised it got out of the sandbox at all. The only question in my mind now is will I get enough traffic to justify the registration fee?