For most of the time I’ve been running this blog it’s been gaining visibility in search engines. Most posts will pick up a few searchers in the long tail and as more posts are added the search traffic for the blog has increased. That is until a month or so ago when I discovered what might be a problem with the way Google indexes WordPress blogs causing most posts to go into the supplemental index.
As I said this blog was gaining visibility in search results, mainly Google, and then all of a sudden nearly all that search traffic stopped. At first I thought little about it as search results can vary from day to day, which can be especially true for keyphrases in the long tail. It was also possible my posts simply weren’t ranking as well as they had been. Not something I wanted to see, but certainly possible. But as the days went by and it was happening to every post I figured there was a sitewide or blogwide problem.
I’m not particularly aggressive with my optimization and near as I could tell I hadn’t done anything I would think would cause problems. Eventually as I dug further I saw most every post here had gone supplemental. When I did a site: search for the site the results went supplemental after about the first 20 listings. I still didn’t do too much, since I don’t think it’s good to overreact to changes with search. Sometimes things change and need to settle a bit before you can really know what’s the best thing to do. I did continue to investigate.
The Solution To My Supplemental Problems
Late last week I discovered two threads on the WebmasterWorld forum that seemed to deal with the issue. The first WordPress and dup content issues talks about the potential for having the same posted listed under more than one URL. This issue may occur if you post full feeds on your index page and consequently also have that post show up on it’s own page or specific category pages. All that content might be seen as duplicate by Googlebot and cause many pages to go supplemental and others to completely drop from Google’s radar.
The solutions are generally one of two options. One being to use a little php and WordPress magic to tell Googlebot not to index certain URLs via dynamically generate noindex meta tags. The other solution is to use 301 redirects in your .htaccess file to get things in order. I’ll let you read through the post for more details of how to achieve both as there are several different methods offered in the thread.
This didn’t seem like the solution for me since my posts other than the most recent are all published as partial feed. While there is some overlap of content it didn’t seem likely it would be enough overlap to be causing duplicate content issues.
The other thread I found, Google indexing /feed URLs discusses how Googlebot may index the feed to posts instead of the posts themselves and place the feed in the supplemental index. This is the problem I’ve been seeing as the pages listed as supplemental for this blog all seemed to end in something like /feed/.
The solution offered in the thread was a simple one. Using my robots.txt I disallowed Googlebot from URLs ending in /feed/, /feed/rss/, and /trackback/ since I had seen each listed as a supplemental result. Because Googlebot does respond to wild card characters in the robots.txt file, it was trivial to add.
The asterisk matches anything and the dollar sign matches the end of the url. So each of the above lines tells Googlebot not to index urls ending in either /feed/, /feed/rss/, or /trackback/.
I added those lines just a few days ago and immediately have begun to see search traffic from Google coming back to the blog. That same site: search is now showing supplemental results starting at the 38th URL and more of the actual posts are now being listed in the regular index. I would imagine before too long the rest of the supplemental results will be gone.
Other Possible Reasons For The Recent Improvement
I have made a few other changes to the blog recently so there might be other reasons why things are returning to where they were. If you remember I created a blog sitemap and also experimented with reducing the click distances to some posts by adding them to the main navigation for the blog. The sitemap too should reduce the click distance for all the posts here.
While it’s possible either could be the reason for the reduction in supplemental results I’m inclined to believe it was the addition to the robots.txt file that’s remedying things. In part because of the timing, but mostly because I had been seeing the feeds indexed as supplemental.
If you are running a WordPress blog you should read the two threads I linked to above. It was about six months after starting this blog before the issue occurred so while it may not be happening to you at the moment it might at some point. Quite honestly this seems to me a problem with Googlebot as most blog owners could likely have the problem and it seems to happen with some pretty standard WordPress settings. But Googlebot’s fault or not there is a solution. Keep a watch on your own blog, WordPress or not, as I would think this issue doesn’t need to be specific to WordPress itself. More a problem in the way Googlebot is indexing feeds.
I’ll keep you updated on the progress with the posts here coming out of the supplemental index. Hopefully in the very near future all the supplemental results will be gone.