Archive for September, 2009
This is part 2 of a 3-part explanation of basic canonicalization in SEO. Part 1 defined canonicalization and provided examples of the SEO issues it can create. This post shows how to detect a problem. Part 3 will explain how to fix it.
Detecting canonicalization issues is easier (thank God) than defining it.
I usually apply one or all of the following 3 techniques:
1: Detecting Canonical Issues Using Google Webmaster Tools
This section assumes you've already got a Google Webmaster Tools account, and that your site's verified. If you don't, or it isn't, go do it. I don't care if you think Google is spying on you - they're doing that anyway. You may as well get the benefit of the toolset.
Here's how you check for canonicalization issues:
In Webmaster Tools, click 'Diagnostics', 'HTML Suggestions'. Then check for pages with duplicate title tags. If you see a list like this:

...Google's detecting duplicate title tags on your site. Assuming you've used unique title tags on your site, canonicalization is the most likely cause of these duplicates.
For each duplicate title tag, click the '+' sign. If you see two URLs that are really similar, I'll bet my hat you've got a canonicalization issue:

Regardless, click each page URL and view the pages. If the content matches, you're in canonicalization purgatory:

Record the URLs for every instance of canonical chaos (sorry, couldn't resist). Google Webmaster Tools makes this easy: You can just click the handy 'Download this table' link and get a CSV file.
2: Detecting Canonical Issues Using A Link Checker
You can also use a link checking tool and list page title tags. Again, look for duplicates and check for canonical confusion.
3: Using Search Results
If you used the same title tag on every page of your site, then title tag reviews won't help. Here's what to do:
- Go to a page of your site.
- Copy one phrase that you think is likely unique. So "Click here for more information" isn't a good candidate. "Curmudgeon's Web Copywriting Site Clinic on Tuesday" is a good one.
- Go to the search engine of your choice, and search for site:www.yoursite.com "your phrase".
- If you see something like this, you may have a problem:
This is a very poor method for detecting canonicalization issues, because it'll detect every single instance of duplication, throughout your site, regardless of the cause. It's also hit-and-miss, because search engines may actually drop some duplicates.
But, if you click 'repeat the search', you'll get a list of duplicate pages. Then you can sift through the list and check for problems.
Another option: Getting fancy
You can also build your own site crawler, if you're a hardcore geek. We did that at Portent years ago, and use it to automatically detect canonical problems.
If you don't live and breath PERL, PHP or Python, though, I don't recommend it. You'll end up like me.
Back to Part 1: Canonicalization defined
or
On to Part 3: How to: Fix canonicalization problems
Ever onward...
Tomorrow, I'll write about fixing canonicalization issues.
Recommended reading:
- SEO 101: Canonicalization (part 1) - part 1 of this series.
- How to: Fix canonicalization issues - part 3 of this series.
- 3 reasons to use rel=canonical, and 4 reasons not to.
- Google Webmaster Tools gets a new coat of paint: A video walkthru by yours truly
This is part one of a 3-part series on canonicalization. Why 3 parts? Because it ended up being too dang long. Parts 2 and 3 will go live tomorrow and Thursday, respectively.
It's a long word, I know, but canonicalization (not to be confused with 'canonization') is at the heart of SEO. Get it right and the world is your oyster. Get it wrong and you're slogging uphill forever.

Canonicalization defined
According to Wikipedia, canonicalization is "a process for converting data that has more than one possible representation into a 'standard' canonical representation".
Okaaaayyyy.
For our purposes, canonicalization means 'having one address and only one address for one page of my web site'.
BTW, it's spelled with one 'N'. There's no such thing as 'cannonical'.

An example
You're still puzzled. I can practically hear your eyebrows knitting together. So here's an example of a canonicalization problem:
I run a blog called Cocoa Heaven, all about chocolate. On it, let's say there's a page about dark chocolate bacon cupcakes (there really is). That page exists at:
OK, no problem. That address represents the location of my article.
But maybe I link to it from another page on my site at:
These are two different canonical addresses. We're representing the same information at two different virtual locations.
D'oh. That's a canonicalization problem. Even though you and I know perfectly well they're the same thing, search engines don't.
A search engine comes along, crawls the link from the home page to the non 'www' address, then crawls the 'www' link. It sees two unique web addresses with duplicate content.

The problem
Now the search engine has to decide which page to index. Search engines try to filter out duplicates. This is not a penalty - it's their effort to provide unique, relevant results.
This filtering will create 3 problems for your search engine optimization efforts:
- Diluted link authority. Say two bloggers visit the bacon cupcake article. One finds it at the non 'www' address, by clicking the home page article link. The other happens upon it from the post where I mistakenly used the 'www' address. They each link back to it, but they've used different canonical addresses. Since every link is a vote, your vote's been split. Instead of a single address having 2 votes, each canonical address has 1.
- The content flip. If both canonical addresses have the same authority, you may find they 'flip flop' in the search results, with one address showing up one day and the other showing up the next. I can't prove it but I strongly suspect this hurts your SEO efforts, as your content doesn't 'age'.
- The maintenance nightmare. Your marketing team dutifully interlinks blog posts on your site. But they use both versions of the address. 3 years later, you log in to move some pages around, and find you have to chase around to find both canonical addresses. Annoying.
In most cases, number 1 is the real crisis. If the canonicalization problem is minor (such as mixing 'www' and non-'www' addresses) then it's all about loss of link authority.
However, other forms of canonicalization issues can throw an entire site structure into flux, and make number 2 into the biggest problem. When that happens, large portions of your site may drop out of the index. I'm talking cats-and-dogs-living-together, find-your-rosary-beads kind of issues.
Types of SEO canonicalization problems
From most to least serious, here are the types of canonical issues I've run into:
- Session IDs. For whatever reason, your site tacks a unique session ID onto every page, so www.mysite.com becomes www.mysite.com?jsessionid=asdf230498q234. This is unique for every visit, so there are infinite canonical addresses for every page on the site. There's Trouble in River City.
- Inconsistent URLs. You have a dynamic site that generates URLs like www.mysite.com?catid=1&subcatid=234&prodid=33. Not a problem, SEO-wise. Unfortunately, you can reach the same product at www.mysite.com?prodid=33 and www.mysite.com?catid=1&prodid=33, and all 3 links are used at random. Not good.
- The blown rewrite. You've just set up a nice, clean URL structure on your site, so that you now have URLs like www.mysite.com/shoes/running. But many pages on the site still use www.mysite.com?catid=1&subcatid=234, and that doesn't redirect to the new, friendlier URL. Yikes.
- The tracking code. You use a special tracking code like www.mysite.com/?source=a0923 for every link on your site, so that you know what folks click. Or you use those codes for banners you place on ad networks. Either way, those links are now in the wild, and create huge canonical tangles. Get your conditioner out.
- Default page confusion. On your site, www.mysite.com/shoes/ and www.mysite.com/shoes/index.php go to the same page. Sadly, you and/or a few dozen partner web sites use these two links interchangeably. Another yikes, but easy to fix (learn how in Part 3).
- WWW mixups. I covered this one above. Both 'www' and non-'www' addresses work on your site. There's no redirection. And you've used both versions interchangeably. Sigh. Don't worry, though - it's fixable.
- Case issues. To a computer, 'a' and 'A' are different characters. If you capitalize part of your URL one time, and don't capitalize it the next, you may cause all sorts of duplication problems. Hard to detect once it's done, so I suggest keeping everything lower case.
There are more. If you think of some, post 'em below.
On to part 2: How to detect canonical problems on your site. It's easy-peasy!
Things you can do
- Read part 2 of this series: How to: Find canonicalization issues.
- Read part 3 of this series: How to: Fix canonicalization issues
- Sign up for my 1 hour SEO consult.
- Read about rel=canonical to get warmed up for parts 2 and 3.
- Follow me on Twitter.
Frank Salinas has just released a free script that will turn visitors of your blog or website into viral promoters of your products, free reports, articles, blogs, webpages & more…
If you want more people on twitter, tweeting about you and your products then this free software will get the job done! Click here to get Exit Tweet for free & watch a demo video of the Exit Tweet in action.
Related posts:
Reading online ,10-15 words on a line is best.
Don't take my word for it anymore. I'm tired of arguing with you.
Just go over and subscribe to the Designer Bookshop Newsletter.
They're design geniuses, and they say stuff like this:
It is important that the columns are set to a length that is "proportional" to the type size. A practical guide to find the right length is that a columns row should contain approximately 10 words...
K?
- (Video) Quick way to get website traffic (It will surprise you!) http://bit.ly/4ApIc #
- (Free Video) easy way to get traffic (this will surprise you) http://bit.ly/4ApIc #
- Morning Tweeple! If you haven't checked out my new blog post on getting traffic & customers from Squidoo.com it's at: http://bit.ly/1vnVXi #
- RT: @Twt2Us Tweet! Tweet! Join us at Twt2.us! http://is.gd/3ARX7 #
Powered by Twitter Tools
Related posts:
- Twitter Weekly Updates for 2009-10-04
- Twitter Weekly Updates for 2009-09-13
- Twitter Weekly Updates for 2009-08-30


