Why is it that it’s almost 2007, and Yahoo still doesn’t get HTML entities right? Of particular concern is the » entity (»), which is part of the default WordPress install. To say WordPress is pervasive would be understating its reach. And yet Yahoo can’t properly display the titles of millions of WordPress articles because it fails to properly render the » character. This is simple stuff guys, when will you get it straight?
I suppose an example is in order, although I’m certain that one can be found quickly by almost anyone with access to search.yahoo.com. This evening I entered the domain of my wife’s new site, analyzingmind.com, into Yahoo. The results show " instead of » in the page titles. In the HTML source of the page, there is simply a quotation mark where there should be a ».
My first thought when I noticed this oddity months ago was that it was a browser issue. Today, however, I’m running Firefox 2, which tells me that it rendered the page in Standards compliance mode and that the character set for the page is UTF-8. With those bits in order, I’m thinking that it should be no problem to display this simple character. And besides, when I view the site directly, the glyph displays properly.
I am beginning to think that Yahoo, upon crawling and indexing the site, translated the » to a " and stored it that way in its index. The HTML 4 spec linked above states that » is a “right-pointing double angle quotation mark,” and that’s sort of similar to a standard, non-pointing, non-angled quotation mark, right? I’m not sure of the reasoning behind this, but I think it probably has something to do with not wanting to serve up characters that are not displayable in certain browser/font combinations. Whatever the reasoning, something is wrong.