Sunday, July 27, 2008

Google geocoding charset encoding currently broken

Update (2008-09-09): Google seems to have finally fixed this.

About 2 weeks ago I started seeing weird characters when geocoding addresses via Google using the YM4R gem. The addresses are outside the US and so contain plenty of accented characters that used to be properly encoded in UTF8. Although Google's XML claims to return UTF8, it currently doesn't, sending what looks like ISO-8859-1 encoded characters in some fields instead. This is more than likely a problem with their outsourced partners not having properly setup UTF8 environments and updating GIS information using local encodings.

I search for the issue and found if mentioned in ep's blog. He ended up seeing the same error message:

#<REXML::ParseException: Missing end tag for 'DependentLocalityName' (got "DependentLocality")

That'll usually result in a nice 500 error in your Rails app (if that's what you use) because it raises an exception.

His solution (forcing further charset translations to occur) worked quite well, except that I had to mention the origin charset in my case. So instead of calling to_utf8 plainly, I pass it the charset with to_utf8('iso-8859-1'). This is a very ugly hack in all so I hope that Google fixes the issue soon. I personally didn't report the bug 'cause I never got any feedback from any information I ever sent their way or any requests that I've made in the past.

No comments: