Android DefaultHttpClient Converts Unicode Into � Black Diamond & Question Mark Symbol Unlike Apache’s Version

This had me confused for quite a while. It seems that the HttpClient for both Android and Apache works differently.

I tried to access a URL that contained unicode characters in it. What Apache’s version did was convert the unicode character into HTML code. For Android’s version, it converted it to this special glyph character showing a black diamond symbol and a question mark inside it.

This is how it looks like: �.

I have scoured through forums to see if there is a workaround for it and sadly I had found none. The closest explanation I found was that Android does not have a font that can understand and display the correct unicode symbol in the app.

Since my String value containing those characters are not a bunch of different unicode characters, what I did was use the replaceAll() method of the String class to convert these � symbols into the unicode symbol that I want.

The unicode for � is \uFFFD.

I thought at first there was something wrong with my code and it took me quite some time to figure out that there was nothing wrong with it. The problem lay in Android’s HttpClient class on how it handles unicode characters found in HTML pages.

Related Posts Plugin for WordPress, Blogger...

Leave a Reply

Your email address will not be published. Required fields are marked *