Browsing articles from "August, 2011"
Aug
29

UTF-8 Encoding problems with file_get_contents() and DOMDocument

I recently bumped into an encoding issue on a project I was working on.
I was trying to scrape some content off a website that had ISO-8859-1 charset encoding, and I needed to capture some text and store it in a database as UTF-8.

After some trial and error I discovered a way to properly change the encoding before saving it in the DB.

A simplified version of what I did:

 $url = 'http://www.smooka.com/blog/';
 $html = file_get_contents($url);

 //Change encoding to UTF-8 from ISO-8859-1
 $html = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $html);

read more