php - Font or Unicode issue on Scraping -


this question has answer here:

am trying scrape info site.

the site have this

127 east zhongshan no 2 rd; 中山东二路127号 

but when try scrap & echo show

127 east zhongshan no 2 rd; 中山ä¸äºè·¯127å·  

i try utf-8

there php code

now please me solve problem.

function grabpage($site){     $ch = curl_init();     curl_setopt($ch, curlopt_returntransfer, true);     curl_setopt($ch, curlopt_useragent, $_server['http_user_agent']);     curl_setopt($ch, curlopt_timeout, 40);     curl_setopt($ch, curlopt_cookiefile, "cookie.txt");     curl_setopt($ch, curlopt_url, $site);     ob_start();     return curl_exec ($ch);     ob_end_clean();     curl_close ($ch); } $grabdata   = grabpage($site);  $dom    = new domdocument(); @$dom->loadhtml($grabdata);  $xpath  = new domxpath($dom);   $mainelements = array(); $mainelements = $xpath->query("//div[@class='col--one-whole mv--col--one-half wv--col--one-whole'][1]/dl/dt");  foreach ($mainelements $names2) {     $name2  = $names2->nodevalue;     echo "$name2"; } 

first off, need set charset before else on top of php file:

header('content-type: text/html; charset=utf-8'); 

you need convert html markup got mb_convert_encoding:

@$dom->loadhtml(mb_convert_encoding($grabdata, 'html-entities', 'utf-8')); 

sample output


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -