php - Font or Unicode issue on Scraping -
this question has answer here:
am trying scrape info site.
the site have this
127 east zhongshan no 2 rd; 中山东二路127号
but when try scrap & echo show
127 east zhongshan no 2 rd; ä¸å±±ä¸äºè·¯127å·
i try utf-8
there php code
now please me solve problem.
function grabpage($site){ $ch = curl_init(); curl_setopt($ch, curlopt_returntransfer, true); curl_setopt($ch, curlopt_useragent, $_server['http_user_agent']); curl_setopt($ch, curlopt_timeout, 40); curl_setopt($ch, curlopt_cookiefile, "cookie.txt"); curl_setopt($ch, curlopt_url, $site); ob_start(); return curl_exec ($ch); ob_end_clean(); curl_close ($ch); } $grabdata = grabpage($site); $dom = new domdocument(); @$dom->loadhtml($grabdata); $xpath = new domxpath($dom); $mainelements = array(); $mainelements = $xpath->query("//div[@class='col--one-whole mv--col--one-half wv--col--one-whole'][1]/dl/dt"); foreach ($mainelements $names2) { $name2 = $names2->nodevalue; echo "$name2"; }
first off, need set charset before else on top of php file:
header('content-type: text/html; charset=utf-8');
you need convert html markup got mb_convert_encoding
:
@$dom->loadhtml(mb_convert_encoding($grabdata, 'html-entities', 'utf-8'));