用python抓新浪首页,返回的是乱码


这是我的代码:


 #-*-coding:utf-8-*-

import requests

url = "http://www.sina.com.cn"
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.132 Safari/537.36'}
html = requests.get(url,headers = headers)
print html.text

返回的字符串:
图片描述

新浪首页的字符集是gb2312,程序里我也来回转过编码,但貌似都没用,所以又改回来了,编辑器里的编码我也设置过了,还是不行,上网搜了一晚上,还是没找到解决方法,所以谢谢大家了

python python2.7 python-爬虫

kiririn 8 years, 10 months ago
H.ero answered 8 years, 9 months ago

requests 提供转码方法


 python


 In [1]: html.encoding
Out[1]: 'ISO-8859-1'

In [2]: html
%%html  html

In [2]: html.encoding = 'gb2312'

In [3]: print html.text[:500]
<!DOCTYPE html>
<!--[30,131,1] published at 2015-08-06 23:33:28 from #130 by system-->
<html>
<head>
<link rel="icon" sizes="any" mask href="http://www.sina.com.cn/favicon.svg">
<meta name="theme-color" content="red">
<link rel="icon" type="image/x-icon" href="http://www.sina.com.cn/favicon.ico">
    <meta http-equiv="Content-type" content="text/html; charset=gb2312" />
    <title>新浪首页</title>

    <meta name="keywords" content="新浪,新浪网,SINA,sina,sina.com.cn,新浪首页,门户,资讯" />
    <meta name="descript

In [4]:

一日就是一天 answered 8 years, 9 months ago

Your Answer