Question

0 0

scrapy抓取内容保存为中文问题

爬虫的编写：


 # -*- coding: utf-8 -*-
#encoding=utf-8
import scrapy
from maizi.items import MaiziItem
reload(__import__('sys')).setdefaultencoding('utf-8') 

class MaimaiSpider(scrapy.Spider):
    name = "maimai"
    allowed_domains = ["http://www.zaobao.com/news/china/"]
    start_urls = (
        'http://www.zaobao.com/news/china//',
    )

    def parse(self, response):

        for li in response.xpath('//*[@id="l_title"]/ul/li'):
            item = MaiziItem()
            item['title'] = li.xpath('a[1]/p/text()').extract()
            item['link']=li.xpath('a[1]/@href').extract()
            item['desc'] = li.xpath('a[2]/p/text()').extract()

            yield item#返回项目

item的定义：


 import scrapy


class MaiziItem(scrapy.Item):
    title = scrapy.Field()
    link=scrapy.Field()
    desc =scrapy.Field()

爬虫结果输出：
是中文乱码，请问怎么改。
图片描述

python scrapy

8 years, 7 months ago

三日月扶子

share

三日月扶子 8 years, 7 months ago

Answer 1

0

这个是保存到xls了吗？你可以先print试试什么类型的

answered 8 years, 7 months ago

非洲黑大帅

share

非洲黑大帅 answered 8 years, 7 months ago

Answer 2

0

试试加上：


 import sys
reload(sys)
sys.setdefaultencoding('UTF-8')

这几乎是Python中文乱码的万能解决方案。

answered 8 years, 7 months ago

做怪兽自豪！

share

做怪兽自豪！ answered 8 years, 7 months ago

Answer 3

0

Xls默认字符集好像是本地字符集对中文也就是gbk的

answered 8 years, 7 months ago

iceblue

share

iceblue answered 8 years, 7 months ago

scrapy抓取内容保存为中文问题

三日月扶子

Answers

非洲黑大帅

做怪兽自豪！

iceblue

Your Answer