BeautifulSoup 用 .find(text=True) 找不到 table 里边的文字


.find(text=True) table 里的一些文字没有作用,下边是我的代码:


 import urllib
import urllib2
import cookielib
import re
import csv
import codecs
from bs4 import BeautifulSoup

listmain = 'http://gdemba.gicp.net:84/ListMain.asp'
header = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(listmain,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)

table = soup.find(id='Table11')
f = open('table.csv', 'w')
csv_writer = csv.writer(f)
td = re.compile('td')

client = ""
tag = ""
tel = ""
catalogue = ""
region = ""
client_type = ""
email = ""
creater = ""
department = ""
action = ""

for row in table.find_all("tr"):
    cells = row.find_all("td")
    if len(cells) == 10:
        client = cells[0].find(text=True)
        tag = cells[1].find(text=True)
        tel = cells[2].find(text=True)
        catalogue = cells[3].find(text=True)
        region = cells[4].find(text=True)
        client_type = cells[5].find(text=True)
        email = cells[6].find(text=True)
        creater = cells[7].find(text=True)
        department = cells[8].find(text=True)
        action = cells[9].find(text=True)

    csv_writer.writerow([x.encode('utf-8') for x in [client, tag, tel, catalogue, region, client_type, email, creater, department, action]])

f.close()

有一条要处理的 <tr> 是这样的:


 <tr class="ListTableRow" id="Row0" onclick="javascript:setRowFocus(this,false,0);FirstDataFormat('0000008688')" ondblclick="viewcoinfo('interunit','0000008688','{A31618B2-90CC-456F-A2E7-4C5B0D577E25}')">
<td nowrap=""> <span id="spanshare0000008688"></span>深圳营业部</td>
<td id="0000008688sign" nowrap=""> 福田</td>
<td nowrap=""> 0755-66666666</td>
<td nowrap=""> 手机配件</td>
<td nowrap=""> 深圳市</td>
<td nowrap=""> 普通客户</td>
<td nowrap=""> <span class="BlueText" onclick="javascript:EmailTo('0000008688','[email protected]')" onmouseout="javascript:this.style.textDecoration=''" onmouseover="javascript:this.style.textDecoration='underline'>[email protected]</span></td>
<td nowrap=""> 信息资源部</td>
<td nowrap=""> 信息资源部</td>
<td height="16" nowrap="" style="width: 78px"> </td>
</tr>

但是 客户名称 Email 两个 <td> 里边的text没办法取出来:

图片描述

请问是什么原因,跟 <span> 标签有关系吗?

python beautifulsoup HTML 网页爬虫

ルカ女王控 8 years, 9 months ago

不需要使用 cells[0].find(text=True) ,直接用cells[0].text就行

飙车的神父 answered 8 years, 9 months ago

Your Answer