700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > python爬虫天气实例scrapy_.08.04 Python网络爬虫之Scrapy爬虫实战二 天气预报...

python爬虫天气实例scrapy_.08.04 Python网络爬虫之Scrapy爬虫实战二 天气预报...

时间:2023-10-26 17:54:40

相关推荐

python爬虫天气实例scrapy_.08.04   Python网络爬虫之Scrapy爬虫实战二   天气预报...

1.项目准备:网站地址:/

2.创建编辑Scrapy爬虫:

scrapy startproject weather

scrapy genspider HQUSpider

项目文件结构如图:

3.修改Items.py:

4.修改Spider文件HQUSpider.py:

(1)先使用命令:scrapy shell / 测试和获取选择器:

(2)试验选择器:打开chrome浏览器,查看网页源代码:

(3)执行命令查看response结果:

(4)编写HQUSpider.py文件:

# -*- coding: utf-8 -*-

import scrapy

from weather.items import WeatherItem

class HquspiderSpider(scrapy.Spider):

name = ‘HQUSpider‘

allowed_domains = [‘‘]

citys=[‘quanzhou‘,‘datong‘]

start_urls = []

for city in citys:

start_urls.append(‘http://‘+city+‘./‘)

def parse(self, response):

subSelector=response.xpath(‘//div[@class="tqshow1"]‘)

items=[]

for sub in subSelector:

item=WeatherItem()

cityDates=‘‘

for cityDate in sub.xpath(‘./h3//text()‘).extract():

cityDates+=cityDate

item[‘cityDate‘]=cityDates

item[‘week‘]=sub.xpath(‘./p//text()‘).extract()[0]

item[‘img‘]=sub.xpath(‘./ul/li[1]/img/@src‘).extract()[0]

temps=‘‘

for temp in sub.xpath(‘./ul/li[2]//text()‘).extract():

temps+=temp

item[‘temperature‘]=temps

item[‘weather‘]=sub.xpath(‘./ul/li[3]//text()‘).extract()[0]

item[‘wind‘]=sub.xpath(‘./ul/li[4]//text()‘).extract()[0]

items.append(item)

return items

(5)修改pipelines.py我,处理Spider的结果:

# -*- coding: utf-8 -*-

# Define your item pipelines here

#

# Don‘t forget to add your pipeline to the ITEM_PIPELINES setting

# See: /en/latest/topics/item-pipeline.html

import time

import os.path

import urllib2

import sys

reload(sys)

sys.setdefaultencoding(‘utf8‘)

class WeatherPipeline(object):

def process_item(self, item, spider):

today=time.strftime(‘%Y%m%d‘,time.localtime())

fileName=today+‘.txt‘

with open(fileName,‘a‘) as fp:

fp.write(item[‘cityDate‘].encode(‘utf-8‘)+‘\t‘)

fp.write(item[‘week‘].encode(‘utf-8‘)+‘\t‘)

imgName=os.path.basename(item[‘img‘])

fp.write(imgName+‘\t‘)

if os.path.exists(imgName):

pass

else:

with open(imgName,‘wb‘) as fp:

response=urllib2.urlopen(item[‘img‘])

fp.write(response.read())

fp.write(item[‘temperature‘].encode(‘utf-8‘)+‘\t‘)

fp.write(item[‘weather‘].encode(‘utf-8‘)+‘\t‘)

fp.write(item[‘wind‘].encode(‘utf-8‘)+‘\n\n‘)

time.sleep(1)

return item

(6)修改settings.py文件,决定由哪个文件来处理获取的数据:

(7)执行命令:scrapy crawl HQUSpider

到此为止,一个完整的Scrapy爬虫就完成了。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。