1. 程式人生 > >Python爬取貓眼top100排行榜數據【含多線程】

Python爬取貓眼top100排行榜數據【含多線程】

代碼 status log col return map result port htm




# -*- coding: utf-8 -*-
import requests
from multiprocessing import Pool
from requests.exceptions import RequestException
import re
import json


def get_one_page(url):
    """
    爬取每個頁面
    :param url: 爬取url地址
    :return: 返回網頁內容
    """
    try:
        response = requests.get(url)
        if
response.status_code == 200: return response.text return None except RequestException: return None def parse_one_page(html): """ 處理篩選網頁內容中需要的信息 :param html: 網頁內容 :return: 字典 """ pattern = re.compile(<dd>.*?board-index.*?>(\d+)</i>.*?data-src="(.*?)".*?name"><a
+ .*?>(.*?)</a>.*?star">(.*?)</p>.*?releasetime">(.*?)</p> + .*?integer">(.*?)</i>.*?fraction">(.*?)</i>.*?</dd>, re.S) items = re.findall(pattern, html) for item in items: yield {
index: item[0], image: item[1], title: item[2], actor: item[3].strip()[3:], time: item[4].strip()[5:], score: item[5]+item[6] } def write_to_file(content): """ 將結果數據寫入文件 :param content: 需要寫入文件的內容 :return: """ with open(result.txt, a, encoding=utf-8) as f: f.write(json.dumps(content, ensure_ascii=False) + "\n") f.close() def main(offset): """ 主函數 :param offset: offset值,用於構造url :return: """ url = "http://maoyan.com/board/4?offset=" + str(offset) html = get_one_page(url) parse_one_page(html) for item in parse_one_page(html): print(item) write_to_file(item) if __name__ == __main__: # for i in range(10): # main(i*10) pool = Pool() pool.map(main, [i*10 for i in range(10)])

【來自天善智能】:https://edu.hellobi.com/course/156/play/lesson/2453

崔大師的代碼看著就是舒服。。。。

Python爬取貓眼top100排行榜數據【含多線程】