1. 程式人生 > >[Python] 記一波閑來無事寫的小爬蟲

[Python] 記一波閑來無事寫的小爬蟲

閑來無事 .html .cn path remove markdown 講解 parser log

Python爬取一波簡書的文章
就沒什麽時間寫講解...
早知道把剛才的視頻錄下來發出來了。

import requests
from bs4 import BeautifulSoup
from os import remove

def find_data(title, url):
data = ""
r = requests.get(url)
soup = BeautifulSoup(r.text, ‘html.parser‘)
div_data = soup.find("div", class_="show-content-free")

p_list = div_data.find_all("p")
for i in p_list:
data += i.text + "\n"
path = title+".txt"
with open(path, "w") as f:
f.write(data)
return path

Title_url = "https://www.jianshu.com"
TitleAndUrl = {}

r = requests.get("https://www.jianshu.com/")
soup = BeautifulSoup(r.text, "html.parser")

ul_data = soup.find("ul", class_="note-list")
li_soup = BeautifulSoup(str(ul_data), "html.parser")

li_data = li_soup.find_all("li")
for i in li_data:
soup = BeautifulSoup(str(i), "html.parser")
a_data = soup.find("a", class_="title")

URL = Title_url + a_data["href"]
TitleAndUrl[a_data.text] = URL
for i in TitleAndUrl:
try:
file_name = find_data(i, TitleAndUrl[i])
except Exception as e:
with open("error.log", "a+") as f:
i = i + "\n"
f.write(i)
e = str(e) + "\n"
f.write(e)
continue
由小影轉發
QQ:1539747235
郵箱:[email protected]

本文基於《署名-非商業性使用-相同方式共享 4.0 國際 (CC BY-NC-SA 4.0)》許可協議授權
文章鏈接:https://www.allsrc.cn/requests/pythonfindjianshu.html (轉載時請註明本文出處及文章鏈接)

[Python] 記一波閑來無事寫的小爬蟲