1. 程式人生 > >網絡爬蟲基本練習

網絡爬蟲基本練習

imp import print ttr sele spa clas from OS

1.取出h1標簽的文本

import requests
url = http://news.gzcc.cn/html/2018/xiaoyuanxinwen_0328/9113.html
res = requests.get(url)
res.encoding=utf-8
from bs4 import BeautifulSoup
soup = BeautifulSoup(res.text,html.parser)
soup.h1.text

2.取出a標簽的鏈接

soup.a.attrs.get(href)

3.取出所有li標簽的所有內容

 for i in soup.select(
li): print(i.text)

4.取出一條新聞的標題、鏈接、發布時間、來源

soup.select(.news-list-title)[0].text
soup.select(li)[1].a.attrs[href]
soup.select(.news-list-info)[0].contents[0].text
soup.select(.news-list-info)[0].contents[1].text

網絡爬蟲基本練習