1. 程式人生 > >python爬取拉鉤網招聘資訊

python爬取拉鉤網招聘資訊

拉鉤網網址為:https://www.lagou.com/

點選F12進入控制檯觀察結構,發現所有的招聘內容都在此json檔案中:

注意headers中的請求url以及請求方法:


還有表單資料:


獲取以上資訊後,基本就可以開始爬取工作,注意,拉鉤網有反爬機制,所以需要使用cookie,referer,user-agent模擬瀏覽器登入。

程式碼如下:

#匯入模組
import requests
from bs4 import BeautifulSoup
from urllib.parse import quote

#鍵入資訊
keyword = input("關鍵詞:")
city = input("所在城市:
") #將所在城市字串轉換成url編碼 city = quote(city) headers = { 'Cookie':'_ga=GA1.2.1209754414.1514967030; user_trace_token=20180103161031-90df0df0-f05d-11e7-9fc4-5254005c3644; LGUID=20180103161031-90df13b0-f05d-11e7-9fc4-5254005c3644; _gid=GA1.2.1398638690.1528077740; index_location_city=%E6%88%90%E9%83%BD; WEBTJ-ID=20180604211403-163caeee34932f-0d26f742560af7-3c60460e-1049088-163caeee34a74b; _gat=1; PRE_HOST=www.baidu.com; LGSID=20180604211405-287e8893-67f9-11e8-9199-525400f775ce; PRE_UTM=m_cf_cpc_baidu_pc; PRE_SITE=https%3A%2F%2Fwww.baidu.com%2Fbaidu.php%3Fsc.K000000fJeHuq9k182ORUWSBOwQf0uubLYJOnccqK-6lsOf9B--xNbB1V0Oak5wAYokuFvNP9W5EWMgVVbG7h4DURdIbdtIKzQccpCTHJe_BvkYwDT-P7rrahydjnpGo9b-DSOk6Sf9CVzYSzYH_KJs7FQ2sTKX7lyFxD_yEKva762AyN6.DD_NR2Ar5Od663rj6tJQrGvKD7ZZKNfYYmcgpIQC8xxKfYt_U_DY2yP5Qjo4mTT5QX1BsT8rZoG4XL6mEukmryZZjzsLTJplePXO-zIr4PXE-sSxH9vX8ZuEsSXOjEzmxUEsSxW9qx-9LdoDkbLyNSPhHWzdvT85R_nYQAHWEotN.U1Yk0ZDqs2v4_tL30A7bTgbqs2v4_tL30A7bTgfqn6KspynqnfKY5TaV8U5PS0KGUHYznjf0u1dsTLwz0ZNG5yF9pywdUAY0TA-b5Hc30APGujYznWm0UgfqnH0krNtknjDLg1DsnWPxn10kPNt1PW0k0AVG5H00TMfqnWDL0ANGujY0mhbqnW0Y0AdW5HDsnj7xP1nsnHRYrjcYg17xnH0zg100TgKGujYs0Z7Wpyfqn0KzuLw9u1Ys0A7B5HKxn0K-ThTqn0KsTjYknjf1njRvrHbv0A4vTjYsQW0snj0snj0s0AdYTjYs0AwbUL0qn0KzpWYs0Aw-IWdsmsKhIjYs0ZKC5H00ULnqn0KBI1Ykn0K8IjYs0ZPl5fKYIgnqn1mvPWb1nHb3PW0YnjTvP1msP0Kzug7Y5HDdnW6knH6sn1TvrjR0Tv-b5H-buWb3Pjubnj0snAm3Pj00mLPV5HKKP1uDrDRYwWfdwDDYfWf0mynqnfKsUWYs0Z7VIjYs0Z7VT1Ys0ZGY5H00UyPxuMFEUHYsg1Kxn7tsg100uA78IyF-gLK_my4GuZnqn7tsg1Kxn1D3PWbkg100TA7Ygvu_myTqn0Kbmv-b5Hcvrjf1PHfdP6K-IA-b5iYk0A71TAPW5H00IgKGUhPW5H00Tydh5HDv0AuWIgfqn0KhXh6qn0Khmgfqn0KlTAkdT1Ys0A7buhk9u1Yk0Akhm1Ys0APzm1Ydnj01n0%26ck%3D3433.2.110.206.561.239.621.215%26shh%3Dwww.baidu.com%26sht%3D25017023_10_pg%26us%3D1.0.2.0.0.0.0%26ie%3Dutf-8%26f%3D8%26tn%3D25017023_10_pg%26wd%3D%25E6%258B%2589%25E9%2592%25A9%25E7%25BD%2591%26oq%3D%25E6%258B%2589%25E9%2592%25A9%25E7%25BD%2591%26rqlang%3Dcn%26lm%3D-1%26ssl_s%3D1%26ssl_c%3Dssl1_163caeed355%26bc%3D110101; PRE_LAND=https%3A%2F%2Fwww.lagou.com%2Flp%2Fhtml%2Fcommon.html%3Futm_source%3Dm_cf_cpc_baidu_pc%26m_kw%3Dbaidu_cpc_cd_e110f9_265e1f_%25E6%258B%2589%25E9%2592%25A9%25E7%25BD%2591; JSESSIONID=ABAAABAAAGFABEF3C6E46C38A26E7FFF00985171CC476C0; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1528077739,1528118044,1528118048,1528118062; TG-TRACK-CODE=index_search; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1528118071; LGRID=20180604211429-368ac179-67f9-11e8-9199-525400f775ce; SEARCH_ID=1976fcf584114b59811d845ae44421b1'
, 'Referer':'https://www.lagou.com/jobs/list_python?city=%s&cl=false&fromSearch=true&labelWords=&suginput=' % (city), 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36' } demo_URL = 'https://www.lagou.com/jobs/list_{0}?city={1}&cl=false&fromSearch=true&labelWords=&suginput='
URL = demo_URL.format(keyword,city) HTML = requests.get(URL,headers=headers) soup = BeautifulSoup(HTML.content,'html.parser') page_total = int(soup.select('.page-number')[0].text.strip().replace(' ','').replace('\n','').split('/')[1]) #輸出該職業資訊總共有多少頁 print(page_total) #遍歷所有頁 for page_number in range(1,(page_total+1)): data = { 'first': 'true', 'pn': page_number, 'kd': keyword } demo_url = 'https://www.lagou.com/jobs/positionAjax.json?city={}&needAddtionalResult=false' url = demo_url.format(city) html = requests.post(url,headers=headers,data=data) result = html.json() info = result['content']['positionResult']['result'] for i in info: #輸出職業名,工作時間,文憑,薪資,公司簡稱,公司全稱,所在城市 print(i['positionName'],i['workYear'],i['education'],i['salary'],i['companyShortName'],i['companyFullName'],i['city'])

效果展示:

首先輸入關鍵詞:


在輸入所在城市:


點選enter,爬取資訊展示(資訊量過多,只展示了一部分資訊):