根據搜尋內容爬取拉鉤網和招聘網的職位招聘資訊

阿新 • • 發佈：2019-01-29

程式碼：

import requests
import time
import random

ip_list = ['117.135.132.107', '121.8.98.196',  '194.116.198.212']

#http請求頭資訊
headers={
'Accept':'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'zh-CN,zh;q=0.8',
'Connection':'keep-alive',
'Content-Length':'25',
'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
'Cookie':'user_trace_token=20170214020222-9151732d-f216-11e6-acb5-525400f775ce; LGUID=20170214020222-91517b06-f216-11e6-acb5-525400f775ce; JSESSIONID=ABAAABAAAGFABEF53B117A40684BFB6190FCDFF136B2AE8; _putrc=ECA3D429446342E9; login=true; unick=yz; showExpriedIndex=1; showExpriedCompanyHome=1; showExpriedMyPublish=1; hasDeliver=0; PRE_UTM=; PRE_HOST=; PRE_SITE=; PRE_LAND=https%3A%2F%2Fwww.lagou.com%2F; TG-TRACK-CODE=index_navigation; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1494688520,1494690499,1496044502,1496048593; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1496061497; _gid=GA1.2.2090691601.1496061497; _gat=1; _ga=GA1.2.1759377285.1487008943; LGSID=20170529203716-8c254049-446b-11e7-947e-5254005c3644; LGRID=20170529203828-b6fc4c8e-446b-11e7-ba7f-525400f775ce; SEARCH_ID=13c3482b5ddc4bb7bfda721bbe6d71c7; index_location_city=%E6%9D%AD%E5%B7%9E',
'Host':'www.lagou.com',
'Origin':'https://www.lagou.com',
'Referer':'https://www.lagou.com/jobs/list_Python?',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'X-Anit-Forge-Code':'0',
'X-Anit-Forge-Token':'None',
'X-Requested-With':'XMLHttpRequest'
}

def get_json(url,page,lange_name):
    #構造一個framdata資料
    FramData = {'firts':'true','pn':page,'kd':lange_name}
    #採用request是post方法，返回requests<200>,訪問成功
    JsonDatas = requests.post(url,FramData,headers=headers,proxies={'http': 'http://' + random.choice(ip_list)}).json()
    #獲取字典資料
    #JsonDatas = jsonData.json()
    return  JsonDatas

def parser_json(page,JsonDatas):#JsonDatas資料庫型別是字典
    #total = int(JsonDatas['content']['positionResult']['totalCount'])
    companyInfos = []
    #獲取招聘資訊的公司，列表型別
    companyInfo = JsonDatas['content']['positionResult']['result']
    #對每一個公司遍歷
    print("正在解析{0}頁招聘資訊".format(page))
    for company in companyInfo:
        #定義一個列表，暫時儲存一個公司資訊
        comInfo = []
        #公司所在城市
        if company['district'] is not None:
            city =  company['city'] + '-' + company['district']
        else:
            city = company['city']
        #print(city)
        comInfo.append(city)
        # 職位名稱
        positionName = company['positionName']
        #print(positionName)
        comInfo.append(positionName)
        #獲取公司名稱
        companyFullName = company['companyFullName']+ '(' + company['companyShortName'] + ')'
        #print(companyFullName)
        comInfo.append(companyFullName)
        #要求學歷
        education = company['education']
        #print(education)
        comInfo.append(education)
        #職位型別
        jobNature = company['jobNature']
        #print(jobNature)
        comInfo.append(jobNature)
        #職位待遇
        positionAdvantages = company['positionAdvantage']
        positionAdvantage = positionAdvantages.replace('，','；').replace(',','；')
        #print(positionAdvantage)
        comInfo.append(positionAdvantage)
        #工資
        salary = company['salary']
        #print(salary)
        comInfo.append(salary)
        #經驗要求
        workYear = company['workYear']
        comInfo.append(workYear)
        #分佈時間
        time = company['createTime']
        comInfo.append(time)
        #將每個公司的資訊加入companyInfos中
        companyInfos.append(comInfo)
    print("第{0}頁解析完成".format(page))
    return companyInfos
def writeCSV(page,fw,companyInfos):
    for companyInfo in companyInfos:
        #print(companyInfo)
        fw.write(",".join(companyInfo)+'\n')
    print("第{0}頁資料寫入完畢".format(page))
    
def main():
    path = 'F:'  # 檔案儲存路徑
    start_page = 1
    end_page = 20 #預設
    lange_name = input("請輸入要所有的職位：")
    city = input("請輸入工作地點：")
    #建立檔案
    fw = open(path + '\lagou_' + lange_name + '.csv', 'a+')
    #構造url連結
    start_url = 'https://www.lagou.com/jobs/positionAjax.json?px=default&city='
    end_url = '&needAddtionalResult=false&isSchoolJob=0'
    url=start_url + city + end_url
    page = start_page
    row = ['工作地點','職位名稱', '公司名稱', '要求學歷', '工作性質', '工作福利', '薪水', '工作經驗要求','釋出時間']
    fw.write(",".join(row) + '\n')
    while page < end_page:
        time.sleep(12)
        print("正在抓取第{0}頁招聘資料資訊".format(page))
        #獲取json資料
        JsonDatas = get_json(url,page,lange_name)
        #對獲取的資料進行解析
        companyInfos = parser_json(page,JsonDatas)
        #將資訊寫入CSV檔案中
        writeCSV(page,fw,companyInfos)
        page = page+1
    print("所有資料寫入完畢")

if __name__ == '__main__':
    main()

根據搜尋內容爬取拉鉤網和招聘網的職位招聘資訊

程式碼：import requests import time import random ip_list = ['117.135.132.107', '121.8.98.196', '194.116.198.212'] #http請求頭資訊 headers={ 'Ac

根據搜尋內容爬取招聘網的職位招聘資訊

程式碼：import requests from bs4 import BeautifulSoup import time def getHtml(url,code='gbk'): try: r = requests.get(url)

ruby 爬蟲爬取拉鉤網職位信息，產生詞雲報告

content 數據持久化 lag works wid spa 代碼職位要求思路：1.獲取拉勾網搜索到職位的頁數　　 2.調用接口獲取職位id 　　 3.根據職位id訪問頁面，匹配出關鍵字　　 url訪問采用unirest，由於拉鉤反爬蟲，短時間內頻繁訪問會被

selelinum+PhantomJS 爬取拉鉤網職位

one while 對象 bili exe 5.0 設置 expect money 使用selenium+PhantomJS爬取拉鉤網職位信息，保存在csv文件至本地磁盤拉鉤網的職位頁面，點擊下一頁，職位信息加載，但是瀏覽器的url的不變，說明數據不是發送get請求得到的

python爬取拉鉤網招聘資訊

拉鉤網網址為：https://www.lagou.com/點選F12進入控制檯觀察結構，發現所有的招聘內容都在此json檔案中：注意headers中的請求url以及請求方法：還有表單資料：獲取以上資訊後，基本就可以開始爬取工作，注意，拉鉤網有反爬機制，所以需要使用cookie

python爬取拉鉤網資料

import requests import re#引用正則匹配 from bs4 import BeautifulSoup headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) A

用Python爬取拉鉤網招聘職位資訊

本文實現自動爬取拉鉤網招聘資訊，並將爬取結果儲存在本地文字中（也可以將資料存入資料庫）使用到的Python模組包（Python3）： 1.urllib.request 2.urllib.parse 3.json 簡單分析： 1.在向伺服器傳送請求，

簡單python爬蟲爬取拉鉤網

因為個人需求，爬取了拉鉤網資料探勘相關職位的資料首先先進入到拉鉤的首頁，搜尋資料探勘，得到相關職位的列表，按F12，檢視網路檢視html，可以看到職位列表並不在html所以肯定是通過XHR非同步載入的，再切換到XHR，可以找到4個，點開檢視，可以看到在一個請求中有我們需要的資

Scrapy爬取拉鉤網的爬蟲（爬取整站CrawlSpider）

經過我的測試，拉鉤網是一個不能直接進行爬取的網站，由於我的上一個網站是扒的介面，所以這次我使用的是scrapy的整站爬取，貼上當時的程式碼（程式碼是我買的視訊裡面的，但是當時是不需要登陸就可以爬取的）： class LagouSpider(CrawlSpider):

爬取拉鉤全站的職位信息

localhost http 一個 pipe mongod 分析信息 maximum 生成爬蟲學習到今天也告一段落了,利用一個項目把自己這幾個月的所學的知識來做一次總結項目所需要的知識比較全面,很適合練手, 一程序目的爬取拉鉤全站的職位信息存入mysql和mo

CrawlSpider爬取拉鉤

CrawlSpider繼承Spider,提供了強大的爬取規則(Rule)供使用填充custom_settings,瀏覽器中的請求頭 from datetime import datetime import scrapy from scrapy.linkextractors import LinkExt

爬取拉鉤崗位資訊生成圖表和詞雲

1.環境準備 py版本：python3.6.7 需要使用的包列表檔案： requirements.txt certifi==2018.10.15 chardet==3.0.4 cycler==0.10.0 idna==2.7 jieba==0.39 kiwisolver==1.0.1

Python 爬取拉鉤

... from urllib import request from urllib import parse from urllib.error import URLError import json import math import pymongo MONGO_URL='localhost'

用python爬取拉勾網招聘資訊並以CSV檔案儲存

爬取拉勾網招聘資訊 1、在網頁原始碼中搜索資訊，並沒有搜到，判斷網頁資訊使用Ajax來實現的 2、檢視網頁中所需的資料資訊，返回的是JSON資料； 3、條件為北京+資料分析師的公司一共40087家，而實際拉勾網展示的資料只有 15條/頁 * 30頁 = 450條，所以需要判斷

HttpClient爬取拉勾網招聘資訊

1.匯入jar包 <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>htt

python爬取拉勾網之selenium

重點程式碼解釋： 1.呼叫lxml的etree實現xpath方法呼叫，xpath相對正則比較簡單，可以不在使用Beauitfulsoup定位 from lxml import etree 2.介面的可視話與否，對於你的執行資源只能用減少 opt=webdri

python爬取拉勾網網際網路大資料職業情況

爬取拉勾網資訊資料處理製圖所需知識只有一點點（畢竟是個小白）： requests基礎部分 json pyecharts wordcloud 接下來開始敲程式碼了，程式碼分成了3個部分：爬取、製圖、生成詞雲爬取部分：首先要說明的是，拉勾網有反爬

Python爬蟲：爬取拉勾網資料分析崗位資料

1 JSON介紹 JSON（JavaScript Object Notation）已經成為通過HTTP請求在Web瀏覽器和其他應用程式之間傳送資料的標準格式之一。比CSV格式更加靈活。Json資料格式，非常接近於有效的Pyhton程式碼，其特點是：JSON物件所

python爬蟲爬取拉勾網站內容

本次主要內容是分享下拉勾網站模擬搜尋以及搜尋內容的爬取，這裡先引入一些用到的庫，由於網站本身的反爬蟲技術和網路原因，這裡使用了fake_useragent和多執行緒模式，當然如果有條件的話也可以使用代理池，這樣可以更加保險一點。由於我沒有弄那些收費的代理，而免費

Python爬取拉勾網招聘資訊存入資料庫

先抓包分析我們想要獲取的資料，很明顯都是動態資料，所以直接到Network下的XHR裡去找，這裡我們找到具體資料後，就要去尋分析求地址與請求資訊了。還有需要提交的表單資訊分析完畢之後，我們就可以開始寫我們的爬蟲專案了。一.編寫Itemitem編寫比較簡單# 拉鉤職位資訊 cl

根據搜尋內容爬取拉鉤網和招聘網的職位招聘資訊

相關推薦