根據搜尋內容爬取招聘網的職位招聘資訊

阿新 • • 發佈：2019-02-17

程式碼：

import requests
from bs4 import BeautifulSoup
import time


def getHtml(url,code='gbk'):
    try:
        r = requests.get(url)
        r.raise_for_status()
        r.encoding = code
        return r.text
    except:
        return ""

def ParserHtml(i,htmlText):
    print("正在解析第{0}頁".format(i))
    RecuitInfos = []
    soup = BeautifulSoup(htmlText,'lxml')
    #獲取職位資訊
    InfoPositions = soup.find_all('p',attrs={'class':'t1 '})
    #獲取公司名資訊
    InfoNames = soup.find_all('span',attrs={'class':'t2'})
    #獲取工作地點
    InfoPlaces = soup.find_all('span', attrs={'class': 't3'})
    #獲取薪資資訊
    InfoSalarys = soup.find_all('span', attrs={'class': 't4'})
    #獲取招聘釋出條件資訊
    InfoTimes = soup.find_all('span', attrs={'class': 't5'})
    for m in range(1,len(InfoPositions)):
        if (len(InfoPositions[m-1].text.split())==0)|(len(InfoNames[m].text.split())==0)|(len(InfoPlaces[m].text.split())==0)|(len(InfoSalarys[m].text.split())==0)|((InfoTimes[m].text.split())==0):
            pass
        else:
            #print(len(InfoPositions),len(InfoNames),len(InfoPlaces),len(InfoSalarys),len(InfoTimes))
            #print(InfoPositions[m-1].text.split()[0],InfoNames[m].text.split()[0],InfoPlaces[m].text.split()[0],InfoSalarys[m].text.split()[0],InfoTimes[m].text.split()[0])
            RecuitInfos.append([InfoPositions[m-1].text.split()[0],InfoNames[m].text.split()[0],InfoPlaces[m].text.split()[0],InfoSalarys[m].text.split()[0],InfoTimes[m].text.split()[0]])
    return RecuitInfos

def writeCSV(i,fw,Recruit_info):
    for Info in Recruit_info:
        print("正在寫入第{0}頁".format(i))
        fw.write(",".join(Info)+'\n')
    print("第{0}資料抓取完畢".format(i))

def main():
    path = 'F:'
    posttion = input("請輸入要抓取的職位名稱：")
    fw = open(path  +'\zhaopin_'+ posttion+'.csv', 'a+')
    row = ["職位名","公司名","工作地點","薪資","釋出時間"]
    fw.write(",".join(row)+"\n")
    star_url = 'http://search.51job.com/list/000000,000000,0000,00,9,99,'
    mid_url = ',2,'
    end_url = '.html?'
    max = input("請輸入最大抓取頁數：")
    for i in range(1,int(max)):
        time.sleep(3)
        url = star_url + posttion +mid_url + str(i) + end_url
        htmlText = getHtml(url)
        Recruit_info = ParserHtml(i,htmlText)
        writeCSV(i,fw, Recruit_info)

if __name__ == '__main__':
    main()

根據搜尋內容爬取拉鉤網和招聘網的職位招聘資訊

程式碼：import requests import time import random ip_list = ['117.135.132.107', '121.8.98.196', '194.116.198.212'] #http請求頭資訊 headers={ 'Ac

根據搜尋內容爬取招聘網的職位招聘資訊

程式碼：import requests from bs4 import BeautifulSoup import time def getHtml(url,code='gbk'): try: r = requests.get(url)

拉勾網爬取全國python職位並數據分析薪資，工作經驗，學歷等信息

add with color palette 谷歌瀏覽器 tor item imp 文件中首先前往拉勾網“爬蟲”職位相關頁面確定網頁的加載方式是JavaScript加載通過谷歌瀏覽器開發者工具分析和尋找網頁的真實請求，確定真實數據在position.Ajax開頭的鏈

利用xpath爬取招聘網的招聘資訊

爬取招聘網的招聘資訊： import json import random import time import pymongo import re import pandas as pd import requests from lxml import etree impor

python 爬取豆瓣網搜尋結果同城活動資料

主要使用的庫： requests:爬蟲請求並獲取原始碼 re：使用正則表示式提取資料 json:使用JSON提取資料 pandas：使用pandans儲存資料 bs4:網頁程式碼解析以下是原始碼： #!coding=utf-8 import requests

網頁內容爬取：如何提取正文內容 BEAUTIFULSOUP的輸出

總計排除 XML html pack prettify 樣式 start ack 創建一個新網站，一開始沒有內容，通常需要抓取其他人的網頁內容，一般的操作步驟如下：根據url下載網頁內容，針對每個網頁的html結構特征，利用正則表達式，或者其他的方式，做文本解析，提取出

scrapy實戰1分布式爬取有緣網：

req 年齡 dict ems arch last rem pen war 直接上代碼： items.py 1 # -*- coding: utf-8 -*- 2 3 # Define here the models for your scraped items

小爬拉勾網職位

with exce www open except es2017 file ucc code 問題描述：爬取拉勾網python、工作地在北京的相關職業（python，北京），將結果保存。 1.頁面分析：因為拉勾網有反爬蟲機制，所以需要設置相應的請求信息，由於職位信息A

多線程版爬取故事網

實現 exe don comm value obj nco result nic 前言：為了能以更高效的速度爬取，嘗試采用了多線程本博客參照代碼及PROJECT來源：http://kexue.fm/archives/4385/ 源代碼： 1 #! -*- cod

結對-爬取大麥網演唱會信息-設計文檔

.com ref lock beautiful 模塊有用 pytho spa pil 結對編程成員：閻大為，張躍馨搭建環境： ?1.安裝python2.7 ?2.安裝beautifulsoup4等相關模塊編寫程序階段： ?1.分析html代碼以及了解相

結對-爬取大麥網近期演唱會信息-開發過程

quest 程序 ima ref 時間 -1 git 簡單測試 cnblogs Github：https://github.com/atinst/Python/tree/master/Damai 開發過程：1.根據需求分析，安裝並導入BeautifulSoup和reques

python 根據鏈家爬取的信息生成雲詞

python plot cfi lib 指定技術 with atp ted #-*- coding: utf-8 -*- ‘‘‘ Created on 2017-10-12 @author: wbhuangzhiqiang ‘‘‘ import csv from wo

結對-爬取大麥網近期演唱會信息-最終程序

.cn es2017 https png 演唱會 pair ima 技術 img 結對成員:閻大為，張躍馨學號:2015035107201學號:2015035107219 項目托管平臺地址：https://github.com/atinst/Pair-programming

Python爬取天氣網歷史天氣數據

ast 信息爬蟲 cmake tex for roc ins fonts 使用Python的requests 和BeautifulSoup模塊，Python 2.7.12可在命令行中直接使用pip進行模塊安裝。爬蟲的核心是利用BeautifulSoup的select語句獲

爬取豆瓣網評論最多的書籍

ups info 程序不容易 ima nta 單元 bs4 很多相信很多人都有書荒的時候，想要找到一本合適的書籍確實不容易，所以這次利用剛學習到的知識爬取豆瓣網的各類書籍，傳送門https://book.douban.com/tag/?view=cloud。首先是這個

Python爬取全書網小說，免費看小說

tle 3.6 tro con fin 保存 get 正在 url地址什麽是網絡爬蟲網絡爬蟲（又被稱為網頁蜘蛛，網絡機器人，在FOAF社區中間，更經常的稱為網頁追逐者），是一種按照一定的規則，自動地抓取萬維網信息的程序或者腳本。另外一些不常使用的名字還有螞蟻、自

Python爬蟲案例：利用Python爬取笑話網

htm 分享 targe pen 技術分享搞笑 lan tle import 學校的服務器可以上外網了，所以打算寫一個自動爬取笑話並發到bbs的東西，從網上搜了一個笑話網站，感覺大部分還不太冷，html結構如下：可以看到，笑話的鏈接列表都在<div cla

爬取中華網科技新聞

ID ews lse () compose all nal date put 爬取 http://tech.china.com/articles/ 抓取新聞列表中所有分頁的新聞詳情，包括標題、正文、時間、來源等信息。創建項目scrapy startproject Chin

最簡單的網絡圖片的爬取 --Pyhon網絡爬蟲與信息獲取

文件 spa lose man spl roo () pen image 1、本次要爬取的圖片url http://www.nxl123.cn/static/imgs/php.jpg 2、代碼部分 import requestsimport osurl = "ht

Python爬蟲項目--爬取自如網房源信息

xml解析 quest chrom 當前 b2b cal 源代碼 headers 判斷本次爬取自如網房源信息所用到的知識點: 1. requests get請求 2. lxml解析html 3. Xpath 4. MongoDB存儲正文 1.分析目標站點 1. url:

根據搜尋內容爬取招聘網的職位招聘資訊

相關推薦