python爬取百度圖片---釋出exe小計編碼是個大坑

阿新 • • 發佈：2018-11-24

#*--coding:utf-8--*
import requests
import sitecustomize
import os
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
type=sys.getfilesystemencoding()

x=0
def getManyPages(keyword,pages,pathdir):
    params=[]
    for i in range(0,30*pages,30):
        params.append({
                      'tn': 'resultjson_com' 
,
                      'ipn': 'rj',
                      'ct': 201326592,
                      'is': '',
                      'fp': 'result',
                      'queryWord': keyword,
                      'cl': 2,
                      'lm': -1,
                      'ie': 'utf-8',
                      'oe' 
: 'utf-8',
                      'adpicid': '',
                      'st': -1,
                      'z': '',
                      'ic': 0,
                      'word': keyword,
                      's': '',
                      'se': '',
                      'tab': '',
                      'width' 
: '',
                      'height': '',
                      'face': 0,
                      'istype': 2,
                      'qc': '',
                      'nc': 1,
                      'fr': '',
                      'pn': i,
                      'rn': 30,#步長
                      'gsm': '1e',
                      '1517211097507': ''
                  })
    url = 'https://image.baidu.com/search/acjson'
    urls = []
    for i in params:
        l=[]
        try:
            l=requests.get(url, params=i).json().get('data')
        except:
            continue
        finally:
            if len(l) is not 0:
                urls.append(l)
                try:
                    getImg(urls,pathdir)  # 引數2:指定儲存的路徑
                    urls.pop()
                except:
                    pass

    return urls


def getImg(dataList, localPath):
    #print('開始下載'.decode('utf-8').encode(type))
    print('開始下載')
    global x;
    if not os.path.exists(localPath):  # 新建資料夾
        os.mkdir(localPath)

    for list in dataList:
        for i in list:
            if i.get('thumbURL') != None:
                #print('正在下載：%s'.decode('utf-8').encode(type) % i.get('thumbURL').decode('utf-8').encode(type))
                print('正在下載：%s' % i.get('thumbURL'))
                ir = requests.get(i.get('thumbURL'))
                open((localPath + '%d.jpg' % x), 'wb').write(ir.content)
                x += 1

            else:
                #print('圖片連結不存在'.decode('utf-8').encode(type))
                print('圖片連結不存在')

if __name__ == '__main__':
    li=[]
    pages=[]
    #exe:
    # pathdir=raw_input('請輸入儲存路徑(英文 半形)：'.decode('utf-8').encode(type))
    # count=input("請輸入查詢的類別數量：".decode('utf-8').encode(type))
    # for xxx in range(0,count):
    #     li.append(raw_input('請輸入要查詢的關鍵字：'.decode('utf-8').encode(type)))
    #     pages.append(input('請輸入下載的總頁數：'.decode('utf-8').encode(type)))
    # for yyy in range(0,count):
    #     getManyPages(li[yyy].decode(type).encode('utf-8'),int(pages[yyy]),(pathdir+str(yyy)+'/'))  # 引數1:關鍵字，引數2:要下載的頁數
    #pycharm:
    pathdir=raw_input('請輸入儲存路徑(英文 半形)：')
    count=input("請輸入查詢的類別數量：")
    for xxx in range(0,count):
        li.append(raw_input('請輸入要查詢的關鍵字：'))
        pages.append(input('請輸入下載的總頁數：'))
    for yyy in range(0,count):
        getManyPages(li[yyy],int(pages[yyy]),(pathdir+str(yyy)+'/'))  # 引數1:關鍵字，引數2:要下載的頁數

現附上原始碼！

接著上一篇文章那個爬蟲！

公司來了一個同事，和我做著差不多的工作，所以就想把那個py打包成為exe！！坑之路從此開始！

由於windows應用的編碼是MBCS！！超級大坑說白了其實就是編碼轉換一下但是過程真的很心酸！

python爬取百度圖片---釋出exe小計編碼是個大坑

#*--coding:utf-8--* import requests import sitecustomize import os import sys reload(sys) sys.setdefaultencoding('utf-8') type=sys.getfilesystemencodi

python爬取百度圖片代碼

python爬蟲；import json import itertools import urllib import requests import os import re import sys word=input("請輸入關鍵字：") path="./ok" if

Python 爬取百度圖片的高清原圖

# coding=utf-8 """ 爬取百度圖片的高清原圖 Author : MirrorMan Created : 2017-11-10 """ import re import urllib import os import requests de

Python 爬取百度圖片

百度圖片抓包資料: 引數詳情: 資料解析: from urllib import request, parse from http import cookiejar import

python爬取百度搜索圖片

知乎需要 with 異常 mage 不足 request height adr 在之前通過爬取貼吧圖片有了一點經驗，先根據之前經驗再次爬取百度搜索界面圖片廢話不說，先上代碼 #!/usr/bin/env python # -*- coding: utf-8 -*- #

Python 3.5_簡單上手、爬取百度圖片的高清原圖 Python 3.5_簡單上手、爬取百度圖片的高清原圖

Python 3.5_簡單上手、爬取百度圖片的高清原圖 2017年11月10日 15:49:50 閱讀數：1008 利用工作之餘的時間，學習Python差不多也有小一個月的時間了，路漫漫其修遠兮，我依然是隻菜鳥。感覺

Python爬取百度貼吧圖片指令碼

新手，以下是爬取百度貼吧制定帖子的圖片指令碼，因為指令碼主要是解析html程式碼，因此一旦百度修改頁面前端程式碼，那麼指令碼會失效，權當爬蟲入門練習吧，後續還會嘗試更多的爬蟲。 # coding=ut

Python爬取百度貼吧的圖片

Python是一個弱型別的動態語言下面是我的第一個簡單的爬蟲指令碼程式 #coding=gbk #匯入re和urlLib兩個庫 import re import urllib #定義一個有參的獲得圖片的方法,方法名為getImg def getImg(url):

python爬蟲爬取百度圖片

爬蟲爬取百度圖片因公司業務需要，而且公司人手不足，我這個測試工程師需要臨時客串一下其他職位，所以，由我來爬取百度圖片。說明 1、最近稍微有點兒忙，沒顧得上整理。而且程式碼量比較少，所以註釋比較少。 2、如果需要直接使用我的程式碼，請將相應路徑檔名稱更改。具體

Python 3.5_簡單上手、爬取百度圖片的高清原圖

利用工作之餘的時間，學習Python差不多也有小一個月的時間了，路漫漫其修遠兮，我依然是隻菜鳥。感覺學習新技術確實是一個痛並快樂著的過程，在此分享些心得和收穫，並貼一個爬取百度圖片原圖的程式碼。一、安裝，搭建環境首先是Python的安裝，我想網上已經很多了，如果

Python依據單個關鍵詞爬取百度圖片

最近由於工作需要要使用大量的水果蔬菜圖片，故萌生使用爬蟲抓取百度圖片的想法，並未用於商業用途，只是為了測試資料。所以並未使用多執行緒、框架等技術。由於百度圖片是動態載入的，發現搜尋關鍵詞後action的引數很相似，故使用requests.get(url ,

python 3 爬取百度圖片

糾結於爬取百度圖片，竟然花費了一天的時間才讓程式順利跑起來。其中踩坑無數。而且還發現公司電腦實在是比較差勁。。。 import requests import urllib import os , re from os.path import join

Python爬取百度貼吧數據

utf-8 支持我 family encode code word keyword 上一條時間　　本渣除了工作外，在生活上還是有些愛好，有些東西，一旦染上，就無法自拔，無法上岸，從此走上一條不歸路。花鳥魚蟲便是我堅持了數十年的愛好。　　本渣還是需要上班，才能支持我的

python爬取百度搜索結果ur匯總

百度搜索 sta attr amp end rom range 百度篩選寫了兩篇之後，我覺得關於爬蟲，重點還是分析過程分析些什麽呢： 1）首先明確自己要爬取的目標　　比如這次我們需要爬取的是使用百度搜索之後所有出來的url結果 2）分析手動進行的獲取目標的過程，以便

python 爬取百度url

style not 域名 head dex fin compile threads www 1 #!/usr/bin/env python 2 # -*- coding: utf-8 -*- 3 # @Date : 2017-08-29 18:38:23 4

【學習筆記】python爬取百度真實url

python 今天跑個腳本需要一堆測試的url，，，挨個找復制粘貼肯定不是程序員的風格，so，還是寫個腳本吧。環境：python2.7 編輯器：sublime text 3 一、分析一下首先非常感謝百度大佬的url分類非常整齊，都在一個

python爬取百度翻譯返回：{'error': 997, 'from': 'zh', 'to': 'en', 'query 問題

escape result words fan use rip odin 解決 base 解決辦法：修改url為手機版的地址：http://fanyi.baidu.com/basetrans User-Agent也用手機版的測試代碼： # -*- coding: utf

selenium+chrome瀏覽器驅動-爬取百度圖片

com max-age col presence and 下載其他 htm row 百度圖片網頁中中，當頁面滾動到底部，頁面會加載新的內容。我們通過selenium和谷歌瀏覽器驅動，執行js，是瀏覽器不斷加載頁面，通過抓取頁面的圖片路徑來下載圖片。 1 from s

python爬取百度貼吧指定內容

環境:python3.6 1：抓取百度貼吧—linux吧內容基礎版抓取一頁指定內容並寫入檔案萌新剛學習Python爬蟲,做個練習貼吧連結: http://tieba.baidu.com/f?kw=linux&ie=utf-8&pn=0 解析原始碼使用的是B

python3 anaconda pycharm 爬取百度圖片

#-*- coding:utf-8 -*- import time import requests from urllib import request from xml import etree import random import os class baiduimgspider(obj

python爬取百度圖片---釋出exe小計 編碼是個大坑

相關推薦

python爬取百度圖片---釋出exe小計編碼是個大坑