python爬蟲問題：TypeError: cannot use a string pattern on a bytes-like objec

阿新 • • 發佈：2018-12-09

Python3.x在學到爬蟲是需要注意不同於Python2.x需要將html進行解碼：

import urllib
import re
def download(url,user_agent='XD',num_retries=2):
    print('Downloading:',url)
    headers = {'User-agent':user_agent}
    request = urllib.request.Request(url,headers=headers)
    try:
        html = urllib.request.urlopen(url).read()
    except 
 urllib.error.URLError as e:
        print('Download error:',e.reason)
        html = None
        if num_retries >0:
            if hasattr(e,'code') and 500<= e.code <600:
                # recursively retry 5xx HTTP errors
                return download(url,user_agent,num_retries-1)
    return 
 html
def crawl_sitemap(url):
    # download the sitemap file
    sitemap = download(url)
    sitemap = sitemap.decode('utf-8')
    # extract the sitemap links
    links = re.findall('<loc>(.*?)</loc>',sitemap)
    # download each link
    for link in links:
        html = download(link)
        # scrape html here 

        #...
if __name__ == '__main__':
    crawl_sitemap('http://www.baidu.com/sitemap.xml')

在def crawl_sitemap(url): 中加入sitemap = sitemap.decode('utf-8') 進行解碼操作

python爬蟲問題：TypeError: cannot use a string pattern on a bytes-like objec

Python3.x在學到爬蟲是需要注意不同於Python2.x需要將html進行解碼： import urllib import re def download(url,user_agent='XD',num_retries=2): print

一勞永逸解決：TypeError: cannot use a string pattern on a bytes-like object

TypeError: cannot use a string pattern on a bytes-like object 最近寫程式碼，python2和python3之間切換，難免會碰到一些問題，有些方法比如re模組的findall要求傳入的是字串格式的引數

TypeError: cannot use a string pattern on a bytes-like object

dsm 模塊 odi nbsp mode IE python2 play ray 一勞永逸解決：TypeError: cannot use a string pattern on a bytes-like object TypeError: canno

can't use a string pattern on a bytes-like object錯誤和must be str, not bytes錯誤

哎，新手使用Python真是痛苦，步步艱辛步步淚！寫了四行程式，出來兩個錯誤。下面記錄錯誤和解決方法 ======================================================================================

Python異常：TypeError: a bytes-like object is required, not 'str'

Python異常：TypeError: a bytes-like object is required, not ‘str’ import json str = [{"userName" : "UserPython", "age" : 20}, (2, 3), 1] with o

Python異常：TypeError: a bytes-like object is required, not 'str'

Python異常：TypeError: a bytes-like object is required, not ‘str’ import json str = [{"userName" : "Use

JQuery $.each遍歷JSON字符串報Uncaught TypeError:Cannot use 'in' operator to search for

error type tex clipboard function sans ica arch tools 查看一個簡單的jQuery的例子來遍歷一個JavaScript數組對象。 [js] view plaincopy var json = [ {"i

Python爬蟲：學爬蟲前得了解的事兒

編寫 election 檢查語言 jpg mage 圖片一個網頁這是關於Python的第14篇文章，主要介紹下爬蟲的原理。提到爬蟲，我們就不得不說起網頁，因為我們編寫的爬蟲實際上是針對網頁進行設計的。解析網頁和抓取這些數據是爬蟲所做的事情。對於大部分網頁來講，它

python爬蟲：爬取網站視頻

爬蟲 python python爬取百思不得姐網站視頻：http://www.budejie.com/video/新建一個py文件，代碼如下：#!/usr/bin/python # -*- coding: UTF-8 -*- import urllib,re,requests import sys

Python爬蟲：新浪新聞詳情頁的數據抓取（函數版）

earch edit arm python爬蟲 print 詳情 contents enter uwa 上一篇文章《Python爬蟲：抓取新浪新聞數據》詳細解說了如何抓取新浪新聞詳情頁的相關數據，但代碼的構建不利於後續擴展，每次抓取新的詳情頁時都需要重新寫一遍，因此，我們需

Python爬蟲：HTTP協議、Requests庫

.org clas python爬蟲 print 通用娛樂信息傳輸協議介紹 HTTP協議： HTTP（Hypertext Transfer Protocol）：即超文本傳輸協議。URL是通過HTTP協議存取資源的Internet路徑，一個URL對應一個數據資源。

Python 爬蟲：把廖雪峰教程轉換成 PDF 電子書

網絡 odi 變量 turn chrome github htm git 臨時文件寫爬蟲似乎沒有比用 Python 更合適了，Python 社區提供的爬蟲工具多得讓你眼花繚亂，各種拿來就可以直接用的 library 分分鐘就可以寫出一個爬蟲出來，今天嘗試寫一個爬蟲，將廖雪

Python爬蟲：認識urllib/urllib2以及requests

更多查看 sts urllib2 chrome 超時設置 word 3.0 erro 首先說明一下我的爬蟲環境是基於py2.x的，為什麽用這個版本呢，因為py2.x的版本支持的多，而且一般會使用py2.x環境，基本在py3.x也沒有太大問題，好了，進入正題！ urlli

Python爬蟲：現學現用Xpath爬取豆瓣音樂

9.1 tree when href scrapy 發現 pat 直接 where 爬蟲的抓取方式有好幾種，正則表達式，Lxml(xpath)與Beautiful,我在網上查了一下資料，了解到三者之間的使用難度與性能三種爬蟲方式的對比。抓取方式性能使用難度

python 爬蟲：HTTP ERROR 406

spl att sof sel cati python error line TP 解決方法：設置了Accept頭後解決了，但是還是不知道原因 headers:{ Accept:"text/html, application/xhtml+xml, */

Python爬蟲：Xpath語法筆記

上一個 div 運算符 tar 爬蟲 att 語法 ont tab 常用的路勁表達式：表達式描述實例 nodename 選取nodename節點的所有子節點 xpath(‘//div’) 選取了div節點的所有子節點 / 從根節點選取 xpath

Python爬蟲：抓取手機APP的數據

sig ner ont sele ebo span fail pytho 抓取摘要: 大多數APP裏面返回的是json格式數據，或者一堆加密過的數據。這裏以超級課程表APP為例，抓取超級課程表裏用戶發的話題。 1、抓取APP數據包方法詳細可以參考這篇博文：

我的第一個python爬蟲：爬取豆瓣top250前100部電影

爬取豆瓣top250前100部電影 1 # -*-coding=UTF-8 -*- 2 3 import requests 4 from bs4 import BeautifulSoup 5 6 headers = {'User-Agent':'Moz

python爬蟲：爬取鏈家深圳全部二手房的詳細信息

data sts rip 二手房 lse area 列表 dom bubuko 1、問題描述：爬取鏈家深圳全部二手房的詳細信息，並將爬取的數據存儲到CSV文件中 2、思路分析: (1)目標網址：https://sz.lianjia.com/ershoufang/ (2

Python爬蟲：爬取網站電影資訊

以爬取電影天堂喜劇片前5頁資訊為例，程式碼如下： 1 # coding:UTF-8 2 3 import requests 4 import re 5 6 def mov(): 7 headers={'User-Agent':'Mozilla/5.0 (Windo

python爬蟲問題：TypeError: cannot use a string pattern on a bytes-like objec

相關推薦