1. 程式人生 > >使用requests爬取易物天下商品型別實戰.md

使用requests爬取易物天下商品型別實戰.md

##使用requests爬取易物天下商品型別實戰

  1. 確定要爬取的資料

    1. 爬取的是首頁的行業分類

    2. 確定資料來源

      • 先使用requests.get方法獲取網頁並沒有行業分類

          response = requests.get(url, params = qs, headers = headers)
        
      • 有可能資料是通過傳送ajax獲取來的

        瀏覽器開啟網址,右鍵檢查,選擇network,發現果然是通過ajax傳送來請求資料

  2. 開始爬取資料

    • 因為資料是通過ajax請求的,所以我直接把瀏覽器上所有的Request.headers中的所有欄位拷貝下來,變成一個字典

        headers={
        
        "Accept": "application/json, text/javascript, */*; q=0.01",
        'Accept-Encoding': "gzip, deflate",
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'Connection': 'keep-alive',
        'Content-Length': '4',
        'Content-Type': 'application/x-www-form-urlencoded',
        'Cookie': 'JSESSIONID=C7BD7DFF7031A1A7EE3B71336BE03419; gr_user_id=47edb7df-b13e-4c2c-8c0c-0db4c28f09ff; Hm_lvt_10bdb52fd1832ac4eeceeabdc4df132f=1537604218; Hm_lpvt_10bdb52fd1832ac4eeceeabdc4df132f=1537608109; gr_session_id_a08ca0a390ddd043=9646c1ab-1d0c-41be-819e-51a04b592b26; gr_session_id_a08ca0a390ddd043_9646c1ab-1d0c-41be-819e-51a04b592b26=true',
        'Host': 'www.i1515.com',
        'Origin': 'http://www.i1515.com',
        'Referer': 'http://www.i1515.com/',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36',
        'X-Requested-With': 'XMLHttpRequest'
        
                }
      
    • 檢視是否Form Data中是否有欄位,如果有,轉化成字典

        data={
            "id":"1"
        }
      
    • 最後我發現網站一共傳送12次ajax請求,並且每一次的id不同,所以我只需要通過迴圈來發送請求,將資料暫時儲存在json檔案中

        for i in range(1,12):
            data["id"]=str(i)
            try:
                response = requests.post(url=url, headers=headers, data=data)
                print(i)
                print(type(response.json()))
                result=response.json()
                print(type(response.json())=="dict")
                if type(response.json())==type({}):
                    print(response.json())
                    with open('type{}.json'.format(i),'w',encoding='utf-8') as f:
                        json.dump(result,f,ensure_ascii=False)
                        f.close()
            except Exception as ex:
                print(ex)
      
  3. 將json檔案中的資料儲存到資料庫中

    • 迴圈遍歷每個檔案

        with open('myspiders/type{}.json'.format(index), 'r', encoding='utf-8') as f:
      
    • 開啟資料庫

        conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='',
                                       db='orsp', charset='utf8')
      
    • 最後插入資料

##原始碼

  •   import requests
      import json
      url='http://www.i1515.com/v2/category/getOtherCategory.html'
      headers={
      
                  "Accept": "application/json, text/javascript, */*; q=0.01",
      'Accept-Encoding': "gzip, deflate",
      'Accept-Language': 'zh-CN,zh;q=0.9',
      'Connection': 'keep-alive',
      'Content-Length': '4',
      'Content-Type': 'application/x-www-form-urlencoded',
      'Cookie': 'JSESSIONID=C7BD7DFF7031A1A7EE3B71336BE03419; gr_user_id=47edb7df-b13e-4c2c-8c0c-0db4c28f09ff; Hm_lvt_10bdb52fd1832ac4eeceeabdc4df132f=1537604218; Hm_lpvt_10bdb52fd1832ac4eeceeabdc4df132f=1537608109; gr_session_id_a08ca0a390ddd043=9646c1ab-1d0c-41be-819e-51a04b592b26; gr_session_id_a08ca0a390ddd043_9646c1ab-1d0c-41be-819e-51a04b592b26=true',
      'Host': 'www.i1515.com',
      'Origin': 'http://www.i1515.com',
      'Referer': 'http://www.i1515.com/',
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36',
      'X-Requested-With': 'XMLHttpRequest'
      
              }
      
      data={
          "id":"1"
      }
      for i in range(1,12):
          data["id"]=str(i)
          try:
              response = requests.post(url=url, headers=headers, data=data)
              print(i)
              print(type(response.json()))
              result=response.json()
              print(type(response.json())=="dict")
              if type(response.json())==type({}):
                  print(response.json())
                  with open('type{}.json'.format(i),'w',encoding='utf-8') as f:
                      json.dump(result,f,ensure_ascii=False)
                      f.close()
          except Exception as ex:
              print(ex)
    
  • 將資料寫入到資料庫中的write_data.py

      import json
      import pymysql
      for index in range(1,12):
          try:
              with open('myspiders/type{}.json'.format(index), 'r', encoding='utf-8') as f:
                  data = json.load(f)
                  print(data["name"])
                  conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='',
                                         db='orsp', charset='utf8')
                  # 建立遊標物件
                  cursor = conn.cursor()
      
                  # 先查出name對應的id
                  sql_id_Byname = 'SELECT id FROM product_type WHERE product_type="{}"'.format(data["name"])
                  cursor.execute(sql_id_Byname)
                  res_id = cursor.fetchone()
                  res_id = res_id[0]
                  print(res_id)
                  # 再插入二級型別
                  for i in range(len(data["sCate"])):
                      sql_insert_two = "INSERT INTO `product_type_two` (`product_type_one_id`, `type_two_name`) VALUES ('{0}', '{1}')"
                      two_type = data["sCate"][i]["name"]
                      print("two_type", two_type)
                      sql_insert_two = sql_insert_two.format(res_id, two_type)
                      print(sql_insert_two)
                      cursor.execute(sql_insert_two)
                      insert_id = conn.insert_id()
                      print("insert_id", insert_id)
                      three_data = data["sCate"][i]["tCategorys"]
                      for j in three_data:
                          print(j["name"])
                          sql_insert_three = "INSERT INTO `product_type_three` (`product_type_two_id`, `type_three_name`) VALUES ({0}, '{1}')"
                          sql_insert_three = sql_insert_three.format(insert_id, j["name"])
                          print(sql_insert_three)
                          cursor.execute(sql_insert_three)
      
                  conn.commit()
          except Exception as ex:
              print(ex)