1. 程式人生 > >愛奇藝、騰訊電視劇解析采集

愛奇藝、騰訊電視劇解析采集

localhost mark code cursor python mat mac 同步 utf

import re
import urllib
import urllib.request
import  pymysql

def getHtml(url):
    page = urllib.request.urlopen(url)
    html = page.read()
    return html

# iqiyi:<a data-pb=""  href="(http://www.iqiyi.com/v_[\s\S]*?html)"[\s\S]*?title="([\s\S]*?)"
# tencent:<a\s*href="(http://v\.qq\.com/x/cover.*?html)"\s*target="_blank[\s\S]*?<span[\s\S]*?episodeNumber">([\s\S]*?)</span>
def parse(url,regular): html = getHtml(url) html=html.decode(utf-8) urls = re.findall(regular, html, re.I) lst={} for u in urls: key=u[1] value=u[0] lst[str(key)]=str(value) result="" for v,k in lst.items(): result+="第{v}集${k}#".format(v=v,k=k)
# print(result) result = result[:-1] return result,len(lst) def exceDB(url,id,regular): result,len=parse(url,regular) len="同步更新至{len}集".format(len=len) conn = pymysql.connect("localhost","root","sa","m8",use_unicode=True,charset="utf8") cur = conn.cursor() sql = "update mac_vod set d_remarks=%s,d_playurl=%s where d_id=%s
" sta=cur.execute(sql,(len,result,id)) print(sta) cur.close() conn.commit() conn.close() # 鬼吹燈之牧野詭事 每周一、周二20:00各更新1集 url = "http://www.iqiyi.com/lib/m_211070614.html" id = 39097 regular = <a data-pb="" href="(http://www.iqiyi.com/v_[\s\S]*?html)"[\s\S]*?title="([\s\S]*?)" exceDB(url,id,regular) # 雙世寵妃 每周一二20點每天更新2集 url = "http://v.qq.com/detail/4/47xswolfi4iamlx.html" id = 21271 regular = <a\s*href="(http://v\.qq\.com/x/cover.*?html)"\s*target="_blank[\s\S]*?<span[\s\S]*?episodeNumber">([\s\S]*?)</span> exceDB(url,id,regular)

使用python3.6.1+pymysql

pymysql 使用pip pymysql install命令安裝即可

采集後展示效果見:www.shurua.com

優酷的采集為了省事直接用了火車頭,此處暫不介紹了

愛奇藝、騰訊電視劇解析采集