1. 程式人生 > >【python爬蟲自學筆記】-----爬取網易雲歌單中歌曲歌詞

【python爬蟲自學筆記】-----爬取網易雲歌單中歌曲歌詞

工具:python3.6 ,pycharm

開始對網頁的內容進行爬取的時候,使用requests獲得響應,只傳url,但是沒有獲得響應,使用urllib新增請求頭部,並對response的內容使用utf-8進行解碼,使用BeautifulSoup轉換為html物件,並格式化列印物件內容。

此爬蟲中最重要的一點是獲得歌詞的連結,此連結在網頁的原始碼中是隱藏的,參看文章說明,使用的是網易雲開放的API介面

#爬取網易雲音樂我的歌單裡面所有歌曲的歌詞
import json
import requests
import re
import urllib
from bs4 import *
myurl = "http://music.163.com/playlist?id=2251736705"
headers = {"Host":" music.163.com",
"User-Agent":" Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0",
}
request = urllib.request.Request(myurl,headers=headers)
response = urllib.request.urlopen(request)
#不decode的話text是十六進位制,不是中文
html = response.read().decode('utf-8','ignore')
soup = BeautifulSoup(html,'lxml')
print(soup.prettify())

#列印的有用的資料部分
<ul class="f-hide">
         <li>
          <a href="/song?id=5048569">
           Wonderful Tonight
          </a>
         </li>
         <li>
          <a href="/song?id=1299217">
           Tears in Heaven
          </a>
         </li>
         <li>
          <a href="/song?id=17541009">
           Autumn Leaves
          </a>
         </li>
         <li>
          <a href="/song?id=28851137">
           Sensitive Kind
          </a>
         </li>
         <li>
          <a href="/song?id=25542198">
           My Back Pages
          </a>
         </li>
         <li>
          <a href="/song?id=17541090">
           Lay Down Sally
          </a>
         </li>
         <li>
          <a href="/song?id=26641658">
           Riding With the King
          </a>
         </li>
         <li>
          <a href="/song?id=17540892">
           Change The World
          </a>
         </li>
         <li>
          <a href="/song?id=28040815">
           Layla
          </a>
         </li>
         <li>
          <a href="/song?id=26641663">
           Help the Poor
          </a>
         </li>
         <li>
          <a href="/song?id=5201813">
           Tears In Heaven
          </a>
         </li>
         <li>
          <a href="/song?id=17540496">
           Piece Of My Heart (Album Version)
          </a>
         </li>
         <li>
          <a href="/song?id=28851139">
           Magnolia
          </a>
         </li>
         <li>
          <a href="/song?id=17540498">
           One Track Mind (Album Version)
          </a>
         </li>
         <li>
          <a href="/song?id=26641661">
           Marry You
          </a>
         </li>
         <li>
          <a href="/song?id=26641665">
           Worried Life Blues
          </a>
         </li>
         <li>
          <a href="/song?id=28851135">
           Someday
          </a>
         </li>
         <li>
          <a href="/song?id=28851134">
           Rock And Roll Records
          </a>
         </li>
         <li>
          <a href="/song?id=17541200">
           Old Love
          </a>
         </li>
         <li>
          <a href="/song?id=17541190">
           Hey Hey
          </a>
         </li>
         <li>
          <a href="/song?id=26641669">
           Come Rain or Come Shine
          </a>
         </li>
         <li>
          <a href="/song?id=1077606">
           Change the World (Live)
          </a>
         </li>
         <li>
          <a href="/song?id=28851141">
           Songbird
          </a>
         </li>
         <li>
          <a href="/song?id=413961594">
           I Will Be There
          </a>
         </li>
         <li>
          <a href="/song?id=18610067">
           Last Will And Testament (Album Version)
          </a>
         </li>
         <li>
          <a href="/song?id=28851136">
           Lies
          </a>
         </li>
         <li>
          <a href="/song?id=1298826">
           Knockin' on Heaven's Door
          </a>
         </li>
         <li>
          <a href="/song?id=17540893">
           My Father's Eyes
          </a>
         </li>
         <li>
          <a href="/song?id=27490248">
           Everytime I Sing the Blues
          </a>
         </li>
         <li>
          <a href="/song?id=17540856">
           Cocaine
          </a>
         </li>
         <li>
          <a href="/song?id=18610066">
           Don't Cry Sister (Album Version)
          </a>
         </li>
         <li>
          <a href="/song?id=31918662">
           Riding With The King
          </a>
         </li>
         <li>
          <a href="/song?id=26641662">
           Three O'Clock Blues
          </a>
         </li>
         <li>
          <a href="/song?id=1299044">
           Jeff's Blues
          </a>
         </li>
         <li>
          <a href="/song?id=26641668">
           Hold On! I'm Comin'
          </a>
         </li>
         <li>
          <a href="/song?id=17540639">
           Golden Ring
          </a>
         </li>
         <li>
          <a href="/song?id=31918653">
           Behind The Mask
          </a>
         </li>
         <li>
          <a href="/song?id=28851140">
           I Got The Same Old Blues
          </a>
         </li>
         <li>
          <a href="/song?id=1297898">
           Over The Rainbow
          </a>
         </li>
         <li>
          <a href="/song?id=17540956">
           Tears In Heaven
          </a>
         </li>
         <li>
          <a href="/song?id=17540890">
           Running On Faith - Unplugged
          </a>
         </li>
         <li>
          <a href="/song?id=26641659">
           Ten Long Years
          </a>
         </li>
         <li>
          <a href="/song?id=26641660">
           Key to the Highway
          </a>
         </li>
         <li>
          <a href="/song?id=26641664">
           I Wanna Be
          </a>
         </li>
         <li>
          <a href="/song?id=31918654">
           Sweet Home Chicago
          </a>
         </li>
         <li>
          <a href="/song?id=28040813">
           Driftin'
          </a>
         </li>
         <li>
          <a href="/song?id=413961593">
           Can't Let You Do It
          </a>
         </li>
         <li>
          <a href="/song?id=28851133">
           They Call Me The Breeze
          </a>
         </li>
         <li>
          <a href="/song?id=18610062">
           It's Easy (Album Version)
          </a>
         </li>
         <li>
          <a href="/song?id=17541198">
           San Francisco Bay Blues
          </a>
         </li>
        </ul>

將爬取的歌詞寫入一個檔案中

#開啟jazz.txt 把歌單中的歌詞寫入
f=open('jazz.txt','w',encoding='utf-8')

首先獲得歌曲的id,根據列印輸出html物件結構可以看出,他們包含在一個ul標籤中,每首歌包含在一個li標籤中


for item in soup.ul.children:
    #取出歌單裡歌曲的id  形式為:/song?id=11111111
    song_id = item('a')[0].get("href",None)
    #歌曲名稱
    song_name = item.string
    #利用正則表示式提取出song_id的數字部分sid
    pat = re.compile(r'[0-9].*$')#提取模式為全都為數字的字串
    sid = re.findall(pat,song_id)[0]#提取歌曲ID
    #列印歌曲ID以及名稱
    print(sid+"-"+song_name)

5048569-Wonderful Tonight
1299217-Tears in Heaven
17541009-Autumn Leaves
28851137-Sensitive Kind 
25542198-My Back Pages
17541090-Lay Down Sally
26641658-Riding With the King
17540892-Change The World
28040815-Layla
26641663-Help the Poor
5201813-Tears In Heaven
17540496-Piece Of My Heart (Album Version)
28851139-Magnolia 
17540498-One Track Mind (Album Version)
26641661-Marry You
26641665-Worried Life Blues
28851135-Someday 
28851134-Rock And Roll Records 
17541200-Old Love
17541190-Hey Hey
26641669-Come Rain or Come Shine
1077606-Change the World (Live)
28851141-Songbird
413961594-I Will Be There
18610067-Last Will And Testament (Album Version)
28851136-Lies
1298826-Knockin' on Heaven's Door
17540893-My Father's Eyes
27490248-Everytime I Sing the Blues
17540856-Cocaine
18610066-Don't Cry Sister (Album Version)
31918662-Riding With The King
26641662-Three O'Clock Blues
1299044-Jeff's Blues
26641668-Hold On! I'm Comin'
17540639-Golden Ring
31918653-Behind The Mask
28851140-I Got The Same Old Blues 
1297898-Over The Rainbow
17540956-Tears In Heaven
17540890-Running On Faith - Unplugged
26641659-Ten Long Years
26641660-Key to the Highway
26641664-I Wanna Be
31918654-Sweet Home Chicago
28040813-Driftin'
413961593-Can't Let You Do It
28851133-They Call Me The Breeze
18610062-It's Easy (Album Version)
17541198-San Francisco Bay Blues

得到的歌曲為json格式,解析並且列印:

 #這裡的url是真實的歌詞頁面
    url = "http://music.163.com/api/song/lyric?"+"id="+str(sid)+"&lv=1&kv=1&tv=-1"
    html = requests.post(url)
    json_obj = html.text
    #歌詞是一個json物件 解析它
    j = json.loads(json_obj)
    print(j)
{'sgc': True, 'sfy': False, 'qfy': False, 'transUser': {'id': 5048569, 'status': 99, 'demand': 1, 'userid': 121424, 'nickname': '老白怪蜀黍', 'uptime': 1522309673919}, 'lrc': {'version': 12, 'lyric': "[00:22.270]It's late in the evening\n[00:27.140]she's wondering what clothes to wear\n[00:32.200]She puts on her make-up\n[00:37.410]and brushes her long blonde hair\n[00:42.600]And then she asks me Do I look all right\n[00:50.690]And I say Yes you look wonderful tonight\n[01:07.890]We go to a party and everyone turns to see\n[01:17.760]This beautiful lady that's walking around with me\n[01:27.790]And then she asks me Do you feel all right\n[01:36.160]And I say Yes I feel wonderful tonight\n[01:46.030]I feel wonderful because I see\n[01:51.720]The love light in your eyes\n[01:57.140]And the wonder of it all\n[02:01.770]Is that you just don't realize how much I love you\n[02:29.420]It's time to go home now and I've got an aching head\n[02:39.040]So I give her the car keys and she helps me to bed\n[02:49.400]And then I tell her as I turn out the light\n[02:57.860]I say My darling you were wonderful tonight\n[03:07.960]Oh my darling you were wonderful tonight\n"}, 'klyric': {'version': 0, 'lyric': None}, 'tlyric': {'version': 1, 'lyric': '[by:阿坤_Arcane]\n[00:22.270]那是一個傍晚\n[00:27.140]她在想穿什麼衣服\n[00:32.200]她打扮好自己\n[00:37.410]然後梳理妥金色的長髮\n[00:42.600]然後她問我:我看起來還好嗎?\n[00:50.690]我說:是的,今晚的你美極了\n[01:07.890]我們去參加派對,所有的人都轉過頭\n[01:17.760]看著這位陪在我身邊的美麗的女士\n[01:27.790]然後她問我:你感覺還好吧\n[01:36.160]我說:是的,今晚感覺棒極了\n[01:46.030]我感到美妙,是因為我看到了\n[01:51.720]你眼中愛的光芒\n[01:57.140]而其中最最美妙的\n[02:01.770]恰是你不會明白我有多麼的愛你\n[02:29.420]是時候回家了,我有一點酒醉頭痛\n[02:39.040]我把車鑰匙給她,她會服侍我回家躺下\n[02:49.400]當我走出派對最後一縷燈光\n[02:57.860]我說:親愛的,今晚你真的很美\n[03:07.960]哦,我的愛人,今晚你真的很美\n'}, 'code': 200}

得到json格式的歌詞並獲得歌詞部分的內容,得到原歌詞內容以及翻譯的歌詞內容:

 try:
        lyric = j['lrc']['lyric']
        tlyric = j['tlyric']['lyric']
        print(lyric)
        print(tlyric)
    except KeyError:
        lyric = "無歌詞"

[00:22.270]It's late in the evening
[00:27.140]she's wondering what clothes to wear
[00:32.200]She puts on her make-up
[00:37.410]and brushes her long blonde hair
[00:42.600]And then she asks me Do I look all right
[00:50.690]And I say Yes you look wonderful tonight
[01:07.890]We go to a party and everyone turns to see
[01:17.760]This beautiful lady that's walking around with me
[01:27.790]And then she asks me Do you feel all right
[01:36.160]And I say Yes I feel wonderful tonight
[01:46.030]I feel wonderful because I see
[01:51.720]The love light in your eyes
[01:57.140]And the wonder of it all
[02:01.770]Is that you just don't realize how much I love you
[02:29.420]It's time to go home now and I've got an aching head
[02:39.040]So I give her the car keys and she helps me to bed
[02:49.400]And then I tell her as I turn out the light
[02:57.860]I say My darling you were wonderful tonight
[03:07.960]Oh my darling you were wonderful tonight

[by:阿坤_Arcane]
[00:22.270]那是一個傍晚
[00:27.140]她在想穿什麼衣服
[00:32.200]她打扮好自己
[00:37.410]然後梳理妥金色的長髮
[00:42.600]然後她問我:我看起來還好嗎?
[00:50.690]我說:是的,今晚的你美極了
[01:07.890]我們去參加派對,所有的人都轉過頭
[01:17.760]看著這位陪在我身邊的美麗的女士
[01:27.790]然後她問我:你感覺還好吧
[01:36.160]我說:是的,今晚感覺棒極了
[01:46.030]我感到美妙,是因為我看到了
[01:51.720]你眼中愛的光芒
[01:57.140]而其中最最美妙的
[02:01.770]恰是你不會明白我有多麼的愛你
[02:29.420]是時候回家了,我有一點酒醉頭痛
[02:39.040]我把車鑰匙給她,她會服侍我回家躺下
[02:49.400]當我走出派對最後一縷燈光
[02:57.860]我說:親愛的,今晚你真的很美
[03:07.960]哦,我的愛人,今晚你真的很美

使用正則表示式獲得例如[00:22.270]的模式然後使用空字串進行替換,re.sub()具體使用方法見re正則表示式用法。string.strip()方法具體使用見string.strip()使用

    pat = re.compile(r'\[.*\]')
    lrc = re.sub(pat,"",lyric)
    tlrc = re.sub(pat,"",tlyric)
    lrc = sid+"-"+song_name+'\n'+lrc.strip()+'\n'+tlrc.strip()+'\n'
    print(lrc)
    f.write(lrc)
f.close()

5048569-Wonderful Tonight
It's late in the evening
she's wondering what clothes to wear
She puts on her make-up
and brushes her long blonde hair
And then she asks me Do I look all right
And I say Yes you look wonderful tonight
We go to a party and everyone turns to see
This beautiful lady that's walking around with me
And then she asks me Do you feel all right
And I say Yes I feel wonderful tonight
I feel wonderful because I see
The love light in your eyes
And the wonder of it all
Is that you just don't realize how much I love you
It's time to go home now and I've got an aching head
So I give her the car keys and she helps me to bed
And then I tell her as I turn out the light
I say My darling you were wonderful tonight
Oh my darling you were wonderful tonight
那是一個傍晚
她在想穿什麼衣服
她打扮好自己
然後梳理妥金色的長髮
然後她問我:我看起來還好嗎?
我說:是的,今晚的你美極了
我們去參加派對,所有的人都轉過頭
看著這位陪在我身邊的美麗的女士
然後她問我:你感覺還好吧
我說:是的,今晚感覺棒極了
我感到美妙,是因為我看到了
你眼中愛的光芒
而其中最最美妙的
恰是你不會明白我有多麼的愛你
是時候回家了,我有一點酒醉頭痛
我把車鑰匙給她,她會服侍我回家躺下
當我走出派對最後一縷燈光
我說:親愛的,今晚你真的很美
哦,我的愛人,今晚你真的很美