1. 程式人生 > >Python requests 自動登入某財BBS,自動簽到打卡領銅錢,最後再配個plist,每天自動執行

Python requests 自動登入某財BBS,自動簽到打卡領銅錢,最後再配個plist,每天自動執行

某財的使用者應該都知道這個網站,在“簽到有禮”版塊,每天會有一貼,用帖子中給出的關鍵字回帖,得銅錢,據說銅錢可以換現金,還可以換書。

真好,裸辭在家的失業人員最需要這個~每天領之。

基本思路:

  1. 先用抓包工具仔細分析下登陸以及回帖時post了哪些資料,這些資料從何而來(我用的Firefox + Firebug,挺好用的,選上保持+全部,就算頁面重定向,所有的請求也都能看到);
  2. python requests庫,用requests.Session().post來登陸和回帖,用get來讀取頁面內容;
  3. 登陸之後,拿到BBS首頁HTML原始碼,正則+BeautifulSoup,找到“簽到有禮”子版塊的相對URL,以及它的forumID,跟baseURL拼接得到子版塊URL;
  4. get子版塊URL,拿到子版塊首頁HTML原始碼,相同的方法,得到當日簽到帖子的相對URL和threadID;
  5. get簽到帖子URL,找到帖子裡的關鍵字,post回帖之;
  6. 然後再一路找到自己資訊欄,看看自己的銅錢數;
  7. 最後,把這一路走來的中間過程和狀態寫入到log檔案裡,方便出問題後反查。
  8. 最後的最後,寫個.sh指令碼,裡面執行這個python程式,配置個相應的plist,每天自動執行(MAC OS)

先說說我踩過的坑:

  • 登陸post之後,返回的是200狀態值,也就是成功,但是回帖post時,永遠提示未登陸,肯定是cookie出了問題,但是requests是自動保持cookie和session的,吭吭哧哧大半天之後,crab大神一語點醒了我。仔細看抓包工具裡的相關包,仔細看post之後,響應的content!!!
    這個網站登陸資料post之後,響應的content是個scripts,裡面有兩個連結,乍一看,都是什麼API...從抓包工具裡也能清楚看到,post登陸資料之後,立馬連著兩個get,請求的URL正是post之後,響應的content裡面的那兩個URL。並且這兩個get得到的響應都是set-cookie,沒錯,這兩個URL就是傳說中『種cookie』的。所以在登入post之後,再get這兩個URL,後面就OK了。所以,不管爬什麼網站,仔細分析清楚請求包和響應包,這是一切的基礎!

小知識點GET:

  • f = open(file,'r+')  
  • f = open(file,'w+')
  • 乍一看,r+ 跟 w+ 沒區別,其實有很大區別:r+ 方式,檔案必須存在,否則會報錯,用r+方式寫的時候,它是從頭開始覆蓋的,覆蓋到哪裡算哪裡;而w+方式,檔案不存在時會新建,寫入的時候是全部清空再寫入的。a+則是可讀可寫,並且是用追加方式寫入的。


下面程式執行方法:python 路徑/Auto_Login_Reply.py 使用者名稱/密碼,這種方式有個好處,不用改.py裡面的使用者名稱密碼引數,直接帶引數執行。

本程式是面向過程的,從頭至尾,一氣呵成。

python真好,既能面向物件,也能面向過程,靈活巧妙,贊!

#!/usr/bin/env python
#-*- coding:utf-8 -*-

__author__ = 'Sophie2805'

import re
import os.path

import requests
from bs4 import BeautifulSoup

import time
import sys

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
if log.txt does not exist under current executing path, create it.
write log, if the log file is larger than 100 lines, delete all then write from beginning
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

file = os.path.abspath('.')+'/log.txt'#get the absolute path of .py executing
if not os.path.isfile(file):#not exist, create a new one
    f = open(file,'w')
    f.close()

if os.path.getsize(file)/1024 > 1024:#larger than 1MB
    f = open(file,'w')
    try:
        f.write('')
    finally:
        f.close()

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
python Auto_Login_Reply.py user/pwd
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

args = sys.argv[1]
#print args
username = args[0:args.find('/')]
pwd = args[args.find('/')+1:len(args)]
#print username , pwd

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
using log_list[] to log the whole process
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

#print os.path.abspath('.')
log_list = []
log_list.append('+++++++++++++++++++++++++++++++++++++++++++++\n')
log_list.append('++++挖財簽到有禮'+(time.strftime("%m.%d %T"))+' 每天簽到得銅錢++++\n')
log_list.append('+++++++++++++++++++++++++++++++++++++++++++++\n')

s = requests.Session()

agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Firefox/38.0'
connection = 'keep-alive'

s.headers. update({'User-Agent':agent,
                   'Connection':connection})

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
post login request to this URL, observed in Firebug
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

login_url = 'https://www.wacai.com/user/user!login.action?cmd=null'

login_post_data ={
    'user.account':username,
    'user.pwd':pwd
}

try:
    login_r = s.post(login_url,login_post_data)
except Exception,e:
    log_list.append(time.strftime("%m.%d %T") + '--Login Exception: '+ e + '.\n')

f = open(file,'a')#append
try:
    f.writelines(log_list)
finally:
    f.close()
log_list=[]

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
these two get() are very import!!!
login_r.content return these 2 api URLs.
Without getting these 2 URLs, the BBS will not take our session as already login.
I assume, getting these 2 URLs, some critical cookie will be returned.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

src1 = login_r.content[login_r.content.find('src')+5:login_r.content.find('"></script>')]
src2 = login_r.content[login_r.content.rfind('src')+5:login_r.content.rfind('"></script><script>')]
#print src1
#print src2
s.get(src1)
s.get(src2)

base_url = 'http://bbs.wacai.com/'
homepage_r = s.get(base_url)
if '我的挖財' in homepage_r.content:
    log_list.append(time.strftime("%m.%d %T") + '--Successfully login.\n')
#print homepage_r.content
'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
find the checkin forum URL and ID, which is used as fid parameter in the reply post URL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''
pattern = '<.+>簽到有禮<.+>'
p = re.compile(pattern)
soup = BeautifulSoup(p.findall(homepage_r.content)[0])
checkin_postfix = soup.a['href']
checkin_forum_url = checkin_postfix
#print checkin_postfix
forum_id = checkin_postfix[checkin_postfix.find('-')+1:checkin_postfix.rfind('-')]
#print forum_id
if forum_id != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find the checkin forum ID.\n')
    print '--Successfully find the checkin forum ID'
    f = open(file,'a')#append
    try:
        f.writelines(log_list)
    finally:
        f.close()
    log_list=[]

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
get the checkin forum portal page and find today's thread URL and ID, which is used as tid parameter in the reply post URL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''
checkin_forum_page=s.get(checkin_forum_url)
#print checkin_forum_page.content
#print checkin_forum_page.status_code
title = '簽到有禮'+(time.strftime("%m.%d")).lstrip('0')+'每天簽到得銅錢,每人限回一次'
print title;
pattern_1 = '<.+>'+title + '<.+>'
p_1 = re.compile(pattern_1)
soup = BeautifulSoup(p_1.findall(checkin_forum_page.content)[0])
thread_postfix = soup.a['href']
thread_url = base_url + thread_postfix
thread_id= thread_postfix[thread_postfix.find('-')+1:thread_postfix.rfind('-')-2]
#print thread_id

if thread_id != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find the thread ID.\n')
    f = open(file,'a')#append
    try:
        f.writelines(log_list)
    finally:
        f.close()
    log_list=[]
t = s.get(thread_url)

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
formhash is a must in the post data, observed in Firebug.
So get the formhash from the html of the page
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''
pattern_2 = '<input type="hidden" name="formhash" .+/>'
p_2 = re.compile(pattern_2)
soup = BeautifulSoup(p_2.findall(t.content)[0])
formhash = soup.input['value']

pattern_3 = '回帖內容必須為'+'.+'+'</font>非此內容將收回銅錢獎勵'
result_3 = re.compile(pattern_3).findall(t.content)
#print result_3
key = result_3[0][result_3[0].find('>')+1:result_3[0].rfind('<')-1]
if key != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find the key word.\n')
    f = open(file,'a')#append
    try:
        f.writelines(log_list)
    finally:
        f.close()
    log_list=[]

'''~~~~~~~
auto reply
~~~~~~~~~~'''

host='bbs.wacai.com'
s.headers.update({'Referer':thread_url})
s.headers.update({'Host':host})
reply_data={
    'formhash':formhash,
    'message':key,
    'subject':'',
    'usesig':''
}
reply_post_url = 'http://bbs.wacai.com/forum.php?mod=post&action=reply&fid='+forum_id+'&tid='+thread_id+'&extra=&replysubmit=yes&infloat=yes&handlekey=fastpost&inajax=1'
try:
    reply_r = s.post(reply_post_url,data=reply_data)
except Exception,e:
    log_list.append(time.strftime("%m.%d %T") + '--Reply exception: '+ e +'.\n' )
if '非常感謝,回覆釋出成功,現在將轉入主題頁,請稍候……' in reply_r.content:#success
    log_list.append(time.strftime("%m.%d %T") + '--Successfully auto reply.\n')
else:
    log_list.append(time.strftime("%m.%d %T") + '--Fail to reply: '+ reply_r.content + '.\n')
f = open(file,'a')#append
try:
    f.writelines(log_list)
finally:
    f.close()
log_list=[]
'''~~~~~~~~~~~~~~
find my WaCai URL
~~~~~~~~~~~~~~~~~'''
pattern_4 = '<.+訪問我的空間.+</a>'
p_4 = re.compile(pattern_4)
soup = BeautifulSoup(p_4.findall(t.content)[0])
if soup.a['href'] != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find my WaCai link.\n' )
    f = open(file,'a')#append
    try:
        f.writelines(log_list)
    finally:
        f.close()
    log_list=[]
mywacai_url = soup.a['href']
mywacai_page = s.get(mywacai_url)

'''~~~~~~~~~~~~~
find my info URL
~~~~~~~~~~~~~~~~'''
pattern_5 = '<.+個人資料</a>'
p_5 = re.compile(pattern_5)
soup = BeautifulSoup(p_5.findall(mywacai_page.content)[0])
if soup.a['href'] != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find my info link.\n' )
    f = open(file,'a')#append
    try:
        f.writelines(log_list)
    finally:
        f.close()
    log_list=[]
myinfo_url = base_url+ soup.a['href']
myinfo_page = s.get(myinfo_url)

'''~~~~~~~~~~~~~~
find my coin info
~~~~~~~~~~~~~~~~~'''
pattern_6 = '<em>銅錢.+\n.+\n'
p_6 = re.compile(pattern_6)
coin = p_6.findall(myinfo_page.content)[0]
coin = coin[coin.find('</em>')+5:coin.find('</li>')]
if int(coin.strip()) != 0:
    log_list.append(time.strftime("%m.%d %T") + '--Successfully get my coin amount: %s.\n'% int(coin.strip()))
    f = open(file,'a')#append
    try:
        f.writelines(log_list)
    finally:
        f.close()
    log_list=[]


最後是plist,mac電腦用這個配置定時任務,windows的話,寫個bat,然後也可以配置的貌似。

先寫個test.sh指令碼,注意用chmod 777 test.sh給它賦予可執行的許可權:

cd /Users/Sophie/PycharmProjects/Auto_Login_Reply_BBS_WaCai

python Auto_Login_Reply.py username/password

然後到如下路徑

~/Library/LaunchAgents,新建個plist檔案,檔名為:wacai.bbs.auto.login.reply.plist。

注意label不要跟別的重複,寫個特別點的,ProgramArguments裡面寫上test.sh的絕對路徑,StartCalendarInterval裡面配置成幾點幾分自動執行,最後的StandardOutPath和StandardErrorPath要不要都行,要更好,出錯了可以看看錯誤資訊。

<span style="font-size:14px;"><span style="font-size:12px;"><?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>Label</key>
	<string>wacai.bbs.auto.login.reply</string>
	<key>ProgramArguments</key>
	<array>
		<string>/Users/Sophie/PycharmProjects/Auto_Login_Reply_BBS_WaCai/test.sh</string>
	</array>
	<key>StartCalendarInterval</key>
	<dict>
		<key>Minute</key>
		<integer>30</integer>
		<key>Hour</key>
		<integer>1</integer>
	</dict>
<key>StandardOutPath</key>
<string>/Users/Sophie/PycharmProjects/Auto_Login_Reply_BBS_WaCai/run.log</string>
<key>StandardErrorPath</key>
<string>/Users/Sophie/PycharmProjects/Auto_Login_Reply_BBS_WaCai/runerror.log</string>
</dict>
</plist></span></span>

編輯好了之後

launchctl load wacai.bbs.auto.login.reply.plist 啟用這個plist

launchctl start wacai.bbs.auto.login.reply 立即執行一次,注意,這裡是那個label值,不帶plist字尾的

修改plist之後,要launchctl unload .... 再 launchctl load...重新載入

還可以用launchctl list | grep wacai 來看看執行狀態,一般,有了PID,並且status為0即為一切正常,否則,哪裡有問題導致執行出了問題。

最後附上Log(我銅錢特別少,窮哭了已經!)以及某財網站截圖,希望頁面不要頻繁變動,不然我就得debug改指令碼了 = =#

+++++++++++++++++++++++++++++++++++++++++++++
++++挖財簽到有禮06.29 02:57:03 每天簽到得銅錢++++
+++++++++++++++++++++++++++++++++++++++++++++
06.29 02:57:19--Successfully login.
06.29 02:57:19--Successfully find the checkin forum ID.
06.29 02:57:19--Successfully find the thread ID.
06.29 02:57:19--Successfully find the key word.
06.29 02:57:19--Successfully auto reply.
06.29 02:57:19--Successfully find my WaCai link.
06.29 02:57:19--Successfully find my info link.
06.29 02:57:20--Successfully get my coin amount: 463.