常用模組(subprocess/hashlib/configparser/logging/re)

Python Java 正則表示式 · 發表 2018-10-28 18:53:00

摘要：一、subprocess（用來執行系統命令） import os cmd = r'dir D:xxx | findstr "py"' # res = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIP...

一、subprocess（用來執行系統命令）

import os

cmd = r'dir D:xxx | findstr "py"'
# res = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
# # 從管道中讀取資料管道就是 兩個程序通訊的媒介
# # print(type(res.stdout.read().decode("GBK")))
# print(res.stdout.read().decode("GBK"))
# print(res.stderr.read().decode("GBK"))

subprocess使用當前系統預設編碼，得到結果為bytes型別，在windows下需要用gbk解碼。

Conclusion

subprocess 主要用於執行系統指令（啟動子程序）與os.system的不同在於

subprocess 可以與這個子程序進行資料交換

二、hashlib（加密）

hash是一種演算法是將一個任意長的資料根據計算得到一個固定長度特徵碼

特徵: 不同輸入可能會有相同的結果機率特別小，相同的輸入必然得到相同結果，

由於雜湊(特徵)的性質，從原理來看是不可能反解。

用來驗證兩個輸入的資料是否一致

使用場景：

1.密碼驗證

2.驗證資料是否被篡改比如遊戲安裝包有沒有被改過，為了防止別人撞庫成功可用提升密碼的複雜度其次可以為密碼加鹽 (加點內容進去)

importhashlib

m = hashlib.md5("aaa".encode("utf-8"))
print(len(m.hexdigest()))32

# 撞庫破解的原理 有人事先 把常見的 明文和密文的對應關係 存到了資料庫中
# 運氣好就能查詢到
pwds = {"aaa":"47bce5c74f589f4867dbd57e9ca9f808"}


h1 = hashlib.sha512("123".encode("utf-8"))
h2 = hashlib.sha3_512("123".encode("utf-8"))

# print(len(h.hexdigest()))
print(h1.hexdigest())
print(h2.hexdigest())

# 2b70683ef3fa64572aa50775acc84855

# 加鹽
m = hashlib.md5("321".encode("utf-8"))
#加
m.update("abcdefplkjoujhh".encode("utf-8"))

print(m.hexdigest())

import hmac
# 沒啥區別 只是在建立的時候必須加鹽
h = hmac.new("abcdefjjjj".encode("utf-8"))

h.update("123".encode("utf-8"))

print(h.hexdigest())

三、configparser（用於解析配置檔案的模組）

何為配置檔案？

包含配置程式資訊的檔案就稱為配置檔案

什麼樣的資料應作為配置資訊？

需要改但是不經常改的資訊例如資料檔案的路徑 DB_PATH

配置檔案中只有兩種內容

一種是section 分割槽

一種是option選項就是一個key=value形式

用的最多的就是get功能用來從配置檔案獲取一個配置選項

import configparser
# 建立一個解析器
config = configparser.ConfigParser()
# 讀取並解析test.cfg
config.read("test.cfg",encoding="utf-8")
# 獲取需要的資訊
# 獲取所有分割槽
# print(config.sections())
# 獲取所有選項
# print(config.options("user"))
# 獲取某個選項的值
# print(config.get("path","DB_PATH"))
# print(type(config.get("user","age")))
#
# # get返回的都是字串型別如果需要轉換型別 直接使用get+對應的型別(bool int float)
# print(type(config.getint("user","age")))
# print(type(config.get("user","age")))

# 是否由某個選項
config.has_option()
# 是否由某個分割槽
# config.has_section()

# 不太常用的
# 新增
# config.add_section("server")
# config.set("server","url","192.168.1.2")
# 刪除
# config.remove_option("user","age")
# 修改
# config.set("server","url","192.168.1.2")

# 寫回檔案中
# with open("test.cfg", "wt", encoding="utf-8") as f:
#config.write(f)

練習:

做一個登入首先檢視配置檔案是否又包含使用者名稱和密碼如果由直接登入如果沒有就進行輸入使用者名稱密碼登入

登入成功後詢問是否要儲存密碼如果是寫入配置檔案

# import configparser
# 
# config = configparser.ConfigParser()
# config.read('login.ini', encoding='utf-8')
# username1 = 'wwl'
# password1 = '123'
# if config.has_option('user','username') and config.has_option('user','password'):
#print('welcome logging')
#exit()
# else:
#username = input('>>>請輸入使用者名稱：').strip()
#password = input('>>>請輸入密碼：').strip()
#if username == username1 and password == password1:
#print('welcome logging')
#print('儲存密碼請輸入1，退出請輸入2')
#choice = input('請輸入：')
#if choice == '1':
#with open('login.ini', 'wt', encoding='utf-8') as f:
#config.add_section("login")
#config.set("login", "Username",username)
#config.set("login", "Password", password)
#print(config.get('login','Username'))
#config.write(f)
#elif choice == '2':
#exit()
#else:
#print('wrong username or password')

login.ini #產生了新的配置檔案

[login]
username = wwl
password = 123

四、logging

一、日誌級別：

CRITICAL = 50 #FATAL = CRITICAL
ERROR = 40
WARNING = 30 #WARN = WARNING
INFO = 20
DEBUG = 10
NOTSET = 0 #不設定

二、預設級別為warning，預設列印到終端：

import logging

logging.debug('除錯debug')
logging.info('訊息info')
logging.warning('警告warn')
logging.error('錯誤error')
logging.critical('嚴重critical')

'''
WARNING:root:警告warn
ERROR:root:錯誤error
CRITICAL:root:嚴重critical

三、為logging模組指定全域性配置，針對所有logger有效，控制列印到檔案中

可在logging.basicConfig()函式中通過具體引數來更改logging模組預設行為，可用引數有
filename：用指定的檔名建立FiledHandler（後邊會具體講解handler的概念），這樣日誌會被儲存在指定的檔案中。
filemode：檔案開啟方式，在指定了filename時使用這個引數，預設值為“a”還可指定為“w”。
format：指定handler使用的日誌顯示格式。 
datefmt：指定日期時間格式。 
level：設定rootlogger（後邊會講解具體概念）的日誌級別 
stream：用指定的stream建立StreamHandler。可以指定輸出到sys.stderr,sys.stdout或者檔案，預設為sys.stderr。若同時列出了filename和stream兩個引數，則stream引數會被忽略

#格式
%(name)s：Logger的名字，並非使用者名稱，詳細檢視

%(levelno)s：數字形式的日誌級別

%(levelname)s：文字形式的日誌級別

%(pathname)s：呼叫日誌輸出函式的模組的完整路徑名，可能沒有

%(filename)s：呼叫日誌輸出函式的模組的檔名

%(module)s：呼叫日誌輸出函式的模組名

%(funcName)s：呼叫日誌輸出函式的函式名

%(lineno)d：呼叫日誌輸出函式的語句所在的程式碼行

%(created)f：當前時間，用UNIX標準的表示時間的浮 點數表示

%(relativeCreated)d：輸出日誌資訊時的，自Logger建立以 來的毫秒數

%(asctime)s：字串形式的當前時間。預設格式是 “2003-07-08 16:49:45,896”。逗號後面的是毫秒

%(thread)d：執行緒ID。可能沒有

%(threadName)s：執行緒名。可能沒有

%(process)d：程序ID。可能沒有

%(message)s：使用者輸出的訊息

logging.basicConfig()

四、logging模組的Formatter，Handler，Logger，Filter物件

#logger：產生日誌的物件

#Filter：過濾日誌的物件

#Handler：接收日誌然後控制列印到不同的地方，FileHandler用來列印到檔案中，StreamHandler用來列印到終端

#Formatter物件：可以定製不同的日誌格式物件，然後繫結給不同的Handler物件使用，以此來控制不同的Handler的日誌格式

'''
critical=50
error =40
warning =30
info = 20
debug =10
'''


import logging

#1、logger物件：負責產生日誌，然後交給Filter過濾，然後交給不同的Handler輸出
logger=logging.getLogger(__file__)

#2、Filter物件：不常用，略

#3、Handler物件：接收logger傳來的日誌，然後控制輸出
h1=logging.FileHandler('t1.log') #列印到檔案
h2=logging.FileHandler('t2.log') #列印到檔案
h3=logging.StreamHandler() #列印到終端

#4、Formatter物件：日誌格式
formmater1=logging.Formatter('%(asctime)s - %(name)s - %(levelname)s -%(module)s:%(message)s',
datefmt='%Y-%m-%d %H:%M:%S %p',)

formmater2=logging.Formatter('%(asctime)s :%(message)s',
datefmt='%Y-%m-%d %H:%M:%S %p',)

formmater3=logging.Formatter('%(name)s %(message)s',)


#5、為Handler物件繫結格式
h1.setFormatter(formmater1)
h2.setFormatter(formmater2)
h3.setFormatter(formmater3)

#6、將Handler新增給logger並設定日誌級別
logger.addHandler(h1)
logger.addHandler(h2)
logger.addHandler(h3)
logger.setLevel(10)

#7、測試
logger.debug('debug')
logger.info('info')
logger.warning('warning')
logger.error('error')
logger.critical('critical'

五、應用

"""
logging配置
"""

import os
import logging.config

# 定義三種日誌輸出格式 開始

standard_format = '[%(asctime)s][%(threadName)s:%(thread)d][task_id:%(name)s][%(filename)s:%(lineno)d]' \
'[%(levelname)s][%(message)s]' #其中name為getlogger指定的名字

simple_format = '[%(levelname)s][%(asctime)s][%(filename)s:%(lineno)d]%(message)s'

id_simple_format = '[%(levelname)s][%(asctime)s] %(message)s'

# 定義日誌輸出格式 結束

logfile_dir = os.path.dirname(os.path.abspath(__file__))# log檔案的目錄

logfile_name = 'all2.log'# log檔名

# 如果不存在定義的日誌目錄就建立一個
if not os.path.isdir(logfile_dir):
os.mkdir(logfile_dir)

# log檔案的全路徑
logfile_path = os.path.join(logfile_dir, logfile_name)

# log配置字典
LOGGING_DIC = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'standard': {
'format': standard_format
},
'simple': {
'format': simple_format
},
},
'filters': {},
'handlers': {
#列印到終端的日誌
'console': {
'level': 'DEBUG',
'class': 'logging.StreamHandler',# 列印到螢幕
'formatter': 'simple'
},
#列印到檔案的日誌,收集info及以上的日誌
'default': {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',# 儲存到檔案
'formatter': 'standard',
'filename': logfile_path,# 日誌檔案
'maxBytes': 1024*1024*5,# 日誌大小 5M
'backupCount': 5,
'encoding': 'utf-8',# 日誌檔案的編碼，再也不用擔心中文log亂碼了
},
},
'loggers': {
#logging.getLogger(__name__)拿到的logger配置
'': {
'handlers': ['default', 'console'],# 這裡把上面定義的兩個handler都加上，即log資料既寫入檔案又列印到螢幕
'level': 'DEBUG',
'propagate': True,# 向上（更高level的logger）傳遞
},
},
}


def load_my_logging_cfg():
logging.config.dictConfig(LOGGING_DIC)# 匯入上面定義的logging配置
logger = logging.getLogger(__name__)# 生成一個log例項
logger.info('It works!')# 記錄該檔案的執行狀態

if __name__ == '__main__':
load_my_logging_cfg()

logging配置檔案

五、re（正則表示式相關）

什麼是正則表示式？

一堆帶有特殊意義的符號組成式子

它的作用，處理(匹配查詢替換 )字串。

1. 在爬蟲中大量使用其實有框架幫你封裝了這些複雜的正則

在網站和手機app的註冊功能中大量使用例如判斷你的郵箱地址是否正確

import re

# =========單個字元匹配=========
print(re.findall("\n","1\n"))# 匹配換行符
print(re.findall("\t","1asasas121\t"))# 匹配製表符

# ==========範圍匹配===========
print(re.findall("\w","1aA_*")) # 匹配數字字母下劃線
print(re.findall("\W","1aA_*,")) # 匹配非數字字母下劃線
print(re.findall("\s","\n\r\t\f")) # 匹配任意空白字元
print(re.findall("\S","\n\r\t\f")) # 匹配任意非空白字元
print(re.findall("\d","123abc1*")) # 匹配任意非空白字元
print(re.findall("\D","123abc1*")) # 匹配任意非空白字元
# print(re.findall("[abc]","AaBbCc")) # 匹配 a b c都行
# print(re.findall("[^abc]","AaBbCc")) # 除了 a b c都行
# print(re.findall("[0-9]","AaBbCc12349")) # 除了 a b c都行
print(re.findall("[a-z]","AaBbCc12349")) # a-z 英文字母
print(re.findall("[A-z]","AaBbC:c:grinning:2349[]")) # A-z 匹配原理 是按照ascII碼錶

# ===========匹配位置======
print(re.findall("\A\d","123abc1*")) # 從字串的開始處匹配
print(re.findall("\d\Z","123abc1*9\n")) # 從字串的結束處匹配 注意把\Z寫在表示式的右邊
print(re.findall("\d$","123abc1*9"))# 從字串的結束處匹配如果末尾有換行 換行不會參與匹配
print(re.findall("^\d","s1asasas121\t"))# 從字元開始匹配數字

import re


# [] 範圍匹配中間 用-來連線
# re.findall("[a-zA-Z0-9]","a ab abc abcd a123c")
# 如果要匹配 符號-要寫表示式的左邊或右邊
# print(re.findall("[-ab]","a ab abc abcd a123c a--"))

# 重複匹配 表示式的匹配次數
# * 表示 任意次數所以0次也滿足
print(re.findall("[a-zA-Z]*","a ab abc abcdssdsjad a123c"))
#[a-zA-Z]*
# +一次或多次
print(re.findall("[a-zA-Z]+","a ab abc abcdssdsjad a123c"))
#[a-zA-Z]+
# ?0次或1次
print(re.findall("[a-zA-Z]?","a ab abc abcdssdsjad a123c"))

# {1,2} 自定義匹配次數{1，} 1到無窮 {，1} 0到1次
print(re.findall("[a-zA-Z]{1,2}","a ab abc abcdsdssjad a123c"))


一般用非貪婪匹配的情況多一些：

# + * 貪婪匹配表示式匹配的情況下 儘可能的多拿（一直匹配 直到不滿足為止）

# print(re.findall("\w*","jjsahdjshdjssadsa dssddsads"))
# print(re.findall("\w+","jjsahdjshdjssadsa dssddsads"))
# 非貪婪匹配 在表示式的後面加上?
# print(re.findall("\w?","jjsahdjshdjssadsa dssddsads")) # 非貪婪匹配

分組

# 分組 加上分組 不會改變原來的規則 僅僅是將括號中的內容單獨拿出來了
print(re.findall("([a-zA-Z]+)_dsb","aigen_dsb cxx_dsb alex_dsb zxx_xsb _dsb"))

模組中常用的函式

# re模組中常用的函式
# match 從字串開始處匹配只找一個
print(re.match("\w*","abc").group(0)) # 獲取匹配成功的內容
# group 用來獲取某個分組的內容 預設獲取第0組 就是整個表示式本身
print(re.match("([a-zA-Z]+)(_dsb)","aigen_dsb cxx_dsb alex_dsb zxx_xsb _dsb").group(2))
print(re.match("\w*","abc").span()) # 獲取匹配成功的內容的索引

print(re.search("\w*","abc").group())
# 從全文範圍取一個
print(re.search("([a-zA-Z]+)(_dsb)","xxx aigen_dsb cxx_dsb alex_dsb zxx_xsb _dsb"))
# 從開始的位置開始匹配
# print(re.match("([a-zA-Z]+)(_dsb)","xxx aigen_dsb cxx_dsb alex_dsb zxx_xsb _dsb").group())
# 將正則表示式 編譯成一個物件 往後可以不用在寫表示式 直接開始匹配
# print(re.compile("\w*").findall("abcd"))
# print(re.split("\|_*\|","python|____|js|____|java"))

# 替換
print(re.sub("python","PYTHON","js|python|java"))
# 用正則表示式來交換位置
text = "java|C++|js|C|python"
# text1 = "java|C++|js|C|python"
# 將整個內容分為三塊 java|C++xxxxxx|python
partten = "(.+?)(\|.+\|)(.+)"
".+?ahgshags"
# ?:用於取消分組就和沒寫括號一樣
# partten = "(?:.+?)(\|.+\|)(.+)"
# print(re.search(partten,text).group(0))
print(re.sub(partten,r"\2\3\1",text))


# 當要匹配的內容包含\時
text = "a\p"
"\p"

print(text)
print(re.findall(r"a\\p",text))

練習題：

# qq密碼長度6--16數字字母特殊不包含^
# 如果包含^ 不匹配任何內容
# 除了^ 別的都能匹配上
"[^\^]{6,16}"
import re
# print(re.search("[^^]{6,16}","1234567^as^"))
# print(re.search("[^[\^]+.{6,16}","1234567as"))

# print(re.match("[^@]{6,16}","1234567@"))
#
# print(re.match("[a-z]{6，16}","abasadsasasa^"))
# 長度必須為6 不能包含@
print(re.match("^[^^]{6,8}$","1111111^56781111"))

# print(re.match("[0-9]{6,7}","1234567"))
# print(re.match("^\"[^@]{6,16}\"$", '"1234567io1u"'))

# ^$ 整體匹配將字串內容看作一個整體而不是像之前的逐個匹配
print(re.match("^[^^]{3,6}$","1234567"))

# 手機號碼驗證 長度11 以1開頭 全都是數字
print(re.match("^1(89|80|32)\d{8}$","18921999093"))

# 郵箱地址驗證 字母數字下劃線(至少6個)@字母數字下劃線(最少一個).(cn com org edu任意一個)可以有[email protected]
partten = "^\w{6,}@\w+\.(cn|com|org|edu)$"
# 只接受qqsina 163

print(re.match(partten,"18921999as [email protected]"))

# 身份證號碼要麼18 要麼15位數字最後一個可能是X
# partten = "^\d{17}(X|\d)$"
partten2 = "(^\d{15}$)|(^\d{17}(X|\d)$)"
print(re.match(partten2,"123321200010100"))