Python 正則替換字串

阿新 • • 發佈：2019-01-16

說明

需求：
1. 替換給定字串中符合正則匹配的子串。
2. 使用者配置增加、刪減替換規則方便。
3. 基於裝飾器模式實現。

實現

基於re包和裝飾器模式實現。
參考裝飾器模式，這資料挺不錯的，有人把設計模式用python都實現了一遍。

郵箱正則匹配：

email_regex = r'[0-9a-zA-Z_]{0,19}@[0-9a-zA-Z]{1,13}\.(?:com|cn|net)'

網址正則匹配：

url_regex = r"\"?http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F 
]))+\"?"

日期正則匹配：

date_regex_standard = r"\d{1,4}[./-]\d{1,2}[./-]\d{1,2}.\d{1,2}[./: -]\d{1,2}[./: -]\d{1,2}"

ip正則匹配：

ip_regex = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"

程式碼如下：

# -*- coding: utf-8 -*-

import re
import sys
reload(sys)

email_regex = r'[0-9a-zA-Z_]{0,19}@[0-9a-zA-Z]{1,13}\.(?:com|cn|net)' 

url_regex = r"\"?http[s]?://(?:[a-zA-Z]|[0-9]|[[email protected]&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+\"?"
ip_regex = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
ip_port_regex = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}"
date_regex_standard = r"\d{1,4}[./-]\d{1,2}[./-]\d{1,2}.\d{1,2}[./: -]\d{1,2}[./: -]\d{1,2}" 



class LogText(object):
    def __init__(self, text):
        self._text = text

    def get_log(self):
        return self._text


class EmailWrapper(object):
    def __init__(self, log_text):
        self.log_text = log_text
        self.wrap_log = None

    def get_log(self):
        """
        整體匹配即可,匹配郵箱
        """
        return re.sub(email_regex, '$email$', self.log_text.get_log())


class UrlWrapper(object):
    def __init__(self, log_text):
        self.log_text = log_text

    def get_log(self):
        """
        整體匹配即可,用於匹配url
        """
        return re.sub(url_regex, '$url$', self.log_text.get_log())


class DataStandardWrapper(object):
    def __init__(self, log_text):
        self.log_text = log_text

    def get_log(self):
        """
        匹配2016*09*25*03*16*20形式日期,其中*代表空格、'/'、'-'、':'、'.'等符號。
        """
        return re.sub(date_regex_standard, '$date$', self.log_text.get_log())


class IpWrapper(object):
    def __init__(self, log_text):
        self.log_text = log_text

    def get_log(self):
        """
        整體匹配即可,先匹配ip+埠號,再匹配ip
        """
        return re.sub(ip_regex, '$ip$', re.sub(ip_port_regex, '$ip+port$', self.log_text.get_log()))


if __name__ == "__main__":
    log_text = LogText('[email protected]得到http://www.baiddu.com方法2016-07-09 09:21:23擦撒[email protected]://www.baidu.com')
    log_text_wrap = DataStandardWrapper(EmailWrapper(UrlWrapper(log_text)))
    print log_text_wrap.get_log()

    log_text_wrap = UrlWrapper(EmailWrapper(log_text))
    print log_text_wrap.get_log()

執行結果

這裡寫圖片描述

Python 正則替換字串

說明需求： 1. 替換給定字串中符合正則匹配的子串。 2. 使用者配置增加、刪減替換規則方便。 3. 基於裝飾器模式實現。實現基於re包和裝飾器模式實現。參考裝飾器模式，這資料挺不錯的，有人把設計模式用python都實現了一遍。郵箱

python 正則表示式字串的匹配替換分割查詢

In [1]: import re In [2]: str1='imooc video=1000' In [3]: str1.find('1000')#可以查找出索引值 Out[3]:

Python 正則去除字串中的指定元素

在獲取資料時，經常會遇到 ['\n文字\n'] [‘\r\r文字\r\r’] ['文\xa0\xa0字]

正則替換字串的全形半形標點符號

感謝【火龍果】，欽佩他的研究精神。http://topic.csdn.net/u/20080925/15/41b814bf-fcaf-4b37-be91-10561a102768.html測試程式碼如下：class T {publicstaticvoid main(String

Python 正則去除字串中的指定元素

在獲取資料時，經常會遇到 ['\n文字\n'] [‘\r\r文字\r\r’] ['文\xa0\xa0字] ... ... 這樣的資料為了保證資料的清潔使用正則表示式去除指定的元素例如：去除 '文

python正則替換re.sub()的基本使用方法

import re time = '2019年1月3號 11:54' [\u4e00-\u9fa5]為unicode編碼,並且剛好是中文編碼的開始和結束的兩個值 sub中第一個引數表示字串中需要替換的內容, 第二個引數表示想要替換的成什麼，第三個引數表示需要要替換的字串，第四個

php 正則替換字串中指定的字串

需求是將一段內容中的某個特定字串後面新增一些字串最好是用到正則替換 preg_match_all('/(http:\/\/blog.com).*?(php)/is',"aaaahttp://blog.com/sss/index.phpsdsdahttp://blog

Python正則表示式如何進行字串替換

Python正則表示式在使用中會經常應用到字串替換的程式碼。有很多人都不知道如何解決這個問題，下面的程式碼就告訴你其實這個問題無比的簡單，希望你有所收穫。 1.替換所有匹配的子串用newstring替換subject中所有與正則表示式regex匹配的子串 result

python 正則表示式找出字串中的純數字

1、簡單的做法 >>> import re >>> re.findall(r'\d+', 'hello 42 I'm a 32 string 30') ['42', '32', '30'] 然而，這種做法使得字串中非純數字也會識別 >

pycharm對字串進行正則替換

利用Pycharm的替換功能，對字串進行格式化的操作，例如，將瀏覽器中複製過來的requests headers 字串通過替換格式化成一個字典。工具/原料 pycharm 正則表示式方法/步驟首先，複製所有Headers欄位，貼上到pych

preg_replace() 正則替換所有符合條件的字串

PHP preg_replace() 正則替換，與Javascript 正則替換不同，PHP preg_replace() 預設就是替換所有符號匹配條件的元素需要我們用程式處理的資料並不總是預先以資料庫思維設計的，或者說是無法用資料庫的結構去儲存的。比如模版引擎解析模版、垃圾敏感資訊過濾

正則替換re.sub 替換字串中多個位置

import re time = ' 2018年08月27日 13:17:26' [\u4e00-\u9fa5]為unicode編碼,並且剛好是中文編碼的開始和結束的兩個值 ‘sub中’ ‘第一個引數表示字串中需要替換的內容,’ ‘第二個引數表示想要替換的成什麼’ ‘第三個引數表示

Python 正則表示式，search(不要求從開頭匹配)，findall(匹配所有)，sub(替換)，split(切割)

match()從開頭匹配。 search()不要求從開頭匹配，只會匹配第一個。 findall()匹配所有，返回列表。 sub()替換匹配到的所有子串為指定內容，並返回整個字串。 split()切割字串，返回列表。 demo.py（search，不要求從開頭匹配

python 正則之提取字串中的漢字,數字,字母

#\d 匹配一個數字字元。等價於 [0-9] #\D 匹配一個非數字字元。等價於 [^0-9] #過濾字串中的英文與符號，保留漢字 import re st = "hello,world!!%[545

python re.sub正則替換

在匹配後面寫成language,0 含義是匹配所有的c#,如果還有的話，會全部被替換成GO group（）在正則表示式中用於獲取分段截獲的字串，解釋如下程式碼（程式碼來自網路）： import re a = "123abc456" print re.

python正則表示式從字串中提取數字

python從字串中提取數字使用正則表示式，用法如下： ## 總結 ## ^ 匹配字串的開始。 ## $ 匹配字串的結尾。 ## \b 匹配一個單詞的邊界。 ## \d 匹配任意數字。 ## \D

python 正則表示式（三）字串處理

import re ''' 1）切割字串時，不確定空格的個數，如："sun today mood eath" ''' str1 = "sun today mood eath" print(re.split(r" +",str1)) ''' 2）字串的替換和修改 def su

字串aaaa......bbbb....ccc...dddddd用正則替換為abcd

public static void main(String[] args) { String s = "aaaa......bbbb....ccc...dddddd"; String s2 = s.replaceAll("\\.+",""); Syste

Python 正則表示式替換所有的為

1. Replace all <b> with <strong>, preserving any existing attributes Match: <(/?)b\b((?:[^>"']| "[^"]*"| '[^']*')*)>

linux c下的字串正則替換

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <unistd.h> //r

Python 正則替換字串

說明

實現

程式碼如下：

執行結果

相關推薦