1. 程式人生 > >文字過濾(python正則)

文字過濾(python正則)

1、保留中文(將非中文替換為" ")

def filterCharacter(s):
    import re
    r1 = re.sub(u"[^\u4e00-\u9fa5]", " ", s)
    return r1

2、連續空格替換為單空格

def filterCharacter(s):
    import re
    r1 = re.sub(r"\s{1,}", " ", r1)
    return r1

3、去掉標點數字等資訊

def filterCharacter(s):
    import re
    r1 = re.sub("[A-Za-z0-9\[\`\~\!\@\#\$\^\&\*\(\)\=\|\{\}\'\:\;\'\,\[\]\.\<\>\/\?\~\!\@\#\\\&\*\%]", "", s)
    return r1

<<<未完待續