1. 程式人生 > >python 正確字串處理(自己踩過的坑)

python 正確字串處理(自己踩過的坑)

不管是誰,只要處理過由使用者提交的調查資料,就能明白這種亂七八糟的資料是怎麼一回事。為了得到一組能用於分析工作的格式統一的字串,需要做很多事情:去除空白符、刪除各種標點符號、正確的大寫格式等。做法之一是使用內建的字串方法和正則表示式re模組:

一般寫法

states = ['   Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
         'south   carolina##', 'West virginia?']

import re

def clean_strings(strings):  # 一般對資料的處理步驟
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = value.title()
        result.append(value)
    return result

In [173]: clean_strings(states)
Out[173]: 
['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

推薦寫法

def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]  # 函式也是物件

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

In [175]: clean_strings(states, clean_ops)
Out[175]: 
['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

# 或者
In [176]: for x in map(remove_punctuation, states):  #  
   .....:     print(x)
Alabama 
Georgia
Georgia
georgia
FlOrIda
south   carolina
West virginia