1. 程式人生 > >python關於字串內建方法

python關於字串內建方法

1. str.split()

Return a list of the words in the string, using sep as the delimiter string.
sep
The delimiter according which to split the string.
None (the default value) means split according to any whitespace,
and discard empty strings from the result.
maxsplit
Maximum number of splits to do.
-1 (the default value) means no limit.

1.)str.split(sep=None, maxsplit=-1)
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified or -1, then there is no limit on the number of splits (all possible splits are made).
2.)If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, ‘1,2’.split(’,’) returns [‘1’, ‘’, ‘2’]). The sep argument may consist of multiple characters (for example, ‘1<>2<>3’.split(’<>’) returns [‘1’, ‘2’, ‘3’]). Splitting an empty string with a specified separator returns [’’].
3.)If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].

split()方法對字串進行分割並返回一個列表,分割符為sep,maxsplit引數預設為-1,進行不限次數最大分割,如果該引數給定,那就分割maxsplit次.
而sep則分為兩種情況:
(1)sep未給定:
a.未指定按空格分割,多個連續空格算一個,即會刪掉空字元;
b.sep未指定時,換行符,製表符等會形成空白的轉義字元也會被當空格處理,不會被分割
c.分割一個空字元返回的是空列表

(2)sep已指定:
a.不能指定為空字元;
b.如果字串中出現多個連續的分割符,則分別計算,視為 分割’空’字元,即多出幾個連續的分割符,就分割成幾個空字元;
c.分割字元可以是多個;
d.分割一個空字元返回一個列表包括一個空字元[’’]


下為示例,使用時一定要做區分:

#--------三個空格-------------------兩個空格
a='it is a   big company is \n \a \r  "" '
b=''
print(a.split(maxsplit=5))
print(a.split())
print(a.split(' '))  # 一個空格
print(a.split('is'))
print(b.split())
print(b.split('is'))
print(len(b.split('is')))
#result
['it', 'is', 'a', 'big', 'company', 'is \n \x07 \r  "" ']
['it', 'is', 'a', 'big', 'company', 'is', '\x07', '""']
['company']
['it', 'is', 'a', '', '', 'big', 'company', 'is', '\n', '\x07', '\r', '', '""', '']
['it ', ' a   big company ', ' \n \x07 \r  "" ']
[]
['']
1

2. str.splitlines()

str.splitlines([keepends])
Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.
This method splits on the following line boundaries. In particular, the boundaries are a superset of universal newlines.

Representation Description
\n Line Feed
\r Carriage Return
\r\n Carriage Return + Line Feed
\v or \x0b Line Tabulation
\f or \x0c Form Feed
\x1c File Separator
\x1d Group Separator
\x1e Record Separator
\x85 Next Line (C1 Control Code)
\u2028 Line Separator
\u2029 Paragraph Separator

Unlike split() when a delimiter string sep is given, this method returns an empty list for the empty string, and a terminal line break does not result in an extra line.

splitlines()方法按行分割符分割分割字串並返回列表,分割後不顯示分割符,如果keepends設為Ture,則顯示分割符;
在字串末尾的換行符不會再產生空行(即列表末尾不會產生空字元),另外對空字串使用此方法返回空列表,這兩點區別於split

下例:

a='\nit is a\r\n   big\n\n company is \n'
b='   \n '
c=''
print(a.splitlines())
print(a.splitlines(keepends=True))
print(b.splitlines())
print(c.splitlines())
print(a.split('\n'))
print(b.split('\n'))
print(c.split())
#result
['', 'it is a', '   big', '', ' company is ']
['\n', 'it is a\r\n', '   big\n', '\n', ' company is \n']
['   ', ' ']
[]
['', 'it is a\r', '   big', '', ' company is ', '']
['   ', ' ']
[]

3. str.strip()

str.strip([chars])
1)Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped
2)The outermost leading and trailing chars argument values are stripped from the string. Characters are removed from the leading end until reaching a string character that is not contained in the set of characters in chars. A similar action takes place on the trailing end.

strip()方法去除字串兩端的指定字元,返回一個處理後的字串副本,使用尤其需要注意兩點:
1.引數chars可以是多個字元,處理時從兩端開始一個一個匹配,只要字元是chars的子串,就會被去掉,如果不匹配則立即停止匹配.
2.不帶引數則預設去除兩端空白(不是空格,\n,\t等都會去掉),而帶引數則只去除指定字元,參考split方法不帶引數時的分割方式,有一些相似性.

下例:

>>>'www.example.com'.strip('cmowz.')
'example'
>>> a='   \nit is a\r\n   big\n\n company is \n   '
>>> a.strip()
'it is a\r\n   big\n\n company is'
>>> a.strip(' ') # 去除空格
'\nit is a\r\n   big\n\n company is \n'
>>> a.strip('\n is')
't is a\r\n   big\n\n company'