python課程學習-模組二-01-檔案處理

阿新 • • 發佈：2019-01-05

1 . 檔案處理的流程

1）開啟檔案，得到檔案控制代碼並賦值給一個變數
2）通過控制代碼對檔案進行操作
3）關閉檔案

例：

In [6]: f1 = open('data.txt', 'r', encoding='utf8')

In [7]: print(f1)
<_io.TextIOWrapper name='data.txt' mode='r' encoding='utf8'>

In [8]: f1.read()  # 顯示讀入的文字
Out[8]: 'Rain雨\n\nRain is falling all around, 雨兒在到處降落，\nIt falls on 
 field and tree, 它落在田野和樹梢， \nIt rains on the umbrella here, 它落在這邊的雨傘上，\nAnd on the ships at sea. 又落在航行海上的船隻。\n'

In [9]: ! cat data.txt  # 文字原文
Rain雨

Rain is falling all around, 雨兒在到處降落，
It falls on field and tree, 它落在田野和樹梢， 
It rains on the umbrella here, 它落在這邊的雨傘上，
And on the ships at sea. 又落在航行海上的船隻。

In 
 [10]: f1.close()

open函式用來開啟檔案，主要語法如下：

open(file, mode='r', buffering=-1, encoding=None)
open(檔名, 檔案開啟方式, 緩衝, 編碼格式)

其中檔案開啟方式，緩衝，編碼格式都是可選的。
mode：檔案開啟模式
buffering：可取值有0，1，>1三個，0代表buffer關閉（只適用於二進位制模式），1代表line buffer（只適用於文字模式），>1表示初始化的buffer大小；
encoding：表示的是返回的資料採用何種編碼，一般採用utf8或者gbk；

如果檔案不存在：

In [16]: f1 = open('data-test.txt', 'r')
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-16-918e552dcaa9> in <module>()
----> 1 f1 = open('data-test.txt', 'r')

FileNotFoundError: [Errno 2] No such file or directory: 'data-test.txt

2. 檔案的開啟模式

r，只讀模式【預設模式，檔案必須存在，不存在則丟擲異常】
w，只寫模式【不可讀；不存在則建立；存在則清空內容】
x，只寫模式【不可讀；不存在則建立，存在則報錯】
a，追加模式【可讀；不存在則建立；存在則只追加內容】，檔案指標自動移到檔案尾。

# 1. 開啟檔案
>>> f1 = open('data.txt', 'r')
>>> f1.read()
'Rain雨\n\nRain is falling all around, 雨兒在到處降落，\nIt falls on field and tree, 它落在田野和樹梢， \nIt rains on the umbrella here, 它落在這邊的雨傘上，\nAnd on the ships at sea. 又落在航行海上的船隻。\n'
>>> f1.close()

# 寫入檔案
>>> f1 = open('data.txt', 'w')
>>> f1.write('hello world\n')
12
>>> f1.close()
我們退出檢視一下檔案：
# cat data.txt
hello world
寫入的時候，如果沒有這個文字檔案，會建立一個新的，如果有，就清空內容後再寫入
可以看到，原來的文字都不見了，'w'這種模式相當於把原有文字的內容全部清空，重新寫入，要追加輸入的話，還要用'a'這種模式。


# 追加寫入
>>> f1 = open('data.txt', 'a')  # 追加模式
>>> f1.write('hello world\n')   # 寫入新資料
12
>>> f1.read()   # 不能讀
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
io.UnsupportedOperation: not readable
>>> f1.close()

>>> f1 = open('data.txt', 'r')  # 讀取新文字
>>> f1.read()
'Rain雨\n\nRain is falling all around, 雨兒在到處降落，\nIt falls on field and tree, 它落在田野和樹梢， \nIt rains on the umbrella here, 它落在這邊的雨傘上，\nAnd on the ships at sea. 又落在航行海上的船隻。\nhello world\n'   # 可以看到在最後，'hello world'已經被寫入了
>>> f1.read().strip('\n').split('\n') # 想做一下進一步處理，但是文字並不能重複讀取，最好是對讀取結果賦值
['']
>>> f1.close()
>>> 
>>> f1 = open('data.txt', 'r')  # 再次讀取
>>> f1.read().strip('\n').split('\n') # 文字處理
['Rain雨', '', 'Rain is falling all around, 雨兒在到處降落，', 'It falls on field and tree, 它落在田野和樹梢， ', 'It rains on the umbrella here, 它落在這邊的雨傘上，', 'And on the ships at sea. 又落在航行海上的船隻。', 'hello world']
>>> 
>>> f1.close()

“+” 表示可以同時讀寫某個檔案
r+，讀寫【可讀，可寫】
w+，寫讀【可讀，可寫】，消除檔案內容，然後以讀寫方式開啟檔案。
x+，寫讀【可讀，可寫】
a+，寫讀【可讀，可寫】，以讀寫方式開啟檔案，並把檔案指標移到檔案尾。

“b”表示以位元組的方式操作，以二進位制模式開啟檔案，而不是以文字模式。
rb 或 r+b
wb 或 w+b
xb 或 w+b
ab 或 a+b

>>> f1 = open('data.txt', 'rb')
>>> f1.read()
b'Rain\xe9\x9b\xa8\n\nRain is falling all around, \xe9\x9b\xa8\xe5\x84\xbf\xe5\x9c\xa8\xe5\x88\xb0\xe5\xa4\x84\xe9\x99\x8d\xe8\x90\xbd\xef\xbc\x8c\nIt falls on field and tree, \xe5\xae\x83\xe8\x90\xbd\xe5\x9c\xa8\xe7\x94\xb0\xe9\x87\x8e\xe5\x92\x8c\xe6\xa0\x91\xe6\xa2\xa2\xef\xbc\x8c \nIt rains on the umbrella here, \xe5\xae\x83\xe8\x90\xbd\xe5\x9c\xa8\xe8\xbf\x99\xe8\xbe\xb9\xe7\x9a\x84\xe9\x9b\xa8\xe4\xbc\x9e\xe4\xb8\x8a\xef\xbc\x8c\nAnd on the ships at sea. \xe5\x8f\x88\xe8\x90\xbd\xe5\x9c\xa8\xe8\x88\xaa\xe8\xa1\x8c\xe6\xb5\xb7\xe4\xb8\x8a\xe7\x9a\x84\xe8\x88\xb9\xe5\x8f\xaa\xe3\x80\x82\nhello world\n'
>>> f1.close()
可以看到，讀取的資料都是二進位制的形式展示的

注：以b方式開啟時，讀取到的內容是位元組型別，寫入時也需要提供位元組型別，不能指定編碼

3. read，readline，readlines的區別

read() #一次讀取全部的檔案內容。
readline() #每次讀取檔案的一行。
readlines() #讀取檔案的所有行，返回一個字串列表。

我們以下面的例子進行說明：

>>> f1 = open('data.txt', 'r')
>>> f2 = open('data.txt', 'r')
>>> f3 = open('data.txt', 'r')
>>> 
>>> f1.read()  # read會一次性全部讀入，並且不會對文字進行處理
'Rain雨\n\nRain is falling all around, 雨兒在到處降落，\nIt falls on field and tree, 它落在田野和樹梢， \nIt rains on the umbrella here, 它落在這邊的雨傘上，\nAnd on the ships at sea. 又落在航行海上的船隻。\nhello world\n'
>>> f2.readline()  # readline是每次只讀取一行，相當於迭代器，這樣對於大檔案來說不佔用太多記憶體空間
'Rain雨\n'
>>> f2.readline()
'\n'
>>> f2.readline()
'Rain is falling all around, 雨兒在到處降落，\n'
>>> f2.readline()
'It falls on field and tree, 它落在田野和樹梢， \n'
>>> 
>>> f2.readlines()  # readlines會把f2剩下的文字讀取出來，形成列表的形式
['It rains on the umbrella here, 它落在這邊的雨傘上，\n', 'And on the ships at sea. 又落在航行海上的船隻。\n', 'hello world\n']
>>> f3.readlines()  # 當然，也可以一次性全部讀取，形成列表
['Rain雨\n', '\n', 'Rain is falling all around, 雨兒在到處降落，\n', 'It falls on field and tree, 它落在田野和樹梢， \n', 'It rains on the umbrella here, 它落在這邊的雨傘上，\n', 'And on the ships at sea. 又落在航行海上的船隻。\n', 'hello world\n']

4. 檔案修改

檔案修改的方式：

4.1 硬碟型

逐行讀取readline，匹配後修改為新內容，把新內容寫入到新檔案中，如果內容過多，就flushall到硬碟中。然後把新檔案重新命名文老檔名（os.rename）。這種方式會比較佔硬碟。

例：

import os
file = 'novel'
new_file = 'new_novel'
f1 = open(file, 'r+', encoding='utf-8')
f2 = open(new_file, 'w+', encoding='utf-8')
data = f1.readlines()

for line in data:
    keys = '雨傘'
    if keys in line:
        print("find 雨傘")
        f2.write(line.replace(keys, '花折傘'))
    else:
        f2.write(line)

f1.close()
f2.close()

os.remove(file)
os.rename(new_file, file) 
# 本來想直接rename的，結果在windows上報錯了，只好先刪除原始檔，再重新命名新檔案，否則不成功。
但是在MAC和linux上直接執行rename是沒有報錯資訊的，可以直接rename。

4.2 記憶體型

把檔案內容全部讀入到記憶體中，在記憶體中修改好之後，重新寫入到原來的文本里，把原來的內容全部覆蓋掉
但是當檔案非常大的時候，這種方式就不適合了。這種方式比較佔記憶體，處理速度快。

file = 'novel'
fr = open(file, 'r+', encoding='utf-8')
data = fr.readlines()
fr.close()

# 重新以寫的模式讀入文字
fw = open(file, 'w', encoding='utf-8')
for line in data:
    keys = '雨傘'
    if keys in line:
        print("find 雨傘")
        fw.write(line.replace(keys, '花折傘'))
    else:
        fw.write(line)
fw.close()

5. with…open…as

當你做檔案處理，你需要獲取一個檔案控制代碼，從檔案中讀取資料，然後關閉檔案控制代碼。

正常情況下，程式碼如下：

file = open("/tmp/foo.txt")
data = file.read()
file.close()

這裡有兩個問題。一是可能忘記關閉檔案控制代碼；二是檔案讀取資料發生異常，沒有進行任何處理。

然而with可以很好的處理上下文環境產生的異常。下面是with版本的程式碼：

with open("/tmp /foo.txt") as file:
    data = file.read()

with的基本思想是with所求值的物件必須有一個enter()方法，一個exit()方法。緊跟with後面的語句被求值後，返回物件的enter()方法被呼叫，這個方法的返回值將被賦值給as後面的變數。當with後面的程式碼塊全部被執行完之後，將呼叫前面返回物件的exit()方法。

6. 其它檔案操作方法

指定讀取size：

In [14]: f1 = open('novel', 'r')

In [15]: data = f1.read(10)

In [16]: data
Out[16]: 'Rain雨\n\nRai'

In [17]: f1.read()
Out[17]: 'n is falling all around, 雨兒在到處降落，\nIt falls on field and tree, 它落在田野和樹梢， \nIt rains on the umbrella here, 它落在這邊的花折傘上，\nAnd on the ships at sea. 又落在航行海上的船隻。\nhello world\n'

In [18]: f1.close()

fp.flush() #把緩衝區的內容寫入硬碟
fp.fileno() #返回一個長整型的”檔案標籤“
fp.isatty() #檔案是否是一個終端裝置檔案（unix系統中的）
fp.tell() #返回檔案操作標記的當前位置，以檔案的開頭為原點
fp.next() #返回下一行，並將檔案操作標記位移到下一行。把一個file用於for … in file這樣的語句時，就是呼叫next()函式來實現遍歷的。
fp.seek(offset[,whence]) #將檔案打操作標記移到offset的位置。這個offset一般是相對於檔案的開頭來計算的，一般為正數。但如果提供了whence引數就不一定了，whence可以為0表示從頭開始計算，1表示以當前位置為原點計算。2表示以檔案末尾為原點進行計算。需要注意，如果檔案以a或a+的模式開啟，每次進行寫操作時，檔案操作標記會自動返回到檔案末尾。
fp.truncate([size]) #把檔案裁成規定的大小，預設的是裁到當前檔案操作標記的位置。如果size比檔案的大小還要大，依據系統的不同可能是不改變檔案，也可能是用0把檔案補到相應的大小，也可能是以一些隨機的內容加上去。記得檔案要以’w’或’w+’模式開啟。

python課程學習-模組二-01-檔案處理

1 . 檔案處理的流程

2. 檔案的開啟模式

3. read，readline，readlines的區別

4. 檔案修改

4.1 硬碟型

4.2 記憶體型

5. with…open…as

6. 其它檔案操作方法

python課程學習-模組二-01-檔案處理

python課程學習-模組二-01-三元運算

學習python課程第二十二天

python資料分析新手入門課程學習——（二）探索分析與視覺化（來源：慕課網）

Python學習【第9篇】：Python之常用模組二（時間模組，序列化模組等）常用模組2

python再學習4 啟動cmd批處理檔案

Python 爬蟲學習筆記二： xpath 模組

使用python中openpyxl模組操作excel檔案，計算單元格分子式的相對分子質量（二）

Python的學習（二十六）---- 壓縮與解壓縮檔案

Python爬蟲包 BeautifulSoup 學習（二）異常處理

Python課程學習總結

Python基礎學習篇章二

Python+Selenium學習筆記9 - 警告框處理

【4】caffe的python介面學習：生成solver檔案

【3】caffe的python介面學習：生成配置檔案

python課程學習===小象學院

python爬蟲學習筆記二：Requests庫詳解及HTTP協議

Python Django 學習（二）【Django 模型】

python使用configparser模組操作配置檔案

Python入門（十二）異常處理

python課程學習-模組二-01-檔案處理

1 . 檔案處理的流程

2. 檔案的開啟模式

3. read，readline，readlines的區別

4. 檔案修改

4.1 硬碟型

4.2 記憶體型

5. with…open…as

6. 其它檔案操作方法

相關推薦