1. 程式人生 > >讀取word文檔並提取和寫入數據(基於python 3.6)

讀取word文檔並提取和寫入數據(基於python 3.6)

number import utf-8 for 文本 pre ext 3.6 war

#!/usr/bin/python3
# -*- coding: utf-8 -*-
# @File : delete_file
# @Author : moucong
# @Date : 2018/4/1 16:33
# @Software: PyCharm

#讀取docx中的文本代碼示例
import docx
import re

#獲取文檔
file=docx.Document("E:\\python_word\\word.docx")
print("段落數:"+str(len(file.paragraphs))) #輸出段落數
file_word = docx.Document()

#輸出每一段的內容
for para in file.paragraphs:
print(para.text)

#輸出段落編號及段落內容
para_data = []
for i in range(len(file.paragraphs)):
# for j in map(lambda x:x.split(‘ ‘),file.paragraphs[i].text):
para_single = file.paragraphs[i].text.split(‘ ‘)
while ‘‘ in para_single: # 移除空格
para_single.remove(‘‘)
# para_data.append(para_single)
for data_number in range(len(para_single)):
data_num = re.findall(r"\d", para_single[data_number])
data_num = ‘‘.join(data_num)
para_data.append(data_num + ‘ ‘)
file_word.add_paragraph(para_data)
file_word.save("E:\\python_word\\number.docx")

讀取word文檔並提取和寫入數據(基於python 3.6)