1. 程式人生 > >Python把資料從Word(.docx)中讀出來寫入到Excel(.xlsx)中

Python把資料從Word(.docx)中讀出來寫入到Excel(.xlsx)中

左側Word的每一行是一段,是一些非結構化資料,目標是把它結構化表示成右側的excel格式。


需要匯入的包:

import docx
from docx import Document
from openpyxl import Workbook
from tools import *

新建用於寫xlsx的物件

workbook = Workbook()
booksheet = workbook.active

讀docx文件存入到xlsx裡:

dir = '/Users/b/'
file = '南京親近母語2017年書目.docx'
f = docx.Document(dir+file)
level = ''
#遍歷文件裡的段落
for para in f.paragraphs: bookname = '' auther = '' publiser = '' resource = '南京親近母語2017年書目' text = para.text if len(text) == 0: continue text = key_filter(text) #用於過濾資料 textlist=text.split(' ') if len(textlist) == 1: level = textlist[0] print('level1',level) continue
print('level2',level) while ' ' in textlist: textlist.remove('') list = [] if is_bookname(textlist[0].strip()): bookname = re_filter(textlist[0].strip(),'[1-9]\d*.') print(bookname) else: continue list.append(bookname.strip()) list.append(textlist[1].strip()) list.append(publiser.strip()) list.append(resource.strip()) list.append(level.strip()) booksheet.append(list) workbook.save(file.split('.'
)[0]+'.xlsx')

上面是完整的,下面分開解釋解釋

讀Word文件:

f = docx.Document(dir+file)
for para in f.paragraphs:
text = para.text
    print(text)

新建excel檔案並寫入資料,以list的形式寫入表中

from openpyxl import Workbook
workbook = Workbook()
booksheet = workbook.active
list = ['《大衛上學去》','[美]大衛·夏農','','南京親近母語2017年書目','一年級課程書目(圖畫書書目']
booksheet.append(list)  
workbook.save(file.split('.')[0]+'.xlsx')