Python實戰之Excel資料按索引更新

阿新 • • 發佈：2018-11-14

在日常工作中，我們經常需要需要批量更新資料，比如有個destination表，裡面有一列的資料需要被更新，更新的依據為reference表，python指令碼執行前和執行後的資料列示意圖如下：

我們使用Excel檔案作為config引數表，reference和destination也使用Excel作為資料，其中config引數如下圖，python例子讀取該檔案中的引數，獲取各個引數的值，從而獲取源資料和目的資料資訊。

實現程式碼設計為幾個函式，其各個功能如下：

函式名	函式功能
get_str_for_cell(cell_value)	把傳入的內容轉換為字串格式，主要針對浮點型資料。在python讀取Excel單元格時，會把數值讀為浮點數，此處方法重新轉換為整數型的字串
get_saveas_name(origin_name)	把傳入的檔名改名為帶時間戳，例如原檔名為abc.xls，則返回abc_2018-11-13-10-10-07.xls。
get_mainpara()	獲config檔案中main的sheet頁的引數，該資訊是源資料和目標資料的定位資訊。
get_optionpara()	獲config檔案中option的sheet頁的引數，該資訊是資料替換的引數，比如是否進行force替換。
get_reference_dict(main_paras)	根據main引數獲取到reference資料的資料字典。
update_xlsx_file(main_paras, option_paras, reference_dict)	更新destination的表格資訊

整個功能實現程式碼如下：

import xlrd
import xlutils.copy
import time
import datetime


def get_str_for_cell(cell_value):
    if isinstance(cell_value, float):
        if cell_value == int(cell_value):
            cell_value = int(cell_value)
    return str(cell_value)


def get_saveas_name(origin_name):
    now_time = datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
    
    if origin_name.rfind('.'):
        return origin_name.split('.')[0] + "_" + str(now_time) + "." + origin_name.split('.')[1]
    

def get_mainpara():
    main_paras = {}
    wb = xlrd.open_workbook("config.xlsx")
    ws = wb.sheet_by_name(u"main")
    rown = 1; coln = 1
    main_paras["ReferenceFileName"] = ws.cell_value(rown,coln)
    rown = 2
    main_paras["ReferenceSheetName"] = ws.cell_value(rown,coln)
    rown = 3
    main_paras["ReferenceColumnName"] = ws.cell_value(rown,coln)
    rown = 4
    main_paras["ReferenceDataColumnName"] = ws.cell_value(rown,coln)
    rown = 5
    main_paras["DestinationFileName"] = ws.cell_value(rown,coln)
    rown = 6
    main_paras["DestinationSheetName"] = ws.cell_value(rown,coln)
    rown = 7
    main_paras["DestinationColumnName"] = ws.cell_value(rown,coln)
    rown = 8
    main_paras["DestinationDataColumnName"] = ws.cell_value(rown,coln)

    wb = xlrd.open_workbook(main_paras["ReferenceFileName"])
    ws = wb.sheet_by_name(main_paras["ReferenceSheetName"])
    reference_column_index = get_column_index(ws, main_paras["ReferenceColumnName"])
    reference_data_column_index = get_column_index(ws, main_paras["ReferenceDataColumnName"])
    main_paras["reference_column_index"] = reference_column_index
    main_paras["reference_data_column_index"] = reference_data_column_index

    wb = xlrd.open_workbook(main_paras["DestinationFileName"])
    ws = wb.sheet_by_name(main_paras["DestinationSheetName"])
    dest_column_index = get_column_index(ws, main_paras["DestinationColumnName"])
    dest_data_column_index = get_column_index(ws, main_paras["DestinationDataColumnName"])
    main_paras["dest_column_index"] = dest_column_index
    main_paras["dest_data_column_index"] = dest_data_column_index

    return main_paras


def get_optionpara():
    option_paras = {}
    wb = xlrd.open_workbook("config.xlsx")
    ws = wb.sheet_by_name(u"option")
    rown = 1; coln = 1
    option_paras["ForceReplace"] = ws.cell_value(rown,coln)
    
    return option_paras


def get_column_index(table, column_name):
    column_index = -1    
    for i in range(table.ncols):     
        if(table.cell_value(0, i) == column_name):
            column_index = i
            break
    return column_index


def get_reference_dict(main_paras):
    reference_dict = {}
    wb = xlrd.open_workbook(main_paras["ReferenceFileName"])
    ws = wb.sheet_by_name(main_paras["ReferenceSheetName"])
    
    reference_column_index = main_paras["reference_column_index"]
    reference_data_column_index = main_paras["reference_data_column_index"]
    
    num_rows = ws.nrows
    for rown in range(num_rows):
        if rown == 0:
            continue        
        reference_dict[get_str_for_cell(ws.cell_value(rown, reference_column_index))] = get_str_for_cell(ws.cell_value(rown, reference_data_column_index))

    return reference_dict

def update_xlsx_file(main_paras, option_paras, reference_dict):
    rb = xlrd.open_workbook(main_paras["DestinationFileName"], formatting_info = True)
    wb = xlutils.copy.copy(rb)
    ws_origin = rb.sheet_by_name(main_paras["DestinationSheetName"])
    ws = wb.get_sheet(main_paras["DestinationSheetName"])
    dest_column_index = main_paras["dest_column_index"]
    dest_data_column_index = main_paras["dest_data_column_index"]

    writen_count = 0
    num_rows = ws_origin.nrows
    for rown in range(num_rows):
        if rown < 5:
            continue
        key_cell = get_str_for_cell(ws_origin.cell_value(rown, dest_column_index))
        if key_cell not in reference_dict:
            print("error! can't find the value for key:", key_cell)
            continue
        data_value = reference_dict.get(key_cell)
        data_value_old = ws_origin.cell_value(rown, dest_data_column_index)
        if data_value == data_value_old:
            continue
        ws.write(rown, dest_data_column_index, data_value)
        ws.write(rown, 1, "M")
        writen_count = writen_count + 1

    print("totally modified rows:", writen_count)
    wb.save(get_saveas_name(main_paras["DestinationFileName"]))
    return


print(time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())), "PlanDataReplacer started work, please wait...")
main_paras = get_mainpara()
print("main_paras:", main_paras)

option_paras = get_optionpara()
print("option_paras:", option_paras)

reference_dict = get_reference_dict(main_paras)
print("reference_dict length:", len(reference_dict))
#print(reference_dict)

update_xlsx_file(main_paras, option_paras, reference_dict)


print(time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())), "PlanDataReplacer work complete.")

執行後列印資訊如下，則說明有479行資料已被更新。

如果您喜歡這篇文章，別忘了點贊和評論哦！

Python實戰之Excel資料按索引更新

在日常工作中，我們經常需要需要批量更新資料，比如有個destination表，裡面有一列的資料需要被更新，更新的依據為reference表，python指令碼執行前和執行後的資料列示意圖如下：我們使用Excel檔案作為confi

python讀取外部資料之excel資料獲取及引數說明

本文簡單介紹pandas.read_excel()引數應用官方函式引數 pandas.read_excel(io, sheetname=0,

Python實戰之dict簡單練習

簡單 fault zhang zha contain default san rom mat [‘__class__‘, ‘__contains__‘, ‘__delattr__‘, ‘__delitem__‘, ‘__dir__‘, ‘__doc__‘, ‘__eq__

python實戰之編碼問題：中文！永遠的痛

輸出 == 技術分享都是 -s dsm font clas ng- 編碼的思維圖譜：也就是說文件沒有編碼之說，事實上都是按二進制格式保存在硬盤中的。不過在寫入讀取時須使用相應的編碼進行處理，以便操作系統配合相關軟件/字體，繪制到屏幕中給人

Python實戰之雙向隊列deque/queue學習筆記及簡單練習

ons rep [] __new__ xtend color int pen queue [‘__add__‘, ‘__bool__‘, ‘__class__‘, ‘__contains__‘, ‘__copy__‘, ‘__delattr__‘, ‘__delitem__

LAMP、LNMP實戰之四搭建mysql(持續更新)

export 加密 base yum exe root with pat debug LAMP、LNMP實戰之四搭建mysql說明：服務器192.168.2.32 數據庫版本mysql5.5.32 cmake版本2.8.8yum install -y g

Python實戰之SocketServer模塊

utf8 mixin 程序通過框架 obj 基本使用取數據 rgs 文章出處：http://www.cnblogs.com/wupeiqi/articles/5040823.html SocketServer內部使用 IO多路復用以及 “多線程” 和 “多進程”

python自動化之excel

1.2 workbook 工作表 admin set max 設置合並單元格 .get import openpyxl wb=openpyxl.load_workbook(r‘C:\Users\Administrator\Desktop\sl.xlsx‘) type(wb

Python實戰之路-day2

pre day2 user 插入實戰 color bin 之路 col 我們始終都要遠行，最終都要與稚嫩的自己告別，告別是通向成長的苦行之路。 Hello Python - day2！ Python中的五種內建數據結構: 變量　 name = "guanq

Python實戰之Oracle數據庫操作

imp cx_oracle () rar aaa www log 簡單的 i386 1. 要想使Python可以操作Oracle數據庫，首先需要安裝cx_Oracle包，可以通過下面的地址來獲取安裝包 [plain] view plain copy http://c

讀書筆記博客實戰之搜索引擎索引和流量漲跌策略分析[圖]

讀書筆記寫在前面：最近百度動作頻繁，變化十分大，以至於很多網站都出現了流量的大範圍波動，引起了站長們的思考和分析猜測，但通過數據來分析是最可靠的觀點，那麽我們今天就來分析一下這幾天百度在流量漲跌方面有什麽變化吧。實戰分析：讀書筆記博客，主要用於教育類話題的寫作和學習，采用老域名制作的新站，通過該網頁的排名變化

Python實戰之unittest使用詳解

python unittest 一 unittest是什麽？unittest是python內置的單元測試框架，具備編寫用例、組織用例、執行用例、輸出報告等自動化框架的條件。使用unittest前需要了解該框架的五個概念: 即test case,test suite,testLoader，test r

python實戰之原生爬蟲(爬取熊貓主播排行榜)

ref png ret spider find end mod int tps """ this is a module,多行註釋 """ import re from urllib import request # BeautifulSoup:解析數據結構推薦庫

Python學習之路——Python基礎之基本資料型別(列表、元組和字典)

基本資料型別數字字串列表 list 元組 tuple 字典 dict 布林值 bool 列表和元組列表：有序，元素可以被修改元組：書寫格式：元組的一級元素不可被修改，不能被增加或者刪除，一般寫元租的時候，推薦在最後加入',' 索引：v =

Python學習之路——Python基礎之基本資料型別

基本資料型別數字字串列表 list 元祖 tuple 字典 dict 布林值 bool ×××的魔法數字將字串轉換為數字:int a = "123" print(type(a),a) b = int(a) print(type(b),b) n

Excel小技巧-你是否只知道表格按列排序?其實也可以按行排序!excel資料按行排序

Excel小技巧-你是否只知道表格按列排序?其實也可以按行排序!excel資料按行排序。系統預設情況下，資料都是按列進行排序的，可以通過簡單的操作，讓其進行按行進行排序。【解決方法，教程視訊資料如下】本教程視訊資料來源：http://edu.51cto.com/course/15404.html 完

Python實戰之MySQL資料庫操作

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

python筆記之基礎資料型別

八大基礎資料型別 int python 中沒有溢位，再大的值也可以用int num = 10 num++ 報錯，num只是儲存資料10的容器，容器不可以自增自減 print(num) # 列印容器中存放的值10 地址 print(id(num)) # id

Python實戰之網頁刷訪問量方法

一些網友偶爾心血來潮，想重新整理網頁訪問量，最近我嘗試著編寫實現了該功能。該功能需要兩個檔案如下：檔名

Python實戰之Excel資料按索引更新

相關推薦