CSV檔案與列表字典之間的轉換

阿新 • • 發佈：2019-01-12

csv檔案與列表之間的轉換

將列表轉換為csv檔案
將巢狀字典的列表轉換為csv檔案

將列表轉換為csv檔案

最基本的轉換，將列表中的元素逐行寫入到csv檔案中

def list2csv(list, file):
    wr = csv.writer(open(file, 'wb'), quoting=csv.QUOTE_ALL)
    for word in list:
        wr.writerow([word])

將巢狀字典的列表轉換為csv檔案

這種屬於典型的csv檔案讀寫，常見的csv檔案常常是第一行為屬性欄，標明各個欄位，接下來每一行都是對應屬性的值，讀取時常常用字典來儲存（key為第一行的屬性，value為對應行的值）,例如

my_list = [{'players.vis_name': 'Khazri', 'players.role': 'Midfielder', 'players.country': 'Tunisia',
            'players.last_name': 'Khazri', 'players.player_id': '989', 'players.first_name': 'Wahbi',
            'players.date_of_birth': '08/02/1991', 'players.team': 'Bordeaux'},
           {'players.vis_name' 
: 'Khazri', 'players.role': 'Midfielder', 'players.country': 'Tunisia',
            'players.last_name': 'Khazri', 'players.player_id': '989', 'players.first_name': 'Wahbi',
            'players.date_of_birth': '08/02/1991', 'players.team': 'Sunderland'},
           {'players.vis_name': 'Lewis Baker', 'players.role' 
: 'Midfielder', 'players.country': 'England',
            'players.last_name': 'Baker', 'players.player_id': '9574', 'players.first_name': 'Lewis',
            'players.date_of_birth': '25/04/1995', 'players.team': 'Vitesse'}
           ]

而最後所有的字典巢狀到一個列表中儲存，而接下來是一個逆過程，即將這種嵌套了字典的列表還原為csv檔案儲存起來

# write nested list of dict to csv
def nestedlist2csv(list, out_file):
    with open(out_file, 'wb') as f:
        w = csv.writer(f)
        fieldnames=list[0].keys()  # solve the problem to automatically write the header
        w.writerow(fieldnames)
        for row in list:
            w.writerow(row.values())

注意其中的fieldnames用於傳遞key即第一行的屬性

csv檔案與字典之間的轉換

csv檔案轉換為字典
- 第一行為key，其餘行為value
- 每一行為key,value的記錄
csv檔案轉換為二級字典
字典轉換為csv檔案
- 第一行為key，其餘行為value
- 每一行為key,value的記錄

csv檔案轉換為字典

第一行為key，其餘行為value

針對常見的首行為屬性，其餘行為值的情形

# convert csv file to dict
# @params:
# key/value: the column of original csv file to set as the key and value of dict
def csv2dict(in_file,key,value):
    new_dict = {}
    with open(in_file, 'rb') as f:
        reader = csv.reader(f, delimiter=',')
        fieldnames = next(reader)
        reader = csv.DictReader(f, fieldnames=fieldnames, delimiter=',')
        for row in reader:
            new_dict[row[key]] = row[value]
    return new_dict

其中的new_dict[row[key]] = row[value]中的'key'和'value'是csv檔案中的對應的第一行的屬性欄位,需要注意的是這裡假設csv檔案比較簡單，所指定的key是唯一的，否則直接從csv轉換為dict檔案會造成重複欄位的覆蓋而丟失資料，如果原始資料指定作為key的列存在重複的情況，則需要構建列表字典，將value部分設定為list，可參照列表字典的構建部分程式碼

每一行為key,value的記錄

針對每一行均為鍵值對的特殊情形
這裡預設認為第一列為所構建的字典的key，而第二列對應為value，可根據需要進行修改

# convert csv file to dict(key-value pairs each row)
def row_csv2dict(csv_file):
    dict_club={}
    with open(csv_file)as f:
        reader=csv.reader(f,delimiter=',')
        for row in reader:
            dict_club[row[0]]=row[1]
    return dict_club

[更新]

字典列表

構造有值為列表的字典，主要適用於需要把csv中的某些列對應的值作為某一個列的值的情形
或者說本身並不適合作為單純的字典結構，同一個鍵對應的值不唯一

# build a dict of list like {key:[...element of lst_inner_value...]}
# key is certain column name of csv file
# the lst_inner_value is a list of specific column name of csv file
def build_list_dict(source_file, key, lst_inner_value):
    new_dict = {}
    with open(source_file, 'rb')as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            for element in lst_inner_value:
                new_dict.setdefault(row[key], []).append(row[element])
    return new_dict
# sample:
# test_club=build_list_dict('test_info.csv','season',['move from','move to'])
# print test_club

csv檔案轉換為二級字典

這個一般是特殊用途，將csv檔案進一步結構化，將其中的某一列(屬性)所對應的值作為key，然後將其餘鍵值對構成子字典作為value，一般用於匹配時優先過濾來建立一種層級結構提高準確度
例如我有csv檔案的記錄如下（以表格形式表示）

id	name	age	country
1	danny	21	China
2	Lancelot	22	America
…	…	…	…

經過二級字典轉換後（假設構建country-name兩級）得到如下字典

dct={'China':{'danny':{'id':'1','age':'21'}}
     'America':{'Lancelot':{'id':'2','age':'22'}}}

程式碼如下

# build specific nested dict from csv files(date->name)
def build_level2_dict(source_file):
    new_dict = {}
    with open(source_file, 'rb')as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            item = new_dict.get(row['country'], dict())
            item[row['name']] = {k: row[k] for k in ('id','age')}
            new_dict[row['country']] = item
    return new_dict

[更新]
進一步改進後可以使用更加靈活一點的方法來構建二級字典，不用修改內部程式碼，二是指定傳入的鍵和值，有兩種不同的字典構建，按需檢視

構建的二級字典的各層級的鍵值均人為指定為某一列的值

# build specific nested dict from csv files
# @params:
#   source_file
#   outer_key:the outer level key of nested dict
#   inner_key:the inner level key of nested dict
#   inner_value:set the inner value for the inner key
def build_level2_dict2(source_file,outer_key,inner_key,inner_value):
    new_dict = {}
    with open(source_file, 'rb')as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            item = new_dict.get(row[outer_key], dict())
            item[row[inner_key]] = row[inner_value]
            new_dict[row[outer_key]] = item
    return new_dict

指定第一層和第二層的字典的鍵，而將csv檔案中剩餘的鍵值對儲存為最內層的值

# build specific nested dict from csv files
# @params:
#   source_file
#   outer_key:the outer level key of nested dict
#   inner_key:the inner level key of nested dict,and rest key-value will be store as the value of inner key
def build_level2_dict(source_file,outer_key,inner_key):
    new_dict = {}
    with open(source_file, 'rb')as csv_file:
        reader = csv.reader(csv_file, delimiter=',')
        fieldnames = next(reader)
        inner_keyset=fieldnames
        inner_keyset.remove(outer_key)
        inner_keyset.remove(inner_key)
        csv_file.seek(0)
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            item = new_dict.get(row[outer_key], dict())
            item[row[inner_key]] = {k: row[k] for k in inner_keyset}
            new_dict[row[outer_key]] = item
    return new_dict

還有另一種構建二級字典的方法，利用的是pop()方法，但是個人覺得不如這個直觀，貼在下面

def build_dict(source_file):
    projects = defaultdict(dict)
    # if there is no header within the csv file you need to set the header 
    # and utilize fieldnames parameter in csv.DictReader method
    # headers = ['id', 'name', 'age', 'country']
    with open(source_file, 'rb') as fp:
        reader = csv.DictReader(fp, dialect='excel', skipinitialspace=True)
        for rowdict in reader:
            if None in rowdict:
                del rowdict[None]
            nationality = rowdict.pop("country")
            date_of_birth = rowdict.pop("name")
            projects[nationality][date_of_birth] = rowdict
    return dict(projects)

[更新]
另外另種構造二級字典的方法，主要是針對csv檔案並不適合直接構造單純的字典結構，某些鍵對應多個值，所以需要在內部用列表來儲存值，或者對每一個鍵值對用列表儲存

用列表儲存鍵值對

# build specific nested dict from csv files
# @params:
#   source_file
#   outer_key:the outer level key of nested dict
#   lst_inner_value: a list of column name,for circumstance that the inner value of the same outer_key are not distinct
#   {outer_key:[{pairs of lst_inner_value}]}
def build_level2_dict3(source_file,outer_key,lst_inner_value):
    new_dict = {}
    with open(source_file, 'rb')as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            new_dict.setdefault(row[outer_key], []).append({k: row[k] for k in lst_inner_value})
    return new_dict

用列表儲存值域

# build specific nested dict from csv files
# @params:
#   source_file
#   outer_key:the outer level key of nested dict
#   lst_inner_value: a list of column name,for circumstance that the inner value of the same outer_key are not distinct
#   {outer_key:{key of lst_inner_value:[...value of lst_inner_value...]}}
def build_level2_dict4(source_file,outer_key,lst_inner_value):
    new_dict = {}
    with open(source_file, 'rb')as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            # print row
            item = new_dict.get(row[outer_key], dict())
            # item.setdefault('move from',[]).append(row['move from'])
            # item.setdefault('move to', []).append(row['move to'])
            for element in lst_inner_value:
                item.setdefault(element, []).append(row[element])
            new_dict[row[outer_key]] = item
    return new_dict

# build specific nested dict from csv files
# @params:
#   source_file
#   outer_key:the outer level key of nested dict
#   lst_inner_key:a list of column name
#   lst_inner_value: a list of column name,for circumstance that the inner value of the same lst_inner_key are not distinct
#   {outer_key:{lst_inner_key:[...lst_inner_value...]}}
def build_list_dict2(source_file,outer_key,lst_inner_key,lst_inner_value):
    new_dict = {}
    with open(source_file, 'rb')as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            # print row
            item = new_dict.get(row[outer_key], dict())
            item.setdefault(row[lst_inner_key], []).append(row[lst_inner_value])
            new_dict[row[outer_key]] = item
    return new_dict

# dct=build_list_dict2('test_info.csv','season','move from','move to')

構造三級字典

類似的，可以從csv重構造三級字典甚至多級字典，方法和上面的類似，就不贅述了，只貼程式碼

# build specific nested dict from csv files
# a dict like {outer_key:{inner_key1:{inner_key2:{rest_key:rest_value...}}}}
# the params are extract from the csv column name as you like
def build_level3_dict(source_file,outer_key,inner_key1,inner_key2):
    new_dict = {}
    with open(source_file, 'rb')as csv_file:
        reader = csv.reader(csv_file, delimiter=',')
        fieldnames = next(reader)
        inner_keyset=fieldnames
        inner_keyset.remove(outer_key)
        inner_keyset.remove(inner_key1)
        inner_keyset.remove(inner_key2)
        csv_file.seek(0)
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            item = new_dict.get(row[outer_key], dict())
            sub_item = item.get(row[inner_key1], dict())
            sub_item[row[inner_key2]] = {k: row[k] for k in inner_keyset}
            item[row[inner_key1]] = sub_item
            new_dict[row[outer_key]] = item
    return new_dict

# build specific nested dict from csv files
# a dict like {outer_key:{inner_key1:{inner_key2:inner_value}}}
# the params are extract from the csv column name as you like
def build_level3_dict2(source_file,outer_key,inner_key1,inner_key2,inner_value):
    new_dict = {}
    with open(source_file, 'rb')as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            item = new_dict.get(row[outer_key], dict())
            sub_item = item.get(row[inner_key1], dict())
            sub_item[row[inner_key2]] = row[inner_value]
            item[row[inner_key1]] = sub_item
            new_dict[row[outer_key]] = item
    return new_dict

這裡同樣給出兩種根據不同需求構建字典的方法，一種是將剩餘的鍵值對原封不動地儲存為最內部的值，另一種是隻取所需要的鍵值對保留。

此外還有一種特殊情形，當你的最內部的值不是一個單獨的元素而需要是一個列表來儲存多個對應同一個鍵的元素，則只需要對於最內部的鍵值對進行修改

# build specific nested dict from csv files
# a dict like {outer_key:{inner_key1:{inner_key2:[inner_value]}}}
# for multiple inner_value with the same inner_key2,thus gather them in a list
# the params are extract from the csv column name as you like
def build_level3_dict3(source_file,outer_key,inner_key1,inner_key2,inner_value):
    new_dict = {}
    with open(source_file, 'rb')as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            item = new_dict.get(row[outer_key], dict())
            sub_item = item.get(row[inner_key1], dict())
            sub_item.setdefault(row[inner_key2], []).append(row[inner_value])
            item[row[inner_key1]] = sub_item
            new_dict[row[outer_key]] = item
    return new_dict

其中的核心部分是這一句
sub_item.setdefault(row[inner_key2], []).append(row[inner_value])

字典轉換為csv檔案

每一行為key,value的記錄
第一行為key，其餘行為value
輸出列表字典

每一行為key,value的記錄

前述csv檔案轉換為字典的逆過程，比較簡單就直接貼程式碼啦

def dict2csv(dict,file):
    with open(file,'wb') as f:
        w=csv.writer(f)
        # write each key/value pair on a separate row
        w.writerows(dict.items())

第一行為key，其餘行為value

def dict2csv(dict,file):
    with open(file,'wb') as f:
        w=csv.writer(f)
        # write all keys on one row and all values on the next
        w.writerow(dict.keys())
        w.writerow(dict.values())

輸出列表字典

其實這個不太常用，倒是逆過程比較常見，就是從常規的csv檔案匯入到列表的字典（本身是一個字典，csv檔案的首行構成鍵，其餘行依次構成對應列下的鍵的值，其中值形成列表），不過如果碰到這種情形要儲存為csv檔案的話，做法如下

import csv
import pandas as pd
from collections import OrderedDict

dct=OrderedDict()
dct['a']=[1,2,3,4]
dct['b']=[5,6,7,8]
dct['c']=[9,10,11,12]

header = dct.keys()
rows=pd.DataFrame(dct).to_dict('records')

with open('outTest.csv', 'wb') as f:
    f.write(','.join(header))
    f.write('\n')
    for data in rows:
        f.write(",".join(str(data[h]) for h in header))
        f.write('\n')

這裡用到了三個包，除了csv包用於常規的csv檔案讀取外，其中OrderedDict用於讓csv檔案輸出後保持原有的列的順序，而pandas則適用於中間的一步將列表構成的字典轉換為字典構成的列表，舉個例子

[('a', [1, 2, 3, 4]), ('b', [5, 6, 7, 8]), ('c', [9, 10, 11, 12])]
to
[{'a': 1, 'c': 9, 'b': 5}, {'a': 2, 'c': 10, 'b': 6}, {'a': 3, 'c': 11, 'b': 7}, {'a': 4, 'c': 12, 'b': 8}]

特殊的csv檔案的讀取

這個主要是針對那種分隔符比較特殊的csv檔案，一般情形下csv檔案統一用一種分隔符是關係不大的（向上述操作基本都是針對分隔符統一用,的情形），而下面這種第一行屬性分隔符是,而後續值的分隔符均為;的讀取時略有不同，一般可逐行轉換為字典在進行操作，程式碼如下:

def func(id_list,input_file,output_file):
    with open(input_file, 'rb') as f:
        # if the delimiter for header is ',' while ';' for rows
        reader = csv.reader(f, delimiter=',')
        fieldnames = next(reader)

        reader = csv.DictReader(f, fieldnames=fieldnames, delimiter=';')        
        rows = [row for row in reader if row['players.player_id'] in set(id_list)]
        # operation on rows...

可根據需要修改分隔符中的內容.

關於csv檔案的一些操作我在實驗過程中遇到的問題大概就是這些啦，大部分其實都可以在stackoverflow上找到或者自己提問解決，上面的朋友還是很給力的，後續會小結一下實驗過程中的一些對資料的其他處理如格式轉換，除重，重複判斷等等

最後，原始碼我釋出在github上的csv_toolkit裡面，歡迎隨意玩耍~

CSV檔案與列表字典之間的轉換

csv檔案與列表之間的轉換

將列表轉換為csv檔案

將巢狀字典的列表轉換為csv檔案

csv檔案與字典之間的轉換

csv檔案轉換為字典

第一行為key，其餘行為value

每一行為key,value的記錄

字典列表

csv檔案轉換為二級字典

用列表儲存鍵值對

用列表儲存值域

構造三級字典

字典轉換為csv檔案

每一行為key,value的記錄

第一行為key，其餘行為value

輸出列表字典

特殊的csv檔案的讀取

CSV檔案與列表字典之間的轉換

Python筆記【4】_字串&列表&元組&字典之間轉換學習

python字符串與列表的相互轉換

一種讀取類csv格式字串/列表字典巢狀字串的方式

cocos2d-x 關於 std::string 與 const char* 之間轉換的奇怪問題

基本資料型別的介紹及轉換，基本資料型別與字串之間轉換，字串與字元陣列之間轉換以及字串與位元組陣列之間轉換

2.6 使用for迴圈遍歷檔案 2.7 使用while迴圈遍歷檔案 2.8 統計系統剩餘的記憶體 2.9 資料型別轉換計算（計算mac地址） 3.0 資料型別轉換（列表與字典相互轉換）

將字典轉換成變量，字符串與列表相互轉換

python 列表與字典相互轉換

C 物件檔案與二進位制串（byte陣列）之間的轉換

python資料結構之列表、元組及元組與列表之間的相互轉換

python中字典轉換為.csv檔案

python 中list（列表），tupe(元組)，str（字串），dict(字典)之間的相互轉換

python 字符串,列表,元組,字典相互轉換

Java中字節與對象之間的轉換

Python字符串、元組、列表、字典互相轉換的方法

python——時間與時間戳之間的轉換

PowerDesigner概念模型與物理模型相互轉換及導出數據字典

php中對象類型與數組之間的轉換

Python3中字符串的編碼與解碼以及編碼之間轉換(decode、encode)

CSV檔案與列表字典之間的轉換

csv檔案與列表之間的轉換

將列表轉換為csv檔案

將巢狀字典的列表轉換為csv檔案

csv檔案與字典之間的轉換

csv檔案轉換為字典

第一行為key，其餘行為value

每一行為key,value的記錄

字典列表

csv檔案轉換為二級字典

用列表儲存鍵值對

用列表儲存值域

構造三級字典

字典轉換為csv檔案

每一行為key,value的記錄

第一行為key，其餘行為value

輸出列表字典

特殊的csv檔案的讀取

相關推薦