python實現城市和省份字典(根據城市判斷屬於哪個省份)
阿新 • • 發佈:2018-10-07
lam ret pandas csv style .data 一份 輸出 以及
首先,在網上拿到一份數據,省份和城市的對應表:
第一張圖是省份以及對應的ID
第二張圖是省份和城市,以及分別對應的ID
基本的思路是:實現一個字典,省份作為鍵,省份包含的城市作為值,舉個例子:{“江蘇省”:“南京市”,“蘇州市”,··· “徐州市”}。
下面用代碼實現:
1. 改變工作目錄
1 import os 2 import pandas as pd 3 os.chdir(r‘D:\inde\machineLearning\python\Province_city\xml‘)
2.1 導入省份數據(也就是圖1)
with open(‘Provinces.txt‘,‘r‘,encoding=‘utf-8‘) as f: file = f.read().strip().split(‘\n‘) num=[] province = [] for fi in file[2:36]: str1 = fi.split(‘"‘) m = str1[1] n = str1[3] num.append(m) province.append(n)
3.1 合並省份數據
province = pd.concat([pd.DataFrame(num),pd.DataFrame(province)],axis=1) province.columns= [‘id‘,‘province‘] province.head(2)
2.2 導入省份和城市數據(圖2)
p_id = [] c_id = [] c_name = [] with open(‘Cities.txt‘,‘r‘,encoding=‘utf-8‘) as f: file = f.read().strip().split(‘\n‘) for fi in file[2:347]: str2 = fi.split(‘"‘) m = str2[1] c = str2[3] i = str2[5] p_id.append(i) c_id.append(m) c_name.append(c)
3.2 合並城市和省份數據
city = pd.concat([pd.DataFrame(p_id),pd.DataFrame(c_id),pd.DataFrame(c_name)],axis=1) city.columns = [‘id‘,‘c_id‘,‘city‘] city.head(2)
4. 將上面兩份數據merge在一起
province_city=pd.merge(city,province,on=‘id‘,how=‘left‘)
得到的輸出結果如下圖:
5. 1 因為本人想要處理的數據裏面沒有“省”,“市”的後綴,所以把後綴去掉
def delete_postfix1(s,str,zizhi=None): if s[-1]==str: return s[0:-1] else: return s province_city.city = province_city.city.apply(lambda s:delete_postfix1(s,‘市‘)) province_city.province = province_city.province.apply(lambda s:delete_postfix1(s,‘省‘)) province_city.province = province_city.province.apply(lambda s:delete_postfix1(s,‘市‘))
5.2 同樣,把自治區的後綴去掉
def delete_postfix2(s,str): if s[0]==‘內‘: return s[0:3] elif s[-3:]==str: return s[0:2] else: return s province_city.province = province_city.province.apply(lambda s:delete_postfix2(s,‘自治區‘)) province_city.province = province_city.province.apply(lambda s:delete_postfix1(s,‘省‘)) province_city.province = province_city.province.apply(lambda s:delete_postfix1(s,‘市‘))
6 保存數據
province_city.to_csv(‘province_city.csv‘,index=0)
7. 轉為字典格式
dicts = {} for i in range(len(province.province)): k=province.province[i] province.id[i]==province_city.id v=list(province_city[province.id[i]==province_city.id].city) dict = {k:v} dicts.update(dict)
8. 看一下初步的輸出結果
到此為止,跟我們預想的完全一樣
9. 接下來,我們把字典格式結果保存,方便以後
import pickle #導入pickle pickle_file = open("dicts.pkl", "wb") # 創建一個pickle文件,但是打開方式必須是wb,二進制格式 pickle.dump(dicts,pickle_file ) # 數據導入文件 pickle_file.close()
pickle_file = open("dicts.pkl", "rb") mylist2 = load(pickle.file) pickle_file.close()
10. 看一下輸出結果
python實現城市和省份字典(根據城市判斷屬於哪個省份)