1. 程式人生 > >python 用逗號分隔欄位但被三個引號括起來的欄位不被逗號分隔的檔案轉化為dataframe

python 用逗號分隔欄位但被三個引號括起來的欄位不被逗號分隔的檔案轉化為dataframe

請教一個問題: 0,"""哎,想當年來佘山的時候,類來,空了。""",-2,-2,-2,0,-2,-2,-2,1,-2,-2,-2,-2,-2,-2,-2,0,-2,-2,1,0 這種資料怎麼用pandas讀到dataframe中,"""括起來的是一個欄位,dataframe用逗號分隔?
 

test_data/sentiment_analysis_testa.csv 檔案內容如下:

id,content,location_traffic_convenience,location_distance_from_business_district,location_easy_to_find,service_wait_time,service_waiters_attitude,service_parking_convenience,service_serving_speed,price_level,price_cost_effective,price_discount,environment_decoration,environment_noise,environment_space,environment_cleaness,dish_portion,dish_taste,dish_look,dish_recommendation,others_overall_experience,others_willing_to_consume_again

0,"""哎,想當年來佘山的時候,類來,空了。""",-2,-2,-2,0,-2,-2,-2,1,-2,-2,-2,-2,-2,-2,-2,0,-2,-2,1,0

1,"""哎,想如同人體讓他人突然候,類來,空了。""",-2,-2,-2,0,-2,-2,-2,1,-2,-2,-2,-2,-2,-2,-2,0,-2,-2,1,0

# 載入資料
def load_data_from_csv(file_name, header=0, encoding="utf-8"):

    data_df = pd.read_csv(file_name, header=header, encoding=encoding)

    return data_df
原始碼直接呼叫生成dataframe是錯誤的。

但直接 
data = pd.read_csv("./ai_challenger_sentiment_analysis_trainingset_20180816/sentiment_analysis_trainingset.csv")
這樣讀出的dataframe是對的。


def load_data_from_csv(file_name, header=0, encoding="utf-8"):
    cols=pd.read_csv(file_name, nrows=1).columns  
    data_lt= []
    with open(file_name, newline='', encoding="utf-8") as csvfile:
         freader = csv.reader(csvfile, delimiter=',', skipinitialspace=True)
         next(freader, None) 
         for row in freader: 
             data_lt.append(row)
    data_df= pd.DataFrame(data_lt,columns=cols)        
    return data_df
df2=load_data_from_csv("data/test_data/sentiment_analysis_testa.csv")

dataframe生成成功,也許可以把newline=''去掉,還沒試。

不管怎麼樣,嘗試出了另外一種生成dataframe的方法。