DataCamp Data Scientist with Python track 學習筆記
阿新 • • 發佈:2018-11-21
Importing Data in Python:
Customizing your pandas import:
# Import matplotlib.pyplot as plt import matplotlib.pyplot as plt # Assign filename: file file = 'titanic_corrupt.txt' # Import file: data data = pd.read_csv(file, sep='\t', comment='#', na_values='Nothing') # Print the head of the DataFrame print(data.head()) # Plot 'Age' variable in a histogram pd.DataFrame.hist(data[['Age']]) plt.xlabel('Age (years)') plt.ylabel('count') plt.show()
也許有的時候pandas預設被當作的缺失值還不能滿足要求,我們可以通過設定na_values,將指定的值替換成為NaN值。語句中的意思是將 'Nothing' 用NaN進行替代,將所有的Nothing都替換成了NaN。
'sep' is the 'pandas' version of 'delim', which in this case is tab-delimited.
data.head() #默認出5行, 括號裡可以填其他資料。
Introduction to other file types:
pickle提供了一個簡單的持久化功能,可以將物件以檔案的形式存放在磁碟上。python中幾乎所有的資料型別(列表,字典,集合,類等)都可以用pickle來序列化,而pickle序列化後的資料可讀性差。
If you merely want to be able to import them into Python, you can serialize them. All this means is converting the object into a sequence of bytes, or a bytestream.
Customizing your spreadsheet import:
# Parse the first sheet and rename the columns: df1
df1 = xl.parse(0, skiprows=[0], names=['Country', 'AAM due to War (2002)'])
# Print the head of the DataFrame df1
print(df1.head())
# Parse the first column of the second sheet and rename the column: df2
df2 = xl.parse(1, parse_cols=[0], skiprows=[0], names=['Country'])
# Print the head of the DataFrame df2
print(df2.head())