1. 程式人生 > >DataCamp Data Scientist with Python track 學習筆記

DataCamp Data Scientist with Python track 學習筆記

Importing Data in Python: 

Customizing your pandas import: 

# Import matplotlib.pyplot as plt
import matplotlib.pyplot as plt

# Assign filename: file
file = 'titanic_corrupt.txt'

# Import file: data
data = pd.read_csv(file, sep='\t', comment='#', na_values='Nothing')

# Print the head of the DataFrame
print(data.head())

# Plot 'Age' variable in a histogram
pd.DataFrame.hist(data[['Age']])
plt.xlabel('Age (years)')
plt.ylabel('count')
plt.show()

也許有的時候pandas預設被當作的缺失值還不能滿足要求,我們可以通過設定na_values,將指定的值替換成為NaN值。語句中的意思是將 'Nothing' 用NaN進行替代,將所有的Nothing都替換成了NaN。

'sep' is the 'pandas' version of 'delim', which in this case is tab-delimited. 

data.head() #默認出5行, 括號裡可以填其他資料。

 

Introduction to other file types: 

pickle提供了一個簡單的持久化功能,可以將物件以檔案的形式存放在磁碟上。python中幾乎所有的資料型別(列表,字典,集合,類等)都可以用pickle來序列化,而pickle序列化後的資料可讀性差。

If you merely want to be able to import them into Python, you can serialize them. All this means is converting the object into a sequence of bytes, or a bytestream. 

 

Customizing your spreadsheet import: 

# Parse the first sheet and rename the columns: df1
df1 = xl.parse(0, skiprows=[0], names=['Country', 'AAM due to War (2002)'])

# Print the head of the DataFrame df1
print(df1.head())

# Parse the first column of the second sheet and rename the column: df2
df2 = xl.parse(1, parse_cols=[0], skiprows=[0], names=['Country'])

# Print the head of the DataFrame df2
print(df2.head())