你已經決定來學習Python，但是你之前沒有程式設計經驗。因此，你常常對從哪兒著手而感到困惑，這麼多Python的知識需要去學習。以下這些是那些開始使用Python資料分析的初學者的普遍遇到的問題：

需要多久來學習Python？
我需要學習Python到什麼程度才能來進行資料分析呢？
學習Python最好的書或者課程有哪些呢？
為了處理資料集，我應該成為一個Python的程式設計專家嗎？

當開始學習一項新技術時，這些都是可以理解的困惑，這是《在20小時內學會任何東西》的作者所說的。不要害怕，我將會告訴你怎樣快速上手，而不必成為一個Python程式設計“忍者”。

不要犯我之前犯過的錯

在開始使用Python之前，我對用Python進行資料分析有一個誤解：我必須不得不對Python程式設計特別精通。因此，我參加了Udacity的Python程式設計入門課程，完成了code academy上的Python教程，同時閱讀了若干本Python程式設計書籍。就這樣持續了3個月（平均每天3個小時），我那會兒通過完成小的軟體專案來學習Python。敲程式碼是快樂的事兒，但是我的目標不是去成為一個Python開發人員，而是要使用Python資料分析。之後，我意識到，我花了很多時間來學習用Python進行軟體開發，而不是資料分析。

在幾個小時的深思熟慮之後，我發現，我需要學習5個Python庫來有效地解決一系列的資料分析問題。然後，我開始一個接一個的學習這些庫。

在我看來,精通用Python開發好的軟體才能夠高效地進行資料分析，這觀點是沒有必要的。

忽略給大眾的資源

有許多優秀的Python書籍和線上課程，然而我不併不推薦它們中的一些，因為，有些是給大眾準備的而不是給那些用來資料分析的人準備的。同樣也有許多書是“用Python科學程式設計”的，但它們是面向各種數學為導向的主題的，而不是成為為了資料分析和統計。不要浪費浪費你的時間去閱讀那些為大眾準備的Python書籍。

在進一步繼續之前，首先設定好你的程式設計環境，然後學習怎麼使用IPython notebook

學習途徑

從code academy開始學起，完成上面的所有練習。每天投入3個小時，你應該在20天內完成它們。Code academy涵蓋了Python基本概念。但是，它不像Udacity那樣以專案為導向;沒關係，因為你的目標是從事資料科學，而不是使用Python開發軟體。

當完成了code academy練習之後，看看這個Ipython notebook:

Python必備教程（在總結部分我已經提供了下載連結）。

它包括了code academy中沒有提到的一些概念。你能在1到2小時內學完這個教程。

現在，你知道足夠的基礎知識來學習Python庫了。

Numpy

首先，開始學習Numpy吧，因為它是利用Python科學計算的基礎包。對Numpy好的掌握將會幫助你有效地使用其他工具例如Pandas。

我已經準備好了IPython筆記，這包含了Numpy的一些基本概念。這個教程包含了Numpy中最頻繁使用的操作，例如，N維陣列，索引，陣列切片，整數索引，陣列轉換，通用函式，使用陣列處理資料，常用的統計方法，等等。

Index Numpy 遇到Numpy陌生函式，查詢用法，推薦！

Pandas

Pandas包含了高階的資料結構和操作工具，它們使得Python資料分析更加快速和容易。

教程包含了series, data frams，從一個axis刪除資料，缺失資料處理，等等。

Matplotlib

這是一個分為四部分的Matplolib教程。

1st 部分:

第一部分介紹了Matplotlib基本功能，基本figure型別。

Simple Plotting example

In [113]:

%matplotlib inline 
import matplotlib.pyplot as plt #importing matplot lib library
import numpy as np 
x = range(100) 
#print x, print and check what is x
y =[val**2 for val in x] 
#print y
plt.plot(x,y) #plotting x and y

Out[113]:

[<matplotlib.lines.Line2D at 0x7857bb0>]

fig, axes = plt.subplots(nrows=1, ncols=2)

for ax in axes:
    ax.plot(x, y, 'r')
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_title('title')
    
fig.tight_layout()

fig, ax = plt.subplots()

ax.plot(x, x**2, label="y = x**2")
ax.plot(x, x**3, label="y = x**3")
ax.legend(loc=2); # upper left corner
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('title');

fig, axes = plt.subplots(1, 2, figsize=(10,4))
      
axes[0].plot(x, x**2, x, np.exp(x))
axes[0].set_title("Normal scale")

axes[1].plot(x, x**2, x, np.exp(x))
axes[1].set_yscale("log")
axes[1].set_title("Logarithmic scale (y)");

n = np.array([0,1,2,3,4,5])

In [47]:

fig, axes = plt.subplots(1, 4, figsize=(12,3))

axes[0].scatter(xx, xx + 0.25*np.random.randn(len(xx)))
axes[0].set_title("scatter")

axes[1].step(n, n**2, lw=2)
axes[1].set_title("step")

axes[2].bar(n, n**2, align="center", width=0.5, alpha=0.5)
axes[2].set_title("bar")

axes[3].fill_between(x, x**2, x**3, color="green", alpha=0.5);
axes[3].set_title("fill_between");

Using Numpy

In [17]:

x = np.linspace(0, 2*np.pi, 100)
y =np.sin(x)
plt.plot(x,y)

Out[17]:

[<matplotlib.lines.Line2D at 0x579aef0>]

In [24]:

x= np.linspace(-3,2, 200)
Y = x ** 2 - 2 * x + 1.
plt.plot(x,Y)

Out[24]:

[<matplotlib.lines.Line2D at 0x6ffb310>]

In [32]:

# plotting multiple plots
x =np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)
z = np.cos(x)
plt.plot(x,y) 
plt.plot(x,z)
plt.show()

# Matplot lib picks different colors for different plot.

In [35]:

cd C:\Users\tk\Desktop\Matplot

C:\Users\tk\Desktop\Matplot

In [39]:

data = np.loadtxt('numpy.txt')
plt.plot(data[:,0], data[:,1]) # plotting column 1 vs column 2
# The text in the numpy.txt should look like this
# 0 0
# 1 1
# 2 4
# 4 16
# 5 25
# 6 36

Out[39]:

[<matplotlib.lines.Line2D at 0x740f090>]

In [56]:

data1 = np.loadtxt('scipy.txt') # load the file
print data1.T

for val in data1.T: #loop over each and every value in data1.T
    plt.plot(data1[:,0], val) #data1[:,0] is the first row in data1.T
    
# data in scipy.txt looks like this:
# 0 0  6
# 1 1  5
# 2 4  4 
# 4 16 3
# 5 25 2
# 6 36 1

[[  0.   1.   2.   4.   5.   6.]
 [  0.   1.   4.  16.  25.  36.]
 [  6.   5.   4.   3.   2.   1.]]

Scatter Plots and Bar Graphs

In [64]:

sct = np.random.rand(20, 2)
print sct
plt.scatter(sct[:,0], sct[:,1]) # I am plotting a scatter plot.

[[ 0.51454542  0.61859101]
 [ 0.45115993  0.69774873]
 [ 0.29051205  0.28594808]
 [ 0.73240446  0.41905186]
 [ 0.23869394  0.5238878 ]
 [ 0.38422814  0.31108919]
 [ 0.52218967  0.56526379]
 [ 0.60760426  0.80247073]
 [ 0.37239096  0.51279078]
 [ 0.45864677  0.28952167]
 [ 0.8325996   0.28479446]
 [ 0.14609382  0.8275477 ]
 [ 0.86338279  0.87428696]
 [ 0.55481585  0.24481165]
 [ 0.99553336  0.79511137]
 [ 0.55025277  0.67267026]
 [ 0.39052024  0.65924857]
 [ 0.66868207  0.25186664]
 [ 0.64066313  0.74589812]
 [ 0.20587731  0.64977807]]

Out[64]:

<matplotlib.collections.PathCollection at 0x78a7110>

In [65]:

ghj =[5, 10 ,15, 20, 25]
it =[ 1, 2, 3, 4, 5]
plt.bar(ghj, it) # simple bar graph

Out[65]:

<Container object of 5 artists>

In [74]:

ghj =[5, 10 ,15, 20, 25]
it =[ 1, 2, 3, 4, 5]
plt.bar(ghj, it, width =5)# you can change the thickness of a bar, by default the bar will have a thickness of 0.8 units

Out[74]:

<Container object of 5 artists>

In [75]:

ghj =[5, 10 ,15, 20, 25]
it =[ 1, 2, 3, 4, 5]
plt.barh(ghj, it) # barh is a horizontal bar graph

Out[75]:

<Container object of 5 artists>

Multiple bar charts

轉載：python資料分析總結

不要犯我之前犯過的錯

忽略給大眾的資源

學習途徑

Numpy

Pandas

Matplotlib

1st 部分:

Simple Plotting example

Using Numpy

Scatter Plots and Bar Graphs

轉載：python資料分析總結

（轉載）Python資料分析之pandas學習

Python資料分析基礎教程：NumPy學習指南（第2版） pdf 下載

分享《Python資料分析基礎教程：NumPy學習指南(第2版)》高清中文PDF+英文PDF+原始碼

分享《Python資料分析基礎教程：NumPy學習指南(第2版)》高清中文PDF+高清英文PDF+原始碼

Python資料分析基礎教程：NumPy學習指南第二章常用函式

Python資料分析基礎教程：NumPy學習指南第一章 NumPy基礎

Cris 的 Python 資料分析筆記 04：NumPy 矩陣的複製，排序，拓展

Cris 的 Python 資料分析筆記 03：NumPy 矩陣運算和常用函式（重點）

Cris 的 Python 資料分析筆記 02：NumPy 資料定位

Cris 的 Python 資料分析筆記 01：NumPy 基本知識

python資料分析新手入門課程學習——（二）探索分析與視覺化（來源：慕課網）

python資料分析新手入門課程學習——（一）資料獲取（來源：慕課網）

python資料分析新手入門課程學習——概述（來源：慕課網）

未明學院：入門資料分析，到底選Python還是R?

python資料分析：迴歸分析

Python 資料分析：第一篇準備工作

python資料分析：分類分析（classification analysis）

Cris 的 Python 資料分析筆記 07：Pandas 中的 Series 資料結構

Cris 的 Python 資料分析筆記 06：Pandas 常見的資料預處理

轉載：python資料分析總結

不要犯我之前犯過的錯

忽略給大眾的資源

學習途徑

Numpy

Pandas

Matplotlib

1st 部分:

Simple Plotting example

Using Numpy

Scatter Plots and Bar Graphs

相關推薦