python:pandas模組中的DataFrame結構及常用操作
阿新 • • 發佈:2019-01-28
轉載:http://blog.csdn.net/u014607457/article/details/51290582
1. 介紹
DataFrame unifies two or more Series into a single data structure.Each Series then represents a named column of the DataFrame, and instead of each column having its own index, the DataFrame provides a single index and the data in all columns is aligned to the
master index of the DataFrame.
這段話的意思是,DataFrame提供的是一個類似表的結構,由多個Series組成,而Series在DataFrame中叫columns
2. 相關操作
a.create
pd.DataFrame()
引數:
1、二維array;
2、Series 列表;
3、value為Series的字典;
a.1、二維array
import pandas as pd
import numpy as np
s1=np.array([1,2,3,4])
s2=np.array([5,6,7,8])
df=pd.DataFrame([s1,s2])
print df
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 1
- 2
- 3
- 4
- 5
- 6
- 7
a.2、Series列表(效果與二維array相同)
import pandas as pd
import numpy as np
s1=pd.Series(np.array ([1,2,3,4]))
s2=pd.Series(np.array([5,6,7,8]))
df=pd.DataFrame([s1,s2])
print df
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 1
- 2
- 3
- 4
- 5
- 6
- 7
a.3、value為Series的字典結構;
import pandas as pd
import numpy as np
s1=pd.Series(np.array([1,2,3,4]))
s2=pd.Series(np.array([5,6,7,8]))
df=pd.DataFrame({"a":s1,"b":s2});
print df
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 1
- 2
- 3
- 4
- 5
- 6
- 7
注:若建立使用的引數中,array、Series長度不一樣時,對應index的value值若不存在則為NaN
b.屬性
b.1 .columns :每個columns對應的keys
b.2 .shape:形狀,(a,b),index長度為a,columns數為b
b.3 .index;.values:返回index列表;返回value二維array
b.4 .head();.tail();
c.if-then 操作
c.1使用.ix[]
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})
df.ix[df.A>1,'B']= -1
print df
- 1
- 2
- 3
- 1
- 2
- 3
df.ix[條件,then操作區域]
c.2使用numpy.where
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})
df["then"]=np.where(df.A<3,1,0)
print df
- 1
- 2
- 3
- 1
- 2
- 3
np.where(條件,then,else)
d.根據條件選擇取DataFrame
d.1 直接取值df.[]
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})
df=df[df.A>=2]
print df
- 1
- 2
- 3
- 1
- 2
- 3
d.2 使用.loc[]
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]})
df=df.loc[df.A>2]
print df
- 1
- 2
- 3
- 1
- 2
- 3
(還有很多種方法就不一一列舉了)
e.Grouping
e.1groupby 形成group
df = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(),
'size': list('SSMMMLL'),
'weight': [8, 10, 11, 1, 20, 12, 12],
'adult' : [False] * 5 + [True] * 2});
#列出動物中weight最大的對應size
group=df.groupby("animal").apply(lambda subf: subf['size'][subf['weight'].idxmax()])
print group
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 1
- 2
- 3
- 4
- 5
- 6
- 7
e.2 使用get_group 取出其中一分組
df = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(),
'size': list('SSMMMLL'),
'weight': [8, 10, 11, 1, 20, 12, 12],
'adult' : [False] * 5 + [True] * 2});
group=df.groupby("animal")
cat=group.get_group("cat")
print cat
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8