pandas中的資料物件Series

阿新 • • 發佈：2019-01-05

pandas 的資料物件 Series

概要

用pandas 有一段時間，很少去總結，這篇文章簡單總結一些 pandas 中series 的一些常用方法，如果有更多的需要，可以查詢官方文件.

構造一個series 物件


import   numpy  as np
import  pandas as pd

s = pd.Series(np.arange(6),index=list("ABCDEF"))

Series 有兩部分組成

index 是索引物件,儲存標籤資訊
values 是儲存元素值的 ndarray陣列

s
A    0
B    1
C    2
D    3
E    4
F    5
dtype: int64
s.index 
Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')
s.values
array([0, 1, 2, 3, 4, 5])

series 可以通過位置，或者標籤資訊拿到資料,同時也支援python中的切片操作。

1.通過位置來索引

s[0]
0
s[1]
1
s[3]
3
s[2:4]
C    2
D    3
dtype: int64

通過index 來取值
如果通過index 去取值，切邊的話，兩端都包含。這一點和普通切片有點區別的.

s
A    0
B    1
C    2
D    3
E    4
F    5
dtype: int64

s['D']
3
s['B']
1
s['C':'E']
C    2
D    3
E    4
dtype: int64

dtype: int64
s['C':'F']
C    2
D    3
E    4
F    5
dtype: int64
s['C':'G']
C    2
D    3
E    4
F    5
dtype: int64

如果構造series沒有指定index, 則自動生成 0到series長度-1 的Rangeindex

s2 = pd.Series([20,30,40,50])
s2
0    20
1    30
2    40
3    50
dtype: int64
s2.index
RangeIndex(start=0, stop=4, step=1)
s2.values
array([20, 30, 40, 50])

來說談一下series的操作

1 排序操作

sort_values

sort_values(self, axis=0, ascending=True, inplace=False,  kind='quicksort', na_position='last')

na_position : {‘first’ or ‘last’}, default ‘last’
inplace= False or  True

sort_value 有幾個引數

ascending 是否升序 True or False
inplace 是否原地替換，預設False ,返回一個series 不改變原來的值，如果設定為True 直接改變series 的值。
na_position 空值的位置，放在前面還是後面，預設值是last，也可以是first
kind 是選擇什麼排序演算法，{‘quicksort’, ‘mergesort’ or ‘heapsort’} 預設是 quicksort

s = pd.Series([np.nan, 1, 3, 10,8,2, 5])

s = pd.Series([np.nan, 1, 3, 10, 5])



s = pd.Series([np.nan, 1, 3, 10, 5])

# 升序排序 預設
s.sort_values(ascending=True)
1     1.0
2     3.0
4     5.0


# 降序排序
s.sort_values(ascending=False)


# inplace  是否直接替換原來的series 的值 ,預設是False  
# 重新構造一個 series
s = pd.Series([np.nan, 1, 3, 10,8,2, 5])

s = pd.Series([np.nan, 1, 3, 10,8,2, 5])
s
0     NaN
1     1.0
2     3.0
3    10.0
4     8.0
5     2.0
6     5.0
dtype: float64
s.sort_values(inplace=True)
s
1     1.0
5     2.0
2     3.0
6     5.0
4     8.0
3    10.0
0     NaN
dtype: float64


s = pd.Series([np.nan, 1, 3, 10,8,2, 5])
s.sort_values()
1     1.0
5     2.0
2     3.0
6     5.0
4     8.0
3    10.0
0     NaN
dtype: float64


s.sort_values(na_position='first')
0     NaN
1     1.0
5     2.0
2     3.0
6     5.0
4     8.0
3    10.0
dtype: float64

Series去重操作

drop_duplicates

方法1 :drop_duplicates

def drop_duplicates(self, keep='first', inplace=False): pass 
  keep : {‘first’, ‘last’, False}, default ‘first’
        ‘first’ : Drop duplicates except for the first occurrence.
        ‘last’ : Drop duplicates except for the last occurrence.
        False : Drop all duplicates.
 inplace : boolean, default False
If True, performs operation inplace and returns None.

簡單解釋一下,

drop_duplicates 刪除重複的資料
keep 這個引數, 如果有重複的資料要保留哪一個 ,

first 保留第一個出現的,
last 保留最後一個出現的.
False 代表不保留直接刪除. (當然不是真的刪除,只是返回了一個series,如果要直接刪除需要和 inplace=True 結合使用)

inplace 引數 True, False 是否要原地刪除資料.

看一個例子

s = pd.Series(['lama', 'cow', 'frank','lama', 'frank','beetle', 'lama', 'hippo'], name='animal')


s
0      lama
1       cow
2     frank
3      lama
4     frank
5    beetle
6      lama
7     hippo
Name: animal, dtype: object

#保留第一個
s.drop_duplicates(keep='first')
0      lama
1       cow
2     frank
5    beetle
7     hippo
Name: animal, dtype: object
# 保留最後一個
s.drop_duplicates(keep='last')
1       cow
4     frank
5    beetle
6      lama
7     hippo
Name: animal, dtype: object

# 直接刪除 重複資料.
s.drop_duplicates(keep=False)
1       cow
5    beetle
7     hippo
Name: animal, dtype: object


# 當然 要想真的原地刪除,需要  inplace=True  這個選項 
s.drop_duplicates(keep=False,inplace=True)
s
1       cow
5    beetle
7     hippo
Name: animal, dtype: object

順便提下這個方法 s.duplicated(keep=False)

用來判斷是否是重複資料，標記為True or False

Series.duplicated(keep='first')[source]
Indicate duplicate Series values.
Duplicated values are indicated as True values in the resulting Series. Either all duplicates, all except the first or all except the last occurrence of duplicates can be indicated.

Parameters: keep : {‘first’, ‘last’, False}, default ‘first’
    ‘first’ : Mark duplicates as True except for the first occurrence.
    ‘last’ : Mark duplicates as True except for the last occurrence.
    False : Mark all duplicates as True.
Returns: pandas.core.series.Series


keep   有三個值 first   ,last ,False  

如果 keep  first  第一次出現是 False ,其他都是True 
    last  最後一次出現標記為 False  ，其他情況標記為True
   False     只要出現 就為 True

s2 = pd.Series(['lama', 'cow', 'frank','lama', 'frank','beetle', 'lama', 'hippo'], name='animal')

s2
0      lama
1       cow
2     frank
3      lama
4     frank
5    beetle
6      lama
7     hippo
Name: animal, dtype: object

s2.duplicated(keep='first')
0    False
1    False
2    False
3     True   # 第二次出現
4     True   # 第二次出現
5    False
6     True   # 第三次出現
7    False 
Name: animal, dtype: bool

s2.duplicated(keep='last')
0     True #  lama  第一次出現
1    False
2     True
3     True #  lama 第二次出現
4    False 
5    False
6    False #  lama  第三次出現 ，也是最後一次出現
7    False
Name: animal, dtype: bool

# 只要重複出現就為 True 
s2.duplicated(keep=False)
0     True
1    False
2     True
3     True
4     True
5    False
6     True
7    False
Name: animal, dtype: bool

方法2:Series.unique 方法
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unique.html#pandas.Series.unique

Series.unique()[source]
Return unique values of Series object.
Uniques are returned in order of appearance. Hash table-based unique, therefore does NOT sort.

Returns: ndarray or CategoricalThe unique values returned as a NumPy array. In case of categorical data type, returned as a Categorical.
該方法 直接就返回   array  保留一個 返回一個 array 物件

這個方法幾乎沒有什麼引數,直接返回一個去重的array  物件. 
The unique values returned as a NumPy array.

s = pd.Series([2, 1, 3, 3,4,1,2], name='number')

s.unique()
array([2, 1, 3, 4])
s
0    2
1    1
2    3
3    3
4    4
5    1
6    2
Name: number, dtype: int64

3.常用的一些對series 的操作

import pandas  as pd
import numpy as  np
s = pd.Series(data= range(10),name='numbers',index=[ chr(i) for i in range(97,107)])


# 獲取series的名稱
s.name
'numbers'

# 獲取所有的索引

print(s.index)

# 獲取所有的值
print(s.values)

# 判斷是否為nan 
print(s.isnull())

print(pd.isnull(s))

# 根據index 訪問值
# 列印所有的index
print(s.index)
print(s.get('b'))
print(s['b'])

# 一次訪問多個不連續的值.
# 傳入一個 list ,值是index  [index,index,index,]
print(s[['a', 'b', 'f', 'j']])


# 獲取series 的資料個數(Attributes)
print(s.size)
10

4 diff 函式

參考文件
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.diff.html#pandas.Series.diff

Calculates the difference of a Series element compared with another element in the Series (default is element in previous row).

Parameters: periods : int, default 1Periods to shift for calculating difference, accepts negative values.
Returns: diffed : Series

diff 預設是後一個值減去前面一個值, 這樣第一個就是 NaN , 後面都會有結果 .
有periods 可以設定怎麼減, 2 跨兩個減, 0 自己減自己. (當前index 對應的值相減)
-1 前面減去後面一個值 . (就是index小的減去index 大的.

values 裡面後面減去前面的值，如果沒有就是NaN ， periods 預設是1 ，如果是2 就是跨兩個數相減.

s = pd.Series([1, 1, 2, 3, 5, 8])
s = pd.Series([1, 1, 2, 3, 5, 8])
s.diff()
0    NaN
1    0.0
2    1.0
3    1.0
4    2.0
5    3.0
dtype: float64
s.diff(3)
0    NaN
1    NaN
2    NaN
3    2.0    #  3-1 
4    4.0    #  5-1 
5    6.0    #  8-2 
dtype: float64
s
0    1
1    1
2    2
3    3
4    5
5    8
dtype: int64
s.diff()
0    NaN
1    0.0
2    1.0
3    1.0
4    2.0
5    3.0
dtype: float64
s.diff(2)
0    NaN
1    NaN
2    1.0
3    2.0
4    3.0
5    5.0
dtype: float64
s
0    1
1    1
2    2
3    3
4    5
5    8
dtype: int64
s.diff(periods=2)
0    NaN
1    NaN
2    1.0
3    2.0
4    3.0
5    5.0
dtype: float64
s.diff(periods=-1)
0    0.0
1   -1.0
2   -1.0
3   -2.0
4   -3.0
5    NaN
dtype: float64
s.diff(periods=-2)
0   -1.0
1   -2.0
2   -3.0
3   -5.0
4    NaN
5    NaN
dtype: float64

# 自己減自己肯定是0 
s.diff(0)
0    0.0
1    0.0
2    0.0
3    0.0
4    0.0
5    0.0
dtype: float64

series 如何判斷是否為空值

斷是否是缺失值

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isna.html#pandas.Series.isna

# 判斷是否為None  , 如果為 None,返回 True 
series.isnull() 
series.isna() 


# 判斷是否不為None , 如果不為None ,返回 True 
series.notnull()
series.notna()

s= pd.Series([1, 8,10, np.NaN])
s
0     1.0
1     8.0
2    10.0
3     NaN
dtype: float64

# 是否為None
s.isna()
0    False
1    False
2    False
3     True
dtype: bool
s.isnull()
0    False
1    False
2    False
3     True
dtype: bool

# 是否不為None 
s.notnull()
0     True
1     True
2     True
3    False
dtype: bool
s.notna()
0     True
1     True
2     True
3    False
dtype: bool

6 series.tolist() 方法, 返回一個list 根據values 的值,返回一個list

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.tolist.html#pandas.Series.tolist
s.tolist()

Return a list of the values.

通過 series 轉成 python 的list 物件

s= pd.Series([1, 8,10, np.NaN]) 


s.tolist()
[1.0, 8.0, 10.0, nan]
s
0     1.0
1     8.0
2    10.0
3     NaN
dtype: float64
s.tolist()
[1.0, 8.0, 10.0, nan]


s= pd.Series(['laoda','laoer','laosan'],name='person') 
s
0     laoda
1     laoer
2    laosan
Name: person, dtype: object
s.tolist()
['laoda', 'laoer', 'laosan']

總結

本文總結了series常用的操作，Series 做為pandas 的核心物件之一，其實有很多方法和dataframe 是差不多的。更多的瞭解還請查詢官方文件,裡面有詳細的介紹.

參考文件

Series index 官方首頁 https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html

分享快樂,留住感動.2018-12-01 16:03:11 --frank

pandas中的資料物件Series

pandas 的資料物件 Series 概要用pandas 有一段時間，很少去總結，這篇文章簡單總結一些 pandas 中series 的一些常用方法，如果有更多的需要，可以查詢官方文件. series官方文件構造一個series 物件 import nu

pandas | 使用pandas進行資料處理——Series篇

本文始發於個人公眾號：**TechFlow**，原創不易，求個關注上週我們關於Python中科學計算庫Numpy的介紹就結束了，今天我們開始介紹一個新的常用的計算工具庫，它就是大名鼎鼎的Pandas。 Pandas的全稱是Python Data Analysis Library，是一種基於Numpy

Cris 的 Python 資料分析筆記 07：Pandas 中的 Series 資料結構

文章目錄 1. DataFrame 和 Series 關係 2. 新建 Series 資料結構（key 和 value） 3. Series 的排序 4. 區間求值 5. 根據 in

python pandas中series與dataframe資料型別屬性及操作基礎

一）屬性 series ：.index,.values, .name,.index.name dataframe ：.columns, .index,.values 二）建立方法 ser

pandas中一列含有多種資料型別的轉換：科學計演算法轉浮點數、字元對映

import pandas as pd import re def getNum(x): """ 科學計數法和字元轉浮點數 """ if re.findall(r'\d+\.\d+E\+',x): return "%.f" % float(x)

pandas中Series的多級索引

假設我們想分析2017年和2018年廣東，廣西，湖南的人口數。如果使用Series進行儲存的話，比較直接的方法如下： In[1]index = [('廣東',2017),('廣東',2018),('廣西', 2017),('廣西', 2018),('湖南',2017),('湖南', 201

Pandas學習2 --- 資料型別Series、DataFrame

Pandas的資料型別 Series(一維資料結構) Dataframe Series --- 帶標籤的一維陣列常用的初始化方法：可迭代物件 np陣列字典物件標量一、Series 1. Series初始化匯入 import pan

03 -2 numpy與pandas中isnull()、notnull()、dropna()、fillna()處理丟失資料的理解與例項

引入三劍客 import numpy as np import pandas as pd from pandas import Series,DataFrame 處理丟失資料 1.有兩種丟失資料： None: Python自帶的資料型別不能參與到任何計算中

pandas的資料結構之一series

Pandas的資料結構 1、Series Series是一種類似於一維陣列的物件，由下面兩個部分組成： index：相關的資料索引標籤 values：一組資料（ndarray型別） series的建立方法: 1.直接傳入一個列表 s1 = Series([1,2,3,4])s1

pandas中read_csv()方法和DataFrame物件的to_csv()

pandas中read_csv()方法和DataFrame物件的to_csv()方法的使用介紹安裝pandas pip3 install pandas to_csv() 官方呼叫介紹介紹：將DataFrame寫入逗號分隔值csv檔案

pandas中Series()和DataFrame()的區別與聯絡

區別： series，只是一個一維資料結構，它由index和value組成。 dataframe，是一個二維結構，除了擁有index和value之外，還擁有column。聯絡： dataframe由多個series組成，無論是行還是列，單獨拆分出來都是一個series。程式碼演示：

pandas中Series和Dataframe的排序操作

對pandas中的Series和Dataframe進行排序，主要使用sort_values()和sort_index()。 DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind=‘quicksort’,

Pandas中Series用法總結

Series：帶標籤的陣列本文對Pandas包中的一維資料型別Series特點及用法進行了總結歸納。 2.1 如何建立Sereis #匯入Pandas包 import pandas as pd #建立Series #1.1.1 通過列表List listSer=pd.Se

python學習筆記——（2）pandas中的資料型別

在用python進行資料處理的時候，自帶的五種資料型別使用起來顯然是有侷限性的，python之強大在於各種包，在資料處理中用的最多的就是pandas和numpy。本文章主要介紹pandas的資料結構。 pandas有兩種資料結構

【python學習筆記】41：認識Pandas中的資料變形

學習《Python3爬蟲、資料清洗與視覺化實戰》時自己的一些實踐。 Pandas資料變形關於stack()和unstack()見這裡和這裡。 import pandas as pd import numpy as np # 讀取杭州天氣檔案 df = pd.read

資料分析面試題之Pandas中的groupby

昨天晚上，筆者有幸參加了一場面試，有一個環節就是現場程式設計！題目如下：示例資料如下，求每名學生（ID）對應的成績（score）最高的那門科目（class）與ID，用Python實現：這個題目看上去很簡單，其實，並不簡單。即要求輸出形式如下：當然，我

向List中資料新增實體物件，實體物件最後一個會把之前的內容覆蓋

錯誤的寫法：（這樣寫等於一直在操作同一個物件，物件中的內容都一樣） List<CommissionSystem> cList = new ArrayList<CommissionSystem>(); for (Goods goods : gList) {

通過SAP函式組GOX_OBJECTS_GENERATE中的函式建立資料物件(域，資料元素，表)

*&---------------------------------------------------------------------* *& Report ZRCP10 *&-------------------------

【原】資料視覺化之Matplotlib : pandas中的繪圖函式

Pandas有許多能夠利用DataFrame物件資料組織特點來建立標準圖表的高階繪圖方法，本文主要介紹的是pandas中的繪圖函式。 #coding:utf-8 import matplotlib.pyplot as plt import pandas as pd import numpy as np fr

pandas 中DataFrame使用:資料標準化、資料分組、日期轉換、日期格式化、日期抽取

1資料標準化將資料按比例縮放，使之落入到特定區間，一般我們使用0-1標準化。公式如下： X∗=x−minmax−minX∗=x−minmax−min #導包 import pandas; from pandas import read_csv df=read_c

pandas中的資料物件Series

pandas 的資料物件 Series

概要

構造一個series 物件

來說談一下series的操作

簡單 解釋 一下,

總結

參考文件

相關推薦

簡單解釋一下,