1. 程式人生 > >關於python中時間格式的疑惑

關於python中時間格式的疑惑

python中間的時間格式,尤其是在用到 pandas 和 numpy之後可是迷迷糊糊的,處理起帶有時間的資料時就很暈。下面結合stackoverflow中的回答  對  python中的datetime標準模組,numpy模組和pandas模組中的時間objects做個區分記錄。

The datetime standard library of Python

這裡面只有4個主要的物件:

  • time - 只有time,可以以hours,minutes,seconds和microseconds衡量
  • date - 只有year, month, day
  • datetime - 包含date和time的所有物件
  • timedelta - 最大單位是天的一段時間
import datetime

datetime.time(hour=1,minute=25,second=61,microsecond=6333)
Traceback (most recent call last):

  File "<ipython-input-2-8e2667fea8f6>", line 1, in <module>
    datetime.time(hour=1,minute=25,second=61,microsecond=6333)

ValueError: second must be in 0..59

datetime.time(hour=1,minute=25,second=22,microsecond=6333)
Out[3]: datetime.time(1, 25, 22, 6333)

datetime.date(year=2018,month=9,day=23)
Out[4]: datetime.date(2018, 9, 23)

datetime.datetime(year =2018,month=9,day=23,hour=20,minute=22,second=30,microsecond=3155)
Out[5]: datetime.datetime(2018, 9, 23, 20, 22, 30, 3155)

datetime.timedelta(days=3,minutes=55)
Out[6]: datetime.timedelta(3, 3300)

datetime.timedelta(days=3,minutes=55) + datetime.datetime(year =2018,month=9,day=23,hour=20,minute=22,second=30,microsecond=3155)
Out[7]: datetime.datetime(2018, 9, 26, 21, 17, 30, 3155)

datetime.date(2018,9,23)
Out[8]: datetime.date(2018, 9, 23)

datetime.date(2018,23,9)
Traceback (most recent call last):

  File "<ipython-input-40-258ea9b432d0>", line 1, in <module>
    datetime.date(2018,23,9)

ValueError: month must be in 1..12

可以看到中間,我瞎試了以下 second>59這是不允許的,然後你照著預設的年月日 時分秒的順序來其實是可以不用輸入 year=,month=,...這之類的

Numpy's datetime64 and timedelta64 objects

Numpy中間沒有分離date和time物件,只有一個datetime64物件表示一瞬間的時間,datetime模組中間的datetime物件精度為微秒級(10^-7)而Numpy中的datetime64物件精度有到attoseconds(10^-18),更靈活能有支援更多型別的輸入

import numpy as np

np.datetime64(5,'ns')
Out[9]: numpy.datetime64('1970-01-01T00:00:00.000000005')

np.datetime64('2018-09-23')
Out[10]: numpy.datetime64('2018-09-23')

np.datetime64('2018-9-23')
Traceback (most recent call last):

  File "<ipython-input-11-5f3797908da0>", line 1, in <module>
    np.datetime64('2018-9-23')

ValueError: Error parsing datetime string "2018-9-23" at position 5

np.datetime64('2018/09/23')
Traceback (most recent call last):

  File "<ipython-input-12-fbe5ac53716b>", line 1, in <module>
    np.datetime64('2018/09/23')

ValueError: Error parsing datetime string "2018/09/23" at position 4

np.datetime64('2018-09-23 05:00')
Out[13]: numpy.datetime64('2018-09-23T05:00')

np.timedelta64(5,'D')
Out[15]: numpy.timedelta64(5,'D')

np.datetime64('2018-09-23 05:00') - np.datetime64('2018-09-23 04:00:59')
Out[16]: numpy.timedelta64(3541,'s')

這裡可以看出datetime64對於 對於時間的格式要求還是很 嚴格的,而且必須帶單位,直接字串轉變的時候必須符合xxxx-xx-xx xx:xx:xx的形式,比如2018-09-24 變為2018-9-24都不行。

Pandas中的Timestamp和Timedelta

其實這兩個就是在Numpy的時間格式的基礎上深入,pandas中的Timestamp也是表示一瞬間的時間,跟datetime很相似,但有更多功能,可以用pd.Timestamp和pd.to_datetime來構建此物件。

import pandas as pd

pd.Timestamp(1234.1256537)#default ns
Out[19]: Timestamp('1970-01-01 00:00:00.000001234')

pd.Timestamp(1234.1256537, unit='h')#change units
Out[21]: Timestamp('1970-02-21 10:07:32.354399999')

pd.Timestamp('2018-9-23 5:00')
Out[22]: Timestamp('2018-09-23 05:00:00')

pd.to_datetime('2018-9-23 5:00')
Out[23]: Timestamp('2018-09-23 05:00:00')

pd.to_datetime(['2018-9-23 5:00','2018-9-23 15:00'])
Out[24]: DatetimeIndex(['2018-09-23 05:00:00', '2018-09-23 15:00:00'], dtype='datetime64[ns]', freq=None)

pd.to_datetime(['2018-9-23 5:00'])
Out[25]: DatetimeIndex(['2018-09-23 05:00:00'], dtype='datetime64[ns]', freq=None)

pd.to_datetime(['2018-9-23 5:00','2018-9-23 15:00'])[0]
Out[26]: Timestamp('2018-09-23 05:00:00')

a = pd.DataFrame([['2018-9-24 12:00',1,3],['2018-9-24 11:00',2,4],['2018-9-24 10:00',5,9]],columns=['date','num1','num2'])

a
Out[27]: 
              date  num1  num2
0  2018-9-24 12:00     1     3
1  2018-9-24 11:00     2     4
2  2018-9-24 10:00     5     9

a.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
date    3 non-null object
num1    3 non-null int64
num2    3 non-null int64
dtypes: int64(2), object(1)
memory usage: 152.0+ bytes

b = a.date.apply(lambda x:pd.Timestamp(x))

b
Out[28]: 
0   2018-09-24 12:00:00
1   2018-09-24 11:00:00
2   2018-09-24 10:00:00
Name: date, dtype: datetime64[ns]

b[0]
Out[29]: Timestamp('2018-09-24 12:00:00')

這裡可以看出pandas中對於時間的格式要求不高,2018-9-24也可以通過,但是也出來我很疑惑的一點了,在一個Series中,輸出info資訊,會出現dtype為 datetime64[ns], 但對於每一個單獨的,又是timestamp格式???

Convert Python datetime to datetime64 and Timestamp

這兩個轉變都很簡單,如下所示

dt = datetime.datetime(2018,9,24,13,39,40,34676)

dt
Out[59]: datetime.datetime(2018, 9, 24, 13, 39, 40, 34676)

np.datetime64(dt)
Out[60]: numpy.datetime64('2018-09-24T13:39:40.034676')

pd.Timestamp(dt)
Out[61]: Timestamp('2018-09-24 13:39:40.034676')

pd.to_datetime(dt)
Out[62]: Timestamp('2018-09-24 13:39:40.034676')

Convert datetime64 to datetime and Timestamp

前者比較麻煩 要先變為float 然後變為datetime 後者更容易 pd.Timestamp/to_datetime()

dt64  = np.datetime64('2017-10-24 05:34:00.136562')
dt64
Out[30]: numpy.datetime64('2017-10-24T05:34:00.136562')

unix_epoch = np.datetime64(0, 's')

one_second = np.timedelta64(1, 's')

seconds_since_epoch = (dt64 - unix_epoch) / one_second

seconds_since_epoch
Out[32]: 1508823240.1365621

datetime.datetime.utcfromtimestamp(seconds_since_epoch)
Out[33]: datetime.datetime(2017, 10, 24, 5, 34, 0, 136562)

pd.to_datetime(dt64)
Out[34]: Timestamp('2017-10-24 05:34:00.136562')

pd.Timestamp(dt64)
Out[35]: Timestamp('2017-10-24 05:34:00.136562')

Convert Timestamp to datetime datetime64

這個也比較簡單,如程式碼所示

ts = pd.Timestamp('2018-9-24 10:22:46.3654')

ts.to_pydatetime()#python's datetime
Out[37]: datetime.datetime(2018, 9, 24, 10, 22, 46, 365400)

ts.to_datetime64()
Out[38]: numpy.datetime64('2018-09-24T10:22:46.365400000')

這幾種都可以互相比較大小的嘛??

dt64
Out[63]: numpy.datetime64('2017-10-24T05:34:00.136562')

dt
Out[64]: datetime.datetime(2018, 9, 24, 13, 39, 40, 34676)

ts
Out[65]: Timestamp('2018-09-24 10:22:46.365400')

dt64>dt
Out[66]: False

ts>dt
Out[67]: False

ts>dt64
Out[68]: True

那兩種型別單獨輸出都是 timestamp 但是比較起來提示 float和timestamp不能比較的原因是??

有點懵