python – 如何子類化大pandasDataFrame?
).
有一些SO執行緒的主題,但我希望有人在這裡可以提供一個更系統的帳戶,目前最好的方式,子類化pandas.DataFrame滿足兩個,我認為一般要求:
import numpy as np import pandas as pd class MyDF(pd.DataFrame): # how to subclass pandas DataFrame? pass mydf = MyDF(np.random.randn(3,4), columns=['A','B','C','D']) print type(mydf)# <class '__main__.MyDF'> # Requirement 1: Instances of MyDF, when calling standard methods of DataFrame, # should produce instances of MyDF. mydf_sub = mydf[['A','C']] print type(mydf_sub)# <class 'pandas.core.frame.DataFrame'> # Requirement 2: Attributes attached to instances of MyDF, when calling standard # methods of DataFrame, should still attach to the output. mydf.myattr = 1 mydf_cp1 = MyDF(mydf) mydf_cp2 = mydf.copy() print hasattr(mydf_cp1, 'myattr')# False print hasattr(mydf_cp2, 'myattr')# False
對於分類大pandasSeries有什麼顯著的差異?謝謝.
對於要求1,只需定義_constructor:
import pandas as pd import numpy as np class MyDF(pd.DataFrame): @property def _constructor(self): return MyDF mydf = MyDF(np.random.randn(3,4), columns=['A','B','C','D']) print type(mydf) mydf_sub = mydf[['A','C']] print type(mydf_sub)
我認為需求2沒有簡單的解決方案,我想你需要定義__init__,複製或者在_constructor中做一些事情,例如:
import pandas as pd import numpy as np class MyDF(pd.DataFrame): _attributes_ = "myattr1,myattr2" def __init__(self, *args, **kw): super(MyDF, self).__init__(*args, **kw) if len(args) == 1 and isinstance(args[0], MyDF): args[0]._copy_attrs(self) def _copy_attrs(self, df): for attr in self._attributes_.split(","): df.__dict__[attr] = getattr(self, attr, None) @property def _constructor(self): def f(*args, **kw): df = MyDF(*args, **kw) self._copy_attrs(df) return df return f mydf = MyDF(np.random.randn(3,4), columns=['A','B','C','D']) print type(mydf) mydf_sub = mydf[['A','C']] print type(mydf_sub) mydf.myattr1 = 1 mydf_cp1 = MyDF(mydf) mydf_cp2 = mydf.copy() print mydf_cp1.myattr1, mydf_cp2.myattr1
http://stackoverflow.com/questions/22155951/how-to-subclass-pandas-dataframe