python數據分析1

分類:編程 時間:2017-02-13

collections之DataFrame和Series

DataFrame:用於把json字符串轉化成表格形式

frame如果是DataFrame類型,那麽可以把他看成一個表

其中frame['列名']得到的就是一列數據,也稱之為Series

使用series.value_counts()可以得到數據出現的頻度

 
frame
Out[64]: 
                                                   a              al   c  \
0  Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...  en-US,en;q=0.8  US   
1                             GoogleMaps/RochesterNY             NaN  US   
2  Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ...           en-US  US   
3  Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...           pt-br  BR   
4  Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...  en-US,en;q=0.8  US   

           cy       g  gr       h          hc         hh         l  \
0     Danvers  A6qOVH  MA  wfLQtf  1331822918  1.usa.gov   orofrog   
1       Provo  mwszkS  UT  mwszkS  1308262393       j.mp     bitly   
2  Washington  xxr3Qb  DC  xxr3Qb  1331919941  1.usa.gov     bitly   
3        Braz  zCaLwp  27  zUtuOu  1331923068  1.usa.gov  alelex88   
4  Shrewsbury  9b6kNl  MA  9b6kNl  1273672411     bit.ly     bitly   

                         ll  nk  \
0   [42.576698, -70.954903]   1   
1  [40.218102, -111.613297]   0   
2     [38.9007, -77.043098]   1   
3  [-23.549999, -46.616699]   0   
4   [42.286499, -71.714699]   0   

                                                   r           t  \
0  http://www.Facebook.com/l/7AQEFzjSi/1.usa.gov/...  1331923247   
1                           http://www.AwareMap.com/  1331923249   
2                               http://t.co/03elZC4Q  1331923250   
3                                             direct  1331923249   
4                http://www.shrewsbury-ma.gov/selco/  1331923251   

                  tz                                                  u  
0   America/New_York        http://www.ncbi.nlm.nih.gov/pubmed/22415991  
1     America/Denver        http://www.monroecounty.gov/etc/911/rss.php  
2   America/New_York  http://boxer.senate.gov/en/press/releases/0316...  
3  America/Sao_Paulo            http://apod.nasa.gov/apod/ap120312.html  
4   America/New_York  http://www.shrewsbury-ma.gov/egov/gallery/1341...  

In [65]: frame['tz']
Out[65]: 
0     America/New_York
1       America/Denver
2     America/New_York
3    America/Sao_Paulo
4     America/New_York
Name: tz, dtype: object

In [66]: frame['tz'].value_counts()
Out[66]: 
America/New_York     3
America/Sao_Paulo    1
America/Denver       1
Name: tz, dtype: int64


補上未知值的兩個方法
clean_tz = frame['tz'].fillna("Missing")

clean_tz[clean_tz == ''] = "unknown"


Tags: compatible Windows python Series frame

文章來源:


ads
ads

相關文章
ads

相關文章

ad