pandas常用功能與函式介紹（結合例項，持續更新）

阿新 • • 發佈：2019-01-17

import numpy as np
import pandas as pd
from Cython.Shadow import inline
import matplotlib.pyplot as plt
#matplotlib inline
###################1 oridinal data##################
train_df = pd.read_csv('input/train.csv', index_col=0)#資料匯入
test_df = pd.read_csv('input/test.csv', index_col=0)
print("type of train_df:" + str(type(train_df)))

#print(train_df.columns)
print("shape of train_df:" + str(train_df.shape))
print("shape of test_df:" + str(test_df.shape))
train_df.head()#資料檢視
#print(train_df.head())
###################2 smooth label######################
prices = pd.DataFrame({"price":train_df["SalePrice"], "log(price+1)":np.log1p(train_df["SalePrice"])})

print("shape of prices:" + str(prices.shape))#資料建立
prices.hist()#直方圖
# plt.plot(alphas, test_scores)
# plt.title("Alpha vs CV Error")
plt.show()
y_train = np.log1p(train_df.pop('SalePrice'))
print("shape of y_train:" + str(y_train.shape))
###################3 take train and test data together##############

all_df = pd.concat((train_df, test_df), axis=0)#資料合併
print("shape of all_df:" + str(all_df.shape))
###################4 make category data to string####################
print(all_df['MSSubClass'].dtypes)
all_df['MSSubClass'] = all_df['MSSubClass'].astype(str)#資料格式轉換
all_df['MSSubClass'].value_counts()#相同數值個數統計
print(all_df['MSSubClass'].value_counts())
##################5 fill null###########################
all_dummy_df = pd.get_dummies(all_df)#one-hot編碼,顏色RGB，則R編碼為[1 0 0]
print(all_dummy_df.head())#下一行進行資料清洗，找到為空的屬性，並按照空的數量對屬性排序
print(all_dummy_df.isnull().sum().sort_values(ascending=False).head())
mean_cols = all_dummy_df.mean()#資料統計，均值
print(mean_cols.head(10))
all_dummy_df = all_dummy_df.fillna(mean_cols)#資料清洗，用()中的值代替空值
print(all_dummy_df.isnull().sum().sum())
###############6 smooth numeric cols########################
numeric_cols = all_df.columns[all_df.dtypes != 'object']#選取屬性不是object，即數值型資料
print(numeric_cols)
numeric_col_means = all_dummy_df.loc[:, numeric_cols].mean()#按照括號的索引選取資料，並求均值
numeric_col_std = all_dummy_df.loc[:, numeric_cols].std()
all_dummy_df.loc[:, numeric_cols] = (all_dummy_df.loc[:, numeric_cols] - numeric_col_means) / numeric_col_std
###############7 train model#######################
dummy_train_df = all_dummy_df.loc[train_df.index]
dummy_test_df = all_dummy_df.loc[test_df.index]
print("shape of dummy_train_df:" + str(dummy_train_df))
print("shape of dummy_test_df:" + str(dummy_test_df))
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_score
X_train = dummy_train_df.values
X_test = dummy_test_df.values
alphas = np.logspace(-3, 2, 50)
test_scores = []
for alpha in alphas:
clf = Ridge(alpha)
test_score = np.sqrt(-cross_val_score(clf, X_train, y_train, cv=10, scoring='neg_mean_squared_error'))
test_scores.append(np.mean(test_score))
plt.plot(alphas, test_scores)
plt.title("Alpha vs CV Error")
plt.show()
from sklearn.ensemble import RandomForestRegressor
max_features = [.1, .3, .5, .7, .9, .99]
test_scores = []
for max_feat in max_features:
clf = RandomForestRegressor(n_estimators=200, max_features=max_feat)
test_score = np.sqrt(-cross_val_score(clf, X_train, y_train, cv=5, scoring='neg_mean_squared_error'))
test_scores.append(np.mean(test_score))
plt.plot(max_features, test_scores)
plt.title("Max Features vs CV Error")
plt.show()
#########################8 stacking#####################
ridge = Ridge(alpha=15)
rf = RandomForestRegressor(n_estimators=200, max_features=.3)
ridge.fit(X_train, y_train)
rf.fit(X_train, y_train)
y_ridge = np.expm1(ridge.predict(X_test))
y_rf = np.expm1(rf.predict(X_test))
y_final = (y_ridge + y_rf)/2
######################9 submission############################
submission_df = pd.DataFrame(data = {'Id':test_df.index, 'SalePrice':y_final})
print(submission_df.head())

pandas常用功能與函式介紹（結合例項，持續更新）

import numpy as np import pandas as pd from Cython.Shadow import inline import matplotlib.pyplot as plt #matplotlib inline ###################1 or

oracle常用命令（日常整理，持續更新）

oracle常用命令一、Oracle資料庫例項、使用者、目錄及session會話檢視： 1、ORACLE SID檢視設定檢視SID、使用者名稱 $ env|grep SID 、select * from v$instance、select instance_name,h

各大機器學習包彙總（python版，持續更新）

隨著機器學習的逐日升溫，各種相關開源包也是層出不群，面對如此多種類的工具包，該如何選擇，有的甚至還知之甚少或者不知呢，本文簡單彙總了一下當下使用比較多的Python版本機器學習工具包，供大家參看，還很不全不詳盡，會持續更新，也歡迎大家補充，多謝多謝！~~~ scik

unsafe（未完成，持續更新）

unsafe本來的英文意思就是不安全的、危險的。在java中的角色同樣也是不安全的、危險的。它是在java包中的sun.msic，

iOS開發-常用第三方開源框架介紹（絕對夠你用了）

影象： 1.圖片瀏覽控制元件MWPhotoBrowser 實現了一個照片瀏覽器類似 iOS 自帶的相簿應用，可顯示來自手機的圖片或者是網路圖片，可自動從網路下載圖片並進行快取。可對圖片進行縮放等操作。下載：https:

【 MATLAB 】freqz 函式介紹（數字濾波器的頻率響應）

freqz Frequency response of digital filter Syntax [h,w] = freqz(b,a,n) [h,w] = freqz(d,n) [h,w] = freqz(___,n,'whole') freqz(___) [

【 MATLAB 】impz函式介紹（數字濾波器的脈衝響應）

這篇博文將MATLAB 幫助文件上的內容簡單的貼上，便於我寫其他博文引用，以及檢視使用。 impz Impulse response of digital filter Syntax [h,t]

起名與選擇器~（總結類、持續更新系列）

瀏覽器自定義輸入 class 引入 -type 一個 important css 廢話沒有，直接幹活　　一、起名方式：1.元素自身的標簽名； 2.利用class屬性自定義名稱；3.利用id屬性自定義名稱。共三種，其中class使用居多。　　二、選擇器：註：選擇器使用

Python 資料結構與演算法——列表（連結串列，linked list）

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

人臉識別的簡要介紹（附例項、python程式碼）！

01 介紹你是否意識到，每當你上傳照片到Facebook上，平臺都會用人臉識別演算法來識別圖片中的人物？目前還有一些政府在用人臉識別技術來識別和抓捕罪犯。此外，最常見的應用就是通過自己的臉部解鎖手機。計算機視覺的子領域應用得非常廣泛，並且全球很多商業活動都已經從中獲益。人臉識別模型的使用

與使用者互動與系統相關（7.1 ，7.2）

參考《瘋狂java講義》與使用者互動實際上，大部分程式都需要處理使用者動作，包括接受使用者的鍵盤輸入，滑鼠動作等。本章未涉及圖形使用者介面（GUI）程式設計，故本節主要介紹程式如何獲得使用者的鍵盤輸入。 1. 執行Java程式的引數回憶Java程式入口——main()方法的方法

我常用的網站（自己平時覺得好的，持續更新）

1.原始碼下載很火的一個原始碼託管網站，裡面有很多著名的原始碼，也有很多自己寫的上傳上去的，可以直接搜尋 https://github.com點選開啟連結 2.一個老外的部落格，裡面有交叉編譯好

常用運算放大器 - 選型列表（比較全，引數詳細）

附加晶片查詢網址入口：https://www.alldatasheetcn.com/ 29種常用的運算放大器-2018（已經打包完畢）：https://download.csdn.net/download/britripe/10831441 TI - 運放：

刁肥宅資料結構課設“布隆過濾器的實踐與應用”原始碼（v1.0，永不上交）

程式碼很簡單，寫了一些註釋；加上註釋看就很清楚了。檔案bloomfilter.cpp： #include "bloomfilter.h" // return a hash range from 0 to 79999 int hash(con

java筆試題的筆記（手寫，待更新）

方法更新 ofo illegal const blank private and prot 1、 String str=new String("abc"); 緊接著這段代碼之後的往往是這個問題，那就是這行代碼究竟創建了幾個String對象呢？ 2個。 2、

自己實戰整理面試題--Mysql（帶答案，不斷更新）

mysql目前用的版本？ 5.1.21；目前最高5.7.* left join，right join，inner join？ left join(左連線) 返回包括左表中的所有記錄和右表中連線欄位相等的記錄 right join(右連線) 返回包括右表中的所有記錄和左

自己實戰整理面試題--Http網路相關（帶答案，不斷更新）

*1.描述下網頁一個 Http 請求，到後端的整個請求過程: https://blog.csdn.net/w372426096/article/details/82012229 瀏覽器輸入https:www.koolearn.com這個URL，瀏覽器只知道名字是www.koolearn.

python 各個地方導航（方便查詢，持續更新！）

老男孩python全棧開發教程，武沛齊老師的知識點！：戳這裡》》》老男孩python全棧開發教程，linhaifeng老師的知識點！：戳這裡》》》老男孩python全棧開發教程，Eva_J老師的知識點！：戳這裡》》》廖雪峰官方python教程！：戳這裡》》》老男孩python全棧開發教程，My

手搓一個兔子問題（分享一個C語言問題，持續更新...）

大家好，我是小七夜，今天就不分享C語言的基礎知識了，分享一個比較好玩的C語言經典例題：兔子問題　　題目是這樣的：說有一個窮苦人這天捉到了一隻公兔子，為了能繁衍後代他又買了一隻母兔子，後來兔子開始生小兔子（一次生一個小兔子），假設兔子不會死亡，第二十次這個窮苦人能有多少隻兔子呢？　　題目解析：分析題目我

自己實戰整理面試題--多執行緒（帶答案，不斷更新）

一個執行緒兩次呼叫 start() 方法會出現什麼情況？執行緒的生命週期，狀態是如何轉移的？ Java 的執行緒是不允許啟動兩次的，第二次呼叫必然會丟擲 IllegalThreadStateException，這是一種執行時異常，多次呼叫 start 被認為是程式設計錯誤。關於執行緒生

pandas常用功能與函式介紹（結合例項，持續更新）

相關推薦