RandomizedSearchCV和GridSearchCV,在呼叫fit方法的時候產生'list' object has no attribute 'values'錯誤之處理方法
阿新 • • 發佈:2018-12-31
【pyhon 版本 3.5.0 skit-learn版本<0.18.1>】
昨天發現的問題,RandomizedSearchCV怎麼都調不通:
# Split the dataset in two equal parts X_train, X_test, y_train, y_test = train_test_split( data,label, test_size=0.25, random_state=0) # Set the parameters by cross-validation tuned_parameters = [{'n_neighbors': range(2,7)}, {'leaf_size':range(9,100,3)}, {'p':range(1,5)}] svr=KNeighborsClassifier() scores = ['precision', 'recall'] for score in scores: print("# Tuning hyper-parameters for %s" % score) print() labels=y_train.values aa c, r = labels.shape labels = labels.reshape(c,) clf = RandomizedSearchCV(svr, tuned_parameters,cv=5,n_jobs=-1,verbose=3) # clf = GridSearchCV(svr, tuned_parameters,cv=5,n_jobs=-1,verbose=3) clf.fit(X_train, labels)
報錯如下:
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "C:/Users/gzhuangzhongyi/Desktop/NetEase/test/RandomSearchCV_Functional.py", line 46, in <module> clf.fit(X_train, labels) File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 1190, in fit return self._fit(X, y, groups, sampled_params) File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 564, in _fit for parameters in parameter_iterable File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 758, in __call__ while self.dispatch_one_batch(iterator): File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 603, in dispatch_one_batch tasks = BatchedCalls(itertools.islice(iterator, batch_size)) File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 127, in __init__ self.items = list(iterator_slice) File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 557, in <genexpr> )(delayed(_fit_and_score)(clone(base_estimator), X, y, self.scorer_, File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 230, in __iter__ for v in self.param_distributions.values()]) AttributeError: 'list' object has no attribute 'values'
經過檢視fit方法,發現無論如何調整fit方法的引數,都沒法執行。
但是如果換成GridSearchCV就可以執行。
經過檢視類實現,發現兩種類呼叫了相同的,fit方法,但是,fit方法有隱含傳入的引數:
sampled_params = ParameterSampler(self.param_distributions, self.n_iter, random_state=self.random_state) return self._fit(X, y, groups, sampled_params)
其中,sampled_params為傳入引數之取樣。
其傳入引數在初始化的時候傳入:
clf = RandomizedSearchCV(svr, tuned_parameters,cv=5,n_jobs=-1,verbose=3)
而,這個引數由:
tuned_parameters = [{'n_neighbors': range(2,7)},
{'leaf_size':range(9,100,3)},
{'p':range(1,5)}]
語句設定,這裡有三個字典。而正確的是:
tuned_parameters = [{'n_neighbors': range(2,7),
'leaf_size':range(9,100,3),
'p':range(1,5)}]
Grid的時候會遍歷字典中所有引數的組合,所以字典的劃分不重要。
for p in self.param_grid:
# Always sort the keys of a dictionary, for reproducibility
items = sorted(p.items())
if not items:
yield {}
else:
keys, values = zip(*items)
for v in product(*values):
params = dict(zip(keys, v))
yield params
但是Randomlize,當傳入字典的時候,會作為帶分佈的進行處理,對字典取值
# Always sort the keys of a dictionary, for reproducibility
items = sorted(self.param_distributions.items())
for _ in six.moves.range(self.n_iter):
params = dict()
for k, v in items:
if hasattr(v, "rvs"):
if sp_version < (0, 16):
params[k] = v.rvs()
else:
params[k] = v.rvs(random_state=rnd)
else:
params[k] = v[rnd.randint(len(v))]
yield params
Random會檢查傳入的引數,如果可以遍歷就認為是分佈。
於是傳入作為fit的引數集的時候,不是作為可遍歷的物件的字典,可以.values,而是一個一個把分佈元素組合成字典的list,但因為傳入的不是一個分佈而是一個list,所以不能對分佈取值。
上面的兩段函式GridSearchCV產生的引數集:
RandomizeSearchCV產生的引數集因為debug調不出來,無法展示。