Python基於皮爾遜系數實現股票預測（多線程）

阿新 • • 發佈：2018-12-06

author top def split pat init -s bubuko odi

  1 # -*- coding: utf-8 -*-
  2 """
  3 Created on Tue Dec  4 08:53:08 2018
  4 
  5 @author: zhen
  6 """
  7 from dtw import fastdtw
  8 import matplotlib.pyplot as plt
  9 import numpy as np
 10 import pandas as pd
 11 import threading
 12 import time
 13 from datetime import datetime
 
 14 
 15 def normalization(x): # np.std:計算矩陣的標準差（方差的算術平方根）
 16     return (x - np.mean(x)) / np.std(x)
 17 
 18 def corrcoef(a,b):
 19     corrc = np.corrcoef(a,b) # 計算皮爾遜相關系數，用於度量兩個變量之間的相關性，其值介於-1到1之間
 20     corrc = corrc[0,1]
 21     return (16 * ((1 - corrc) / (1 + corrc)) ** 1) # ** 表示乘方
 22 
         
 23 startTimeStamp = datetime.now() # 獲取當前時間
 24 # 加載數據
 25 filename = ‘C:/Users/zhen/.spyder-py3/sh000300_2017.csv‘
 26 # 獲取第一，二列的數據
 27 all_date = pd.read_csv(filename,usecols=[0, 1], dtype = ‘str‘)
 28 all_date = np.array(all_date)
 29 data = all_date[:, 0]
 30 times = all_date[:, 1]
 31 
 32 
 data_points = pd.read_csv(filename,usecols=[3])
 33 data_points = np.array(data_points)
 34 data_points = data_points[:,0] #數據
 35 
 36 topk = 10 #只顯示top-10
 37 baselen = 100 # 假設在50到150之間變化
 38 basebegin = 365
 39 basedata = data[basebegin]+‘ ‘+times[basebegin]+‘~‘+data[basebegin+baselen-1]+‘ ‘+times[basebegin+baselen-1]
 40 length = len(data_points) #數據長度
 41 
 42 # 定義自定義線程類
 43 class Thread_Local(threading.Thread):
 44     def __init__(self, thread_id, name, counter):
 45         threading.Thread.__init__(self)
 46         self.thread_id = thread_id
 47         self.name = name
 48         self.counter = counter
 49         self.__running = threading.Event() # 標識停止線程
 50         self.__running.set() # 設置為True
 51         
 52     def run(self):
 53         print("starting %s" % self.name)
 54         split_data(self, self.counter) # 執行代碼邏輯
 55         
 56     def stop(self):
 57         self.__running.clear()
 58         
 59 # 分割片段並執行匹配，多線程
 60 def split_data(self, split_len):
 61     base = data_points[basebegin:basebegin+split_len]  # 獲取初始要匹配的數據
 62     subseries = []
 63     dateseries = []
 64     for j in range(0, length): 
 65         if (j < (basebegin - split_len) or j > (basebegin + split_len - 1)) and j <length - split_len:
 66             subseries.append(data_points[j:j+split_len])
 67             dateseries.append(j) #開始位置
 68     search(self, subseries, base, dateseries)  # 調用模式匹配
 69 
 70 # 定義結果變量
 71 result = []  
 72 base_list = []
 73 date_list = []
 74 def search(self, subseries, base, dateseries):
 75      # 片段搜索
 76     listdistance = []
 77     for i in range(0, len(subseries)):
 78         tt = np.array(subseries[i])
 79         # dist, cost, acc, path = fastdtw(base, tt, dist=‘euclidean‘)
 80         # listdistance.append(dist)
 81         distance = corrcoef(base, tt)
 82         listdistance.append(distance)
 83     # 排序
 84     index = np.argsort(listdistance, kind=‘quicksort‘) #排序，返回排序後的索引序列
 85     result.append(subseries[index[0]])
 86     print("result length is %d" % len(result))
 87     base_list.append(base)
 88     date_list.append(dateseries[index[0]])
 89     # 關閉線程
 90     self.stop()
 91     
 92 # 變換數據（收縮或擴展），生成50到150之間的數據，間隔為10
 93 loc = 0
 94 for split_len in range(round(0.5 * baselen), round(1.5 * baselen), 10):
 95     # 執行匹配
 96    thread = Thread_Local(1, "Thread" + str(loc), split_len)
 97    loc += 1
 98    # 開啟線程
 99    thread.start()
100 
101 boo = 1
102 
103 while(boo > 0):
104     if(len(result) < 10):
105         if(boo % 100 == 0):
106             print("has running %d s" % boo)
107         boo += 1
108         time.sleep(1)
109     else:
110         boo = 0
111         
112  # 片段搜索
113 listdistance = []
114 for i in range(0, len(result)):
115     tt = np.array(result[i])
116     distance = corrcoef(base_list[i], tt)
117     listdistance.append(distance)
118 # 最終排序   
119 index = np.argsort(listdistance, kind=‘quicksort‘) #排序，返回排序後的索引序列
120 print("closed Main Thread")
121 endTimeStamp = datetime.now()
122 # 結果集對比
123 plt.figure(0)
124 plt.plot(normalization(base_list[index[0]]),label= basedata,linewidth=‘2‘)
125 length = len(result[index[0]])
126 begin = data[date_list[index[0]]] + ‘ ‘ + times[date_list[index[0]]]
127 end = data[date_list[index[0]] + length - 1] + ‘ ‘ + times[date_list[index[0]] + length - 1]
128 label = begin + ‘~‘ + end
129 plt.plot(normalization(result[index[0]]), label=label, linewidth=‘2‘)  
130 plt.legend(loc=‘upper left‘)
131 plt.title(‘normal similarity search‘)
132 plt.show()
133 print(‘run time‘, (endTimeStamp-startTimeStamp).seconds, "s")

結果：

技術分享圖片

分析：

　　皮爾遜相關系數（corrcoef）運算速度遠超DTW或FASTDTW，但DTW或FASTDTW應用範圍更廣，適用於等長或變長的比較。

Python基於皮爾遜系數實現股票預測（多線程）

author top def split pat init -s bubuko odi 1 # -*- coding: utf-8 -*- 2 """ 3 Created on Tue Dec 4 08:53:08 2018 4 5 @a

皮爾遜系數

https targe target jin tps details from ike wan https://blog.csdn.net/wangxin1982314/article/details/72152584 https://blog.csdn.net/shij

【Python學習筆記】使用Python計算皮爾遜相關系數

自己 pre 求和相關學習筆記 python學習 tip urn pow 源代碼不記得是哪裏獲取的了，侵刪。此處博客僅作為自己筆記學習。 def multipl(a,b): sumofab=0.0 for i in range(len(a)):

Python 語言學習第二篇：數據類型（字符串）

拼接查找字符保留 upper gis 原始的一次 \n 處的字符串是一個有序的字符的不可變序列，用於存儲基於文本的信息。字符串所包含的字符存在從左至右的位置順序，不可以在原處（in-place）修改。Python沒有C語言的字符和字符串之分，只有字符串。從嚴格意義上

基本數據類型在多線程的情況下是否需要加鎖

等於 access mes 程序大於 bold data 結構全局對於多線程訪問同一變量是否需要加鎖的問題，先前大家都討論過。今天用代碼驗證了一下之前的猜想:32位CPU與內存的最小交換數據為4字節/次，這也是結構體要對齊4字節的原因。在物理上，CPU對於同一4字節的

Python3.x：實現多任務（多進程）

並發引用 target 函數 color 日期 ctime strftime span Python3.x：實現多任務（多進程） # python3 # author lizm # datetime 2018-02-13 16:00:00 # -*- coding: u

Python 端口掃描（全連接，無多線程）

Python 端口掃描 ‘‘‘這是一個端口全連接掃描的腳本，掃描結果會比較準確，但是比較費時間‘‘‘ ‘‘‘運行環境 Python3 ‘‘‘ from socket import * def portScanner(host,port): try: s = socket(AF

Python 端口掃描（全連接掃描，多線程）

Python 端口掃描多線程 from socket import * import threading #導入線程相關模塊 lock = threading.Lock() openNum = 0 threads = [] #定義線程列表 def port

python核心編程（多線程編程）

clas inf 編程）模塊 src nbsp body 解釋器 div 1、全局解釋器鎖 2、threading模塊　　thread類　　　　　　 python核心編程（多線程編程）

NOIP2000提高組方格取數（多線程dp）

我們路線其他 else .html efi height return mes 方格取數設有N*N的方格圖(N<=10)，我們將其中的某些方格中填入正整數，而其他的方格中則放人數字0。如下圖所示（見樣例，黃色和藍色分別為兩次走的路線，其中綠色的格子為黃

Python爬蟲入門教程 13-100 鬥圖啦表情包多線程爬取

.text 入門教程地址 ESS 文件頭部 https .html 一個 mat 寫在前面今天在CSDN博客，發現好多人寫爬蟲都在爬取一個叫做鬥圖啦的網站，裏面很多表情包，然後瞅了瞅，各種實現方式都有，今天我給你實現一個多線程版本的。關鍵技術點 aiohttp ，你可以

使用websocket實現群聊（多個群）

最近一個專案中需要用到一個使用者實時聊天需求：需要很多使用者（在不同的房間）進行實時聊天，也就是一個簡單的聊天室，這裡用的是websocket實現。這裡需要對每一個連線都指定兩個引數：使用者的userId和所加入的房間id（roomId）； @ServerEndpoin

python DBUtils 線程池連接 Postgresql（多線程公用線程池，DB-API : psycopg2）

work 風險等待 put pro 連接數 exist eve self. 一、DBUtils DBUtils 是一套允許線程化 Python 程序可以安全和有效的訪問數據庫的模塊，DBUtils提供兩種外部接口： PersistentDB ：提供線程專用的數據庫連接，

如何用python的畫幾組資料量不同的boxplot（箱線圖）

使用pandas裡的dataframe資料結構存放待顯示的資料。 dataframe和Series的知識不再講解，可以看這個博文。如果希望顯示的各個資料列表中，資料長度不一致，可以先用Series

皮爾遜相關系數

評價 item product reference ret calculate ati ack 相關系數皮爾遜相關系數是比歐幾裏德距離更加復雜的可以判斷人們興趣的相似度的一種方法。該相關系數是判斷兩組數據與某一直線擬合程序的一種試題。它在數據不是很規範的時候，會傾向於給出

皮爾遜相關系數和余弦相似性的關系

表現差值超過商業 C4D 接下來二維空間相關畢業有兩篇回答，我覺得都是正確的，從不同的方向來看的。作者：陳小龍鏈接：https://www.zhihu.com/question/19734616/answer/174098489來源：知乎著作權歸作者

皮爾遜相關系數理解

IT sel 開發網站依次高中數學開平 func 1.4 皮爾遜相關系數理解有兩個角度其一, 按照高中數學水平來理解, 它很簡單, 可以看做將兩組數據首先做Z分數處理之後, 然後兩組數據的乘積和除以樣本數 Z分數一般代表正態分布中, 數據偏離中心點的距離.等於

np.corrcoef()方法計算數據皮爾遜積矩相關系數（Pearson's r）

https moment -m 參數 tps blank .org lan 通過上一篇通過公式自己寫了一個計算兩組數據的皮爾遜積矩相關系數（Pearson‘s r）的方法,但np已經提供了一個用於計算皮爾遜積矩相關系數（Pearson‘s r）的方法 np.corrcoe

皮爾遜相關系數（Pearson Correlation Coefficient, Pearson's r）

opera back 一個 tar post blank 圖片 art 正數 Pearson‘s r，稱為皮爾遜相關系數（Pearson correlation coefficient），用來反映兩個隨機變量之間的線性相關程度。用於總體（population）時記作ρ

皮爾遜相關系數與余弦相似度（Pearson Correlation Coefficient & Cosine Similarity）

blog 相關 htm mage cnblogs 變量對比兩個是把之前《皮爾遜相關系數（Pearson Correlation Coefficient, Pearson‘s r）》一文介紹了皮爾遜相關系數。那麽，皮爾遜相關系數（Pearson Correlation

Python基於皮爾遜系數實現股票預測（多線程）

相關推薦