python coo_matrix的理解和用法

阿新 • • 發佈：2019-01-22

1. 理解和用法

首先ffm格式（主key，副key，1）資料如下：第一列是lable，後面是x（特徵值）

舉例2：3:1表示源資料第2列，索引為3

源資料test.txt：（其中第8列是連續型特徵沒有離散化，其他列是離散型特徵）

1 2:3:1 3:5:1 5:7:1 7:10:1 8:14:1.2
0 1:1:1 2:4:1 6:9:1 7:10:1 8:14:2.3
1 2:3:1 3:5:1 7:11:1 8:14:1.5
1 1:2:1 5:7:1 7:12:1 8:14:2.2 9:15:1
0 3:6:1 5:8:1 7:13:1 9:16:1

def libsvm_2_coo(libsvm_data, shape):
    coo_rows = []
    coo_cols = []
    coo_data = []
    n = 0
    for x, d in libsvm_data:
        coo_rows.extend(n)
        coo_cols.extend(x)
        coo_data.extend(d)
        n += 1
    coo_rows = np.array(coo_rows)
    coo_cols = np.array(coo_cols)
    coo_data = np.array(coo_data)
    #coo_rows  即n 從1開始
    #coo_col  即副key[ 3  5  7 10 14  1  4  9 10 14  3  5 11 14  2  7 12 14 15  6  8 13 16] 

    #coo_data 即1
    return coo_matrix((coo_data, (coo_rows, coo_cols)), shape=shape)

# data = coo_matrix((coo_data, (coo_rows, coo_cols)), shape=shape)#得到的結果是：（由於是用第0列第0行開始的，所以在源資料中沒有第0列，這裡全部補0）

[[0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 1. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0.]]

data.tocsr()得到的結果如下：csr_matrix記憶體使用約為coo_matrix的70% ，所以我們轉換成coo_csr

(0, 3) 1.0
(0, 5) 1.0
(0, 7) 1.0
(0, 10) 1.0
(0, 14) 1.2
(1, 1) 1.0
(1, 4) 1.0
(1, 9) 1.0
(1, 10) 1.0
(1, 14) 2.3
(2, 3) 1.0
(2, 5) 1.0
(2, 11) 1.0
(2, 14) 1.5
(3, 2) 1.0
(3, 7) 1.0
(3, 12) 1.0
(3, 14) 2.2
(3, 15) 1.0
(4, 6) 1.0
(4, 8) 1.0
(4, 13) 1.0
(4, 16) 1.0

參考：

def read_data("test.txt"):
 
    X = []
    D = []
    y = []
    file = open(file_name)
    fin = file.readlines()
    for line in fin:
        X_i = []
        D_i = []
        line = line.strip().split()
        yy=float(line[0])
        if yy!= 0.:
            y_i=(float(line[0]))
        else: y_i=0.
        for x in line[1:]:
            # Just get categorical features
            # if x.split(':')[2] == '1.0':
            X_i.append(int(x.split(':')[1]))
            D_i.append(float((x.split(':')[2])))
        y.append(y_i)
        X.append(X_i)
        D.append(D_i)
    y = np.reshape(np.array(y), [-1])
    X = libsvm_2_coo(zip(X, D), (len(X), INPUT_DIM)).tocsr()
    return X, y

2. 使用中遇到的問題：

column index exceeds matrix dimensions'

解決方法：即列的個數指上文中的coo_cols 不能大於coo_matrix((coo_data, (coo_rows, coo_cols)), shape=shape) 中的引數shape的列，指上文中的INPUT_DIM。

舉例這裡coo_cols最大值為16，所以這裡的INPUT_DIM至少應該取17（即0~16共17列），如果取值>17，則後面會補0，不影響也無意義。

python coo_matrix的理解和用法

python coo_matrix的理解和用法

什麼是委託，委託的深入理解和用法

iOS中nil、Nil、NULL和NSNull的理解和用法詳解

rem的具體理解和用法

@ResponseBody與PrintWriter(response.getWriter)理解和用法區分

委託的初步理解和用法

C++中this指標的理解和用法

nofollow屬性的個人理解和用法

(1)fgets函式的理解和用法

Python中*args 和**kwargs的用法

Python中內置數據類型list,tuple,dict,set的區別和用法

python私有方法和私有屬性屬性理解

python學習7_1面向對象基本概念和用法

Python解釋器種類以及特點 (經典概括，便於理解和記憶)

Python中字典和集合的用法

【Python】理解yield和generator(生成器)

【Python】引用和物件的理解

Python中split()函式用法和例項

python中os.path.isdir()等函數的作用和用法

Python：raw_input 和 input用法

python coo_matrix的理解和用法

相關推薦