SVD與 LSI教程（3）：計算矩陣的全部奇異值

阿新 • • 發佈：2019-01-23

/**********************作者資訊****************/

Dr. E. Garcia

Mi Islita.com

Email | Last Update: 01/07/07

/**********************作者資訊****************/

Revisiting Singular Values

In Part 2 of this tutorial you have learned that SVD decomposes a regular matrix A into three matrices

Equation 1: A = USV^T

was computed by the following procedure:

A^T and A^TA were computed.
the eigenvalues of A^TA were determined and sorted in descending order, in the absolute sense. The nonnegative square roots of these are the singular values of A.
S was constructed by placing singular values in descending order along its diagonal.

You learned that the Rank of a Matrix is the number of nonzero singular values.

You also learned that since S is a diagonal matrix, its nondiagonal elements are equal to zero. This can be verified by computing S fromU^TAV. However, one would need to know U first, which we have not defined yet. Either way, the alternate expression for S

is obtained by postmultipliying by V and premultiplying by U^T Equation 1:

Equation 2: AV = USV^TV = US

Equation 3: U^TAV = S

This implies that U and V are orthogonal matrices. As discussed in Matrix Tutorial 2: Basic Matrix Operations, if a matrix M is orthogonal then

Equation 4: MM^T = M^TM = I = 1

where I is the identity matrix. But, we know that MM^-1 = I = 1. Consequently, M^T = M^-1.

Computing "right" eigenvectors, V, and V^T

In the example given in Part 2 you learned that the eigenvalues of AA^T and A^TA are identical since both respond to the same characteristic equation; e.g.

Characteristic equation and eigenvalues

Figure 1. Characteristic equation and eigenvalues for AA^T and A^TA.

Let's use these eigenvalues to compute the eigenvectors of A^TA. This is done by solving

Equation 5: (A - c_iI)X_i = 0

As mentioned in Matrix Tutorial 3: Eigenvalues and Eigenvectors for large matrices one would need to resource to the Power Method or other methods to do this. Fortunately in this case we are dealing with a small matrix, so we only need to use simple algebra.

We first compute eigenvectors for each eigenvalue, c₁ = 40 and c₂ = 10. Once computed, we convert eigenvectors to unit vectors. This is done by normalizing their lengths. Figure 2 illustrates these steps.

right-eigenvectors

Figure 2. Right eigenvectors of A^TA

We would have arrived at identical results if during normalization we assumed an arbitrary coordinate value for either x₁ or x₂. We now construct V by placing vectors along its columns and compute V^T

right-matrix

Figure 3. V and its transpose V^T

Hey! That wasn't that hard.

Note that we constructed V by preserving the order in which singular values were placed along the diagonal of S. That is, we placed the largest eigenvector in the first column and the second eigenvector in the second column. These end paired with singular values placed along the diagonal of S. Preserving the order in which singular values, eigenvalues and eigenvectors are placed in their corresponding matrices is very important. Otherwise we end with the wrong SVD.

Let's compute now the "left" eigenvectors and U.

Computing "left" eigenvectors and U

To compute U we can reuse eigenvalues and compute in exactly the same manner the eigenvectors of AA^T. Once these are computed we place these along the columns of U. However, with large matrices this is time consuming. In fact, one would need to compute eigenvectors by resourcing again to the Power Method or other suitable methods.

In practice, it is common to use the following shortcut. Postmultiply Equation 2 by S^-1 to obtain

Equation 6: AVS^-1 = USS^-1

Equation 7: U = AVS^-1

and then compute U. Since A and V are known already, we just need to invert S. Since S is a diagonal matrix, it must follows that

Inverted Singular Matrix

Figure 4. Inverted Singular Matrix.

Since s₁ = 40^1/2 = 6.3245 and s₂ = 10^1/2 = 3.1622 (expressed to four decimal places), then

Left eigenvectors and U

Figure 5. "Left" eigenvectors and U.

That was quite a mechanical task. Huh?

This shortcut is very popular since simplifies calculations. Unfortunately, its widespread use has resulted in many overlooking important information contained in the AA^Tmatrix. In recent years, LSI researchers has found that high-order term-term co-occurrence patterns contained in this matrix might be important. At least two studies (1, 2), one a 2005 thesis, indicate that high-order term-term co-occurrence present in this matrix might be at the heart of LSI.

These studies are:

In the first issue of our IR Watch - The Newsletter -which is free- our subscribers learned about this thesis and other equally interesting LSI resources.

The orthogonal nature of the V and U matrices is evident by inspecting their eigenvectors. This can be demonstrated by computing dot products between column vectors. All dot products are equal to zero. A visual inspection is also possible in this case. In Figure 6 we have plotted eigenvectors. Observe that they are all orthogonal and end to the right and left of each other, from here the reference to these as "right" and "left" eigenvectors.

Right and Left Eigenvectors

Figure 6. "Right" and "Left" Eigenvectors.

Computing the Full SVD

So, we finally know U,S, V and V^T. To complete the proof, we reconstruct A by computing its full SVD.

Full SVD

Figure 7. Computing the full SVD.

So as we can see, SVD is a straightforward matrix decomposition and reconstruction technique.

The Reduced SVD

Obtaining an approximation of the original matrix is quite easy. This is done by truncating the three matrices obtained from a full SVD. Essentially we keep the first k columns of U, the first k rows of V^T and the first k rows and columns of S; that is, the first k singular values. This removes noisy dimensions and exposes the effect of the largest ksingular values on the original data. This effect is hidden, masked, latent, in the full SVD.

The reduction process is illustrated in Figure 8 and is often referred to as "computing the reduced SVD", dimensionality reduction or the Rank k Approximation.

Reduced SVD

Figure 8. The Reduced SVD or Rank k Approximation.

The shaded areas indicate the part of the matrices retained. The approximated matrix A_k is the Rank k Approximation of the original matrix and is defined as

Equation 8: A_k = U_kS_kV^T_k

So, once these matrices are approximated we simply compute their products to get A_k.

Quite easy. Huh?

Summary

So far we have learned that the full SVD of a matrix A can be computed by the following procedure:

compute its transpose A^T and A^TA.
determine the eigenvalues of A^TA and sort these in descending order, in the absolute sense. Square roots these to obtain the singular values of A.
Construct diagonal matrix S by placing singular values in descending order along its diagonal. Compute its inverse, S^-1.
use the ordered eigenvalues from step 2 and compute the eigenvectors of A^TA. Place these eigenvectors along the columns of V and compute its transpose, V^T.
Compute U as U = AVS^-1. To complete the proof, compute the full SVD using A = USV^T.

Before concluding, let me mention this: in this tutorial, you have learned how SVD is applied to a matrix where m = n. This is just only one possible scenario. In general, if

m = n and all singular values are greater than zero, the pseudoinverse of A is given by A^-1 = VS^-1U^T.
m < n, S is m x n with last column elements being all zero. The SVD then gives a solution with minimum norm.
m > n, S is n x n, there are more equations than unknowns, and the SVD solution is least square.

Movellan discuss these cases in great detail.

So, how does SVD is used in Latent Semantic Indexing (LSI)?

In LSI, it is not the intent to reconstruct A. The goal is to find the best rank k approximation of A that would improve retrieval. The selection of k and the number of singular values in S to use is still an open area of research. During her ternure at Bellcore (now Telcordia), Microsoft's Susan Dumais mentioned in the 1995 presentationTranscription of the Application that her research group experimented with k values largely "by seat of the pants".

Early studies with the MED database using few hundred documents and dozen of queries indicate that performance versus k values are not entirely proportional, but tend to describe an inverted U-shaped curve of around k = 100. These results might change at other experimental conditions. At the time of writing the optimum k values are still determined via trial-and-error experimentation.

Now that we have the basic calculations out of the way, let's move forward and and learn how LSI scores documents and queries. It is time to demystify these calculations. Wait for Part 4 and see.

This is getting exciting.

Tutorial Review

For the matrix

Compute the eigenvalues of A^TA.
Prove that this is a matrix of Rank 2.
Compute its full SVD.
Compute its Rank 2 Approximation.

SVD與 LSI教程（3）：計算矩陣的全部奇異值

Revisiting Singular Values

Computing "right" eigenvectors, V, and V^T

Computing "left" eigenvectors and U

Computing the Full SVD

The Reduced SVD

Summary

Tutorial Review

References

SVD與 LSI教程（3）：計算矩陣的全部奇異值

SVD 與 LSI 教程（4）： LSI計算

SVD 與 LSI教程（5）：LSI關鍵字研究與協同理論

Nginx 教程（3）：SSL 設定

Project 2013專案管理教程（3）：建立任務間的依賴性

spring 核心與原始碼解析（3）：AOP如何使用

演算法與資料結構（3）：基本資料結構——連結串列，棧，佇列，有根樹

OpenCV Python教程（3）（4）（5）：直方圖的計算與顯示形態學處理初級濾波內

springCloud（3）：微服務的註冊與發現（Eureka）

python基礎（3）：輸入輸出與運算符

Git 教程（三）：倉庫與分支

痞子衡嵌入式：第一本Git命令教程（3）- 編輯(status/add/rm/mv)

《Linux學習並不難》Linux網絡命令（3）：ping命令測試與目標計算機之間的連通性

繼承與派生（3）：多繼承

Java併發程式設計（3）：執行緒掛起、恢復與終止的正確方法（含程式碼）

C語言面向物件程式設計：虛擬函式與多型（3）

python數字影象處理（3）：影象畫素的訪問與裁剪

【翻譯】CodeMix使用教程（六）：任務與tasks.json

Navicat使用教程（三）：使用MySQL日誌（第3部分）——慢速日誌

新手學python（3）：yield與序列化

SVD與 LSI教程（3）： 計算矩陣的全部奇異值

Revisiting Singular Values

Computing "right" eigenvectors, V, and VT

Computing "left" eigenvectors and U

Computing the Full SVD

The Reduced SVD

Summary

Tutorial Review

References

相關推薦

SVD與 LSI教程（3）：計算矩陣的全部奇異值

Computing "right" eigenvectors, V, and V^T