鍛造正則神兵之Java原始碼分析器-V0.01

Java 原始碼分析 · 發表 2019-01-18 15:56:41

摘要：本文目的：簡單分析一個原始碼的構成部分，讓你大概知道它的重量級檔案讀寫(簡)+正則操作(終點) 一、原始碼字串的讀取與準備先撿個軟柿子捏， Bundle 類的大小還好，1270行，中等，就他了 Bundle.png 1....

本文目的：

簡單分析一個原始碼的構成部分，讓你大概知道它的重量級

檔案讀寫(簡)+正則操作(終點)

一、原始碼字串的讀取與準備

先撿個軟柿子捏， Bundle 類的大小還好，1270行，中等，就他了

Bundle.png

1.讀取

看AndroidStudio最上面有原始碼的磁碟路徑,新建 JavaSourceParser.java 類

由於原始碼是既定的字串文字，使用 FileReader ,我想要一行一行讀包個 BufferedReader

為了看起來爽快一點，異常就直接拋了

public class JavaSourceParser {

@Test
public void parse() throws IOException {
read("H:\\sdk\\sources\\android-27\\android\\os\\Bundle.java");
}

private void read(String name) throws IOException {
File file = new File(name);
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
}
br.close();
}
}

讀取ok.png

2.原始碼實體類： SourceBean.java

先定義這幾個欄位，還是為了方便觀看和使用，成員變數用public

/**
 * 作者：張風捷特烈<br/>
 * 時間：2019/1/18/018:8:30<br/>
 * 郵箱：[email protected]<br/>
 * 說明：原始碼物件
 */
public class SourceBean {
public String name;//類名
public String pkgName;//包名
public int fatherName;//父類名稱
public List<String> itfName;//實現的介面名稱
public int fullName;//全名稱 包名+類名
public List<String> importClass;//匯入的類
public int lineCount;//原始碼行數
public int realLineCount;//真實原始碼行數---去除註釋和空行
public List<String> attrs;//成員變數數量
public List<String> methods;//方法名
}

二、正則的資料解析

1.捕獲自己包名

先來練練手，熟悉一下正則,如何精確匹配 package android.os;

你可能會說："你在逗我嗎?一個contains不就搞定了"

1.1：做個小測試

可以看出contains在精確度上有所欠佳

public void match() {
String str1 = "package android.os;";
String str2 = "int countOfpackage = 1;";
System.out.println("str1:"+str1.contains("package"));//str1:true
System.out.println("str2:"+str2.contains("package"));//str2:true
}

1.2：使用正則匹配

\\b?package\\b.* 看這句什麼意思? \b 是判斷單詞邊界,兩邊界夾住 package ,

說明有 package 作為單詞出現,然而package作為一個關鍵字，是不能用作變數名的,故精確匹配

public void match() {
String str1 = "package android.os;";
String str2 = "int countOfpackage = 1;";
String regex = "\\b?package\\b.*";
System.out.println("str1:" + str1.matches(regex));//str1:true
System.out.println("str2:" + str2.matches(regex));//str2:false
}

1.3:實際使用

ok,捕獲類的包名

private void read(String name) throws IOException {
SourceBean sourceBean = new SourceBean();
File file = new File(name);
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
String packageRegx = "\\bpackage\\b.*";
while ((line = br.readLine()) != null) {
if (line.matches(packageRegx)) {
sourceBean.pkgName = line.split("package")[1].replace(";","");
}
}
br.close();
System.out.println(sourceBean.pkgName);// android.os
}

2.捕獲引入包名

分析一下： importClasses 是一個字串列表,一般都會有很多,方法和上面一樣

import android.annotation.Nullable;
import android.util.ArrayMap;
import android.util.Size;
import android.util.SizeF;
import android.util.SparseArray;
import com.android.internal.annotations.VisibleForTesting;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;

//匯入類名列表
ArrayList<String> importClasses = new ArrayList<>();
String importRegx = "\\bimport\\b.*";

---->[迴圈中]-------------
if (line.matches(importRegx)) {
String importClass = line.split("import ")[1].replace(";", "");
importClasses.add(importClass);
}
--------------------------

sourceBean.importClass = importClasses;

捕獲成功.png

3.捕獲一些數目

為了方便些，將幾個數目單獨封裝

/**
 * 作者：張風捷特烈<br/>
 * 時間：2019/1/18/018:10:16<br/>
 * 郵箱：[email protected]<br/>
 * 說明：數數bean
 */
public class CountBean {
public int lineCount;//原始碼行數
public int annoLineCount;//註釋行數
public int blankLineCount;//空格行數

public int realLineCount;//真實原始碼行數---去除註釋和空行
public int methodCount;//方法個數
public int attrCount;//成員欄位個數
}

---->[SourceBean.java]----------
public CountBean countBean;//數目物件

[番外]---總算有點明白為什麼文件註釋為什麼一列星

這個疑問來源於經常拷貝原始碼的註釋去翻譯，每次都要一個個刪 * ，存在即合理。

現在看來，對解析真的很方便,因為註釋裡的可以出現關鍵字，這就會造成解析時的不精確

註釋的首行都是 * ，讀行是時 * 就 continue,還有助於記錄註釋的行數

/**
 * Constructs a new, empty Bundle sized to hold the given number of
 * elements. The Bundle will grow as needed.
 *
 * @param capacity the initial capacity of the Bundle
 */
public Bundle(int capacity) {
super(capacity);
mFlags = FLAG_HAS_FDS_KNOWN | FLAG_ALLOW_FDS;
}

獲取原始碼總行數、空白行數、註釋行數、真實原始碼行數

CountBean countBean = new CountBean();
int annoLineCount = 0;//註釋行數
int blankLineCount = 0;//空白行數
int allCount = 0;//空白行數

while ((line = br.readLine()) != null) {
allCount++;
if (line.contains("*")) {
annoLineCount++;
continue;
}
if (line.equals("")) {
blankLineCount++;
continue;
}
//...同上
-------------------------------------

countBean.annoLineCount = annoLineCount;
countBean.blankLineCount = blankLineCount;
countBean.lineCount = allCount;
countBean.realLineCount = allCount - blankLineCount - annoLineCount;
sourceBean.countBean = countBean;

System.out.println(sourceBean.countBean.annoLineCount);//560
System.out.println(sourceBean.countBean.blankLineCount);//96
System.out.println(sourceBean.countBean.lineCount);//1275
System.out.println(sourceBean.countBean.realLineCount);//619

獲取相關行數.png

4.捕獲類名、父類名，實現介面名

4.1.封裝一下 ClassBean.java

/**
 * 作者：張風捷特烈<br/>
 * 時間：2019/1/18/018:10:36<br/>
 * 郵箱：[email protected]<br/>
 * 說明：類的基本資訊
 */
public class ClassBean {
public String perFix;//字首修飾
public String name;//類名
public String fatherName;//父類名稱
public List<String> itfNames;//實現的介面名稱
public String fullName;//全名稱 包名+類名
}

---->[SourceBean.java]----------
public ClassBean classBean;//類的基本資訊

4.2：解析類的基本資訊

獲取下一個單詞的方法封裝， 單詞必須一個空格隔開 原始碼中適用

/**
 * 獲取下一個單詞(//TODO 適用：單詞必須一個空格隔開)
 * @param line 字串
 * @param target 目標字串
 * @return 下一個單詞
 */
private String getNextWordBy(String line, String target) {
if (!line.contains(target+" ") || line.endsWith(target)) {
return "NO FOUND";
}
return line.split(target + " ")[1].split(" ")[0];
}

4.3：解析類名、父類名，實現介面名

ClassBean classBean = new ClassBean();
 String classRegx = ".*\\bclass\\b.*";
 
String className = "";//類名
String fatherName = "";//父類名
String perFix = "";//字首秋色
ArrayList<String> itfNames = new ArrayList<>();//介面名

---->[迴圈中]-------------
//處理類名、父類名、介面名
if (line.matches(classRegx)) {
perFix = line.split(" class ")[0];
className = getNextWordBy(line, "class");//類名
if (line.contains("extends")) {//父類名
fatherName = getNextWordBy(line, "extends");
} else {
fatherName = "Object";
}
if (line.contains("implements")) {
String implementsStr = line.split("implements ")[1].split(" \\{")[0];
String[] split = implementsStr.replaceAll(" ","").split(",");
itfNames.addAll(Arrays.asList(split));
}
}
----------------------------

classBean.name = className;
classBean.fatherName = fatherName;
classBean.fullName = pkgName + "." + className;
classBean.itfNames = itfNames;
classBean.perFix = perFix;
sourceBean.classBean = classBean;
System.out.println(sourceBean.classBean.name);//Bundle
System.out.println(sourceBean.classBean.fatherName);//BaseBundle
System.out.println(sourceBean.classBean.fullName);//android.os.Bundle
System.out.println(sourceBean.classBean.perFix);//public final

結果.png

5、獲取欄位資訊

暫時先獲取欄位的字串： public List<String> attrs;//成員變數集合

5.1匹配成員變數

觀察一下,再結合實際，定義成員變數時： (訪問限定符) (修飾符) 型別名稱 (= 預設值);

其中括號裡是可省略的，多番考慮，無法分辨方法內部變數和成員變數

所以使用巨集觀上，將程式碼合成字串，再做考量，根據成員變數在類的最上面這一點來進行分割

StringBuffer pureCodeSb = new StringBuffer();//無註釋的程式碼

---->[迴圈中，排除空行，註釋後]-------------
pureCodeSb.append(line + "\n");
---------------------------------------------

String pureCode = pureCodeSb.toString();//無註釋的純程式碼
String attrDirty = pureCode.split("\\{")[1];//髒亂的屬性
System.out.println(attrDirty);

屬性字串獲取.png

5.3：成員變數的解析

將獲取的字串分割

private void handleAttr(String code) {
String attrDirty = code.split("\\{")[1];//髒亂的屬性
String[] split = attrDirty.split(";");

for (int i = 0; i < split.length-1; i++) {
System.out.println(split[i]);
}
}

分割.png

5.4：成員變數的歸整

換行和過多的空行都不要，正則表示式 "\n|( {2,})

//成員變數集合
attrs = new ArrayList<>();

private void handleAttr(String code) {
String attrDirty = code.split("\\{")[1];//髒亂的屬性
String[] split = attrDirty.split(";");
for (int i = 0; i < split.length - 1; i++) {
String result = split[i].replaceAll("\n|( {2,})", "-");
attrs.add(result);
}
}

歸整後.png

6.匹配方法

有限定符的方法正則: (\b?(private|public|protecte).*\(.*)\{

匹配方法.png

String methodRegex = "(.*(private|public|protecte).*\\(.*)\\{";
ArrayList<String> methods = new ArrayList<>();

//方法名的解析
if (line.matches(methodRegex)) {
String result = line.replaceAll("\\{|( {2,})", "");
methods.add(result);
}

資料在手，天下我有，顯示一下唄。

顯示結果.png

三、優化與適配

侷限性還是有的，就是內部類會來壞事，一行一行讀也就無法滿足需求了，那就整個吞吧

1.小適配

下面的情況剛才沒有想到，修改起來很簡單價格空格就行了，以{結尾就行了 (.*\b class\b.*)\{

適配.png

2.獲取內部類、介面、列舉名稱

使用正則匹配 (.*\\b (class|interface|enum)\\b.*)\\{ 獲取資訊

獲取內部類、介面、列舉名稱.png

ArrayList<String> sonClasses = new ArrayList<>();//匯入類名列表
String code = pureCodeSb.toString();
String classRegx = "(.*\\b (class|interface|enum)\\b.*)\\{";
Pattern pattern = Pattern.compile(classRegx);
Matcher matcher = pattern.matcher(code);

while (matcher.find()) {
String aClass = matcher.group(0);
System.out.println(aClass.replaceAll("\\{|( {2,})",""));
sonClasses.add(aClass.replaceAll("\\{|( {2,})",""));
}

mMainSource.innerClassName = sonClasses;

PowerManager.png

解析ok.png

V0.01就這樣，當然還有很多可優化點，

比如通過內部類的再解析

屬性方法字串的再解析

根據解析的資料來自定定義控制元件來完美展現原始碼資訊

比如不同的修飾符不同顏色，或者似有和公有方法的佔比圖

還有註釋的展現也可以去做。

最後把總的原始碼貼上

/**
 * 作者：張風捷特烈<br/>
 * 時間：2019/1/18/018:8:33<br/>
 * 郵箱：[email protected]<br/>
 * 說明：原始碼分析器
 */
public class JavaSourceParser {

private List<String> attrs;
private int annoLineCount;//註釋行數
private int blankLineCount;//空白行數
private int allCount;//全部行數
private StringBuffer pureCodeSb;
//private StringBuffer codeSb;

private boolean mainOk;
private final SourceBean mMainSource;

public JavaSourceParser() {

mMainSource = new SourceBean();
}


public SourceBean parse(String name) {
File file = new File(name);
BufferedReader br = null;
//codeSb = new StringBuffer();
//無註釋的程式碼
pureCodeSb = new StringBuffer();
//成員變數集合
attrs = new ArrayList<>();

String aLine;

String packageRegx = "\\bpackage\\b.*";
String importRegx = "\\bimport\\b.*";
String pkgName = "";
ArrayList<String> importClasses = new ArrayList<>();//匯入類名列表
ArrayList<String> sonClasses = new ArrayList<>();

try {
br = new BufferedReader(new FileReader(file));

while ((aLine = br.readLine()) != null) {
//codeSb.append(aLine + "\n");
//處理數量
allCount++;
if (aLine.contains("*")) {
annoLineCount++;
continue;
}
if (aLine.equals("")) {
blankLineCount++;
continue;
}
pureCodeSb.append(aLine + "\n");
//處理包名
if (aLine.matches(packageRegx)) {
pkgName = aLine.split("package ")[1].replace(";", "");
}
//處理匯入包名
if (aLine.matches(importRegx)) {
String importClass = aLine.split("import ")[1].replace(";", "");
importClasses.add(importClass);
}
}

} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (br != null) {
br.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}


String code = pureCodeSb.toString();
String classRegx = "(.*\\b (class|interface|enum)\\b.*)\\{";

Pattern pattern = Pattern.compile(classRegx);
Matcher matcher = pattern.matcher(code);
while (matcher.find()) {
String aClass = matcher.group(0);
System.out.println(aClass.replaceAll("\\{|( {2,})", ""));
sonClasses.add(aClass.replaceAll("\\{|( {2,})", ""));

}
SourceBean sourceBean = parseCode(pureCodeSb.toString(), mMainSource);
mMainSource.pkgName = pkgName;
mMainSource.importClass = importClasses;
mMainSource.innerClassName = sonClasses;
mMainSource.classBean.fullName = mMainSource.pkgName + "." + mMainSource.classBean.name;

return sourceBean;
}

private SourceBean parseCode(String code, SourceBean sourceBean) {
CountBean countBean = new CountBean();
ClassBean classBean = new ClassBean();

String classRegx = "(.*\\b class\\b.*)\\{";
String methodRegex = "(.*(private|public|protecte).*\\(.*)\\{";

ArrayList<String> methods = new ArrayList<>();

String className = "";//類名
String fatherName = "";//父類名
String perFix = "";//字首修飾
ArrayList<String> itfNames = new ArrayList<>();//介面名

String[] lines = code.split("\n");
for (String line : lines) {
//方法名的解析
if (line.matches(methodRegex)) {
String result = line.replaceAll("\\{|( {2,})", "");
methods.add(result);
}
//處理類名、父類名、介面名
if (line.matches(classRegx) && !mainOk) {
perFix = line.split(" class ")[0];
className = getNextWordBy(line, "class");//類名
if (line.contains("extends")) {//父類名
fatherName = getNextWordBy(line, "extends");
} else {
fatherName = "Object";
}
if (line.contains("implements")) {
String implementsStr = line.split("implements ")[1].split(" \\{")[0];
String[] split = implementsStr.replaceAll(" ", "").split(",");
itfNames.addAll(Arrays.asList(split));
}
mainOk = true;
}
}


handleAttr(pureCodeSb.toString());//無註釋的純程式碼
countBean.annoLineCount = annoLineCount;
countBean.blankLineCount = blankLineCount;
countBean.lineCount = allCount;
countBean.realLineCount = allCount - blankLineCount - annoLineCount;
countBean.attrCount = attrs.size();
countBean.methodCount = methods.size();

sourceBean.countBean = countBean;

classBean.name = className;
classBean.fatherName = fatherName;

classBean.itfNames = itfNames;
classBean.perFix = perFix;
sourceBean.classBean = classBean;

sourceBean.attrs = attrs;
sourceBean.methods = methods;

return sourceBean;
}

private void handleAttr(String code) {
String attrDirty = code.split("\\{")[1];//髒亂的屬性
String[] split = attrDirty.split(";");
for (int i = 0; i < split.length - 1; i++) {
String result = split[i].replaceAll("\n|( {2,})", "-");
attrs.add(result);
}
}

/**
* 獲取下一個單詞(//TODO 適用：單詞必須一個空格隔開)
*
* @param line字串
* @param target 目標字串
* @return 下一個單詞
*/
private String getNextWordBy(String line, String target) {
if (!line.contains(target + " ") || line.endsWith(target)) {
return "NO FOUND";
}
return line.split(target + " ")[1].split(" ")[0];
}
}

後記：捷文規範

1.本文成長記錄及勘誤表

專案原始碼	日期	備註
V0.1-github	2018-1-18	ofollow,noindex">鍛造正則神兵之Java原始碼分析器-V0.01

2.更多關於我

筆名	QQ	微信	愛好
張風捷特烈	1981462002	zdl1994328	語言
我的github	我的簡書	我的掘金	個人網站

3.宣告

1----本文由張風捷特烈原創,轉載請註明

2----歡迎廣大程式設計愛好者共同交流

3----個人能力有限，如有不正之處歡迎大家批評指證，必定虛心改正

4----看到這裡，我在此感謝你的喜歡與支援

icon_wx_200.png