1. 程式人生 > >從零寫一個編譯器(十三):程式碼生成之遍歷AST

從零寫一個編譯器(十三):程式碼生成之遍歷AST

專案的完整程式碼在 C2j-Compiler

前言

在上一篇完成對JVM指令的生成,下面就可以真正進入程式碼生成部分了。通常現代編譯器都是先把生成IR,再經過程式碼優化等等,最後才編譯成目標平臺程式碼。但是時間水平有限,我們沒有IR也沒有程式碼優化,就直接利用AST生成Java位元組碼

入口

進行程式碼生成的入口在CodeGen,和之前直譯器一樣:先獲取main函式的頭節點,從這個節點開始,先進入函式定義,再進入程式碼塊

函式定義節點

在進入函式定義節點的時候,就要生成一個函式定義對應的Java位元組碼,即一個靜態方法(因為我們對整個C語言檔案生成為一個類,main方法為public static main,其它的則是對應的靜態方法,結構體則是另外的類)

  • 對於函式定義先從節點中拿到對應的函式命和引數
  • emitArgs是用來處理引數的,根據引數生成相應的Java位元組碼
  • 如果這個函式是main的話已經是交由前面處理了,邏輯差不多(具體在start.Start中)
case SyntaxProductionInit.NewName_LP_RP_TO_FunctDecl:
    root.reverseChildren();
    AstNode n = root.getChildren().get(0);
    String name = (String) n.getAttribute(NodeKey.TEXT);
    symbol = (Symbol) root.getAttribute(NodeKey.SYMBOL);
    generator.setCurrentFuncName(name);
    if (name != null && !name.equals("main")) {
        String declaration = name + emitArgs(symbol);
        generator.emitDirective(Directive.METHOD_PUBBLIC_STATIC, declaration);
        generator.setNameAndDeclaration(name, declaration);
    }
    copyChild(root, root.getChildren().get(0));
    break;

case SyntaxProductionInit.NewName_LP_VarList_RP_TO_FunctDecl:
    n = root.getChildren().get(0);
    name = (String) n.getAttribute(NodeKey.TEXT);
    symbol = (Symbol) root.getAttribute(NodeKey.SYMBOL);
    generator.setCurrentFuncName(name);
    if (name != null && !name.equals("main")) {
        String declaration = name + emitArgs(symbol);
        generator.emitDirective(Directive.METHOD_PUBBLIC_STATIC, declaration);
        generator.setNameAndDeclaration(name, declaration);
    }

    Symbol args = symbol.getArgList();

    if (args == null || argsList == null || argsList.isEmpty()) {
        System.err.println("generate function with arg list but arg list is null");
        System.exit(1);
    }
    break;

建立結構體和陣列

陣列

建立結構體和陣列的節點在DefGenerate裡,可以看到在這裡只處理了陣列和普通變數,有關結構體的處理是在對結構體第一次使用的時候。順便提一下程式碼生成對於賦初值操作是沒有進行處理的。

  • 如果是個陣列,酒直接呼叫ProgramGenerator直接生成建立陣列的指令
  • 如果是個普通變數,就直接找到它並且賦值為0(這裡變數在佇列裡的位置是根據符號表來計算的,具體可以看上一篇的getLocalVariableIndex方法)
public class DefGenerate extends BaseGenerate {
    @Override
    public Object generate(AstNode root) {
        int production = (int) root.getAttribute(NodeKey.PRODUCTION);
        ProgramGenerator generator = ProgramGenerator.getInstance();
        Symbol symbol = (Symbol) root.getAttribute(NodeKey.SYMBOL);

        switch (production) {
            case SyntaxProductionInit.Specifiers_DeclList_Semi_TO_Def:
                Declarator declarator = symbol.getDeclarator(Declarator.ARRAY);
                if (declarator != null) {
                    if (symbol.getSpecifierByType(Specifier.STRUCTURE) == null) {
                        generator.createArray(symbol);
                    }
                } else {
                    int i = generator.getLocalVariableIndex(symbol);
                    generator.emit(Instruction.SIPUSH, "" + 0);
                    generator.emit(Instruction.ISTORE, "" + i);
                }

                break;

            default:
                break;
        }

        return root;
    }
}

結構體

處理結構體定義的程式碼在UnaryNodeGenerate,也就是隻有在使用到結構體定義時才會進行定義

  • 先拿到當前UNARY的符號,如果instanceof ArrayValueSetter就說明是一個結構體陣列,就進入getStructSymbolFromStructArray方法建立一個結構體陣列,並返回當前下標的結構體物件
  • 設定當前結構體的作用域範圍
  • 對結構體作為類進行定義
  • 然後對讀取結構體的域
  • 其實可以忽略指標部分,因為程式碼生成並沒有對指標進行模擬
case SyntaxProductionInit.Unary_StructOP_Name_TO_Unary:
    child = root.getChildren().get(0);
    String fieldName = (String) root.getAttribute(NodeKey.TEXT);
    Object object = child.getAttribute(NodeKey.SYMBOL);
    boolean isStructArray = false;

    if (object instanceof ArrayValueSetter) {
        symbol = getStructSymbolFromStructArray(object);
        symbol.addValueSetter(object);
        isStructArray = true;
    } else {
        symbol = (Symbol) child.getAttribute(NodeKey.SYMBOL);
    }

    if (isStructArray) {
        ArrayValueSetter vs = (ArrayValueSetter) object;
        Symbol structArray = vs.getSymbol();
        structArray.addScope(ProgramGenerator.getInstance().getCurrentFuncName());
    } else {
        symbol.addScope(ProgramGenerator.getInstance().getCurrentFuncName());
    }

    ProgramGenerator.getInstance().putStructToClassDeclaration(symbol);

    if (isSymbolStructPointer(symbol)) {
        copyBetweenStructAndMem(symbol, false);
    }

    Symbol args = symbol.getArgList();
    while (args != null) {
        if (args.getName().equals(fieldName)) {
            args.setStructParent(symbol);
            break;
        }

        args = args.getNextSymbol();
    }

    if (args == null) {
        System.err.println("access a filed not in struct object!");
        System.exit(1);
    }

    if (args.getValue() != null) {
        ProgramGenerator.getInstance().readValueFromStructMember(symbol, args);
    }

    root.setAttribute(NodeKey.SYMBOL, args);
    root.setAttribute(NodeKey.VALUE, args.getValue());

    if (isSymbolStructPointer(symbol)) {
        checkValidPointer(symbol);
        structObjSymbol = symbol;
        monitorSymbol = args;

        GenerateBrocasterImpl.getInstance().registerReceiverForAfterExe(this);
    } else {
        structObjSymbol = null;
    }
    break;

一元操作節點

這個節點和在直譯器的有很多相同,除了有對結構體的操作,其它的也是有非常重要的作用

  • 像數字、字串或者是變數和之前的操作都是把資訊傳遞到父節點,交由父節點處理
case SyntaxProductionInit.Number_TO_Unary:
    text = (String) root.getAttribute(NodeKey.TEXT);
    boolean isFloat = text.indexOf('.') != -1;
    if (isFloat) {
        value = Float.valueOf(text);
        root.setAttribute(NodeKey.VALUE, value);
    } else {
        value = Integer.valueOf(text);
        root.setAttribute(NodeKey.VALUE, value);
    }
    break;

case SyntaxProductionInit.Name_TO_Unary:
    symbol = (Symbol) root.getAttribute(NodeKey.SYMBOL);
    if (symbol != null) {
        root.setAttribute(NodeKey.VALUE, symbol.getValue());
        root.setAttribute(NodeKey.TEXT, symbol.getName());
    }
    break;

case SyntaxProductionInit.String_TO_Unary:
    text = (String) root.getAttribute(NodeKey.TEXT);
    root.setAttribute(NodeKey.VALUE, text);
    break;

case SyntaxProductionInit.Unary_LB_Expr_RB_TO_Unary:
    child = root.getChildren().get(0);
    symbol = (Symbol) child.getAttribute(NodeKey.SYMBOL);

    child = root.getChildren().get(1);
    int index = 0;
    if (child.getAttribute(NodeKey.VALUE) != null) {
        index = (Integer) child.getAttribute(NodeKey.VALUE);
    }
    Object idxObj = child.getAttribute(NodeKey.SYMBOL);

    try {
        Declarator declarator = symbol.getDeclarator(Declarator.ARRAY);
        if (declarator != null) {
            Object val = declarator.getElement((int) index);
            root.setAttribute(NodeKey.VALUE, val);
            ArrayValueSetter setter;
            if (idxObj == null) {
                setter = new ArrayValueSetter(symbol, index);
            } else {
                setter = new ArrayValueSetter(symbol, idxObj);
            }

            root.setAttribute(NodeKey.SYMBOL, setter);
            root.setAttribute(NodeKey.TEXT, symbol.getName());

        }
        Declarator pointer = symbol.getDeclarator(Declarator.POINTER);
        if (pointer != null) {
            setPointerValue(root, symbol, index);
            PointerValueSetter pv = new PointerValueSetter(symbol, index);
            root.setAttribute(NodeKey.SYMBOL, pv);
            root.setAttribute(NodeKey.TEXT, symbol.getName());
        }
    } catch (Exception e) {
        e.printStackTrace();
        System.exit(1);
    }
    break;

賦值操作

  • 如果當前是一個數組,先拿到它的符號和下標
  • 如果不是結構體陣列,那麼拿到下標直接用readArrayElement生成讀取陣列元素的指令
  • 如果是一個符號則用getLocalVariableIndex讀取這個符號的值
  • 如果是一個常數,則直接生成IPUSH指令
  • 最後進行賦值操作,如果不是對結構體的域進行賦值就直接用getLocalVariableIndex拿到佇列位置然後生成ISTORE
  • 如果是對結構體陣列的元素的域的賦值,就呼叫assignValueToStructMemberFromArray生成程式碼,如果只是結構體就直接呼叫assignValueToStructMember生成程式碼
ProgramGenerator generator = ProgramGenerator.getInstance();

if (BaseGenerate.resultOnStack) {
    this.value = obj;
    BaseGenerate.resultOnStack = false;
} else if (obj instanceof ArrayValueSetter) {
    ArrayValueSetter setter = (ArrayValueSetter) obj;
    Symbol symbol = setter.getSymbol();
    Object index = setter.getIndex();
    if (symbol.getSpecifierByType(Specifier.STRUCTURE) == null) {
        if (index instanceof Symbol) {
            ProgramGenerator.getInstance().readArrayElement(symbol, index);
            if (((Symbol) index).getValue() != null) {
                int i = (int) ((Symbol) index).getValue();
                try {
                    this.value = symbol.getDeclarator(Declarator.ARRAY).getElement(i);
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        } else {
            int i = (int) index;
            try {
                this.value = symbol.getDeclarator(Declarator.ARRAY).getElement(i);
            } catch (Exception e) {
                e.printStackTrace();
            }

            ProgramGenerator.getInstance().readArrayElement(symbol, index);
        }
    }
} else if (obj instanceof Symbol) {
    Symbol symbol = (Symbol) obj;
    this.value = symbol.value;
    int i = generator.getLocalVariableIndex(symbol);
    generator.emit(Instruction.ILOAD, "" + i);
} else if (obj instanceof Integer) {
    Integer val = (Integer) obj;
    generator.emit(Instruction.SIPUSH, "" + val);
    this.value = obj;
}

if (!this.isStructMember()) {
    int idx = generator.getLocalVariableIndex(this);
    if (!generator.isPassingArguments()) {
        generator.emit(Instruction.ISTORE, "" + idx);
    }
} else {
    if (this.getStructSymbol().getValueSetter() != null) {
        generator.assignValueToStructMemberFromArray(this.getStructSymbol().getValueSetter(), this, this.value);
    } else {
        generator.assignValueToStructMember(this.getStructSymbol(), this, this.value);
    }
}

最後

完成這部分後,對下面的程式碼

void quicksort(int A[10], int p, int r) {
    int x;
    int i;
    i = p - 1;
    int j;
    int t;
    int v;
    v = r - 1;
    if (p < r) {
        x = A[r];
        for (j = p; j <= v; j++) {
            if (A[j] <= x) {
                i++;
                t = A[i];
                A[i] = A[j];
                A[j] = t;
            }
        }
        v = i + 1;
        t = A[v];
        A[v] = A[r];
        A[r] = t;
        t = v - 1;
        quicksort(A, p, t);
        t = v + 1;
        quicksort(A, t, r);
    }
}

void main () {
    int a[10];
    int i;
    int t;
    printf("before quick sort:");
    for(i = 0; i < 10; i++) {
        t = (10 - i);
        a[i] = t;
        printf("value of a[%d] is %d", i, a[i]);
    }
    quicksort(a, 0, 9);
    printf("after quick sort:");
    for (i = 0; i < 10; i++) {
        printf("value of a[%d] is %d", i, a[i]);
    }
}

則會生成下面的Java位元組碼

.class public C2Bytecode
.super java/lang/Object

.method public static main([Ljava/lang/String;)V
    sipush  10
    newarray    int
    astore  0
    sipush  0
    istore  1
    sipush  0
    istore  2
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    ldc "before quick sort:"
    invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    ldc "
"
    invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V
    sipush  0
    istore  1

loop0:
    iload   1
    sipush  10
if_icmpge branch0
    sipush  10
    iload   1
    isub
    istore  2
    aload   0
    iload   1
    iload   2
    iastore
    aload   0
    iload   1
    iaload
    istore  3
    iload   1
    istore  4
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    ldc "value of a["
    invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    iload   4
    invokevirtual   java/io/PrintStream/print(I)V
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    ldc "] is "
    invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    iload   3
    invokevirtual   java/io/PrintStream/print(I)V
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    ldc "
"
    invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V
    iload   1
    sipush  1
    iadd
    istore  1
goto loop0
branch0:
    aload   0
    sipush  0
    sipush  9
    invokestatic    C2Bytecode/quicksort([III)V
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    ldc "after quick sort:"
    invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    ldc "
"
    invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V
    sipush  0
    istore  1

loop2:
    iload   1
    sipush  10
if_icmpge branch4
    aload   0
    iload   1
    iaload
    istore  3
    iload   1
    istore  4
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    ldc "value of a["
    invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    iload   4
    invokevirtual   java/io/PrintStream/print(I)V
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    ldc "] is "
    invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    iload   3
    invokevirtual   java/io/PrintStream/print(I)V
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    ldc "
"
    invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V
    iload   1
    sipush  1
    iadd
    istore  1
goto loop2
branch4:
    return
.end method
.method public static quicksort([III)V
    sipush  2
    newarray    int
    astore  6
    sipush  0
    istore  5
    sipush  1
    istore  5
    aload   6
    iload   5
    sipush  1
    iastore
    aload   6
    sipush  1
    iaload
    istore  10
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    ldc "before quick sort: "
    invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    iload   10
    invokevirtual   java/io/PrintStream/print(I)V
    getstatic   java/lang/System/out Ljava/io/PrintStream;
    ldc "
"
    invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V
    sipush  0
    istore  9
    sipush  0
    istore  3
    iload   1
    sipush  1
    isub
    istore  3
    sipush  0
    istore  4
    sipush  0
    istore  7
    sipush  0
    istore  8
    iload   2
    sipush  1
    isub
    istore  8
    iload   1
    iload   2
if_icmpge branch1

    aload   0
    iload   2
    iaload
    istore  9
    iload   1
    istore  4

loop1:

    iload   4
    iload   8
if_icmpgt ibranch1

    aload   0
    iload   4
    iaload
    iload   9
if_icmpgt ibranch2

    iload   3
    sipush  1
    iadd
    istore  3
    aload   0
    iload   3
    iaload
    istore  7
    aload   0
    iload   3
    aload   0
    iload   4
    iaload
    iastore
    aload   0
    iload   4
    iload   7
    iastore
ibranch2:

    iload   4
    sipush  1
    iadd
    istore  4
goto loop1

ibranch1:

    iload   3
    sipush  1
    iadd
    istore  8
    aload   0
    iload   8
    iaload
    istore  7
    aload   0
    iload   8
    aload   0
    iload   2
    iaload
    iastore
    aload   0
    iload   2
    iload   7
    iastore
    iload   8
    sipush  1
    isub
    istore  7
    aload   0
    iload   1
    iload   7
    invokestatic    C2Bytecode/quicksort([III)V
    iload   8
    sipush  1
    iadd
    istore  7
    aload   0
    iload   7
    iload   2
    invokestatic    C2Bytecode/quicksort([III)V
branch1:

    return
.end method

.end class

小結

這篇的程式碼生成和之前直譯器的思路很相似,都是根據AST和對應的產生式來執行或者生成程式碼。

其實主要的思路是很清晰的,只是其中有太多細節容易讓人太過糾結。這個系列算作是我自己的學習筆記,到這也有十三篇了,下一篇可能寫寫總結就正式結束了。

歡迎St