LLVM
LLVM:
模組化,可重用的編譯器以及工具鏈技術集合.
創始人: Chris Lattner
LLVM不是Low Level Virtual Machine(低階虛擬機器)的縮寫,LLVM就是他的專案全名.
傳統編譯器:
Clang
傳統編譯器架構:
- Frontend: 前端
詞法分析,語法分析,語義分析,生成中間程式碼
- Opotimizer: 優化器
中間程式碼優化
- Backend: 後端
生成機器碼
LLVM架構
- 不同的前端後端使用統一的中間程式碼 LLVM Intermediate Representation(LLVM IR).
- 如果需要支援一種新的程式語言,只需要實現一個新的前端.
- 如果需要支援一種新的硬體裝置,只需要增加一個新的後端.
- 優化階段是一個通用階段,它針對的是統一的LLVM IR,無論是支援新的程式語言,還是支援新的硬體裝置,都不需要對優化階段做修改.
- 相比之下,GCC的前端和後端沒分的泰開,前端後端耦合在一起.所以GCC為了來支援一門新的語言或者新的硬體裝置,就變得很困難.
- LLVM現在被用作為實現各種靜態和執行時變易語言的通用基礎結構.(GCC家族,Java,.net,Python等)
Clang
- LLVM一個子專案
- 基於LVVM架構的C/C++/Objective-C編譯器前端
優點:
- 編譯速度快,在某些平臺上Clang的便以速度顯著地快過GCC
- 佔用記憶體小,Clang生成的AST所佔用的內訓師CGG的五分之一左右
- 模組化設計,基於庫的模組化設計,易於IDE整合以及其他用途的重用
- 診斷資訊可讀性強: 在編譯過程中,Clang建立並保留了大量詳細的元資料(metadata),有利於除錯和錯誤解讀.
- 設計清晰簡單,容易理解,易於擴充套件增強
Clang與LLVM
- 廣義LLVM
整個LLVM架構
- 狹義LLVM
LLVM後端(程式碼優化,目的碼生成等)
OC原始檔編譯過程
命令列檢視編譯過程
clang -ccc-print-phases main.m
➜TestSwift clang -ccc-print-phases main.swift 0: input, "main.swift", object 1: linker, {0}, image 2: bind-arch, "x86_64", {1}, image
➜TestOC clang -ccc-print-phases main.m 0: input, "main.m", objective-c 1: preprocessor, {0}, objective-c-cpp-output 2: compiler, {1}, ir 3: backend, {2}, assembler 4: assembler, {3}, object 5: linker, {4}, image 6: bind-arch, "x86_64", {5}, image
Swift比OC少了4個編譯階段吶,有木有...
檢視preprocessor(預處理)的結果
clang -E main.m
//原始檔 print("Hello World") //預處理輸出 ➜TestSwift clang -E main.swift clang: warning: main.swift: 'linker' input unused [-Wunused-command-line-argument]
//原始檔 #define AGE 10 int main(int argc, const char * argv[]) { int a = 10; int b = 20; int c = a + b + AGE; return 0; } //預處理輸出 ➜TestOC clang -E main.m # 1 "main.m" # 1 "<built-in>" 1 # 1 "<built-in>" 3 # 373 "<built-in>" 3 # 1 "<command line>" 1 # 1 "<built-in>" 2 # 1 "main.m" 2 # 11 "main.m" int main(int argc, const char * argv[]) { int a = 10; int b = 20; int c = a + b + 10; return 0; }
詞法分析
- 詞法分析,生成Token(類似英語中主語,謂語,賓語,賓補...)
clang -fmodules -E -Xclang -dump-tokens main.m
➜TestSwift clang -fmodules -E -Xclang -dump-tokens main.swift clang: warning: main.swift: 'linker' input unused [-Wunused-command-line-argument] clang: warning: argument unused during compilation: '-fmodules' [-Wunused-command-line-argument] clang: warning: argument unused during compilation: '-Xclang -dump-tokens' [-Wunused-command-line-argument]
➜TestOC clang -fmodules -E -Xclang -dump-tokens main.m int 'int' [StartOfLine]Loc=<main.m:11:1> identifier 'main' [LeadingSpace]Loc=<main.m:11:5> l_paren '('Loc=<main.m:11:9> int 'int'Loc=<main.m:11:10> identifier 'argc' [LeadingSpace]Loc=<main.m:11:14> comma ','Loc=<main.m:11:18> const 'const' [LeadingSpace]Loc=<main.m:11:20> char 'char' [LeadingSpace]Loc=<main.m:11:26> star '*' [LeadingSpace]Loc=<main.m:11:31> identifier 'argv' [LeadingSpace]Loc=<main.m:11:33> l_square '['Loc=<main.m:11:37> r_square ']'Loc=<main.m:11:38> r_paren ')'Loc=<main.m:11:39> l_brace '{' [LeadingSpace]Loc=<main.m:11:41> int 'int' [StartOfLine] [LeadingSpace]Loc=<main.m:13:5> identifier 'a' [LeadingSpace]Loc=<main.m:13:9> equal '=' [LeadingSpace]Loc=<main.m:13:11> numeric_constant '10' [LeadingSpace]Loc=<main.m:13:13> semi ';'Loc=<main.m:13:15> int 'int' [StartOfLine] [LeadingSpace]Loc=<main.m:14:5> identifier 'b' [LeadingSpace]Loc=<main.m:14:9> equal '=' [LeadingSpace]Loc=<main.m:14:11> numeric_constant '20' [LeadingSpace]Loc=<main.m:14:13> semi ';'Loc=<main.m:14:15> int 'int' [StartOfLine] [LeadingSpace]Loc=<main.m:15:5> identifier 'c' [LeadingSpace]Loc=<main.m:15:9> equal '=' [LeadingSpace]Loc=<main.m:15:11> identifier 'a' [LeadingSpace]Loc=<main.m:15:13> plus '+' [LeadingSpace]Loc=<main.m:15:15> identifier 'b' [LeadingSpace]Loc=<main.m:15:17> plus '+' [LeadingSpace]Loc=<main.m:15:19> numeric_constant '10' [LeadingSpace]Loc=<main.m:15:21 <Spelling=main.m:9:13>> semi ';'Loc=<main.m:15:24> return 'return' [StartOfLine] [LeadingSpace]Loc=<main.m:17:5> numeric_constant '0' [LeadingSpace]Loc=<main.m:17:12> semi ';'Loc=<main.m:17:13> r_brace '}' [StartOfLine]Loc=<main.m:18:1> eof ''Loc=<main.m:18:2>
語法分析
- 語法分析,生成語法樹(AST, Abstract Syntax Tree)
clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
➜Test clang -fmodules -fsyntax-only -Xclang -ast-dump main.swift clang: warning: main.swift: 'linker' input unused [-Wunused-command-line-argument] clang: warning: argument unused during compilation: '-fmodules' [-Wunused-command-line-argument] clang: warning: argument unused during compilation: '-Xclang -ast-dump' [-Wunused-command-line-argument]
➜TestOC clang -fmodules -fsyntax-only -Xclang -ast-dump main.m TranslationUnitDecl 0x7ff3730298e8 <<invalid sloc>> <invalid sloc> |-TypedefDecl 0x7ff373029e60 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128' | `-BuiltinType 0x7ff373029b80 '__int128' |-TypedefDecl 0x7ff373029ed0 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128' | `-BuiltinType 0x7ff373029ba0 'unsigned __int128' |-TypedefDecl 0x7ff373029f70 <<invalid sloc>> <invalid sloc> implicit SEL 'SEL *' | `-PointerType 0x7ff373029f30 'SEL *' |`-BuiltinType 0x7ff373029dc0 'SEL' |-TypedefDecl 0x7ff37302a058 <<invalid sloc>> <invalid sloc> implicit id 'id' | `-ObjCObjectPointerType 0x7ff37302a000 'id' |`-ObjCObjectType 0x7ff373029fd0 'id' |-TypedefDecl 0x7ff37302a138 <<invalid sloc>> <invalid sloc> implicit Class 'Class' | `-ObjCObjectPointerType 0x7ff37302a0e0 'Class' |`-ObjCObjectType 0x7ff37302a0b0 'Class' |-ObjCInterfaceDecl 0x7ff37302a190 <<invalid sloc>> <invalid sloc> implicit Protocol |-TypedefDecl 0x7ff37302a4f8 <<invalid sloc>> <invalid sloc> implicit __NSConstantString 'struct __NSConstantString_tag' | `-RecordType 0x7ff37302a300 'struct __NSConstantString_tag' |`-Record 0x7ff37302a260 '__NSConstantString_tag' |-TypedefDecl 0x7ff37302a590 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *' | `-PointerType 0x7ff37302a550 'char *' |`-BuiltinType 0x7ff373029980 'char' |-TypedefDecl 0x7ff373062488 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'struct __va_list_tag [1]' | `-ConstantArrayType 0x7ff373062430 'struct __va_list_tag [1]' 1 |`-RecordType 0x7ff3730622a0 'struct __va_list_tag' |`-Record 0x7ff373062200 '__va_list_tag' `-FunctionDecl 0x7ff373062758 <main.m:11:1, line:18:1> line:11:5 main 'int (int, const char **)' |-ParmVarDecl 0x7ff3730624f8 <col:10, col:14> col:14 argc 'int' |-ParmVarDecl 0x7ff373062610 <col:20, col:38> col:33 argv 'const char **':'const char **' `-CompoundStmt 0x7ff373062bd8 <col:41, line:18:1> |-DeclStmt 0x7ff373062928 <line:13:5, col:15> | `-VarDecl 0x7ff3730628a8 <col:5, col:13> col:9 used a 'int' cinit |`-IntegerLiteral 0x7ff373062908 <col:13> 'int' 10 |-DeclStmt 0x7ff3730629d8 <line:14:5, col:15> | `-VarDecl 0x7ff373062958 <col:5, col:13> col:9 used b 'int' cinit |`-IntegerLiteral 0x7ff3730629b8 <col:13> 'int' 20 |-DeclStmt 0x7ff373062b88 <line:15:5, col:24> | `-VarDecl 0x7ff373062a08 <col:5, line:9:13> line:15:9 c 'int' cinit |`-BinaryOperator 0x7ff373062b60 <col:13, line:9:13> 'int' '+' ||-BinaryOperator 0x7ff373062b18 <line:15:13, col:17> 'int' '+' || |-ImplicitCastExpr 0x7ff373062ae8 <col:13> 'int' <LValueToRValue> || | `-DeclRefExpr 0x7ff373062a68 <col:13> 'int' lvalue Var 0x7ff3730628a8 'a' 'int' || `-ImplicitCastExpr 0x7ff373062b00 <col:17> 'int' <LValueToRValue> ||`-DeclRefExpr 0x7ff373062aa8 <col:17> 'int' lvalue Var 0x7ff373062958 'b' 'int' |`-IntegerLiteral 0x7ff373062b40 <line:9:13> 'int' 10 `-ReturnStmt 0x7ff373062bc0 <line:17:5, col:12> `-IntegerLiteral 0x7ff373062ba0 <col:12> 'int' 0
LLVM IR
LLVM IR有三種表示形式(本質等價,好比水的氣態,液態,固態)
1.text: 便於閱讀的文字格式,類似於組合語言,副檔名 .ll
> clang -S -emit-llvm main.m
; ModuleID = 'main.m' source_filename = "main.m" target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx10.14.0" ; Function Attrs: noinline nounwind optnone ssp uwtable define i32 @main(i32, i8**) #0 { %3 = alloca i32, align 4 %4 = alloca i32, align 4 %5 = alloca i8**, align 8 %6 = alloca i32, align 4 %7 = alloca i32, align 4 %8 = alloca i32, align 4 store i32 0, i32* %3, align 4 store i32 %0, i32* %4, align 4 store i8** %1, i8*** %5, align 8 store i32 10, i32* %6, align 4 store i32 20, i32* %7, align 4 %9 = load i32, i32* %6, align 4 %10 = load i32, i32* %7, align 4 %11 = add nsw i32 %9, %10 %12 = add nsw i32 %11, 10 store i32 %12, i32* %8, align 4 ret i32 0 } attributes #0 = { noinline nounwind optnone ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6} !llvm.ident = !{!7} !0 = !{i32 1, !"Objective-C Version", i32 2} !1 = !{i32 1, !"Objective-C Image Info Version", i32 0} !2 = !{i32 1, !"Objective-C Image Info Section", !"__DATA,__objc_imageinfo,regular,no_dead_strip"} !3 = !{i32 4, !"Objective-C Garbage Collection", i32 0} !4 = !{i32 1, !"Objective-C Class Properties", i32 64} !5 = !{i32 1, !"wchar_size", i32 4} !6 = !{i32 7, !"PIC Level", i32 2} !7 = !{!"Apple LLVM version 10.0.0 (clang-1000.11.45.2)"} //什麼鬼東西
2.memory: 記憶體格式
3.bitcode: 二進位制格式,副檔名 .bc
clang -c -emit-llvm main.m
IR基本語法
- 註釋以分號
;
開頭 - 全域性識別符號以
@
開頭,區域性識別符號以%
開頭 -
alloca
在當前函式棧幀中分配記憶體 -
i32
,32bit,4個位元組的意思 -
align
,記憶體對齊 -
store
,寫入資料 -
load
,讀取資料