LLVM學習筆記（8）

阿新 • • 發佈：2019-02-08

2.3. 彙編處理描述

至於關於讀寫彙編格式指令資訊的封裝，TableGen提供了類Target（target.td）作為各目標機器的基類。

1059 class Target {

1060 // InstructionSet- Instruction set description for this target.

1061 InstrInfo InstructionSet;

1062

1063 // AssemblyParsers- The AsmParser instances available for this target.

1064

list<AsmParser> AssemblyParsers =[DefaultAsmParser];

1065

1066 ///AssemblyParserVariants - The AsmParserVariant instances available for

1067 /// this target.

1068 list<AsmParserVariant>AssemblyParserVariants = [DefaultAsmParserVariant];

1069

1070 //AssemblyWriters - The AsmWriter instances available for this target.

1071 list<AsmWriter> AssemblyWriters =[DefaultAsmWriter];

1072 }

AssemblyParsers與AssemblyParserVariants是目標機器相關的彙編程式碼解析描述，如果子目標機器需要使用自己的彙編解析描述，這些解析器描述AssemblyParserVariants儲存。Target定義中最重要的是InstrInfo，它代表輸出彙編所關注的目標機器指令資訊：

691 // Target canspecify its instructions in either big or little-endian formats.

692 // For instance,while both Sparc and PowerPC are big-endian platforms, the

693 // Sparc manualspecifies its instructions in the format [31..0] (big), while

694 // PowerPCspecifies them using the format [0..31] (little).

695 bit isLittleEndianEncoding = 0;

696

697 // Theinstruction properties mayLoad, mayStore, and hasSideEffects are unset

698 // by default,and TableGen will infer their value from the instruction

699 // pattern whenpossible.

700 //

701 // Normally,TableGen will issue an error it it can't infer the value of a

702 // property thathasn't been set explicitly. When guessInstructionProperties

703 // is set, itwill guess a safe value instead.

704 //

705 // This option isa temporary migration help. It will go away.

706 bit guessInstructionProperties = 1;

707

708 // TableGen'sinstruction encoder generator has support for matching operands

709 // to bit-fieldvariables both by name and by position. While matching by

710 // name ispreferred, this is currently not possible for complex operands,

711 // and sometargets still reply on the positional encoding rules. When

712 // generating adecoder for such targets, the positional encoding rules must

713 // be used by thedecoder generator as well.

714 //

715 // This option istemporary; it will go away once the TableGen decoder

716 // generator hasbetter support for complex operands and targets have

717 // migrated awayfrom using positionally encoded operands.

718 bit decodePositionallyEncodedOperands = 0;

719

720 // When set, thisindicates that there will be no overlap between those

721 // operands thatare matched by ordering (positional operands) and those

722 // matched byname.

723 //

724 // This option istemporary; it will go away once the TableGen decoder

725 // generator hasbetter support for complex operands and targets have

726 // migrated awayfrom using positionally encoded operands.

727 bit noNamedPositionallyEncodedOperands = 0;

728 }

X86目標機器原封不動地從InstrInfo派生了X86InstrInfo。它的Target派生定義則是：

568 def X86 : Target {

569 // Informationabout the instructions...

570 letInstructionSet = X86InstrInfo;

571 letAssemblyParsers = [ATTAsmParser];

572 letAssemblyParserVariants = [ATTAsmParserVariant, IntelAsmParserVariant];

573 letAssemblyWriters = [ATTAsmWriter, IntelAsmWriter];

574 }

正如我們所熟悉的X86的彙編程式碼分為了AT&T與Intel格式。ATTAsmParserVariant這樣的定義給出了對應組合語言的註釋符，分隔符以及暫存器字首等的定義，而ATTAsmWriter這樣的定義則指出LLVM中的哪個類可以輸出指定的彙編格式程式碼。ATTAsmParser則指出LLVM需要使用AsmParser來住持彙編解析。

2.4. 目標機器描述

對目標機器處理器的描述則以Target.td檔案中的Processor定義為基類。

1119 class Processor<string n, ProcessorItineraries pi,list<SubtargetFeature> f> {

1120 // Name - Chipset name. Used by command line (-mcpu=)to determine the

1121 // appropriatetarget chip.

1122 //

1123 string Name = n;

1124

1125 // SchedModel -The machine model for scheduling and instruction cost.

1126 //

1127 SchedMachineModel SchedModel = NoSchedModel;

1128

1129 // ProcItin - Thescheduling information for the target processor.

1130 //

1131 ProcessorItineraries ProcItin = pi;

1132

1133 // Features -list of

1134 list<SubtargetFeature>Features = f;

1135 }

1127行的SchedModel與1131行的ProcItin都可以對排程細節進行描述。SchedModel具體的描述參見下面的定義，ProcItin則是描述各種指令的執行步驟。1127行的NoSchedModel使得Processor定義預設不使用SchedModel。不過，對於某些處理器，ProcItin並不方便使用，而是使用SchedModel，因此也有預設禁止ProcItin的Processor定義：

1143 class ProcessorModel<string n, SchedMachineModel m,list<SubtargetFeature> f>

1144 : Processor<n, NoItineraries, f> {

1145 letSchedModel = m;

1146 }

下面我們將看到對於Atom這樣的順序流水線機器，使用的是ProcItin，而SandyBridge這樣支援亂序執行的機器，則使用SchedModel。我們在下面討論這些細節。

2.4.1. 特性描述

1134行的Features用於描述處理器所支援的特性，比如是否支援SSE指令集，是否支援TSX指令集等。TableGen將根據這些描述生成需要的謂詞判斷，SubtargetFeature是一個全字串的定義：

1077 class SubtargetFeature<string n, string a, string v, string d,

1078 list<SubtargetFeature> i = []> {

1079 // Name - Featurename. Used by command line (-mattr=) todetermine the

1080 // appropriatetarget chip.

1081 //

1082 string Name = n;

1083

1084 // Attribute -Attribute to be set by feature.

1085 //

1086 string Attribute = a;

1087

1088 // Value - Valuethe attribute to be set to by feature.

1089 //

1090 string Value = v;

1091

1092 // Desc - Featuredescription. Used by command line(-mattr=) to display help

1093 // information.

1094 //

1095 string Desc = d;

1096

1097 // Implies -Features that this feature implies are present. If one of those

1098 // features isn'tset, then this one shouldn't be set either.

1099 //

1100 list<SubtargetFeature> Implies = i;

1101 }

1100行的Implies表示，如果具有該特性，該特定隱含包含了哪些特性。

SubTargetFeature具有很好的靈活性以及描述能力，能夠描述差異很大的處理器。以Atom處理器為例，這是Intel推出的面向移動裝置的處理器。它與傳統的Intel處理器相比，更像RISC處理器。TableGen這樣定義它：

240 // Atom CPUs.

241 class BonnellProc<string Name> :ProcessorModel<Name, AtomModel, [

242 ProcIntelAtom,

243 FeatureSSSE3,

244 FeatureCMPXCHG16B,

245 FeatureMOVBE,

246 FeatureSlowBTMem,

247 FeatureLeaForSP,

248 FeatureSlowDivide32,

249 FeatureSlowDivide64,

250 FeatureCallRegIndirect,

251 FeatureLEAUsesAG,

252 FeaturePadShortFunctions

253 ]>;

254 def : BonnellProc<"bonnell">;

255 def: BonnellProc<"atom">; // Pin thegeneric name to the baseline.

242~252行就是Atom處理器支援的特性。第一個ProcIntelAtom是對這個處理器族的文字描述，幫助開發人員理解。而接著的FeatureSSSE3表示Atom支援SSSE3指令集，這同時也意味著也支援SSSE3以下的指令集，即SSE3、SSE2、SSE。

54 def FeatureSSSE3: SubtargetFeature<"ssse3","X86SSELevel", "SSSE3",

55 "Enable SSSE3 instructions",

56 [FeatureSSE3]>;

這個定義表示只支援SSSE3及以下的指令集（因此還包括SSE3，SSE2，SSE，MMX等）。

75 defFeatureCMPXCHG16B: SubtargetFeature<"cx16","HasCmpxchg16b", "true",

76 "64-bit with cmpxchg16b",

77 [Feature64Bit]>;

這個定義表示支援64位cmpxchg16b指令及條件move指令。

143 def FeatureMOVBE: SubtargetFeature<"movbe","HasMOVBE", "true",

144 "Support MOVBE instruction">;

這個定義表示支援MOVBE指令。

78 def FeatureSlowBTMem: SubtargetFeature<"slow-bt-mem","IsBTMemSlow", "true",

79 "Bit testing of memory is slow">;

這個定義表示該處理器上的位元測試指令是慢的。

173 def FeatureLeaForSP:SubtargetFeature<"lea-sp", "UseLeaForSP","true",

174 "UseLEA for adjusting the stack pointer">;

這個定義表示使用LEA指令來調整棧指標。

175 def FeatureSlowDivide32:SubtargetFeature<"idivl-to-divb",

176 "HasSlowDivide32", "true",

177 "Use8-bit divide for positive values less than 256">;

這個定義表示對小於256的正數使用8位除法（因為32位除法慢）。

178 def FeatureSlowDivide64:SubtargetFeature<"idivq-to-divw",

179 "HasSlowDivide64", "true",

180 "Use16-bit divide for positive values less than 65536">;

這個定義表示對小於65535的正數使用16位除法（因為64位除法慢）。

184 def FeatureCallRegIndirect:SubtargetFeature<"call-reg-indirect",

185 "CallRegIndirect", "true",

186 "Callregister indirect">;

這個定義表示間接呼叫暫存器。

187 def FeatureLEAUsesAG:SubtargetFeature<"lea-uses-ag", "LEAUsesAG","true",

188 "LEAinstruction needs inputs at AG stage">;

這個定義表示LEA指令在AG階段（訪問資料快取的某個階段）就需要輸入。

181 def FeaturePadShortFunctions:SubtargetFeature<"pad-short-functions",

182 "PadShortFunctions", "true",

183 "Padshort functions">;

這個定義表示要填充短的函式。

這些SubtargetFeature的定義由一個專門的TableGen選項“-gen-subtarget”來解析，在構建編譯器時，將通過llvm-tblgen執行該選項來生成一個目標機器特定的檔案X86GenSubtargetInfo.inc（以X86為例）。這個檔案包含的程式碼將輔助構建一個X86Subtarget類的例項（X86Subtarget.h），這個例項將提供相應的判斷方法。

2.4.2. 排程資訊

SchedMachineModel是這樣描述亂序執行處理器的：

78 int IssueWidth = -1; //Max micro-ops that may be scheduled per cycle.

79 int MinLatency = -1; //Determines which instructions are allowed in a group.

80 // (-1) inorder (0) ooo, (1): inorder +var latencies.

81 int MicroOpBufferSize = -1; // Max micro-ops that can be buffered.

82 int LoopMicroOpBufferSize = -1; // Max micro-ops that can be buffered for

83 // optimizedloop dispatch/execution.

84 int LoadLatency = -1; // Cycles for loads to access the cache.

85 int HighLatency = -1; // Approximation of cycles for "high latency" ops.

86 int MispredictPenalty = -1; // Extra cycles for a mispredicted branch.

88 // Per-cycleresources tables.

89 ProcessorItineraries Itineraries = NoItineraries;

91 bit PostRAScheduler = 0; // Enable Post RegAlloc Scheduler pass.

93 // Subtargetsthat define a model for only a subset of instructions

94 // that have ascheduling class (itinerary class or SchedRW list)

95 // and mayactually be generated for that subtarget must clear this

96 // bit.Otherwise, the scheduler considers an unmodelled opcode to

97 // be an error.This should only be set during initial bringup,

98 // or there willbe no way to catch simple errors in the model

99 // resulting fromchanges to the instruction definitions.

100 bit CompleteModel = 1;

101

102 bit NoModel = 0; //Special tag to indicate missing machine model.

103 }

屬性的預設值定義在C++類MCSchedModel（MCSchedule.h）中。在SchedMachineModel中出現的值-1表示該目標機器不會改寫該屬性。

其中IssueWidth預設為1，它是一個週期內可以排程（釋出）的最大指令數。

MicroOpBufferSize預設是0，它是處理器為亂序執行所緩衝的微操作個數。0表示在本週期未就緒的操作不考慮排程（它們進入掛起佇列）。指令時延是最重要的。如果在一次排程中掛起許多指令，可能會更高效。1表示不管在本週期是否就緒，考慮排程所有的指令。指令時延仍然會導致釋出暫停，不過我們通過其他啟發式平衡這些暫停。>1表示處理器亂序執行。這是高度特定於，比如暫存器重新命名池及重排緩衝，這些機器特性的一個機器無關的估計。

LoopMicroOpBufferSize的預設值是0。它是處理器為了優化迴圈的執行可能緩衝的微操作個數。更一般地，這代表了一個迴圈體裡最優的微運算元。迴圈可能被部分展開使得迴圈體的微運算元更接近這個值。

LoadLatency的預設值是4。它是讀指令的預期時延。如果MinLatency>= 0，對個別讀操作，可以通過InstrItinerary的OperandCycles來覆蓋它。

HighLatency的預設值是10。它是“非常高時延”操作的預期時延。通常，這是可能對排程啟發式產生一些影響的一個任意高的週期數。如果MinLatency>= 0，可以通過InstrItinerary的OperandCycles來覆蓋它。

MispredictPenalty的預設值是10。它是處理器從一次跳轉誤判恢復所需的、典型的額外週期數。

NoModel如果是1，表示這個SchedMachineModel定義是沒有實際意義的，比如：

105 def NoSchedModel : SchedMachineModel {

106 let NoModel =1;

107 }

看到在SchedMachineModel的定義裡也包含了一個ProcessorItineraries型別的成員，註釋中提到這是每週期的資源表（預設是不提供）。同時ProcessorItineraries也是Processor中ProcItin的型別，它的定義是（TargetItinerary.td）：

126 class ProcessorItineraries<list<FuncUnit> fu,list<Bypass> bp,

127 list<InstrItinData> iid> {

128 list<FuncUnit> FU = fu;

129 list<Bypass> BP = bp;

130 list<InstrItinData> IID = iid;

131 }

其成員都是list列表，列表的每個元素對應一個處理器週期。FuncUnit描述的是處理器的組成單元。因為處理器單元形形色色、各式各樣，因此FuncUnit作為基類只是個空的class。類似的，Bypass用於描述流水線旁路，它也是一個空class。

下面我們看一些例子。

2.4.2.1. ATOM的描述

Atom是Intel設計的超低電壓IA-32與x86-64微處理器，它基於Bonnell微架構。因此在X86.td的255行，可以看到Atom的ProcessorModel定義正是從BonnellProc派生的。

Bonnell微架構每週期可以最多執行兩條指令。像許多其他x86微處理器，在執行前它把x86指令（CISC指令）翻譯為更簡單的內部操作（有時稱為微操作，即實質上RISC形式的指令）。在翻譯時，在典型的程式裡，大多數指令產生一個微操作，大約4%的指令產生多個微操作。生成多個微操作的指令數顯著少於P6及NetBurst微架構。在Bonnell微架構裡，內部的微操作可以同時包含與一個ALU操作關聯的一個記憶體讀及一個記憶體寫，因此更類似於x86水平，比之前設計中使用的微操作更強大。這使得僅使用兩個整數ALU，無需指令重排，推測執行或暫存器重新命名，就獲得相對好的效能。因此Bonnell微架構代表用在Intel更早期設計的原則，比如P5與i486，的一個部分復興，唯一的目的是提高每瓦特效能比。不過，超執行緒以一個簡單的方式（即低功耗）實現，通過避免典型的簡單執行緒依賴來提高流水線效率。

要描述這個處理器，首先在X86.td檔案裡可以看到這些定義（這只是X86處理器定義集中的很小一部分）：

203 def ProcIntelAtom: SubtargetFeature<"atom","X86ProcFamily", "IntelAtom",

204 "Intel Atom processors">;

205 def ProcIntelSLM:SubtargetFeature<"slm", "X86ProcFamily","IntelSLM",

206 "Intel Silvermontprocessors">;

207

208 class Proc<string Name,list<SubtargetFeature> Features>

209 : ProcessorModel<Name, GenericModel,Features>;

210

211 def : Proc<"generic", []>;

212 def : Proc<"i386", []>;

213 def : Proc<"i486", []>;

214 def : Proc<"i586", []>;

215 def : Proc<"pentium", []>;

216 def : Proc<"pentium-mmx", [FeatureMMX]>;

217 def : Proc<"i686", []>;

218 def : Proc<"pentiumpro", [FeatureCMOV]>;

219 def : Proc<"pentium2", [FeatureMMX, FeatureCMOV]>;

220 def : Proc<"pentium3", [FeatureSSE1]>;

221 def : Proc<"pentium3m", [FeatureSSE1, FeatureSlowBTMem]>;

222 def : Proc<"pentium-m", [FeatureSSE2, FeatureSlowBTMem]>;

223 def : Proc<"pentium4", [FeatureSSE2]>;

224 def : Proc<"pentium4m", [FeatureSSE2, FeatureSlowBTMem]>;

225

226 // Intel Core Duo.

227 def : ProcessorModel<"yonah",SandyBridgeModel,

228 [FeatureSSE3,FeatureSlowBTMem]>;

229

230 // NetBurst.

231 def : Proc<"prescott", [FeatureSSE3,FeatureSlowBTMem]>;

232 def : Proc<"nocona", [FeatureSSE3, FeatureCMPXCHG16B,FeatureSlowBTMem]>;

233

234 // Intel Core 2 Solo/Duo.

235 def : ProcessorModel<"core2", SandyBridgeModel,

236 [FeatureSSSE3,FeatureCMPXCHG16B, FeatureSlowBTMem]>;

237 def : ProcessorModel<"penryn",SandyBridgeModel,

238 [FeatureSSE41,FeatureCMPXCHG16B, FeatureSlowBTMem]>;

239

240 // Atom CPUs.

241 class BonnellProc<string Name> :ProcessorModel<Name, AtomModel, [

243 FeatureSSSE3,

244 FeatureCMPXCHG16B,

245 FeatureMOVBE,

246 FeatureSlowBTMem,

247 FeatureLeaForSP,

248 FeatureSlowDivide32,

249 FeatureSlowDivide64,

250 FeatureCallRegIndirect,

251 FeatureLEAUsesAG,

252 FeaturePadShortFunctions

253 ]>;

254 def : BonnellProc<"bonnell">;

255 def : BonnellProc<"atom">; // Pin the generic name to the baseline.

256

257 class SilvermontProc<string Name> :ProcessorModel<Name, SLMModel, [

258 ProcIntelSLM,

259 FeatureSSE42,

260 FeatureCMPXCHG16B,

261 FeatureMOVBE,

262 FeaturePOPCNT,

263 FeaturePCLMUL,

264 FeatureAES,

265 FeatureSlowDivide64,

266 FeatureCallRegIndirect,

267 FeaturePRFCHW,

268 FeatureSlowLEA,

269 FeatureSlowIncDec,

270 FeatureSlowBTMem,

271 FeatureFastUAMem

272 ]>;

273 def : SilvermontProc<"silvermont">;

274 def: SilvermontProc<"slm">; // Legacyalias.

首先203與205行定義了兩個SubtargetFeature的派生定義，分別用於表示Atom與Silvermont型號處理器的特性。208行Proc定義使用GenericModel作為描述模型。這個處理器模型用於粗略描述處理器（比如沒有提供處理器完整的指令執行細節文件的情形）。

638 letIssueWidth = 4;

639 letMicroOpBufferSize = 32;

640 letLoadLatency = 4;

641 letHighLatency = 10;

642 letPostRAScheduler = 0;

643 }

GenericModel不包含每週期的資源表。IssueWidth類似於解碼單元的數量。Core與其子代，包括Nehalem及SandyBridge有4個解碼器。解碼器之外的資源執行微操作並被緩衝，因此相鄰的微操作不會直接競爭。

MicroOpBufferSize > 1表示未經處理的依賴性可以在同一個週期裡解碼。對執行中的指令，值32是一個合理的主觀值。

HighLatency = 10是樂觀的。X86InstrInfo::isHighLatencyDef標誌高時延的操作碼。或者，在這裡包含InstrItinData項來定義特定的運算元時延。因為這些時延不用於流水線衝突（pipelinehazard），它們不需要精確。

這裡可以看到，即使GenericModel也是對Core及之後的處理器優化的，這個設定對老舊的Intel處理器並不適合（它們沒有這麼多解碼單元，也沒有微操作緩衝）。不過對指令排程來說，沒有什麼太大問題，只是會使老舊處理器的效率不高而已。

上面的處理器定義用到了許多特性描述，其中FeatureMMX表示處理器支援符合MMX標準的指令與暫存器：

41 def FeatureMMX:SubtargetFeature<"mmx","X86SSELevel", "MMX",

42 "Enable MMXinstructions">;

而Atom的SchedMachineModel派生定義是這樣的：

537 def AtomModel : SchedMachineModel{

538 letIssueWidth = 2; //Allows 2 instructions per scheduling group.

539 let MicroOpBufferSize= 0; // In-order execution, always hide latency.

540 letLoadLatency = 3; // Expected cycles, may beoverriden by OperandCycles.

541 letHighLatency = 30;// Expected, may be overriden byOperandCycles.

542

543 // On the Atom,the throughput for taken branches is 2 cycles. For small

544 // simple loops,expand by a small factor to hide the backedge cost.

545 letLoopMicroOpBufferSize = 10;

546 letPostRAScheduler = 1;

547

548 letItineraries = AtomItineraries;

549 }

548行的AtomItineraries排程路線圖是一組龐大的定義（X86ScheduleAtom.td）。Atom依賴它來描述各種指令對資源（功能單元）佔用的情況。能這麼做，是因為Atom使用順序流水線，而且只有兩個Port口（這是Intel的處理器手冊定義的功能單元，手冊上就這麼定義的）。

20 def Port0 : FuncUnit; //ALU: ALU0, shift/rotate, load/store

21 // SIMD/FP: SIMD ALU,Shuffle,SIMD/FP multiply, divide

22 def Port1 :FuncUnit; // ALU: ALU1, bit processing, jump, andLEA

23 // SIMD/FP: SIMD ALU, FP Adder

25 def AtomItineraries : ProcessorItineraries<

26 [ Port0, Port1 ],

27 [], [

28 // P0 only

29 //InstrItinData<class, [InstrStage<N, [P0]>] >,

30 // P0 or P1

31 //InstrItinData<class, [InstrStage<N, [P0, P1]>] >,

32 // P0 and P1

33 //InstrItinData<class, [InstrStage<N, [P0], 0>, InstrStage<N, [P1]>] >,

34 //

35 // Default is 1cycle, port0 or port1

36 InstrItinData<IIC_ALU_MEM,[InstrStage<1, [Port0]>] >,

37 InstrItinData<IIC_ALU_NONMEM, [InstrStage<1,[Port0, Port1]>] >,

38 InstrItinData<IIC_LEA, [InstrStage<1,[Port1]>] >,

39 InstrItinData<IIC_LEA_16,[InstrStage<2, [Port0, Port1]>] >,

40 // mul

…

238 InstrItinData<IIC_SSE_MASKMOV,[InstrStage<2, [Port0, Port1]>] >,

239

240 InstrItinData<IIC_SSE_PEXTRW,[InstrStage<4, [Port0, Port1]>] >,

241 InstrItinData<IIC_SSE_PINSRW,[InstrStage<1, [Port0]>] >,

242

243 InstrItinData<IIC_SSE_PABS_RR,[InstrStage<1, [Port0, Port1]>] >,

244 InstrItinData<IIC_SSE_PABS_RM,[InstrStage<1, [Port0]>] >,

245

246 InstrItinData<IIC_SSE_MOV_S_RR,[InstrStage<1, [Port0, Port1]>] >,

247 InstrItinData<IIC_SSE_MOV_S_RM,[InstrStage<1, [Port0]>] >,

248 InstrItinData<IIC_SSE_MOV_S_MR,[InstrStage<1, [Port0]>] >,

249

250 InstrItinData<IIC_SSE_MOVA_P_RR,[InstrStage<1, [Port0, Port1]>] >,

251 InstrItinData<IIC_SSE_MOVA_P_RM,[InstrStage<1, [Port0]>] >,

252 InstrItinData<IIC_SSE_MOVA_P_MR,[InstrStage<1, [Port0]>] >,

253

254 InstrItinData<IIC_SSE_MOVU_P_RR,[InstrStage<1, [Port0, Port1]>] >,

255 InstrItinData<IIC_SSE_MOVU_P_RM,[InstrStage<3, [Port0, Port1]>] >,

256 InstrItinData<IIC_SSE_MOVU_P_MR,[InstrStage<2, [Port0, Port1]>] >,

257

258 InstrItinData<IIC_SSE_MOV_LH,[InstrStage<1, [Port0]>] >,

259

260 InstrItinData<IIC_SSE_LDDQU,[InstrStage<3, [Port0, Port1]>] >,

261

262 InstrItinData<IIC_SSE_MOVDQ,[InstrStage<1, [Port0]>] >,

263 InstrItinData<IIC_SSE_MOVD_ToGP,[InstrStage<3, [Port0]>] >,

264 InstrItinData<IIC_SSE_MOVQ_RR,[InstrStage<1, [Port0, Port1]>] >,

…

357 InstrItinData<IIC_FILD, [InstrStage<5,[Port0], 0>, InstrStage<5, [Port1]>] >,

358 InstrItinData<IIC_FLD, [InstrStage<1, [Port0]>] >,

359 InstrItinData<IIC_FLD80, [InstrStage<4,[Port0, Port1]>] >,

360

361 InstrItinData<IIC_FST, [InstrStage<2, [Port0, Port1]>] >,

362 InstrItinData<IIC_FST80, [InstrStage<5,[Port0, Port1]>] >,

363 InstrItinData<IIC_FIST, [InstrStage<6, [Port0, Port1]>] >,

…

533 InstrItinData<IIC_NOP, [InstrStage<1,[Port0, Port1]>] >

534 ]>;

X86ScheduleAtom.td檔案開頭的註釋提到，這部分定義來自“Intel 64 andIA32 Architectures Optimization Reference Manual”的第13章，第4節（2016年版則在第14章，第4節）。這份文件網上可以下載。文件中將這兩個ALU命名為Port0與Port1，LLVM也遵循它的命名，在20與22行給出這兩個定義。文件在第13章，第4節給出了一張表各種指令與ALU繫結及執行時延等資訊，AtomItineraries正是根據這張表來構建的。比如，第36、37行的定義來自表的以下兩項：

Instruction

Ports

Latency

Throughput

ADD/AND/CMP/OR/SUB/XOR/TEST mem, reg;

ADD/AND/CMP/OR/SUB/XOR2 reg, mem;

ADD/AND/CMP/OR/SUB/XOR reg, Imm8

ADD/AND/CMP/OR/SUB/XOR reg, imm

(0, 1)

0.5

現在我們看一下具體的例子。比如下面的指令定義（X86InstrCompiler.td）：

551 multiclass LOCK_ArithBinOp<bits<8> RegOpc,bits<8> ImmOpc, bits<8> ImmOpc8,

552 Format ImmMod,string mnemonic> {

553 let Defs = [EFLAGS], mayLoad = 1, mayStore = 1,isCodeGenOnly = 1,

554 SchedRW = [WriteALULd, WriteRMW] in {

555

556 def NAME#8mr : I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},

557 RegOpc{3}, RegOpc{2},RegOpc{1}, 0 },

558 MRMDestMem, (outs), (insi8mem:$dst, GR8:$src2),

559 !strconcat(mnemonic,"{b}\t",

560 "{$src2,$dst|$dst, $src2}"),

561 [], IIC_ALU_NONMEM>, LOCK;

562 def NAME#16mr :I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},

563 RegOpc{3}, RegOpc{2}, RegOpc{1},1 },

564 MRMDestMem, (outs), (insi16mem:$dst, GR16:$src2),

565 !strconcat(mnemonic,"{w}\t",

566 "{$src2,$dst|$dst, $src2}"),

567 [], IIC_ALU_NONMEM>,OpSize16, LOCK;

568 def NAME#32mr :I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},

569 RegOpc{3}, RegOpc{2},RegOpc{1}, 1 },

570 MRMDestMem, (outs), (insi32mem:$dst, GR32:$src2),

571 !strconcat(mnemonic,"{l}\t",

572 "{$src2,$dst|$dst, $src2}"),

573 [], IIC_ALU_NONMEM>,OpSize32, LOCK;

574 def NAME#64mr :RI<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},

575 RegOpc{3}, RegOpc{2},RegOpc{1}, 1 },

576 MRMDestMem, (outs), (insi64mem:$dst, GR64:$src2),

577 !strconcat(mnemonic,"{q}\t",

578 "{$src2,$dst|$dst, $src2}"),

579 [], IIC_ALU_NONMEM>,LOCK;

580

581 def NAME#8mi :Ii8<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},

582 ImmOpc{3}, ImmOpc{2},ImmOpc{1}, 0 },

583 ImmMod, (outs), (ins i8mem:$dst, i8imm :$src2),

584 !strconcat(mnemonic,"{b}\t",

585 "{$src2,$dst|$dst, $src2}"),

586 [], IIC_ALU_MEM>, LOCK;

587

588 def NAME#16mi :Ii16<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},

589 ImmOpc{3}, ImmOpc{2},ImmOpc{1}, 1 },

590 ImmMod, (outs), (ins i16mem:$dst, i16imm :$src2),

591 !strconcat(mnemonic,"{w}\t",

592 "{$src2,$dst|$dst, $src2}"),

593 [], IIC_ALU_MEM>,OpSize16, LOCK;

594

595 def NAME#32mi :Ii32<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},

596 ImmOpc

LLVM學習筆記（8）

2.3. 彙編處理描述

2.4. 目標機器描述

2.4.1. 特性描述

2.4.2. 排程資訊

2.4.2.1. ATOM的描述

LLVM學習筆記（8）

Swift學習筆記（8）：枚舉

struts2學習筆記（8）-------struts2的ajax支持

Linux學習筆記（8）

Linux第二周學習筆記（8）

匯編語言學習筆記（8）——數據處理的基本問題

Rx 學習筆記（8）錯誤處理和 To 操作符

SRM32學習筆記（8）——ADC和DAC

ActiveMQ學習筆記（8）----ActiveMQ的訊息儲存持久化

cesium 學習筆記（8）2018.11.08

MongoDB 學習筆記（8）---$type 操作符

吳恩達深度學習筆記（8）-重點-梯度下降法（Gradient Descent）

Go語言學習筆記（8）面向物件

Java核心技術卷I 基礎知識學習筆記（8）

LLVM學習筆記（47）

LLVM學習筆記（46）

LLVM學習筆記（45）

Python時間序列LSTM預測系列學習筆記（8）-多變數

LLVM學習筆記（42）

solidity學習筆記（8）—— 函式修飾符及自定義修飾符

LLVM學習筆記（8）

2.3. 彙編處理描述

2.4. 目標機器描述

2.4.1. 特性描述

2.4.2. 排程資訊

2.4.2.1. ATOM的描述

相關推薦