LLVM學習筆記(8)
2.3. 彙編處理描述
至於關於讀寫彙編格式指令資訊的封裝,TableGen提供了類Target(target.td)作為各目標機器的基類。
1059 class Target {
1060 // InstructionSet- Instruction set description for this target.
1061 InstrInfo InstructionSet;
1062
1063 // AssemblyParsers- The AsmParser instances available for this target.
1064
1065
1066 ///AssemblyParserVariants - The AsmParserVariant instances available for
1067 /// this target.
1068 list<AsmParserVariant>AssemblyParserVariants = [DefaultAsmParserVariant];
1069
1070 //AssemblyWriters - The AsmWriter instances available for this target.
1071 list<AsmWriter> AssemblyWriters =[DefaultAsmWriter];
1072 }
AssemblyParsers與AssemblyParserVariants是目標機器相關的彙編程式碼解析描述,如果子目標機器需要使用自己的彙編解析描述,這些解析器描述AssemblyParserVariants儲存。Target定義中最重要的是InstrInfo,它代表輸出彙編所關注的目標機器指令資訊:
691 // Target canspecify its instructions in either big or little-endian formats.
692 // For instance,while both Sparc and PowerPC are big-endian platforms, the
693 // Sparc manualspecifies its instructions in the format [31..0] (big), while
694 // PowerPCspecifies them using the format [0..31] (little).
695 bit isLittleEndianEncoding = 0;
696
697 // Theinstruction properties mayLoad, mayStore, and hasSideEffects are unset
698 // by default,and TableGen will infer their value from the instruction
699 // pattern whenpossible.
700 //
701 // Normally,TableGen will issue an error it it can't infer the value of a
702 // property thathasn't been set explicitly. When guessInstructionProperties
703 // is set, itwill guess a safe value instead.
704 //
705 // This option isa temporary migration help. It will go away.
706 bit guessInstructionProperties = 1;
707
708 // TableGen'sinstruction encoder generator has support for matching operands
709 // to bit-fieldvariables both by name and by position. While matching by
710 // name ispreferred, this is currently not possible for complex operands,
711 // and sometargets still reply on the positional encoding rules. When
712 // generating adecoder for such targets, the positional encoding rules must
713 // be used by thedecoder generator as well.
714 //
715 // This option istemporary; it will go away once the TableGen decoder
716 // generator hasbetter support for complex operands and targets have
717 // migrated awayfrom using positionally encoded operands.
718 bit decodePositionallyEncodedOperands = 0;
719
720 // When set, thisindicates that there will be no overlap between those
721 // operands thatare matched by ordering (positional operands) and those
722 // matched byname.
723 //
724 // This option istemporary; it will go away once the TableGen decoder
725 // generator hasbetter support for complex operands and targets have
726 // migrated awayfrom using positionally encoded operands.
727 bit noNamedPositionallyEncodedOperands = 0;
728 }
X86目標機器原封不動地從InstrInfo派生了X86InstrInfo。它的Target派生定義則是:
568 def X86 : Target {
569 // Informationabout the instructions...
570 letInstructionSet = X86InstrInfo;
571 letAssemblyParsers = [ATTAsmParser];
572 letAssemblyParserVariants = [ATTAsmParserVariant, IntelAsmParserVariant];
573 letAssemblyWriters = [ATTAsmWriter, IntelAsmWriter];
574 }
正如我們所熟悉的X86的彙編程式碼分為了AT&T與Intel格式。ATTAsmParserVariant這樣的定義給出了對應組合語言的註釋符,分隔符以及暫存器字首等的定義,而ATTAsmWriter這樣的定義則指出LLVM中的哪個類可以輸出指定的彙編格式程式碼。ATTAsmParser則指出LLVM需要使用AsmParser來住持彙編解析。
2.4. 目標機器描述
對目標機器處理器的描述則以Target.td檔案中的Processor定義為基類。
1119 class Processor<string n, ProcessorItineraries pi,list<SubtargetFeature> f> {
1120 // Name - Chipset name. Used by command line (-mcpu=)to determine the
1121 // appropriatetarget chip.
1122 //
1123 string Name = n;
1124
1125 // SchedModel -The machine model for scheduling and instruction cost.
1126 //
1127 SchedMachineModel SchedModel = NoSchedModel;
1128
1129 // ProcItin - Thescheduling information for the target processor.
1130 //
1131 ProcessorItineraries ProcItin = pi;
1132
1133 // Features -list of
1134 list<SubtargetFeature>Features = f;
1135 }
1127行的SchedModel與1131行的ProcItin都可以對排程細節進行描述。SchedModel具體的描述參見下面的定義,ProcItin則是描述各種指令的執行步驟。1127行的NoSchedModel使得Processor定義預設不使用SchedModel。不過,對於某些處理器,ProcItin並不方便使用,而是使用SchedModel,因此也有預設禁止ProcItin的Processor定義:
1143 class ProcessorModel<string n, SchedMachineModel m,list<SubtargetFeature> f>
1144 : Processor<n, NoItineraries, f> {
1145 letSchedModel = m;
1146 }
下面我們將看到對於Atom這樣的順序流水線機器,使用的是ProcItin,而SandyBridge這樣支援亂序執行的機器,則使用SchedModel。我們在下面討論這些細節。
2.4.1. 特性描述
1134行的Features用於描述處理器所支援的特性,比如是否支援SSE指令集,是否支援TSX指令集等。TableGen將根據這些描述生成需要的謂詞判斷,SubtargetFeature是一個全字串的定義:
1077 class SubtargetFeature<string n, string a, string v, string d,
1078 list<SubtargetFeature> i = []> {
1079 // Name - Featurename. Used by command line (-mattr=) todetermine the
1080 // appropriatetarget chip.
1081 //
1082 string Name = n;
1083
1084 // Attribute -Attribute to be set by feature.
1085 //
1086 string Attribute = a;
1087
1088 // Value - Valuethe attribute to be set to by feature.
1089 //
1090 string Value = v;
1091
1092 // Desc - Featuredescription. Used by command line(-mattr=) to display help
1093 // information.
1094 //
1095 string Desc = d;
1096
1097 // Implies -Features that this feature implies are present. If one of those
1098 // features isn'tset, then this one shouldn't be set either.
1099 //
1100 list<SubtargetFeature> Implies = i;
1101 }
1100行的Implies表示,如果具有該特性,該特定隱含包含了哪些特性。
SubTargetFeature具有很好的靈活性以及描述能力,能夠描述差異很大的處理器。以Atom處理器為例,這是Intel推出的面向移動裝置的處理器。它與傳統的Intel處理器相比,更像RISC處理器。TableGen這樣定義它:
240 // Atom CPUs.
241 class BonnellProc<string Name> :ProcessorModel<Name, AtomModel, [
242 ProcIntelAtom,
243 FeatureSSSE3,
244 FeatureCMPXCHG16B,
245 FeatureMOVBE,
246 FeatureSlowBTMem,
247 FeatureLeaForSP,
248 FeatureSlowDivide32,
249 FeatureSlowDivide64,
250 FeatureCallRegIndirect,
251 FeatureLEAUsesAG,
252 FeaturePadShortFunctions
253 ]>;
254 def : BonnellProc<"bonnell">;
255 def: BonnellProc<"atom">; // Pin thegeneric name to the baseline.
242~252行就是Atom處理器支援的特性。第一個ProcIntelAtom是對這個處理器族的文字描述,幫助開發人員理解。而接著的FeatureSSSE3表示Atom支援SSSE3指令集,這同時也意味著也支援SSSE3以下的指令集,即SSE3、SSE2、SSE。
54 def FeatureSSSE3: SubtargetFeature<"ssse3","X86SSELevel", "SSSE3",
55 "Enable SSSE3 instructions",
56 [FeatureSSE3]>;
這個定義表示只支援SSSE3及以下的指令集(因此還包括SSE3,SSE2,SSE,MMX等)。
75 defFeatureCMPXCHG16B: SubtargetFeature<"cx16","HasCmpxchg16b", "true",
76 "64-bit with cmpxchg16b",
77 [Feature64Bit]>;
這個定義表示支援64位cmpxchg16b指令及條件move指令。
143 def FeatureMOVBE: SubtargetFeature<"movbe","HasMOVBE", "true",
144 "Support MOVBE instruction">;
這個定義表示支援MOVBE指令。
78 def FeatureSlowBTMem: SubtargetFeature<"slow-bt-mem","IsBTMemSlow", "true",
79 "Bit testing of memory is slow">;
這個定義表示該處理器上的位元測試指令是慢的。
173 def FeatureLeaForSP:SubtargetFeature<"lea-sp", "UseLeaForSP","true",
174 "UseLEA for adjusting the stack pointer">;
這個定義表示使用LEA指令來調整棧指標。
175 def FeatureSlowDivide32:SubtargetFeature<"idivl-to-divb",
176 "HasSlowDivide32", "true",
177 "Use8-bit divide for positive values less than 256">;
這個定義表示對小於256的正數使用8位除法(因為32位除法慢)。
178 def FeatureSlowDivide64:SubtargetFeature<"idivq-to-divw",
179 "HasSlowDivide64", "true",
180 "Use16-bit divide for positive values less than 65536">;
這個定義表示對小於65535的正數使用16位除法(因為64位除法慢)。
184 def FeatureCallRegIndirect:SubtargetFeature<"call-reg-indirect",
185 "CallRegIndirect", "true",
186 "Callregister indirect">;
這個定義表示間接呼叫暫存器。
187 def FeatureLEAUsesAG:SubtargetFeature<"lea-uses-ag", "LEAUsesAG","true",
188 "LEAinstruction needs inputs at AG stage">;
這個定義表示LEA指令在AG階段(訪問資料快取的某個階段)就需要輸入。
181 def FeaturePadShortFunctions:SubtargetFeature<"pad-short-functions",
182 "PadShortFunctions", "true",
183 "Padshort functions">;
這個定義表示要填充短的函式。
這些SubtargetFeature的定義由一個專門的TableGen選項“-gen-subtarget”來解析,在構建編譯器時,將通過llvm-tblgen執行該選項來生成一個目標機器特定的檔案X86GenSubtargetInfo.inc(以X86為例)。這個檔案包含的程式碼將輔助構建一個X86Subtarget類的例項(X86Subtarget.h),這個例項將提供相應的判斷方法。
2.4.2. 排程資訊
SchedMachineModel是這樣描述亂序執行處理器的:
78 int IssueWidth = -1; //Max micro-ops that may be scheduled per cycle.
79 int MinLatency = -1; //Determines which instructions are allowed in a group.
80 // (-1) inorder (0) ooo, (1): inorder +var latencies.
81 int MicroOpBufferSize = -1; // Max micro-ops that can be buffered.
82 int LoopMicroOpBufferSize = -1; // Max micro-ops that can be buffered for
83 // optimizedloop dispatch/execution.
84 int LoadLatency = -1; // Cycles for loads to access the cache.
85 int HighLatency = -1; // Approximation of cycles for "high latency" ops.
86 int MispredictPenalty = -1; // Extra cycles for a mispredicted branch.
87
88 // Per-cycleresources tables.
89 ProcessorItineraries Itineraries = NoItineraries;
90
91 bit PostRAScheduler = 0; // Enable Post RegAlloc Scheduler pass.
92
93 // Subtargetsthat define a model for only a subset of instructions
94 // that have ascheduling class (itinerary class or SchedRW list)
95 // and mayactually be generated for that subtarget must clear this
96 // bit.Otherwise, the scheduler considers an unmodelled opcode to
97 // be an error.This should only be set during initial bringup,
98 // or there willbe no way to catch simple errors in the model
99 // resulting fromchanges to the instruction definitions.
100 bit CompleteModel = 1;
101
102 bit NoModel = 0; //Special tag to indicate missing machine model.
103 }
屬性的預設值定義在C++類MCSchedModel(MCSchedule.h)中。在SchedMachineModel中出現的值-1表示該目標機器不會改寫該屬性。
其中IssueWidth預設為1,它是一個週期內可以排程(釋出)的最大指令數。
MicroOpBufferSize預設是0,它是處理器為亂序執行所緩衝的微操作個數。0表示在本週期未就緒的操作不考慮排程(它們進入掛起佇列)。指令時延是最重要的。如果在一次排程中掛起許多指令,可能會更高效。1表示不管在本週期是否就緒,考慮排程所有的指令。指令時延仍然會導致釋出暫停,不過我們通過其他啟發式平衡這些暫停。>1表示處理器亂序執行。這是高度特定於,比如暫存器重新命名池及重排緩衝,這些機器特性的一個機器無關的估計。
LoopMicroOpBufferSize的預設值是0。它是處理器為了優化迴圈的執行可能緩衝的微操作個數。更一般地,這代表了一個迴圈體裡最優的微運算元。迴圈可能被部分展開使得迴圈體的微運算元更接近這個值。
LoadLatency的預設值是4。它是讀指令的預期時延。如果MinLatency>= 0,對個別讀操作,可以通過InstrItinerary的OperandCycles來覆蓋它。
HighLatency的預設值是10。它是“非常高時延”操作的預期時延。通常,這是可能對排程啟發式產生一些影響的一個任意高的週期數。如果MinLatency>= 0,可以通過InstrItinerary的OperandCycles來覆蓋它。
MispredictPenalty的預設值是10。它是處理器從一次跳轉誤判恢復所需的、典型的額外週期數。
NoModel如果是1,表示這個SchedMachineModel定義是沒有實際意義的,比如:
105 def NoSchedModel : SchedMachineModel {
106 let NoModel =1;
107 }
看到在SchedMachineModel的定義裡也包含了一個ProcessorItineraries型別的成員,註釋中提到這是每週期的資源表(預設是不提供)。同時ProcessorItineraries也是Processor中ProcItin的型別,它的定義是(TargetItinerary.td):
126 class ProcessorItineraries<list<FuncUnit> fu,list<Bypass> bp,
127 list<InstrItinData> iid> {
128 list<FuncUnit> FU = fu;
129 list<Bypass> BP = bp;
130 list<InstrItinData> IID = iid;
131 }
其成員都是list列表,列表的每個元素對應一個處理器週期。FuncUnit描述的是處理器的組成單元。因為處理器單元形形色色、各式各樣,因此FuncUnit作為基類只是個空的class。類似的,Bypass用於描述流水線旁路,它也是一個空class。
下面我們看一些例子。
2.4.2.1. ATOM的描述
Atom是Intel設計的超低電壓IA-32與x86-64微處理器,它基於Bonnell微架構。因此在X86.td的255行,可以看到Atom的ProcessorModel定義正是從BonnellProc派生的。
Bonnell微架構每週期可以最多執行兩條指令。像許多其他x86微處理器,在執行前它把x86指令(CISC指令)翻譯為更簡單的內部操作(有時稱為微操作,即實質上RISC形式的指令)。在翻譯時,在典型的程式裡,大多數指令產生一個微操作,大約4%的指令產生多個微操作。生成多個微操作的指令數顯著少於P6及NetBurst微架構。在Bonnell微架構裡,內部的微操作可以同時包含與一個ALU操作關聯的一個記憶體讀及一個記憶體寫,因此更類似於x86水平,比之前設計中使用的微操作更強大。這使得僅使用兩個整數ALU,無需指令重排,推測執行或暫存器重新命名,就獲得相對好的效能。因此Bonnell微架構代表用在Intel更早期設計的原則,比如P5與i486,的一個部分復興,唯一的目的是提高每瓦特效能比。不過,超執行緒以一個簡單的方式(即低功耗)實現,通過避免典型的簡單執行緒依賴來提高流水線效率。
要描述這個處理器,首先在X86.td檔案裡可以看到這些定義(這只是X86處理器定義集中的很小一部分):
203 def ProcIntelAtom: SubtargetFeature<"atom","X86ProcFamily", "IntelAtom",
204 "Intel Atom processors">;
205 def ProcIntelSLM:SubtargetFeature<"slm", "X86ProcFamily","IntelSLM",
206 "Intel Silvermontprocessors">;
207
208 class Proc<string Name,list<SubtargetFeature> Features>
209 : ProcessorModel<Name, GenericModel,Features>;
210
211 def : Proc<"generic", []>;
212 def : Proc<"i386", []>;
213 def : Proc<"i486", []>;
214 def : Proc<"i586", []>;
215 def : Proc<"pentium", []>;
216 def : Proc<"pentium-mmx", [FeatureMMX]>;
217 def : Proc<"i686", []>;
218 def : Proc<"pentiumpro", [FeatureCMOV]>;
219 def : Proc<"pentium2", [FeatureMMX, FeatureCMOV]>;
220 def : Proc<"pentium3", [FeatureSSE1]>;
221 def : Proc<"pentium3m", [FeatureSSE1, FeatureSlowBTMem]>;
222 def : Proc<"pentium-m", [FeatureSSE2, FeatureSlowBTMem]>;
223 def : Proc<"pentium4", [FeatureSSE2]>;
224 def : Proc<"pentium4m", [FeatureSSE2, FeatureSlowBTMem]>;
225
226 // Intel Core Duo.
227 def : ProcessorModel<"yonah",SandyBridgeModel,
228 [FeatureSSE3,FeatureSlowBTMem]>;
229
230 // NetBurst.
231 def : Proc<"prescott", [FeatureSSE3,FeatureSlowBTMem]>;
232 def : Proc<"nocona", [FeatureSSE3, FeatureCMPXCHG16B,FeatureSlowBTMem]>;
233
234 // Intel Core 2 Solo/Duo.
235 def : ProcessorModel<"core2", SandyBridgeModel,
236 [FeatureSSSE3,FeatureCMPXCHG16B, FeatureSlowBTMem]>;
237 def : ProcessorModel<"penryn",SandyBridgeModel,
238 [FeatureSSE41,FeatureCMPXCHG16B, FeatureSlowBTMem]>;
239
240 // Atom CPUs.
241 class BonnellProc<string Name> :ProcessorModel<Name, AtomModel, [
243 FeatureSSSE3,
244 FeatureCMPXCHG16B,
245 FeatureMOVBE,
246 FeatureSlowBTMem,
247 FeatureLeaForSP,
248 FeatureSlowDivide32,
249 FeatureSlowDivide64,
250 FeatureCallRegIndirect,
251 FeatureLEAUsesAG,
252 FeaturePadShortFunctions
253 ]>;
254 def : BonnellProc<"bonnell">;
255 def : BonnellProc<"atom">; // Pin the generic name to the baseline.
256
257 class SilvermontProc<string Name> :ProcessorModel<Name, SLMModel, [
258 ProcIntelSLM,
259 FeatureSSE42,
260 FeatureCMPXCHG16B,
261 FeatureMOVBE,
262 FeaturePOPCNT,
263 FeaturePCLMUL,
264 FeatureAES,
265 FeatureSlowDivide64,
266 FeatureCallRegIndirect,
267 FeaturePRFCHW,
268 FeatureSlowLEA,
269 FeatureSlowIncDec,
270 FeatureSlowBTMem,
271 FeatureFastUAMem
272 ]>;
273 def : SilvermontProc<"silvermont">;
274 def: SilvermontProc<"slm">; // Legacyalias.
首先203與205行定義了兩個SubtargetFeature的派生定義,分別用於表示Atom與Silvermont型號處理器的特性。208行Proc定義使用GenericModel作為描述模型。這個處理器模型用於粗略描述處理器(比如沒有提供處理器完整的指令執行細節文件的情形)。
638 letIssueWidth = 4;
639 letMicroOpBufferSize = 32;
640 letLoadLatency = 4;
641 letHighLatency = 10;
642 letPostRAScheduler = 0;
643 }
GenericModel不包含每週期的資源表。IssueWidth類似於解碼單元的數量。Core與其子代,包括Nehalem及SandyBridge有4個解碼器。解碼器之外的資源執行微操作並被緩衝,因此相鄰的微操作不會直接競爭。
MicroOpBufferSize > 1表示未經處理的依賴性可以在同一個週期裡解碼。對執行中的指令,值32是一個合理的主觀值。
HighLatency = 10是樂觀的。X86InstrInfo::isHighLatencyDef標誌高時延的操作碼。或者,在這裡包含InstrItinData項來定義特定的運算元時延。因為這些時延不用於流水線衝突(pipelinehazard),它們不需要精確。
這裡可以看到,即使GenericModel也是對Core及之後的處理器優化的,這個設定對老舊的Intel處理器並不適合(它們沒有這麼多解碼單元,也沒有微操作緩衝)。不過對指令排程來說,沒有什麼太大問題,只是會使老舊處理器的效率不高而已。
上面的處理器定義用到了許多特性描述,其中FeatureMMX表示處理器支援符合MMX標準的指令與暫存器:
41 def FeatureMMX:SubtargetFeature<"mmx","X86SSELevel", "MMX",
42 "Enable MMXinstructions">;
而Atom的SchedMachineModel派生定義是這樣的:
537 def AtomModel : SchedMachineModel{
538 letIssueWidth = 2; //Allows 2 instructions per scheduling group.
539 let MicroOpBufferSize= 0; // In-order execution, always hide latency.
540 letLoadLatency = 3; // Expected cycles, may beoverriden by OperandCycles.
541 letHighLatency = 30;// Expected, may be overriden byOperandCycles.
542
543 // On the Atom,the throughput for taken branches is 2 cycles. For small
544 // simple loops,expand by a small factor to hide the backedge cost.
545 letLoopMicroOpBufferSize = 10;
546 letPostRAScheduler = 1;
547
548 letItineraries = AtomItineraries;
549 }
548行的AtomItineraries排程路線圖是一組龐大的定義(X86ScheduleAtom.td)。Atom依賴它來描述各種指令對資源(功能單元)佔用的情況。能這麼做,是因為Atom使用順序流水線,而且只有兩個Port口(這是Intel的處理器手冊定義的功能單元,手冊上就這麼定義的)。
20 def Port0 : FuncUnit; //ALU: ALU0, shift/rotate, load/store
21 // SIMD/FP: SIMD ALU,Shuffle,SIMD/FP multiply, divide
22 def Port1 :FuncUnit; // ALU: ALU1, bit processing, jump, andLEA
23 // SIMD/FP: SIMD ALU, FP Adder
24
25 def AtomItineraries : ProcessorItineraries<
26 [ Port0, Port1 ],
27 [], [
28 // P0 only
29 //InstrItinData<class, [InstrStage<N, [P0]>] >,
30 // P0 or P1
31 //InstrItinData<class, [InstrStage<N, [P0, P1]>] >,
32 // P0 and P1
33 //InstrItinData<class, [InstrStage<N, [P0], 0>, InstrStage<N, [P1]>] >,
34 //
35 // Default is 1cycle, port0 or port1
36 InstrItinData<IIC_ALU_MEM,[InstrStage<1, [Port0]>] >,
37 InstrItinData<IIC_ALU_NONMEM, [InstrStage<1,[Port0, Port1]>] >,
38 InstrItinData<IIC_LEA, [InstrStage<1,[Port1]>] >,
39 InstrItinData<IIC_LEA_16,[InstrStage<2, [Port0, Port1]>] >,
40 // mul
…
238 InstrItinData<IIC_SSE_MASKMOV,[InstrStage<2, [Port0, Port1]>] >,
239
240 InstrItinData<IIC_SSE_PEXTRW,[InstrStage<4, [Port0, Port1]>] >,
241 InstrItinData<IIC_SSE_PINSRW,[InstrStage<1, [Port0]>] >,
242
243 InstrItinData<IIC_SSE_PABS_RR,[InstrStage<1, [Port0, Port1]>] >,
244 InstrItinData<IIC_SSE_PABS_RM,[InstrStage<1, [Port0]>] >,
245
246 InstrItinData<IIC_SSE_MOV_S_RR,[InstrStage<1, [Port0, Port1]>] >,
247 InstrItinData<IIC_SSE_MOV_S_RM,[InstrStage<1, [Port0]>] >,
248 InstrItinData<IIC_SSE_MOV_S_MR,[InstrStage<1, [Port0]>] >,
249
250 InstrItinData<IIC_SSE_MOVA_P_RR,[InstrStage<1, [Port0, Port1]>] >,
251 InstrItinData<IIC_SSE_MOVA_P_RM,[InstrStage<1, [Port0]>] >,
252 InstrItinData<IIC_SSE_MOVA_P_MR,[InstrStage<1, [Port0]>] >,
253
254 InstrItinData<IIC_SSE_MOVU_P_RR,[InstrStage<1, [Port0, Port1]>] >,
255 InstrItinData<IIC_SSE_MOVU_P_RM,[InstrStage<3, [Port0, Port1]>] >,
256 InstrItinData<IIC_SSE_MOVU_P_MR,[InstrStage<2, [Port0, Port1]>] >,
257
258 InstrItinData<IIC_SSE_MOV_LH,[InstrStage<1, [Port0]>] >,
259
260 InstrItinData<IIC_SSE_LDDQU,[InstrStage<3, [Port0, Port1]>] >,
261
262 InstrItinData<IIC_SSE_MOVDQ,[InstrStage<1, [Port0]>] >,
263 InstrItinData<IIC_SSE_MOVD_ToGP,[InstrStage<3, [Port0]>] >,
264 InstrItinData<IIC_SSE_MOVQ_RR,[InstrStage<1, [Port0, Port1]>] >,
…
357 InstrItinData<IIC_FILD, [InstrStage<5,[Port0], 0>, InstrStage<5, [Port1]>] >,
358 InstrItinData<IIC_FLD, [InstrStage<1, [Port0]>] >,
359 InstrItinData<IIC_FLD80, [InstrStage<4,[Port0, Port1]>] >,
360
361 InstrItinData<IIC_FST, [InstrStage<2, [Port0, Port1]>] >,
362 InstrItinData<IIC_FST80, [InstrStage<5,[Port0, Port1]>] >,
363 InstrItinData<IIC_FIST, [InstrStage<6, [Port0, Port1]>] >,
…
533 InstrItinData<IIC_NOP, [InstrStage<1,[Port0, Port1]>] >
534 ]>;
X86ScheduleAtom.td檔案開頭的註釋提到,這部分定義來自“Intel 64 andIA32 Architectures Optimization Reference Manual”的第13章,第4節(2016年版則在第14章,第4節)。這份文件網上可以下載。文件中將這兩個ALU命名為Port0與Port1,LLVM也遵循它的命名,在20與22行給出這兩個定義。文件在第13章,第4節給出了一張表各種指令與ALU繫結及執行時延等資訊,AtomItineraries正是根據這張表來構建的。比如,第36、37行的定義來自表的以下兩項:
Instruction |
Ports |
Latency |
Throughput |
ADD/AND/CMP/OR/SUB/XOR/TEST mem, reg; ADD/AND/CMP/OR/SUB/XOR2 reg, mem; |
0 |
1 |
1 |
ADD/AND/CMP/OR/SUB/XOR reg, Imm8 ADD/AND/CMP/OR/SUB/XOR reg, imm |
(0, 1) |
1 |
0.5 |
現在我們看一下具體的例子。比如下面的指令定義(X86InstrCompiler.td):
551 multiclass LOCK_ArithBinOp<bits<8> RegOpc,bits<8> ImmOpc, bits<8> ImmOpc8,
552 Format ImmMod,string mnemonic> {
553 let Defs = [EFLAGS], mayLoad = 1, mayStore = 1,isCodeGenOnly = 1,
554 SchedRW = [WriteALULd, WriteRMW] in {
555
556 def NAME#8mr : I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},
557 RegOpc{3}, RegOpc{2},RegOpc{1}, 0 },
558 MRMDestMem, (outs), (insi8mem:$dst, GR8:$src2),
559 !strconcat(mnemonic,"{b}\t",
560 "{$src2,$dst|$dst, $src2}"),
561 [], IIC_ALU_NONMEM>, LOCK;
562 def NAME#16mr :I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},
563 RegOpc{3}, RegOpc{2}, RegOpc{1},1 },
564 MRMDestMem, (outs), (insi16mem:$dst, GR16:$src2),
565 !strconcat(mnemonic,"{w}\t",
566 "{$src2,$dst|$dst, $src2}"),
567 [], IIC_ALU_NONMEM>,OpSize16, LOCK;
568 def NAME#32mr :I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},
569 RegOpc{3}, RegOpc{2},RegOpc{1}, 1 },
570 MRMDestMem, (outs), (insi32mem:$dst, GR32:$src2),
571 !strconcat(mnemonic,"{l}\t",
572 "{$src2,$dst|$dst, $src2}"),
573 [], IIC_ALU_NONMEM>,OpSize32, LOCK;
574 def NAME#64mr :RI<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},
575 RegOpc{3}, RegOpc{2},RegOpc{1}, 1 },
576 MRMDestMem, (outs), (insi64mem:$dst, GR64:$src2),
577 !strconcat(mnemonic,"{q}\t",
578 "{$src2,$dst|$dst, $src2}"),
579 [], IIC_ALU_NONMEM>,LOCK;
580
581 def NAME#8mi :Ii8<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},
582 ImmOpc{3}, ImmOpc{2},ImmOpc{1}, 0 },
583 ImmMod, (outs), (ins i8mem:$dst, i8imm :$src2),
584 !strconcat(mnemonic,"{b}\t",
585 "{$src2,$dst|$dst, $src2}"),
586 [], IIC_ALU_MEM>, LOCK;
587
588 def NAME#16mi :Ii16<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},
589 ImmOpc{3}, ImmOpc{2},ImmOpc{1}, 1 },
590 ImmMod, (outs), (ins i16mem:$dst, i16imm :$src2),
591 !strconcat(mnemonic,"{w}\t",
592 "{$src2,$dst|$dst, $src2}"),
593 [], IIC_ALU_MEM>,OpSize16, LOCK;
594
595 def NAME#32mi :Ii32<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},
596 ImmOpc