1. 程式人生 > >LLVM學習筆記(8)

LLVM學習筆記(8)

2.3.           彙編處理描述

至於關於讀寫彙編格式指令資訊的封裝,TableGen提供了類Target(target.td)作為各目標機器的基類。

1059   class Target {

1060     // InstructionSet- Instruction set description for this target.

1061     InstrInfo InstructionSet;

1062  

1063     // AssemblyParsers- The AsmParser instances available for this target.

1064  

  list<AsmParser> AssemblyParsers =[DefaultAsmParser];

1065  

1066     ///AssemblyParserVariants - The AsmParserVariant instances available for

1067     /// this target.

1068     list<AsmParserVariant>AssemblyParserVariants = [DefaultAsmParserVariant];

1069  

1070     //AssemblyWriters - The AsmWriter instances available for this target.

1071     list<AsmWriter> AssemblyWriters =[DefaultAsmWriter];

1072   }

AssemblyParsers與AssemblyParserVariants是目標機器相關的彙編程式碼解析描述,如果子目標機器需要使用自己的彙編解析描述,這些解析器描述AssemblyParserVariants儲存。Target定義中最重要的是InstrInfo,它代表輸出彙編所關注的目標機器指令資訊:

691        // Target canspecify its instructions in either big or little-endian formats.

692        // For instance,while both Sparc and PowerPC are big-endian platforms, the

693        // Sparc manualspecifies its instructions in the format [31..0] (big), while

694        // PowerPCspecifies them using the format [0..31] (little).

695        bit isLittleEndianEncoding = 0;

696     

697        // Theinstruction properties mayLoad, mayStore, and hasSideEffects are unset

698        // by default,and TableGen will infer their value from the instruction

699        // pattern whenpossible.

700        //

701        // Normally,TableGen will issue an error it it can't infer the value of a

702        // property thathasn't been set explicitly. When guessInstructionProperties

703        // is set, itwill guess a safe value instead.

704        //

705        // This option isa temporary migration help. It will go away.

706       bit guessInstructionProperties = 1;

707     

708        // TableGen'sinstruction encoder generator has support for matching operands

709        // to bit-fieldvariables both by name and by position. While matching by

710        // name ispreferred, this is currently not possible for complex operands,

711        // and sometargets still reply on the positional encoding rules. When

712        // generating adecoder for such targets, the positional encoding rules must

713        // be used by thedecoder generator as well.

714        //

715       // This option istemporary; it will go away once the TableGen decoder

716        // generator hasbetter support for complex operands and targets have

717        // migrated awayfrom using positionally encoded operands.

718        bit decodePositionallyEncodedOperands = 0;

719     

720        // When set, thisindicates that there will be no overlap between those

721        // operands thatare matched by ordering (positional operands) and those

722        // matched byname.

723        //

724        // This option istemporary; it will go away once the TableGen decoder

725        // generator hasbetter support for complex operands and targets have

726        // migrated awayfrom using positionally encoded operands.

727        bit noNamedPositionallyEncodedOperands = 0;

728      }

X86目標機器原封不動地從InstrInfo派生了X86InstrInfo。它的Target派生定義則是:

568      def X86 : Target {

569        // Informationabout the instructions...

570        letInstructionSet = X86InstrInfo;

571        letAssemblyParsers = [ATTAsmParser];

572        letAssemblyParserVariants = [ATTAsmParserVariant, IntelAsmParserVariant];

573        letAssemblyWriters = [ATTAsmWriter, IntelAsmWriter];

574      }

正如我們所熟悉的X86的彙編程式碼分為了AT&T與Intel格式。ATTAsmParserVariant這樣的定義給出了對應組合語言的註釋符,分隔符以及暫存器字首等的定義,而ATTAsmWriter這樣的定義則指出LLVM中的哪個類可以輸出指定的彙編格式程式碼。ATTAsmParser則指出LLVM需要使用AsmParser來住持彙編解析。

2.4.           目標機器描述

對目標機器處理器的描述則以Target.td檔案中的Processor定義為基類。

1119   class Processor<string n, ProcessorItineraries pi,list<SubtargetFeature> f> {

1120     // Name - Chipset name.  Used by command line (-mcpu=)to determine the

1121     // appropriatetarget chip.

1122     //

1123     string Name = n;

1124  

1125     // SchedModel -The machine model for scheduling and instruction cost.

1126     //

1127     SchedMachineModel SchedModel = NoSchedModel;

1128  

1129     // ProcItin - Thescheduling information for the target processor.

1130     //

1131     ProcessorItineraries ProcItin = pi;

1132  

1133     // Features -list of

1134     list<SubtargetFeature>Features = f;

1135   }

1127行的SchedModel與1131行的ProcItin都可以對排程細節進行描述。SchedModel具體的描述參見下面的定義,ProcItin則是描述各種指令的執行步驟。1127行的NoSchedModel使得Processor定義預設不使用SchedModel。不過,對於某些處理器,ProcItin並不方便使用,而是使用SchedModel,因此也有預設禁止ProcItin的Processor定義:

1143   class ProcessorModel<string n, SchedMachineModel m,list<SubtargetFeature> f>

1144     : Processor<n, NoItineraries, f> {

1145     letSchedModel = m;

1146   }

下面我們將看到對於Atom這樣的順序流水線機器,使用的是ProcItin,而SandyBridge這樣支援亂序執行的機器,則使用SchedModel。我們在下面討論這些細節。

2.4.1.  特性描述

1134行的Features用於描述處理器所支援的特性,比如是否支援SSE指令集,是否支援TSX指令集等。TableGen將根據這些描述生成需要的謂詞判斷,SubtargetFeature是一個全字串的定義:

1077   class SubtargetFeature<string n, string a,  string v, string d,

1078                         list<SubtargetFeature> i = []> {

1079     // Name - Featurename.  Used by command line (-mattr=) todetermine the

1080     // appropriatetarget chip.

1081     //

1082     string Name = n;

1083  

1084     // Attribute -Attribute to be set by feature.

1085     //

1086     string Attribute = a;

1087  

1088     // Value - Valuethe attribute to be set to by feature.

1089     //

1090     string Value = v;

1091  

1092     // Desc - Featuredescription.  Used by command line(-mattr=) to display help

1093     // information.

1094     //

1095     string Desc = d;

1096  

1097     // Implies -Features that this feature implies are present. If one of those

1098     // features isn'tset, then this one shouldn't be set either.

1099     //

1100     list<SubtargetFeature> Implies = i;

1101   }

1100行的Implies表示,如果具有該特性,該特定隱含包含了哪些特性。

SubTargetFeature具有很好的靈活性以及描述能力,能夠描述差異很大的處理器。以Atom處理器為例,這是Intel推出的面向移動裝置的處理器。它與傳統的Intel處理器相比,更像RISC處理器。TableGen這樣定義它:

240      // Atom CPUs.

241      class BonnellProc<string Name> :ProcessorModel<Name, AtomModel, [

242                                         ProcIntelAtom,

243                                         FeatureSSSE3,

244                                        FeatureCMPXCHG16B,

245                                        FeatureMOVBE,

246                                        FeatureSlowBTMem,

247                                         FeatureLeaForSP,

248                                        FeatureSlowDivide32,

249                                        FeatureSlowDivide64,

250                                        FeatureCallRegIndirect,

251                                        FeatureLEAUsesAG,

252                                         FeaturePadShortFunctions

253                                       ]>;

254      def : BonnellProc<"bonnell">;

255      def: BonnellProc<"atom">; // Pin thegeneric name to the baseline.

242~252行就是Atom處理器支援的特性。第一個ProcIntelAtom是對這個處理器族的文字描述,幫助開發人員理解。而接著的FeatureSSSE3表示Atom支援SSSE3指令集,這同時也意味著也支援SSSE3以下的指令集,即SSE3、SSE2、SSE。

54        def FeatureSSSE3: SubtargetFeature<"ssse3","X86SSELevel", "SSSE3",

55                                             "Enable SSSE3 instructions",

56                                              [FeatureSSE3]>;

這個定義表示只支援SSSE3及以下的指令集(因此還包括SSE3,SSE2,SSE,MMX等)。

75        defFeatureCMPXCHG16B: SubtargetFeature<"cx16","HasCmpxchg16b", "true",

76                                             "64-bit with cmpxchg16b",

77                                              [Feature64Bit]>;

這個定義表示支援64位cmpxchg16b指令及條件move指令。

143      def FeatureMOVBE: SubtargetFeature<"movbe","HasMOVBE", "true",

144                                           "Support MOVBE instruction">;

這個定義表示支援MOVBE指令。

78        def FeatureSlowBTMem: SubtargetFeature<"slow-bt-mem","IsBTMemSlow", "true",

79                                              "Bit testing of memory is slow">;

這個定義表示該處理器上的位元測試指令是慢的。

173      def FeatureLeaForSP:SubtargetFeature<"lea-sp", "UseLeaForSP","true",

174                                           "UseLEA for adjusting the stack pointer">;

這個定義表示使用LEA指令來調整棧指標。

175      def FeatureSlowDivide32:SubtargetFeature<"idivl-to-divb",

176                                          "HasSlowDivide32", "true",

177                                           "Use8-bit divide for positive values less than 256">;

這個定義表示對小於256的正數使用8位除法(因為32位除法慢)。

178      def FeatureSlowDivide64:SubtargetFeature<"idivq-to-divw",

179                                          "HasSlowDivide64", "true",

180                                           "Use16-bit divide for positive values less than 65536">;

這個定義表示對小於65535的正數使用16位除法(因為64位除法慢)。

184      def FeatureCallRegIndirect:SubtargetFeature<"call-reg-indirect",

185                                          "CallRegIndirect", "true",

186                                           "Callregister indirect">;

這個定義表示間接呼叫暫存器。

187      def FeatureLEAUsesAG:SubtargetFeature<"lea-uses-ag", "LEAUsesAG","true",

188                                         "LEAinstruction needs inputs at AG stage">;

這個定義表示LEA指令在AG階段(訪問資料快取的某個階段)就需要輸入。

181      def FeaturePadShortFunctions:SubtargetFeature<"pad-short-functions",

182                                          "PadShortFunctions", "true",

183                                           "Padshort functions">;

這個定義表示要填充短的函式。

這些SubtargetFeature的定義由一個專門的TableGen選項“-gen-subtarget”來解析,在構建編譯器時,將通過llvm-tblgen執行該選項來生成一個目標機器特定的檔案X86GenSubtargetInfo.inc(以X86為例)。這個檔案包含的程式碼將輔助構建一個X86Subtarget類的例項(X86Subtarget.h),這個例項將提供相應的判斷方法。

2.4.2.  排程資訊

SchedMachineModel是這樣描述亂序執行處理器的:

78          int IssueWidth = -1; //Max micro-ops that may be scheduled per cycle.

79          int MinLatency = -1; //Determines which instructions are allowed in a group.

80                              // (-1) inorder (0) ooo, (1): inorder +var latencies.

81          int MicroOpBufferSize = -1; // Max micro-ops that can be buffered.

82          int LoopMicroOpBufferSize = -1; // Max micro-ops that can be buffered for

83                                          // optimizedloop dispatch/execution.

84          int LoadLatency = -1; // Cycles for loads to access the cache.

85          int HighLatency = -1; // Approximation of cycles for "high latency" ops.

86          int MispredictPenalty = -1; // Extra cycles for a mispredicted branch.

87       

88          // Per-cycleresources tables.

89          ProcessorItineraries Itineraries = NoItineraries;

90       

91          bit PostRAScheduler = 0; // Enable Post RegAlloc Scheduler pass.

92       

93          // Subtargetsthat define a model for only a subset of instructions

94          // that have ascheduling class (itinerary class or SchedRW list)

95          // and mayactually be generated for that subtarget must clear this

96          // bit.Otherwise, the scheduler considers an unmodelled opcode to

97          // be an error.This should only be set during initial bringup,

98          // or there willbe no way to catch simple errors in the model

99          // resulting fromchanges to the instruction definitions.

100        bit CompleteModel = 1;

101     

102        bit NoModel = 0; //Special tag to indicate missing machine model.

103      }

屬性的預設值定義在C++類MCSchedModel(MCSchedule.h)中。在SchedMachineModel中出現的值-1表示該目標機器不會改寫該屬性。

其中IssueWidth預設為1,它是一個週期內可以排程(釋出)的最大指令數。

MicroOpBufferSize預設是0,它是處理器為亂序執行所緩衝的微操作個數。0表示在本週期未就緒的操作不考慮排程(它們進入掛起佇列)。指令時延是最重要的。如果在一次排程中掛起許多指令,可能會更高效。1表示不管在本週期是否就緒,考慮排程所有的指令。指令時延仍然會導致釋出暫停,不過我們通過其他啟發式平衡這些暫停。>1表示處理器亂序執行。這是高度特定於,比如暫存器重新命名池及重排緩衝,這些機器特性的一個機器無關的估計。

LoopMicroOpBufferSize的預設值是0。它是處理器為了優化迴圈的執行可能緩衝的微操作個數。更一般地,這代表了一個迴圈體裡最優的微運算元。迴圈可能被部分展開使得迴圈體的微運算元更接近這個值。

LoadLatency的預設值是4。它是讀指令的預期時延。如果MinLatency>= 0,對個別讀操作,可以通過InstrItinerary的OperandCycles來覆蓋它。

HighLatency的預設值是10。它是“非常高時延”操作的預期時延。通常,這是可能對排程啟發式產生一些影響的一個任意高的週期數。如果MinLatency>= 0,可以通過InstrItinerary的OperandCycles來覆蓋它。

MispredictPenalty的預設值是10。它是處理器從一次跳轉誤判恢復所需的、典型的額外週期數。

NoModel如果是1,表示這個SchedMachineModel定義是沒有實際意義的,比如:

105      def NoSchedModel : SchedMachineModel {

106        let NoModel =1;

107      }

看到在SchedMachineModel的定義裡也包含了一個ProcessorItineraries型別的成員,註釋中提到這是每週期的資源表(預設是不提供)。同時ProcessorItineraries也是Processor中ProcItin的型別,它的定義是(TargetItinerary.td):

126      class ProcessorItineraries<list<FuncUnit> fu,list<Bypass> bp,

127                                list<InstrItinData> iid> {

128        list<FuncUnit> FU = fu;

129        list<Bypass> BP = bp;

130        list<InstrItinData> IID = iid;

131      }

其成員都是list列表,列表的每個元素對應一個處理器週期。FuncUnit描述的是處理器的組成單元。因為處理器單元形形色色、各式各樣,因此FuncUnit作為基類只是個空的class。類似的,Bypass用於描述流水線旁路,它也是一個空class。

下面我們看一些例子。

2.4.2.1.       ATOM的描述

Atom是Intel設計的超低電壓IA-32與x86-64微處理器,它基於Bonnell微架構。因此在X86.td的255行,可以看到Atom的ProcessorModel定義正是從BonnellProc派生的。

Bonnell微架構每週期可以最多執行兩條指令。像許多其他x86微處理器,在執行前它把x86指令(CISC指令)翻譯為更簡單的內部操作(有時稱為微操作,即實質上RISC形式的指令)。在翻譯時,在典型的程式裡,大多數指令產生一個微操作,大約4%的指令產生多個微操作。生成多個微操作的指令數顯著少於P6及NetBurst微架構。在Bonnell微架構裡,內部的微操作可以同時包含與一個ALU操作關聯的一個記憶體讀及一個記憶體寫,因此更類似於x86水平,比之前設計中使用的微操作更強大。這使得僅使用兩個整數ALU,無需指令重排,推測執行或暫存器重新命名,就獲得相對好的效能。因此Bonnell微架構代表用在Intel更早期設計的原則,比如P5與i486,的一個部分復興,唯一的目的是提高每瓦特效能比。不過,超執行緒以一個簡單的方式(即低功耗)實現,通過避免典型的簡單執行緒依賴來提高流水線效率。

要描述這個處理器,首先在X86.td檔案裡可以看到這些定義(這只是X86處理器定義集中的很小一部分):

203      def ProcIntelAtom: SubtargetFeature<"atom","X86ProcFamily", "IntelAtom",

204                          "Intel Atom processors">;

205      def ProcIntelSLM:SubtargetFeature<"slm", "X86ProcFamily","IntelSLM",

206                          "Intel Silvermontprocessors">;

207     

208      class Proc<string Name,list<SubtargetFeature> Features>

209      : ProcessorModel<Name, GenericModel,Features>;

210     

211      def : Proc<"generic",         []>;

212      def : Proc<"i386",            []>;

213      def : Proc<"i486",            []>;

214      def : Proc<"i586",            []>;

215      def : Proc<"pentium",         []>;

216      def : Proc<"pentium-mmx",     [FeatureMMX]>;

217      def : Proc<"i686",            []>;

218      def : Proc<"pentiumpro",      [FeatureCMOV]>;

219      def : Proc<"pentium2",        [FeatureMMX, FeatureCMOV]>;

220      def : Proc<"pentium3",        [FeatureSSE1]>;

221      def : Proc<"pentium3m",       [FeatureSSE1, FeatureSlowBTMem]>;

222      def : Proc<"pentium-m",       [FeatureSSE2, FeatureSlowBTMem]>;

223      def : Proc<"pentium4",        [FeatureSSE2]>;

224      def : Proc<"pentium4m",       [FeatureSSE2, FeatureSlowBTMem]>;

225     

226      // Intel Core Duo.

227      def : ProcessorModel<"yonah",SandyBridgeModel,

228                           [FeatureSSE3,FeatureSlowBTMem]>;

229     

230      // NetBurst.

231      def : Proc<"prescott", [FeatureSSE3,FeatureSlowBTMem]>;

232      def : Proc<"nocona",   [FeatureSSE3, FeatureCMPXCHG16B,FeatureSlowBTMem]>;

233     

234      // Intel Core 2 Solo/Duo.

235      def : ProcessorModel<"core2", SandyBridgeModel,

236                           [FeatureSSSE3,FeatureCMPXCHG16B, FeatureSlowBTMem]>;

237      def : ProcessorModel<"penryn",SandyBridgeModel,

238                           [FeatureSSE41,FeatureCMPXCHG16B, FeatureSlowBTMem]>;

239     

240      // Atom CPUs.

241      class BonnellProc<string Name> :ProcessorModel<Name, AtomModel, [

243                                        FeatureSSSE3,

244                                        FeatureCMPXCHG16B,

245                                        FeatureMOVBE,

246                                         FeatureSlowBTMem,

247                                        FeatureLeaForSP,

248                                        FeatureSlowDivide32,

249                                        FeatureSlowDivide64,

250                                         FeatureCallRegIndirect,

251                                        FeatureLEAUsesAG,

252                                        FeaturePadShortFunctions

253                                       ]>;

254      def : BonnellProc<"bonnell">;

255      def : BonnellProc<"atom">; // Pin the generic name to the baseline.

256     

257      class SilvermontProc<string Name> :ProcessorModel<Name, SLMModel, [

258                                            ProcIntelSLM,

259                                            FeatureSSE42,

260                                           FeatureCMPXCHG16B,

261                                           FeatureMOVBE,

262                                           FeaturePOPCNT,

263                                           FeaturePCLMUL,

264                                            FeatureAES,

265                                           FeatureSlowDivide64,

266                                           FeatureCallRegIndirect,

267                                           FeaturePRFCHW,

268                                            FeatureSlowLEA,

269                                           FeatureSlowIncDec,

270                                           FeatureSlowBTMem,

271                                           FeatureFastUAMem

272                                          ]>;

273      def : SilvermontProc<"silvermont">;

274      def: SilvermontProc<"slm">; // Legacyalias.

首先203與205行定義了兩個SubtargetFeature的派生定義,分別用於表示Atom與Silvermont型號處理器的特性。208行Proc定義使用GenericModel作為描述模型。這個處理器模型用於粗略描述處理器(比如沒有提供處理器完整的指令執行細節文件的情形)。

638        letIssueWidth = 4;

639        letMicroOpBufferSize = 32;

640        letLoadLatency = 4;

641        letHighLatency = 10;

642        letPostRAScheduler = 0;

643      }

GenericModel不包含每週期的資源表。IssueWidth類似於解碼單元的數量。Core與其子代,包括Nehalem及SandyBridge有4個解碼器。解碼器之外的資源執行微操作並被緩衝,因此相鄰的微操作不會直接競爭。

MicroOpBufferSize > 1表示未經處理的依賴性可以在同一個週期裡解碼。對執行中的指令,值32是一個合理的主觀值。

HighLatency = 10是樂觀的。X86InstrInfo::isHighLatencyDef標誌高時延的操作碼。或者,在這裡包含InstrItinData項來定義特定的運算元時延。因為這些時延不用於流水線衝突(pipelinehazard),它們不需要精確。

這裡可以看到,即使GenericModel也是對Core及之後的處理器優化的,這個設定對老舊的Intel處理器並不適合(它們沒有這麼多解碼單元,也沒有微操作緩衝)。不過對指令排程來說,沒有什麼太大問題,只是會使老舊處理器的效率不高而已。

上面的處理器定義用到了許多特性描述,其中FeatureMMX表示處理器支援符合MMX標準的指令與暫存器:

41        def FeatureMMX:SubtargetFeature<"mmx","X86SSELevel", "MMX",

42                                              "Enable MMXinstructions">;

而Atom的SchedMachineModel派生定義是這樣的:

537      def AtomModel : SchedMachineModel{

538        letIssueWidth = 2;  //Allows 2 instructions per scheduling group.

539        let MicroOpBufferSize= 0; // In-order execution, always hide latency.

540        letLoadLatency = 3; // Expected cycles, may beoverriden by OperandCycles.

541        letHighLatency = 30;// Expected, may be overriden byOperandCycles.

542     

543        // On the Atom,the throughput for taken branches is 2 cycles. For small

544        // simple loops,expand by a small factor to hide the backedge cost.

545        letLoopMicroOpBufferSize = 10;

546        letPostRAScheduler = 1;

547     

548        letItineraries = AtomItineraries;

549      }

548行的AtomItineraries排程路線圖是一組龐大的定義(X86ScheduleAtom.td)。Atom依賴它來描述各種指令對資源(功能單元)佔用的情況。能這麼做,是因為Atom使用順序流水線,而且只有兩個Port口(這是Intel的處理器手冊定義的功能單元,手冊上就這麼定義的)。

20        def Port0 : FuncUnit; //ALU: ALU0, shift/rotate, load/store

21                             // SIMD/FP: SIMD ALU,Shuffle,SIMD/FP multiply, divide

22        def Port1 :FuncUnit; // ALU: ALU1, bit processing, jump, andLEA

23                             // SIMD/FP: SIMD ALU, FP Adder

24       

25        def AtomItineraries : ProcessorItineraries<

26          [ Port0, Port1 ],

27          [], [

28          // P0 only

29          //InstrItinData<class, [InstrStage<N, [P0]>] >,

30          // P0 or P1

31          //InstrItinData<class, [InstrStage<N, [P0, P1]>] >,

32          // P0 and P1

33          //InstrItinData<class, [InstrStage<N, [P0], 0>,  InstrStage<N, [P1]>] >,

34          //

35          // Default is 1cycle, port0 or port1

36          InstrItinData<IIC_ALU_MEM,[InstrStage<1, [Port0]>] >,

37          InstrItinData<IIC_ALU_NONMEM, [InstrStage<1,[Port0, Port1]>] >,

38          InstrItinData<IIC_LEA, [InstrStage<1,[Port1]>] >,

39          InstrItinData<IIC_LEA_16,[InstrStage<2, [Port0, Port1]>] >,

40          // mul

             …

238        InstrItinData<IIC_SSE_MASKMOV,[InstrStage<2, [Port0, Port1]>] >,

239     

240        InstrItinData<IIC_SSE_PEXTRW,[InstrStage<4, [Port0, Port1]>] >,

241        InstrItinData<IIC_SSE_PINSRW,[InstrStage<1, [Port0]>] >,

242     

243        InstrItinData<IIC_SSE_PABS_RR,[InstrStage<1, [Port0, Port1]>] >,

244        InstrItinData<IIC_SSE_PABS_RM,[InstrStage<1, [Port0]>] >,

245     

246        InstrItinData<IIC_SSE_MOV_S_RR,[InstrStage<1, [Port0, Port1]>] >,

247        InstrItinData<IIC_SSE_MOV_S_RM,[InstrStage<1, [Port0]>] >,

248        InstrItinData<IIC_SSE_MOV_S_MR,[InstrStage<1, [Port0]>] >,

249     

250        InstrItinData<IIC_SSE_MOVA_P_RR,[InstrStage<1, [Port0, Port1]>] >,

251        InstrItinData<IIC_SSE_MOVA_P_RM,[InstrStage<1, [Port0]>] >,

252        InstrItinData<IIC_SSE_MOVA_P_MR,[InstrStage<1, [Port0]>] >,

253     

254        InstrItinData<IIC_SSE_MOVU_P_RR,[InstrStage<1, [Port0, Port1]>] >,

255        InstrItinData<IIC_SSE_MOVU_P_RM,[InstrStage<3, [Port0, Port1]>] >,

256        InstrItinData<IIC_SSE_MOVU_P_MR,[InstrStage<2, [Port0, Port1]>] >,

257     

258        InstrItinData<IIC_SSE_MOV_LH,[InstrStage<1, [Port0]>] >,

259     

260        InstrItinData<IIC_SSE_LDDQU,[InstrStage<3, [Port0, Port1]>] >,

261     

262        InstrItinData<IIC_SSE_MOVDQ,[InstrStage<1, [Port0]>] >,

263        InstrItinData<IIC_SSE_MOVD_ToGP,[InstrStage<3, [Port0]>] >,

264        InstrItinData<IIC_SSE_MOVQ_RR,[InstrStage<1, [Port0, Port1]>] >,

              …

357        InstrItinData<IIC_FILD, [InstrStage<5,[Port0], 0>, InstrStage<5, [Port1]>] >,

358        InstrItinData<IIC_FLD,  [InstrStage<1, [Port0]>] >,

359        InstrItinData<IIC_FLD80, [InstrStage<4,[Port0, Port1]>] >,

360     

361        InstrItinData<IIC_FST,   [InstrStage<2, [Port0, Port1]>] >,

362        InstrItinData<IIC_FST80, [InstrStage<5,[Port0, Port1]>] >,

363        InstrItinData<IIC_FIST,  [InstrStage<6, [Port0, Port1]>] >,

             …

533        InstrItinData<IIC_NOP, [InstrStage<1,[Port0, Port1]>] >

534        ]>;

X86ScheduleAtom.td檔案開頭的註釋提到,這部分定義來自“Intel 64 andIA32 Architectures Optimization Reference Manual”的第13章,第4節(2016年版則在第14章,第4節)。這份文件網上可以下載。文件中將這兩個ALU命名為Port0與Port1,LLVM也遵循它的命名,在20與22行給出這兩個定義。文件在第13章,第4節給出了一張表各種指令與ALU繫結及執行時延等資訊,AtomItineraries正是根據這張表來構建的。比如,第36、37行的定義來自表的以下兩項:

Instruction

Ports

Latency

Throughput

ADD/AND/CMP/OR/SUB/XOR/TEST mem, reg;

ADD/AND/CMP/OR/SUB/XOR2 reg, mem;

0

1

1

ADD/AND/CMP/OR/SUB/XOR reg, Imm8

ADD/AND/CMP/OR/SUB/XOR reg, imm

(0, 1)

1

0.5

現在我們看一下具體的例子。比如下面的指令定義(X86InstrCompiler.td):

551      multiclass LOCK_ArithBinOp<bits<8> RegOpc,bits<8> ImmOpc, bits<8> ImmOpc8,

552                                 Format ImmMod,string mnemonic> {

553      let Defs = [EFLAGS], mayLoad = 1, mayStore = 1,isCodeGenOnly = 1,

554          SchedRW = [WriteALULd, WriteRMW] in {

555     

556      def NAME#8mr : I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},

557                        RegOpc{3}, RegOpc{2},RegOpc{1}, 0 },

558                        MRMDestMem, (outs), (insi8mem:$dst, GR8:$src2),

559                        !strconcat(mnemonic,"{b}\t",

560                                   "{$src2,$dst|$dst, $src2}"),

561                        [], IIC_ALU_NONMEM>, LOCK;

562      def NAME#16mr :I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},

563                         RegOpc{3}, RegOpc{2}, RegOpc{1},1 },

564                         MRMDestMem, (outs), (insi16mem:$dst, GR16:$src2),

565                         !strconcat(mnemonic,"{w}\t",

566                                    "{$src2,$dst|$dst, $src2}"),

567                         [], IIC_ALU_NONMEM>,OpSize16, LOCK;

568      def NAME#32mr :I<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},

569                         RegOpc{3}, RegOpc{2},RegOpc{1}, 1 },

570                         MRMDestMem, (outs), (insi32mem:$dst, GR32:$src2),

571                         !strconcat(mnemonic,"{l}\t",

572                                    "{$src2,$dst|$dst, $src2}"),

573                         [], IIC_ALU_NONMEM>,OpSize32, LOCK;

574      def NAME#64mr :RI<{RegOpc{7}, RegOpc{6}, RegOpc{5}, RegOpc{4},

575                          RegOpc{3}, RegOpc{2},RegOpc{1}, 1 },

576                          MRMDestMem, (outs), (insi64mem:$dst, GR64:$src2),

577                          !strconcat(mnemonic,"{q}\t",

578                                     "{$src2,$dst|$dst, $src2}"),

579                          [], IIC_ALU_NONMEM>,LOCK;

580     

581      def NAME#8mi :Ii8<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},

582                          ImmOpc{3}, ImmOpc{2},ImmOpc{1}, 0 },

583                          ImmMod, (outs), (ins i8mem:$dst, i8imm :$src2),

584                          !strconcat(mnemonic,"{b}\t",

585                                     "{$src2,$dst|$dst, $src2}"),

586                          [], IIC_ALU_MEM>, LOCK;

587     

588      def NAME#16mi :Ii16<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},

589                            ImmOpc{3}, ImmOpc{2},ImmOpc{1}, 1 },

590                            ImmMod, (outs), (ins i16mem:$dst, i16imm :$src2),

591                            !strconcat(mnemonic,"{w}\t",

592                                       "{$src2,$dst|$dst, $src2}"),

593                            [], IIC_ALU_MEM>,OpSize16, LOCK;

594     

595      def NAME#32mi :Ii32<{ImmOpc{7}, ImmOpc{6}, ImmOpc{5}, ImmOpc{4},

596                            ImmOpc