LLVM學習筆記(42)
3.6.2.2.3. 資源及其使用的描述
我們已經知道有兩個方式可以描述指令的執行。一種是執行步驟,Itinerary,它包括了一系列包含一組InstrStage定義的InstrItinData定義,將InstrItinData與指令定義關聯起來的InstrItinClass,以及一個把有關定義組合起來的ProcessorItineraries定義。另一種則是通過描述資源使用情形,它由一系列相互關聯的SchedReadWrite派生定義組成。
這背後都是將處理器描述成若干資源,並敘述指令對這些資源的使用情況。現在是時候輸出相關的資料結構了。
1251 OS << "#ifdef DBGFIELD\n"
1252 << "#error \"<target>GenSubtargetInfo.inc requires a DBGFIELD macro\"\n"
1253 << "#endif\n"
1254 << "#ifndef NDEBUG\n"
1255 << "#define DBGFIELD(x) x,\n"
1256 << "#else\n"
1257 << "#define DBGFIELD(x)\n"
1258
1259
1260 if (SchedModels.hasItineraries()) {
1261 std::vector<std::vector<> > ProcItinLists;
1262 // Emit the stage data
1264 (OS, ProcItinLists);
1265 }
與前面章節看到的一樣,這裡的SchedModels物件中容器ProcModels儲存了同一族的各個處理器的CodeGenProcModel物件。如果處理器中有使用執行步驟來描述的,滿足1260行條件,將輸出這些處理器的步驟(stage)資料。類似於TD檔案裡使用的InstrStage定義,LLVM也有一個同名的、作用相類的型別。
60 enum ReservationKinds {
61 Required = 0,
62 Reserved = 1
63 };
64
65 unsigned Cycles_; ///< Length of stage in machine cycles
66 unsigned Units_; ///< Choice of functional units
67 int NextCycles_; ///< Number of machine cycles to next stage
68 ReservationKinds Kind_; ///< Kind of the FU reservation
69
70 /// \brief Returns the number of cycles the stage is occupied.
71 unsigned getCycles() const {
72 return Cycles_;
73 }
74
75 /// \brief Returns the choice of FUs.
76 unsigned getUnits() const {
77 return Units_;
78 }
79
80 ReservationKinds getReservationKind() const {
81 return Kind_;
82 }
83
84 /// \brief Returns the number of cycles from the start of this stage to the
85 /// start of the next stage in the itinerary
86 unsigned getNextCycles() const {
87 return (NextCycles_ >= 0) ? (unsigned)NextCycles_ : Cycles_;
88 }
89 };
InstrStage代表指令執行中的一個非流水線化的步驟。Cycles表示完成該步驟所需的週期,Units表示可供選擇用於完成該步驟的功能單元。比如IntUnit1,IntUnit2。NextCycles表示從該步驟開始到下一步開始所應該消逝的週期數。值-1表示下一步應該跟在當前步驟後立即開始。比如:
{ 1, x, -1 }:表示該步驟佔用FU x一個週期,下一步在該步驟後立即開始。
{ 2, x|y, 1 }:表示該步驟佔用FU x或FU y連續的兩個週期,下一步應該在該步驟開始一週期後開始。即,這些步驟要求在時間上重疊。
{ 1, x, 0 }:表示該步驟佔用FU x一個週期,下一步與該步驟在同一個週期開始。這可用於表示指令同一時間要求多個步驟。
有兩種FU保留型別:指令實際要求的FU,指令僅保留的FU。對其他指令的執行,保留單元不可用。不過,多條指令可以多次保留同一個單元。這兩種單元保留用於模擬指令欄位改變導致的暫停,使用同樣資源(比如同一個暫存器)的FU,等等。
98 int NumMicroOps; ///< # of micro-ops, -1 means it's variable
99 unsigned FirstStage; ///< Index of first stage in itinerary
100 unsigned LastStage; ///< Index of last + 1 stage in itinerary
101 unsigned FirstOperandCycle; ///< Index of first operand rd/wr
102 unsigned LastOperandCycle; ///< Index of last + 1 operand rd/wr
103 };
InstrItinerary代表指令的排程資訊。包括該指令所佔據的一組步驟及運算元讀、寫所在的流水線週期。它是InstrItinData定義在LLVM的對等物。更上一級的封裝則是InstrItineraryData,它所定義的資料成員及建構函式有下面這些。它為子目標機器提供資料的封裝。
109 class InstrItineraryData {
110 public:
111 MCSchedModel SchedModel; ///< Basic machine properties.
112 const InstrStage *Stages; ///< Array of stages selected
113 const unsigned *OperandCycles; ///< Array of operand cycles selected
114 const unsigned *Forwardings; ///< Array of pipeline forwarding pathes
115 const *Itineraries; ///< Array of itineraries selected
116
117 /// Ctors.
118 InstrItineraryData() : SchedModel(MCSchedModel::GetDefaultSchedModel()),
119 Stages(nullptr), OperandCycles(nullptr),
120 Forwardings(nullptr), Itineraries(nullptr) {}
121
122 InstrItineraryData(const MCSchedModel &SM, const InstrStage *S,
123 const unsigned *OS, const unsigned *F)
124 : SchedModel(SM), Stages(S), OperandCycles(OS), Forwardings(F),
125 Itineraries(SchedModel.InstrItineraries) {}
3.6.2.2.3.1. 功能單元與旁路定義
我們已經知道一個處理器CodeGenProcModel物件的ItinsDef成員是其Processor派生定義裡實際使用的ProcessorItineraries定義的Record物件(ProcessoràProcItin或ProcessoràSchedModelà Itineraries)。
359 void SubtargetEmitter::
361 std::vector<std::vector<InstrItinerary> >
362 &ProcItinLists) {
363
364 // Multiple processor models may share an itinerary record. Emit it once.
365 SmallPtrSet<Record*, 8> ItinsDefSet;
366
367 // Emit functional units for all the itineraries.
368 for (CodeGenSchedModels::ProcIter PI = SchedModels.procModelBegin(),
369 PE = SchedModels.procModelEnd(); PI != PE; ++PI) {
370
371 if (!ItinsDefSet.insert(PI->ItinsDef).second)
372 continue;
373
374 std::vector<Record*> FUs = PI->ItinsDef->getValueAsListOfDefs("FU");
375 if (FUs.empty())
376 continue;
377
378 const std::string &Name = PI->ItinsDef->getName();
379 OS << "\n// Functional units for \"" << Name << "\"\n"
380 << "namespace " << Name << "FU {\n";
381
382 for (unsigned j = 0, FUN = FUs.size(); j < FUN; ++j)
383 OS << " const unsigned " << FUs[j]->getName()
384 << " = 1 << " << j << ";\n";
385
386 OS << "}\n";
387
388 std::vector<Record*> BPs = PI->ItinsDef->getValueAsListOfDefs("BP");
389 if (!BPs.empty()) {
390 OS << "\n// Pipeline forwarding pathes for itineraries \"" << Name
391 << "\"\n" << "namespace " << Name << "Bypass {\n";
392
393 OS << " const unsigned NoBypass = 0;\n";
394 for (unsigned j = 0, BPN = BPs.size(); j < BPN; ++j)
395 OS << " const unsigned " << BPs[j]->getName()
396 << " = 1 << " << j << ";\n";
397
398 OS << "}\n";
399 }
400 }
X86家族中只有Atom使用Itinerary機制。Atom的ProcessorItineraries定義沒有定義BP(旁路,bypass),而且只定義了兩個Port資源,因此我們得到如下的輸出:
#ifdef DBGFIELD
#error "<target>GenSubtargetInfo.inc requires a DBGFIELD macro"
#endif
#ifndef NDEBUG
#define DBGFIELD(x) x,
#else
#define DBGFIELD(x)
#endif
// Functional units for "AtomItineraries"
namespace AtomItinerariesFU {
const unsigned Port0 = 1 << 0;
const unsigned Port1 = 1 << 1;
}
接下來要輸出三張表。第一個是InstrStage型別描述的Stage陣列,第二個是描述操作數週期的字串陣列,第三個是描述旁路的字串陣列。這些陣列的第一個項都是預留給NoItineraries定義。
SubtargetEmitter::EmitStageAndOperandCycleData(續)
402 // Begin stages table
403 std::string StageTable = "\nextern const llvm::InstrStage " + Target +
404 "Stages[] = {\n";
405 StageTable += " { 0, 0, 0, llvm::InstrStage::Required }, // No itinerary\n";
406
407 // Begin operand cycle table
408 std::string OperandCycleTable = "extern const unsigned " + Target +
409 "OperandCycles[] = {\n";
410 OperandCycleTable += " 0, // No itinerary\n";
411
412 // Begin pipeline bypass table
413 std::string BypassTable = "extern const unsigned " + Target +
414 "ForwardingPaths[] = {\n";
415 BypassTable += " 0, // No itinerary\n";
416
417 // For each Itinerary across all processors, add a unique entry to the stages,
418 // operand cycles, and pipepine bypess tables. Then add the new Itinerary
419 // object with computed offsets to the ProcItinLists result.
420 unsigned StageCount = 1, OperandCycleCount = 1;
421 std::map<std::string, unsigned> ItinStageMap, ItinOperandMap;
422 for (CodeGenSchedModels::ProcIter PI = SchedModels.procModelBegin(),
423 PE = SchedModels.procModelEnd(); PI != PE; ++PI) {
424 const CodeGenProcModel &ProcModel = *PI;
425
426 // Add process itinerary to the list.
427 ProcItinLists.resize(ProcItinLists.size()+1);
428
429 // If this processor defines no itineraries, then leave the itinerary list
430 // empty.
431 std::vector<InstrItinerary> &ItinList = ProcItinLists.back();
432 if (!ProcModel.hasItineraries())
433 continue;
434
435 const std::string &Name = ProcModel.ItinsDef->getName();
436
437 ItinList.resize(SchedModels.numInstrSchedClasses());
438 assert(ProcModel.ItinDefList.size() == ItinList.size() && "bad Itins");
439
440 for (unsigned SchedClassIdx = 0, SchedClassEnd = ItinList.size();
441 SchedClassIdx < SchedClassEnd; ++SchedClassIdx) {
442
443 // Next itinerary data
444 Record *ItinData = ProcModel.ItinDefList[SchedClassIdx];
445
446 // Get string and stage count
447 std::string ItinStageString;
448 unsigned NStages = 0;
449 if (ItinData)
450 (Name, ItinData, ItinStageString, NStages);
451
452 // Get string and operand cycle count
453 std::string ItinOperandCycleString;
454 unsigned NOperandCycles = 0;
455 std::string ItinBypassString;
456 if (ItinData) {
458 NOperandCycles);
459
461 NOperandCycles);
462 }
463
464 // Check to see if stage already exists and create if it doesn't
465 unsigned FindStage = 0;
466 if (NStages > 0) {
467 FindStage = ItinStageMap[ItinStageString];
468 if (FindStage == 0) {
469 // Emit as { cycles, u1 | u2 | ... | un, timeinc }, // indices
470 StageTable += ItinStageString + ", // " + itostr(StageCount);
471 if (NStages > 1)
472 StageTable += "-" + itostr(StageCount + NStages - 1);
473 StageTable += "\n";
474 // Record Itin class number.
475 ItinStageMap[ItinStageString] = FindStage = StageCount;
476 StageCount += NStages;
477 }
478 }
479
480 // Check to see if operand cycle already exists and create if it doesn't
481 unsigned FindOperandCycle = 0;
482 if (NOperandCycles > 0) {
483 std::string ItinOperandString = ItinOperandCycleString+ItinBypassString;
484 FindOperandCycle = ItinOperandMap[ItinOperandString];
485 if (FindOperandCycle == 0) {
486 // Emit as cycle, // index
487 OperandCycleTable += ItinOperandCycleString + ", // ";
488 std::string OperandIdxComment = itostr(OperandCycleCount);
489 if (NOperandCycles > 1)
490 OperandIdxComment += "-"
491 + itostr(OperandCycleCount + NOperandCycles - 1);
492 OperandCycleTable += OperandIdxComment + "\n";
493 // Record Itin class number.
494 ItinOperandMap[ItinOperandCycleString] =
495 FindOperandCycle = OperandCycleCount;
496 // Emit as bypass, // index
497 BypassTable += ItinBypassString + ", // " + OperandIdxComment + "\n";
498 OperandCycleCount += NOperandCycles;
499 }
500 }
501
502 // Set up itinerary as location and location + stage count
503 int NumUOps = ItinData ? ItinData->getValueAsInt("NumMicroOps") : 0;
504 InstrItinerary Intinerary = { NumUOps, FindStage, FindStage + NStages,
505 FindOperandCycle,
506 FindOperandCycle + NOperandCycles};
507
508 // Inject - empty slots will be 0, 0
509 ItinList[SchedClassIdx] = Intinerary;
510 }
511 }
512
513 // Closing stage
514 StageTable += " { 0, 0, 0, llvm::InstrStage::Required } // End stages\n";
515 StageTable += "};\n";
516
517 // Closing operand cycles
518 OperandCycleTable += " 0 // End operand cycles\n";
519 OperandCycleTable += "};\n";
520
521 BypassTable += " 0 // End bypass tables\n";
522 BypassTable += "};\n";
523
524 // Emit tables.
525 OS << StageTable;
526 OS << OperandCycleTable;
527 OS << BypassTable;
528 }
3.6.2.2.3.2. 執行步驟的資料
對使用執行步驟輔助指令排程的每個處理器,其CodeGenProcModel例項的ItinDefList容器儲存的是相關ProcessorItineraries定義裡的IID列表(型別list<InstrItinData>),這個容器關聯了援引相同InstrItinClass定義的排程型別與InstrItinData定義。上面438行斷言必須滿足,因為在collectProcItins的784行,ProcModel.ItinsDef被調整為NumInstrSchedClasses大小。
對某個處理器CodeGenProcModel物件,440行實質上是遍歷所有的非推導的CodeGenSchedClass物件,因此,444行獲取的是與指定排程型別匹配的InstrItinData定義的Record物件,並作為450行呼叫的FormItineraryStageString方法的第二個引數。
275 Record *ItinData,
276 std::string &ItinString,
277 unsigned &NStages) {
278 // Get states list
279 const std::vector<Record*> &StageList =
280 ItinData->getValueAsListOfDefs("Stages");
281
282 // For each stage
283 unsigned N = NStages = StageList.size();
284 for (unsigned i = 0; i < N;) {
285 // Next stage
286 const Record *Stage = StageList[i];
287
288 // Form string as ,{ cycles, u1 | u2 | ... | un, timeinc, kind }
289 int Cycles = Stage->getValueAsInt("Cycles");
290 ItinString += " { " + itostr(Cycles) + ", ";
291
292 // Get unit list
293 const std::vector<Record*> &UnitList = Stage->getValueAsListOfDefs("Units");
294
295 // For each unit
296 for (unsigned j = 0, M = UnitList.size(); j < M;) {
297 // Add name and bitwise or
298 ItinString += Name + "FU::" + UnitList[j]->getName();
299 if (++j < M) ItinString += " | ";
300 }
301
302 int TimeInc = Stage->getValueAsInt("TimeInc");
303 ItinString += ", " + itostr(TimeInc);
304
305 int Kind = Stage->getValueAsInt("Kind");
306 ItinString += ", (llvm::InstrStage::ReservationKinds)" + itostr(Kind);
307
308 // Close off stage
309 ItinString += " }";
310 if (++i < N) ItinString += ", ";
311 }
312 }
所輸出的描述字串可以參考上面對類InstrStage說明的例子。InstrItinData定義裡還有一個OperandCycles定義用來描述指令發出後,指定運算元的值讀、寫完成所需的週期數。
320 std::string &ItinString, unsigned &NOperandCycles) {
321 // Get operand cycle list
322 const std::vector<int64_t> &OperandCycleList =
323 ItinData->getValueAsListOfInts("OperandCycles");
324
325 // For each operand cycle
326 unsigned N = NOperandCycles = OperandCycleList.size();
327 for (unsigned i = 0; i < N;) {
328 // Next operand cycle
329 const int OCycle = OperandCycleList[i];
330
331 ItinString += " " + itostr(OCycle);
332 if (++i < N) ItinString += ", ";
333 }
334 }
最後還要輸出一個描述旁路(bypass)的陣列。可以發現.td檔案裡的InstrItinData定義被拆分為這三個陣列,這是因為這是描寫InstrItinData定義比較獨立的3個維度。而且這3個維度本身也可能是存在不少的重複定義,建立這三個陣列,並通過陣列下標來標定InstrItinData定義會獲取更為緊湊的資料結構。
337 Record *ItinData,
338 std::string &ItinString,
339 unsigned NOperandCycles) {
340 const std::vector<Record*> &BypassList =
341 ItinData->getValueAsListOfDefs("Bypasses");
342 unsigned N = BypassList.size();
343 unsigned i = 0;
344 for (; i < N;) {
345 ItinString += Name + "Bypass::" + BypassList[i]->getName();
346 if (++i < NOperandCycles) ItinString += ", ";
347 }
348 for (; i < NOperandCycles;) {
349 ItinString += " 0";
350 if (++i < NOperandCycles) ItinString += ", ";
351 }
352 }
注意,對方法FormItineraryOperandCycleString,引數NOperandCycles是一個引用,在326行被設定為InstrItinData定義裡OperandCycles的大小。它被傳給方法FormItineraryBypassString,用以控制旁路陣列的大小。
在EmitStageAndOperandCycleData的466行,NStages是由FormItineraryStageString方法設定的InstrItinData定義Stages的物件。容器ItinStageMap(std::map<std::string, unsigned>)用來保證生成InstrStage的唯一性,468~477行確保輸出唯一的InstrStage。容器ItinOperandMap也是類似的作用,確保OperandCycle輸出的唯一性。
在504行生成了一個InstrItinerary例項,儲存到ProcItinLists容器的相應位置。在514行開始輸出這三個陣列。例如對X86目標機器,這是:
extern const llvm::InstrStage X86Stages[] = {
{ 0, 0, 0, llvm::InstrStage::Required }, // No itinerary
{ 13, AtomItinerariesFU::Port0 | AtomItinerariesFU::Port1, -1, (llvm::InstrStage::ReservationKinds)0 }, // 1
{ 7, AtomItinerariesFU::Port0 | AtomItinerariesFU::Port1, -1, (llvm::InstrStage::ReservationKinds)0 }, // 2
{ 21, AtomItinerariesFU::Port0 | AtomItinerariesFU::Port1, -1, (llvm::InstrStage::ReservationKinds)0 }, // 3
{ 1, AtomItinerariesFU::Port0 | AtomItinerariesFU::Port1, -1, (llvm::InstrStage::ReservationKinds)0 }, // 4
…
{ 202, AtomItinerariesFU::Port0 | AtomItinerariesFU::Port1, -1, (llvm::InstrStage::ReservationKinds)0 }, // 92
{ 0, 0, 0, llvm::InstrStage::Required } // End stages
};
extern const unsigned X86OperandCycles[] = {
0, // No itinerary
0 // End operand cycles
};
extern const unsigned X86ForwardingPaths[] = {
0, // No itinerary
0 // End bypass tables
};
這三者通過下面將要生成的InstrItinerary陣列聯絡起來。方法EmitItineraries的引數ProcItinLists是在前面的方法EmitStageAndOperandCycleData裡準備的。注意,在546行對SchedModels容器ProcModels的遍歷順序與EmitStageAndOperandCycleData準備這些InstrItinerary物件資料時遍歷ProcModels容器的順序是一樣的,而且ProcItinLists與ProcModels容器的大小總是相等的(EmitStageAndOperandCycleData的427行)。另外在432行看到,對不使用Itinerary的處理器,ProcItinLists的項是空的,而在509行看到,對於使用Itinerary的處理器,不管是否存在內容相同的Intinerary例項,總是為該處理器的ProcItinLists項生成一個新的Intinerary例項。因此,在下面遍歷的處理器與ProcItinLists總是一一對應的(562行條件將不使用Itinerary的處理器濾除了)。
536 void SubtargetEmitter::
538 std::vector<std::vector<InstrItinerary> > &ProcItinLists) {
539
540 // Multiple processor models may share an itinerary record. Emit it once.
541 SmallPtrSet<Record*, 8> ItinsDefSet;
542
543 // For each processor's machine model
544 std::vector<std::vector<InstrItinerary> >::iterator
545 ProcItinListsIter = ProcItinLists.begin();
546 for (CodeGenSchedModels::ProcIter PI = SchedModels.procModelBegin(),
547 PE = SchedModels.procModelEnd(); PI != PE; ++PI, ++ProcItinListsIter) {
548
549 Record *ItinsDef = PI->ItinsDef;
550 if (!ItinsDefSet.insert(ItinsDef).second)
551 continue;
552
553 // Get processor itinerary name
554 const std::string &Name = ItinsDef->getName();
555
556 // Get the itinerary list for the processor.
557 assert(ProcItinListsIter != ProcItinLists.end() && "bad iterator");
558 std::vector<InstrItinerary> &ItinList = *ProcItinListsIter;
559
560 // Empty itineraries aren't referenced anywhere in the tablegen output
561 // so don't emit them.
562 if (ItinList.empty())
563 continue;
564
565 OS << "\n";
566 OS << "static const llvm::InstrItinerary ";
567
568 // Begin processor itinerary table
569 OS << Name << "[] = {\n";
570
571 // For each itinerary class in CodeGenSchedClass::Index order.
572 for (unsigned j = 0, M = ItinList.size(); j < M; ++j) {
573 InstrItinerary &Intinerary = ItinList[j];
574
575 // Emit Itinerary in the form of
576 // { firstStage, lastStage, firstCycle, lastCycle } // index
577 OS << " { " <<
578 Intinerary.NumMicroOps << ", " <<
579 Intinerary.FirstStage << ", " <<
580 Intinerary.LastStage << ", " <<
581 Intinerary.FirstOperandCycle << ", " <<
582 Intinerary.LastOperandCycle << " }" <<
583 ", // " << j << " " << SchedModels.getSchedClass(j).Name << "\n";
584 }
585 // End processor itinerary table
586 OS << " { 0, ~0U, ~0U, ~0U, ~0U } // end marker\n";
587 OS << "};\n";
588 }
589 }
X86目標機器只有Atom處理器使用了Itinerary,因此它輸出這樣的陣列(有950項):
static const llvm:: AtomItineraries[] = {
{ 0, 0, 0, 0, 0 }, // 0 NoInstrModel
{ 1, 1, 2, 0, 0 }, // 1 IIC_AAA_WriteMicrocoded
{ 1, 2, 3, 0, 0 }, // 2 IIC_AAD_WriteMicrocoded
{ 1, 3, 4, 0, 0 }, // 3 IIC_AAM_WriteMicrocoded
{ 1, 1, 2, 0, 0 }, // 4 IIC_AAS_WriteMicrocoded
{ 1, 4, 5, 0, 0 }, // 5 IIC_BIN_CARRY_NONMEM_WriteALU
…
{ 1, 43, 44, 0, 0 }, // 948 LDMXCSR_VLDMXCSR
{ 1, 17, 18, 0, 0 }, // 949 STMXCSR_VSTMXCSR
{ 0, ~0U, ~0U, ~0U, ~0U } // end marker
};
註釋裡給出的是所謂的排程型別。注意這裡輸出的順序與X86GenInstrInfo.inc裡Sched名字空間裡的表示排程型別的列舉常量的順序是完全一樣。這個一致性使得我們通過這些列舉常量就能得到對應排程型別的具體引數。