Introduction to 3D Game Programming with DirectX 12 學習筆記之 --- 第六章：在Direct3D中繪製

阿新 • • 發佈：2018-10-31

程式碼工程地址：

https://github.com/jiabaodan/Direct12BookReadingNotes

學習目標

熟悉Direct3D介面的定義，儲存和繪製幾何資料；
學習編寫基本的頂點和畫素著色器；
學習使用渲染流水線狀態物件來配置渲染流水線；
理解如何建立常數快取資料（constant buffer data），並且熟悉根簽名？（root signature）；

1 頂點和輸入佈局

下面的程式碼定義了2類頂點：

struct Vertex1
{
	XMFLOAT3 Pos;
	XMFLOAT4 Color;
};

struct Vertex2
{
	XMFLOAT3 Pos;
	XMFLOAT3 Normal;
	XMFLOAT2 Tex0;
	XMFLOAT2 Tex1;
};

當我們定義完一個頂點結構以後，我們需要提供一個描述來讓Direct3D知道每個元件是什麼作用，這個描述由Direct3D結構體D3D12_INPUT_LAYOUT_DESC通過輸入佈局描述（input layout description）的形式提供：

typedef struct D3D12_INPUT_LAYOUT_DESC
{
	const D3D12_INPUT_ELEMENT_DESC *pInputElementDescs;
	UINT NumElements;
} D3D12_INPUT_LAYOUT_DESC;

一個輸入佈局描述就是D3D12_INPUT_ELEMENT_DESC的陣列，和陣列的個數。
陣列中每個元素用來描述頂點結構中對應的元件，D3D12_INPUT_ELEMENT_DESC結構體定義如下：

typedef struct D3D12_INPUT_ELEMENT_DESC
{
	LPCSTR SemanticName;
	UINT SemanticIndex;
	DXGI_FORMAT Format;
	UINT InputSlot;
	UINT AlignedByteOffset;
	D3D12_INPUT_CLASSIFICATION InputSlotClass;
	UINT InstanceDataStepRate;
} D3D12_INPUT_ELEMENT_DESC;

SemanticName：關聯到頂點結構中每個元素，它主要用以將頂點結構中的元素對映到頂點著色器輸入簽名中使用；
SemanticIndex：關聯到語義上的索引，使相同的語義可以多次使用，以索引區分，如上圖；
Format：由DXGI_FORMAT列舉型別定義的型別，下面是一些常用的值：

		DXGI_FORMAT_R32_FLOAT // 1D 32-bit float scalar
		DXGI_FORMAT_R32G32_FLOAT // 2D 32-bit float vector
		DXGI_FORMAT_R32G32B32_FLOAT // 3D 32-bit float vector
		DXGI_FORMAT_R32G32B32A32_FLOAT // 4D 32-bit float vector
		DXGI_FORMAT_R8_UINT // 1D 8-bit unsigned integer scalar
		DXGI_FORMAT_R16G16_SINT // 2D 16-bit signed integer vector
		DXGI_FORMAT_R32G32B32_UINT // 3D 32-bit unsigned integer vector
		DXGI_FORMAT_R8G8B8A8_SINT // 4D 8-bit signed integer vector
		DXGI_FORMAT_R8G8B8A8_UINT // 4D 8-bit unsigned integer vector

InputSlot：定義元素傳進來的輸入槽，Direct3D支援16個輸入槽（0~15）；
AlignedByteOffset：每個元素的偏移量，單位是位元組：

		struct Vertex2
		{
			XMFLOAT3 Pos; // 0-byte offset
			XMFLOAT3 Normal; // 12-byte offset
			XMFLOAT2 Tex0; // 24-byte offset
			XMFLOAT2 Tex1; // 32-byte offset
		};

InputSlotClass：目前暫時都設定為D3D12_INPUT_PER_VERTEX_DATA，其他值用以例項化技術；
InstanceDataStepRate：目前暫時都設定為0，其他值用以例項化技術；

對於之前描述的兩個頂點結構，其對應的輸入佈局描述如下：

D3D12_INPUT_ELEMENT_DESC desc1[] =
{
	{"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_PER_VERTEX_DATA, 0},
	{"COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_PER_VERTEX_DATA, 0}
};

D3D12_INPUT_ELEMENT_DESC desc2[] =
{
	{"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_PER_VERTEX_DATA, 0},
	{"NORMAL", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 12, D3D12_INPUT_PER_VERTEX_DATA, 0},
	{"TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 24, D3D12_INPUT_PER_VERTEX_DATA, 0}
	{"TEXCOORD", 1, DXGI_FORMAT_R32G32_FLOAT, 0, 32, D3D12_INPUT_PER_VERTEX_DATA, 0}
};

2 頂點緩衝（VERTEX BUFFERS）

為了讓GPU訪問到頂點陣列，我們需要將它們儲存在一個叫緩衝（buffer）的GPU資源中（ID3D12Resource），用來儲存頂點資料的緩衝叫做頂點緩衝。
如之前4.3.8，我們通過填寫一個D3D12_RESOURCE_DESC結構體，建立一個ID3D12Resource物件來描述緩衝資源，然後呼叫ID3D12Device::CreateCommittedResource方法。Direct3D 12提供了一個C++封裝的類CD3DX12_RESOURCE_DESC（繼承自D3D12_RESOURCE_DESC），它提供了一個更加方便的構造方法：

static inline CD3DX12_RESOURCE_DESC Buffer(
	UINT64 width,
	D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_NONE,
	UINT64 alignment = 0 )
{
	return CD3DX12_RESOURCE_DESC(
		D3D12_RESOURCE_DIMENSION_BUFFER,
		alignment, width, 1, 1, 1,
		DXGI_FORMAT_UNKNOWN, 1, 0,
		D3D12_TEXTURE_LAYOUT_ROW_MAJOR, flags );
}

with代表了緩衝中的位元組數。

對於靜態的幾何體，我們將頂點緩衝放到預設堆中（default heap）（(D3D12_HEAP_TYPE_DEFAULT）用以優化效能；為了建立實際的頂點緩衝資源，我們需要建立一個型別為D3D12_HEAP_TYPE_UPLOAD的上傳緩衝（upload buffer）資源。
因為中間的上傳緩衝需要在預設緩衝（default buffer）中初始化，所以我們在d3dUtil.h/.cpp中編寫下面函式，用以避免重複程式碼：

Microsoft::WRL::ComPtr<ID3D12Resource> d3dUtil::CreateDefaultBuffer(
	ID3D12Device* device,
	ID3D12GraphicsCommandList* cmdList,
	const void* initData,
	UINT64 byteSize,
	Microsoft::WRL::ComPtr<ID3D12Resource>& uploadBuffer)
{
	ComPtr<ID3D12Resource> defaultBuffer;
	
	// Create the actual default buffer resource.
	ThrowIfFailed(device->CreateCommittedResource(
		&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT),
		D3D12_HEAP_FLAG_NONE,
		&CD3DX12_RESOURCE_DESC::Buffer(byteSize),
		D3D12_RESOURCE_STATE_COMMON,
		nullptr,
		IID_PPV_ARGS(defaultBuffer.GetAddressOf())));
		
	// In order to copy CPU memory data into our default buffer, we need
	// to create an intermediate upload heap.
	ThrowIfFailed(device->CreateCommittedResource(
		&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD),
		D3D12_HEAP_FLAG_NONE,
		&CD3DX12_RESOURCE_DESC::Buffer(byteSize),
		D3D12_RESOURCE_STATE_GENERIC_READ,
		nullptr,
		IID_PPV_ARGS(uploadBuffer.GetAddressOf())));
		
	// Describe the data we want to copy into the default buffer.
	D3D12_SUBRESOURCE_DATA subResourceData = {};
	subResourceData.pData = initData;
	subResourceData.RowPitch = byteSize;
	subResourceData.SlicePitch = subResourceData.RowPitch;
	
	// Schedule to copy the data to the default buffer resource.
	// At a high level, the helper function UpdateSubresources
	// will copy the CPU memory into the intermediate upload heap.
	// Then, using ID3D12CommandList::CopySubresourceRegion,
	// the intermediate upload heap data will be copied to mBuffer.
	cmdList->ResourceBarrier(1,
		&CD3DX12_RESOURCE_BARRIER::Transition(defaultBuffer.D3D12_RESOURCE_STATE_COMMON,
		D3D12_RESOURCE_STATE_COPY_DEST));
		
	UpdateSubresources<1>(cmdList,
		defaultBuffer.Get(), uploadBuffer.Get(),
		0, 0, 1, &subResourceData);
		
	cmdList->ResourceBarrier(1,
		&CD3DX12_RESOURCE_BARRIER::Transition(defaultBuffer.D3D12_RESOURCE_STATE_COPY_DEST,
		D3D12_RESOURCE_STATE_GENERIC_READ));
		
	// Note: uploadBuffer has to be kept alive after the above function
	// calls because the command list has not been executed yet that
	// performs the actual copy.
	// The caller can Release the uploadBuffer after it knows the copy
	// has been executed.
	
	return defaultBuffer;
}

D3D12_SUBRESOURCE_DATA結構體定義如下：

typedef struct D3D12_SUBRESOURCE_DATA
{
	const void *pData;
	LONG_PTR RowPitch;
	LONG_PTR SlicePitch;
} D3D12_SUBRESOURCE_DATA;

pData：指向包含緩衝中需要初始化資料的記憶體的指標，如果該緩衝可以儲存n個頂點，那麼記憶體陣列至少也要有n個頂點的記憶體；
RowPitch：對於緩衝來說，是我們要複製的資料的大小；
SlicePitch：對於緩衝來說，是我們要複製的資料的大小；

下面的程式碼展示了該類使用的一個例子：

Vertex vertices[] =
{
	{ XMFLOAT3(-1.0f, -1.0f, -1.0f), XMFLOAT4(Colors::White) },
	{ XMFLOAT3(-1.0f, +1.0f, -1.0f), XMFLOAT4(Colors::Black) },
	{ XMFLOAT3(+1.0f, +1.0f, -1.0f), XMFLOAT4(Colors::Red) },
	{ XMFLOAT3(+1.0f, -1.0f, -1.0f), XMFLOAT4(Colors::Green) },
	{ XMFLOAT3(-1.0f, -1.0f, +1.0f), XMFLOAT4(Colors::Blue) },
	{ XMFLOAT3(-1.0f, +1.0f, +1.0f), XMFLOAT4(Colors::Yellow) },
	{ XMFLOAT3(+1.0f, +1.0f, +1.0f), XMFLOAT4(Colors::Cyan) },
	{ XMFLOAT3(+1.0f, -1.0f, +1.0f), XMFLOAT4(Colors::Magenta) }
};

const UINT64 vbByteSize = 8 * sizeof(Vertex);
ComPtr<ID3D12Resource> VertexBufferGPU = nullptr;
ComPtr<ID3D12Resource> VertexBufferUploader = nullptr;

VertexBufferGPU = d3dUtil::CreateDefaultBuffer(md3dDevice.Get(), 
	mCommandList.Get(), 
	vertices, 
	vbByteSize, 
	VertexBufferUploader);

頂點的定義如下：

struct Vertex
{
	XMFLOAT3 Pos;
	XMFLOAT4 Color;
};

為了繫結頂點緩衝到渲染管線，我們需要建立一個頂點緩衝描述（vertex buffer view）。和RTV（render target view）不同，我們不需要為頂點緩衝描述建立描述堆（descriptor heap），它可以通過D3D12_VERTEX_BUFFER_VIEW_DESC結構來表示：

typedef struct D3D12_VERTEX_BUFFER_VIEW
{
	D3D12_GPU_VIRTUAL_ADDRESS BufferLocation;
	UINT SizeInBytes;
	UINT StrideInBytes;
} D3D12_VERTEX_BUFFER_VIEW;

BufferLocation：需要建立的描述的虛擬地址，可以使用ID3D12Resource::GetGPUVirtualAddress方法來獲取；
SizeInBytes：從BufferLocation開始，描述需要的字元數；
StrideInBytes：每個頂點元素的大小，單位是位元組；

當我們建立好一個頂點緩衝，併為它建立好描述後，我們可以把它繫結到渲染管線的一個輸入槽，用以將頂點資料輸入到輸入階段；這個過程可以使用下面函式完成：

void ID3D12GraphicsCommandList::IASetVertexBuffers(
	UINT StartSlot,
	UINT NumBuffers,
	const D3D12_VERTEX_BUFFER_VIEW *pViews);

StartSlot：輸入槽的序號（0~15）；
NumBuffers：需要繫結的頂點緩衝的數量，如果開始序號是k，要繫結n個，那麼繫結的序號為k，k+1…;
pViews：指向第一個頂點緩衝描述的指標；

下面是一個呼叫的例子：

D3D12_VERTEX_BUFFER_VIEW vbv;

vbv.BufferLocation = VertexBufferGPU->GetGPUVirtualAddress();
vbv.StrideInBytes = sizeof(Vertex);
vbv.SizeInBytes = 8 * sizeof(Vertex);

D3D12_VERTEX_BUFFER_VIEW vertexBuffers[1] = { vbv };
mCommandList->IASetVertexBuffers(0, 1, vertexBuffers);

一個頂點緩衝將會保持在輸入的槽，知道它被改變，所以你的程式碼應該類似下面：

ID3D12Resource* mVB1; // stores vertices of type Vertex1
ID3D12Resource* mVB2; // stores vertices of type Vertex2

D3D12_VERTEX_BUFFER_VIEW_DESC mVBView1; // view to mVB1
D3D12_VERTEX_BUFFER_VIEW_DESC mVBView2; // view to mVB2
/*…Create the vertex buffers and views…*/

mCommandList->IASetVertexBuffers(0, 1, &VBView1);
/* …draw objects using vertex buffer 1… */
mCommandList->IASetVertexBuffers(0, 1, &mVBView2);
/* …draw objects using vertex buffer 2… */

設定頂點緩衝到輸入槽並沒有開始繪製，它只是讓頂點做好輸入到渲染管線的準備，最終實際的渲染步驟是由ID3D12GraphicsCommandList::DrawInstanced方法完成：

void ID3D12CommandList::DrawInstanced(
	UINT VertexCountPerInstance,
	UINT InstanceCount,
	UINT StartVertexLocation,
	UINT StartInstanceLocation);

VertexCountPerInstance：需要繪製的頂點的數量；
InstanceCount：應用於例項化技術，這裡先設定為1；
StartVertexLocation：指明開始的第一個頂點；
StartInstanceLocation：應用於例項化技術，這裡先設定為0；

VertexCountPerInstance和StartVertexLocation引數定義了繪製那些頂點：
在這裡插入圖片描述

DrawInstanced方法中並沒有指明拓撲結構，它由下面的方法中指定：

cmdList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);

3 目錄和索引緩衝（INDEX BUFFERS）

和頂點一樣，為了讓GPU能訪問到索引資料，我們需要建立一個索引緩衝和上傳緩衝（Upload Buffer）。因為d3dUtil::CreateDefaultBuffer函式的資料是void*型別，所以可以直接使用它；為了繫結索引緩衝到渲染流水線，我們需要建立一個緩衝描述，和頂點一樣，不需要描述堆；一個索引緩衝描述可以使用D3D12_INDEX_BUFFER_VIEW結構來表示：

typedef struct D3D12_INDEX_BUFFER_VIEW
{
	D3D12_GPU_VIRTUAL_ADDRESS BufferLocation;
	UINT SizeInBytes;
	DXGI_FORMAT Format;
} D3D12_INDEX_BUFFER_VIEW;

BufferLocation：索引緩衝資源的虛擬地址，我們使用ID3D12Resource::GetGPUVirtualAddress來獲取；
SizeInBytes：從BufferLocation開始佔用的位元組數；
Format：DXGI_FORMAT_R16_UINT或者DXGI_FORMAT_R32_UINT，正常情況下使用16位來減少記憶體和頻寬，除非真的有需要使用32位；

和頂點一樣，使用前需要繫結到流水線，可以使用ID3D12CommandList::SetIndexBuffer方法繫結到輸入階段：

std::uint16_t indices[] = {
	// front face
	0, 1, 2,
	0, 2, 3,
	// back face
	4, 6, 5,
	4, 7, 6,
	// left face
	4, 5, 1,
	4, 1, 0,
	// right face
	3, 2, 6,
	3, 6, 7,
	// top face
	1, 5, 6,
	1, 6, 2,
	// bottom face
	4, 0, 3,
	4, 3, 7
};

const UINT ibByteSize = 36 * sizeof(std::uint16_t);

ComPtr<ID3D12Resource> IndexBufferGPU = nullptr;
ComPtr<ID3D12Resource> IndexBufferUploader = nullptr;

IndexBufferGPU = d3dUtil::CreateDefaultBuffer(md3dDevice.Get(),
	mCommandList.Get(), indices), ibByteSize,
	IndexBufferUploader);
	
D3D12_INDEX_BUFFER_VIEW ibv;
ibv.BufferLocation = IndexBufferGPU->GetGPUVirtualAddress();
ibv.Format = DXGI_FORMAT_R16_UINT;
ibv.SizeInBytes = ibByteSize;
mCommandList->IASetIndexBuffer(&ibv);

最終使用索引的時候，我們需要使用ID3D12GraphicsCommandList::DrawIndexedInstanced方法，而不是DrawInstanced：

void ID3D12GraphicsCommandList::DrawIndexedInstanced(
	UINT IndexCountPerInstance,
	UINT InstanceCount,
	UINT StartIndexLocation,
	INT BaseVertexLocation,
	UINT StartInstanceLocation);

IndexCountPerInstance：每個例項繪製的索引數量；
InstanceCount：例項的數量，目前設定為1；
StartIndexLocation：索引緩衝中開始索引的位置；
BaseVertexLocation：當前繪製呼叫中，取得頂點時增加的一個整形值；
StartInstanceLocation：用以後續例項Demo，當前只設置為0；

上述引數可以使用下面的例子來描述，比如現在有3個模型球體，盒子和一個圓柱體，它們擁有各自的頂點緩衝和索引緩衝；現在為了優化，將它們合併到一個全域性的頂點緩衝和索引緩衝中，此時索引的值就錯誤了，需要重新計算：
在這裡插入圖片描述
這種情況下，繪製程式碼應該如下：

mCmdList->DrawIndexedInstanced( numSphereIndices, 1, 0, 0, 0);
mCmdList->DrawIndexedInstanced( numBoxIndices, 1, firstBoxIndex, firstBoxVertexPos, 0);
mCmdList->DrawIndexedInstanced( numCylIndices, 1, firstCylIndex, firstCylVertexPos, 0);

4 頂點著色器的例子

下面是一個簡單的頂點著色器的程式碼：

cbuffer cbPerObject : register(b0)
{
	float4x4 gWorldViewProj;
};

void VS(float3 iPosL : POSITION,
	float4 iColor : COLOR,
	out float4 oPosH : SV_POSITION,
	out float4 oColor : COLOR)
{
	// Transform to homogeneous clip space.
	oPosH = mul(float4(iPosL, 1.0f), gWorldViewProj);
	
	// Just pass vertex color into the pixel shader. 
	oColor = iColor;
}

函式名VS可以由我們任意定義，HLSL程式碼中不包含引用和指標，並且函式都是內斂的。
它的輸入引數對映到之前我們在D3D12_INPUT_ELEMENT_DESC定義的頂點結構：
在這裡插入圖片描述
輸出引數對映到下一節的畫素著色器中，其中SV_POSITION語義中的SV表示系統值（system value），它表示在其次裁切空間中頂點的位置，所以它必須存在；其他不包含SV的可以不新增。

第一行代表將頂點通過乘以gWorldViewProj矩陣，從區域性座標系轉換到其次裁切座標系。
我們也可以使用結構體的形式，重寫上面的程式碼：

cbuffer cbPerObject : register(b0)
{
	float4x4 gWorldViewProj;
};

struct VertexIn
{
	float3 PosL : POSITION;
	float4 Color : COLOR;
};

struct VertexOut
{
	float4 PosH : SV_POSITION;
	float4 Color : COLOR;
};

VertexOut VS(VertexIn vin)
{
	VertexOut vout;
	// Transform to homogeneous clip space.
	vout.PosH = mul(float4(vin.PosL, 1.0f), gWorldViewProj);
	
	// Just pass vertex color into the pixel shader. 
	vout.Color = vin.Color;
	
	return vout;
}

如果沒有幾何著色器，SV_POSITION必須有；如果有幾何著色器，SV_POSITION的值可以放在幾何著色器中計算；
頂點著色器不做透視分割（perspective divide）處理，它只做透視矩陣相乘部分，透視分割後續由硬體完成。

4.1 輸入佈局描述（Input Layout Description）和輸入簽名連結（Input Signature Linking）

如果頂點著色器中的輸入訊號和頂點資料不匹配，將會報錯，比如下面的程式碼：

//--------------
// C++ app code
//--------------
struct Vertex
{
	XMFLOAT3 Pos;
	XMFLOAT4 Color;
};
D3D12_INPUT_ELEMENT_DESC desc[] = {
	{"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_PER_VERTEX_DATA, 0},
	{"COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_PER_VERTEX_DATA, 0}
};

//--------------
// Vertex shader
//--------------
struct VertexIn
{
	float3 PosL : POSITION;
	float4 Color : COLOR;
	float3 Normal : NORMAL;
};
struct VertexOut
{
	float4 PosH : SV_POSITION;
	float4 Color : COLOR;
};
VertexOut VS(VertexIn vin) { … }

到第9節的時候，我們將會看到，當建立ID3D12PipelineState時，我們需要同時指明輸入佈局描述和頂點著色器，Direct3D將會檢查他們的相容性。
頂點資料不需要和輸入簽名完全匹配，需要匹配的是，頂點資料需要提供頂點著色器所需要的所有資料；如果有更多頂點著色器不需要的資料是容許的，比如下面程式碼中的情況：

//--------------
// C++ app code
//--------------
struct Vertex
{
	XMFLOAT3 Pos;
	XMFLOAT4 Color;
	XMFLOAT3 Normal;
};

D3D12_INPUT_ELEMENT_DESC desc[] =
{
	{"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_PER_VERTEX_DATA, 0},
	{"COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_PER_VERTEX_DATA, 0},
	{ "NORMAL", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 28, D3D12_INPUT_PER_VERTEX_DATA, 0 }
};

//--------------
// Vertex shader
//--------------
struct VertexIn
{
	float3 PosL : POSITION;
	float4 Color : COLOR;
};

struct VertexOut
{
	float4 PosH : SV_POSITION;
	float4 Color : COLOR;
};

VertexOut VS(VertexIn vin) { … }

下面再考慮下如果頂點結構和輸入簽名在元素上相同，但是型別不同的情況：

//--------------
// C++ app code
//--------------
struct Vertex
{
	XMFLOAT3 Pos;
	XMFLOAT4 Color;
};

D3D12_INPUT_ELEMENT_DESC desc[] =
{
	{"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_PER_VERTEX_DATA, 0},
	{"COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_PER_VERTEX_DATA, 0}
};

//--------------
// Vertex shader
//--------------
struct VertexIn
{
	float3 PosL : POSITION;
	int4 Color : COLOR;
};

struct VertexOut
{
	float4 PosH : SV_POSITION;
	float4 Color : COLOR;
};

VertexOut VS(VertexIn vin) { … }

這種情況是合法的，因為Direct3D容許位元組在輸入暫存器中被重新解釋，但是VC++會提示警告：

	D3D12 WARNING: ID3D11Device::CreateInputLayout:
The provided input signature expects to read an
element with SemanticName/Index: ‘COLOR’/0 and
component(s) of the type ‘int32’. However, the
matching entry in the Input Layout declaration,
element[1], specifies mismatched format:
‘R32G32B32A32_FLOAT’. This is not an error, since
behavior is well defined: The element format
determines what data conversion algorithm gets
applied before it shows up in a shader register.
Independently, the shader input signature defines
how the shader will interpret the data that has
been placed in its input registers, with no change
in the bits stored. It is valid for the application
to reinterpret data as a different type once it is
in the vertex shader, so this warning is issued
just in case reinterpretation was not intended by
the author.

5 畫素著色器的例子

通過頂點著色器輸出的頂點屬性會基於三角形進行差值產生新屬性，然後會輸入的畫素著色器中：
在這裡插入圖片描述
畫素著色器的職責是計算每一個畫素片段的顏色，需要注意的是，不是每個畫素都可以被寫入後置緩衝（back buffer），它可能在畫素著色器中被裁切（使用HLSL的Clip函式），或者被其他深度更小的畫素片段阻塞，或者在後續的階段，比如模板測試中被廢棄。所以後置緩衝中的每個畫素可能會有多個候選者，這個是和我們通常意義上的畫素的區別。

對於硬體優化，一個畫素片段是有可能直接跳過畫素著色器，比如（early-z rejection），但是在有些情況下，這個功能會無法使用，比如如果畫素著色器中修改了Z值，那麼每個畫素必須進行畫素著色器計算後才能得到最終的Z值。

下面的程式碼是畫素著色器的一個例子：

cbuffer cbPerObject : register(b0)
{
	float4x4 gWorldViewProj;
};

void VS(float3 iPos : POSITION, 
	float4 iColor : COLOR,
	out float4 oPosH : SV_POSITION,
	out float4 oColor : COLOR)
{
	// Transform to homogeneous clip space.
	oPosH = mul(float4(iPos, 1.0f), gWorldViewProj);
	// Just pass vertex color into the pixel shader.
	oColor = iColor;
}

float4 PS(float4 posH : SV_POSITION, float4 color : COLOR) : SV_Target
{
	return pin.Color;
}

畫素著色器輸入的引數必須完全匹配頂點著色器輸出的引數，這個是要求；返回的是一個4D顏色值，該值必須匹配渲染目標格式。同樣也可以使用結構體的方式重寫上面的程式碼：

cbuffer cbPerObject : register(b0)
{
	float4x4 gWorldViewProj;
};

struct VertexIn
{
	float3 Pos : POSITION;
	float4 Color : COLOR;
};

struct VertexOut
{
	float4 PosH : SV_POSITION;
	float4 Color : COLOR;
};

VertexOut VS(VertexIn vin)
{
	VertexOut vout;
	// Transform to homogeneous clip space.
	vout.PosH = mul(float4(vin.Pos, 1.0f), gWorldViewProj);
	
	// Just pass vertex color into the pixel shader.
	vout.Color = vin.Color;
	
	return vout;
}

float4 PS(VertexOut pin) : SV_Target
{
	return pin.Color;
}

6 常量緩衝（CONSTANT BUFFERS）

6.1 建立常量緩衝

常量緩衝是GPU資源（ID3D12Resource）的一種，它的資料內容可以被著色器程式引用。在第4節中的示例程式碼：

cbuffer cbPerObject : register(b0)
{
	float4x4 gWorldViewProj;
};

就引用了一個叫cbPerObject的cbuffer型別的物件。和頂點緩衝和畫素緩衝不同的是，常量緩衝一般每幀只被CPU更新一次，比如攝像機位置改變後，我們需要更新變換矩陣；所以我們將以上傳堆的方式建立常量緩衝，這樣它就可以被CPU更新。
常量緩衝也有硬體要求，它的大小必須是硬體最小申請大小的倍數（256 bytes）。
我們經常也需要對個型別相同的常量緩衝，下面的程式碼展示瞭如何建立多個（NumElements）常量緩衝：

struct ObjectConstants
{
	DirectX::XMFLOAT4X4 WorldViewProj = MathHelper::Identity4x4();
};

UINT elementByteSize = d3dUtil::CalcConstantBufferByteSize(sizeof(ObjectConstants));
ComPtr<ID3D12Resource> mUploadCBuffer;

device->CreateCommittedResource(
	&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD),
	D3D12_HEAP_FLAG_NONE,
	&CD3DX12_RESOURCE_DESC::Buffer(mElementByteSize* NumElements),
	D3D12_RESOURCE_STATE_GENERIC_READ, 
	nullptr,
	IID_PPV_ARGS(&mUploadCBuffer));

我們可以認為mUploadCBuffer是儲存了ObjectConstants型別常量緩衝的陣列（填充到256位元組的倍數）。當需要繪製物體的時候，我只需要繫結一個常量緩衝描述（CBV）來劃分出當前物體對應的區域。本書中我們經常會把儲存了一個數組常量緩衝的mUploadCBuffer稱之為常量緩衝。
d3dUtil::CalcConstantBufferByteSize方法用以計算填充到256位元組倍數的大小：

UINT d3dUtil::CalcConstantBufferByteSize(UINT byteSize)
{
	// Constant buffers must be a multiple of the minimum hardware
	// allocation size (usually 256 bytes). So round up to nearest
	// multiple of 256. We do this by adding 255 and then masking off
	// the lower 2 bytes which store all bits < 256.
	
	// Example: Suppose byteSize = 300.
	// (300 + 255) & ˜255
	// 555 & ˜255
	// 0x022B & ˜0x00ff
	// 0x022B & 0xff00
	// 0x0200
	// 512
	return (byteSize + 255) & ˜255;
}

雖然我們這裡做了處理（256位元組倍數），但是在HLSL中這並不是必須的，因為會HLSL會做隱式處理。

// Implicitly padded to 256 bytes.
cbuffer cbPerObject : register(b0)
{
	float4x4 gWorldViewProj;
};

// Explicitly padded to 256 bytes.
cbuffer cbPerObject : register(b0)
{
	float4x4 gWorldViewProj;
	float4x4 Pad0;
	float4x4 Pad1;
	float4x4 Pad1;
};

為了避免常量緩衝隱式對齊，我們最好做到顯示對齊。

Direct3D引進了著色模型（shader model）5.1，該模型引進了一個可選的HLSL標識來定義常量緩衝：

struct ObjectConstants
{
	float4x4 gWorldViewProj;
	uint matIndex;
};

ConstantBuffer<ObjectConstants> gObjConstants :
register(b0);

這裡常量緩衝的資料元素是分開定義的，然後常量緩衝根據結構來建立。那麼在著色器中常量緩衝的資料可以這樣訪問：

uint index = gObjConstants.matIndex;

6.2 更新常量緩衝

因為常量緩衝是由D3D12_HEAP_TYPE_UPLOAD堆型別建立的，所以我們可以通過CPU來更新資源。我們可以通過Map函式來獲取資源的指標：

ComPtr<ID3D12Resource> mUploadBuffer;
BYTE* mMappedData = nullptr;
mUploadBuffer->Map(0, nullptr, reinterpret_cast<void**>(&mMappedData));

第一個引數子資源的序號，因為在緩衝中，我們只有一個資源，所以設定為0；第二個引數是一個可選的指向D3D12_RANGE的指標，它用來定義對映的記憶體的範圍，設定為Null代表全部；第三個引數返回對映的資料的指標。如果要複製資料進去，可以這樣呼叫：

memcpy(mMappedData, &data, dataSizeInBytes);

當我們處理完畢，我們需要在釋放之前呼叫Unmap方法：

if(mUploadBuffer != nullptr)
	mUploadBuffer->Unmap(0, nullptr);
	
mMappedData = nullptr;

第一個引數是子資源的序號，對於緩衝直接設定為0；第二個引數是指向D3D12_RANGE結構的指標，設定為Null代表所有資源範圍；

6.3 上傳緩衝助手

我們定義一個UploadBuffer.h來簡單的封裝一下上傳緩衝，它包含構造和析構；對資源的對映和取消對映；和CopyData方法（用以在CPU對資料進行修改）。值得注意的是，該類不僅僅用於常量緩衝，可以適用於任意上傳緩衝。如果是使用於常量緩衝，我們需要設定isConstantBuffer值，因為常量緩衝的資料需要對齊到256位元組的倍數。

template<typename T>
class UploadBuffer
{ 
public:
	UploadBuffer(ID3D12Device* device, UINT elementCount, bool isConstantBuffer) :
		mIsConstantBuffer(isConstantBuffer)
	{
		mElementByteSize = sizeof(T);
		// Constant buffer elements need to be multiples of 256 bytes.
		// This is because the hardware can only view constant data
		// at m*256 byte offsets and of n*256 byte lengths.
		// typedef struct D3D12_CONSTANT_BUFFER_VIEW_DESC {
			// UINT64 OffsetInBytes; // multiple of 256
			// UINT SizeInBytes; // multiple of 256
			// } D3D12_CONSTANT_BUFFER_VIEW_DESC;
			
		if(isConstantBuffer)
			mElementByteSize = d3dUtil::CalcConstantBufferByteSize(sizeof(T));
			
		ThrowIfFailed(device->CreateCommittedResource(
			&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD),
			D3D12_HEAP_FLAG_NONE,
			&CD3DX12_RESOURCE_DESC::Buffer(mElementByteSize*elementCount),
			D3D12_RESOURCE_STATE_GENERIC_READ,
			nullptr,
			IID_PPV_ARGS(&mUploadBuffer)));
			
		ThrowIfFailed(mUploadBuffer->Map(0, nullptr, reinterpret_cast<void**>(&mMappedData)));
			
		// We do not need to unmap until we are done with the resource.
		// However, we must not write to the resource while it is in use by
		// the GPU (so we must use synchronization techniques).
	}
	
	UploadBuffer(const UploadBuffer& rhs) = delete;
	UploadBuffer& operator=(const UploadBuffer& rhs) = delete;
	˜UploadBuffer()
	{
		if(mUploadBuffer != nullptr)
			mUploadBuffer->Unmap(0, nullptr);
			
		mMappedData = nullptr;
	}
	
	ID3D12Resource* Resource()const
	{
		return mUploadBuffer.Get();
	}
	
	void CopyData(int elementIndex, const T& data)
	{
		memcpy(&mMappedData[elementIndex*mElementByteSize], &data, sizeof(T));
	}
	
private:
	Microsoft::WRL::ComPtr<ID3D12Resource> mUploadBuffer;
	BYTE* mMappedData = nullptr;
	UINT mElementByteSize = 0;
	bool mIsConstantBuffer = false;
};

比如，當物體做運動/旋轉/縮放時需要改變世界變換矩陣；當攝像機移動/旋轉時需要改變檢視變換矩陣；當視窗尺寸變化時需要改變透視投影矩陣需要改變。所以本章的Demo中，我們在Update函式中更新：

void BoxApp::OnMouseMove(WPARAM btnState, int x, int y)
{
	if((btnState & MK_LBUTTON) != 0)
	{
		// Make each pixel correspond to a quarter of a degree.
		float dx = XMConvertToRadians(0.25f*static_cast<float> (x - mLastMousePos.x));
		float dy = XMConvertToRadians(0.25f*static_cast<float> (y - mLastMousePos.y));
		
		// Update angles based on input to orbit
		camera around box.
		mTheta += dx;
		mPhi += dy;
		
		// Restrict the angle mPhi.
		mPhi = MathHelper::Clamp(mPhi, 0.1f, MathHelper::Pi - 0.1f);
	}
	else if((btnState & MK_RBUTTON) != 0)
	{
		// Make each pixel correspond to 0.005 unit in the scene.
		float dx = 0.005f*static_cast<float>(x - mLastMousePos.x);
		float dy = 0.005f*static_cast<float>(y - mLastMousePos.y);
		
		// Update the camera radius based on input.
		mRadius += dx - dy;
		
		// Restrict the radius.
		mRadius = MathHelper::Clamp(mRadius, 3.0f, 15.0f);
	}
	
	mLastMousePos.x = x;
	mLastMousePos.y = y;
}

void BoxApp::Update(const GameTimer& gt)
{
	// Convert Spherical to Cartesian coordinates.
	float x = mRadius*sinf(mPhi)*cosf(mTheta);
	float z = mRadius*sinf(mPhi)*sinf(mTheta);
	float y = mRadius*cosf(mPhi);
	
	// Build the view matrix.
	XMVECTOR pos = XMVectorSet(x, y, z, 1.0f);
	XMVECTOR target = XMVectorZero();
	XMVECTOR up = XMVectorSet(0.0f, 1.0f, 0.0f, 0.0f);
	XMMATRIX view = XMMatrixLookAtLH(pos, target, up);
	XMStoreFloat4x4(&mView, view);
	XMMATRIX world = XMLoadFloat4x4(&mWorld);
	XMMATRIX proj = XMLoadFloat4x4(&mProj);
	XMMATRIX worldViewProj = world*view*proj;
	
	// Update the constant buffer with the latest worldViewProj matrix.
	ObjectConstants objConstants;
	XMStoreFloat4x4(&objConstants.WorldViewProj, XMMatrixTranspose(mObjectCB->CopyData(0, objConstants);
}

6.4 常量緩衝描述（Constant Buffer Descriptors）

我們需要一個描述來將常量緩衝繫結到渲染流水線，它需要使用D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV 描述堆型別，該型別可以混合儲存常量緩衝（CBV），著色器資源（SRV）和無序訪問描述（UAV：unordered access descriptors），所以需要建立一個該型別的堆：

D3D12_DESCRIPTOR_HEAP_DESC cbvHeapDesc;
cbvHeapDesc.NumDescriptors = 1;
cbvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV;
cbvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE;
cbvHeapDesc.NodeMask = 0;

ComPtr<ID3D12DescriptorHeap> mCbvHeap

md3dDevice->CreateDescriptorHeap(&cbvHeapDesc,
	IID_PPV_ARGS(&mCbvHeap));

在本章的Demo中，我們沒有SRV和UAV，並且我們只繪製一個物體，所以我們只需要一個描述；
一個CBV通過填充一個D3D12_CONSTANT_BUFFER_VIEW_DESC結構的例項，呼叫ID3D12Device::CreateConstantBufferView方法來建立：

// Constant data per-object.
struct ObjectConstants
{
	XMFLOAT4X4 WorldViewProj = MathHelper::Identity4x4();
};

// Constant buffer to store the constants of n object.
std::unique_ptr<UploadBuffer<ObjectConstants>> mObjectCB = nullptr;
mObjectCB = std::make_unique<UploadBuffer<ObjectConstants>>( md3dDevice.Get(), n, true);
UINT objCBByteSize = d3dUtil::CalcConstantBufferByteSize(sizeof(ObjectConstants));

// Address to start of the buffer (0th constant buffer).
D3D12_GPU_VIRTUAL_ADDRESS cbAddress = mObjectCB->Resource()->GetGPUVirtualAddress();

// Offset to the ith object constant buffer in the buffer.
int boxCBufIndex = i;
cbAddress += boxCBufIndex*objCBByteSize;

D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc;
cbvDesc.BufferLocation = cbAddress;
cbvDesc.SizeInBytes = d3dUtil::CalcConstantBufferByteSize(sizeof(ObjectConstants));

md3dDevice->CreateConstantBufferView(
	&cbvDesc,
	mCbvHeap->GetCPUDescriptorHandleForHeapStart());

D3D12_CONSTANT_BUFFER_VIEW_DESC結構描述了常量緩衝中繫結到HLSL中的一個自區間；根據硬體的要求BufferLocation和SizeInBytes需要是256的倍數；所以如果你設定為64，會收到下面的報錯：

D3D12 ERROR:
ID3D12Device::CreateConstantBufferView: SizeInBytes
of 64 is invalid. Device requires SizeInBytes be a
multiple of 256.
D3D12 ERROR: ID3D12Device::
CreateConstantBufferView: OffsetInBytes of 64 is
invalid. Device requires OffsetInBytes be a
multiple of 256.

6.5 根簽名（Root Signature）和描述表（Descriptor Tables）

一般情況下，在執行繪製命令之前，不同的著色器程式希望繫結不同的資源到繪製管線中；資源繫結在著色器可以訪問到的對應的暫存器槽中：

// Texture resource bound to texture register slot 0.
Texture2D gDiffuseMap : register(t0);

// Sampler resources bound to sampler register slots 0-5.
SamplerState gsamPointWrap : register(s0);
SamplerState gsamPointClamp : register(s1);
SamplerState gsamLinearWrap : register(s2);
SamplerState gsamLinearClamp : register(s3);
SamplerState gsamAnisotropicWrap : register(s4);
SamplerState gsamAnisotropicClamp : register(s5);

// cbuffer resource bound to cbuffer register slots 0-2
cbuffer cbPerObject : register(b0)
{
	float4x4 gWorld;
	float4x4 gTexTransform;
};

// Constant data that varies per material.
cbuffer cbPass : register(b1)
{
	float4x4 gView;
	float4x4 gProj;
	[…] // Other fields omitted for brevity.
};

cbuffer cbMaterial : register(b2)
{
	float4 gDiffuseAlbedo;
	float3 gFresnelR0;
	float gRoughness;
	float4x4 gMatTransform;
};

根簽名定義了那些資源將要繫結到渲染管線，和從哪裡對映到著色器程式；不同的繪製呼叫可能使用不同的著色器程式，也就需要不同的根簽名。
一個根簽名在Direct3D中由ID3D12RootSignature介面表示。它由著色器程式需要的引數陣列定義。一個根引數（root parameter）可以是一個根常量（root constant），根描述（root descriptor）或者描述表（descriptor table）。本章中我們只使用描述表，一個描述表定義了一段在描述堆中連續的描述。

下面的程式碼建立了一個具有一個由足夠大的儲存一個CBV的描述表的根引數的根訊號：

// Root parameter can be a table, root descriptor or root constants.
CD3DX12_ROOT_PARAMETER slotRootParameter[1];

// Create a single descriptor table of CBVs.
CD3DX12_DESCRIPTOR_RANGE cbvTable;
cbvTable.Init(
	D3D12_DESCRIPTOR_RANGE_TYPE_CBV,
	1, // Number of descriptors in table
	339
	0);// base shader register arguments are bound to for this root parameter
	
slotRootParameter[0].InitAsDescriptorTable(
	1, // Number of ranges
	&cbvTable); // Pointer to array of ranges
	
// A root signature is an array of root parameters.
CD3DX12_ROOT_SIGNATURE_DESC rootSigDesc(1,
slotRootParameter, 0, nullptr,
D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_// create a root signature with a single slot
which points to a

// descriptor range consisting of a single constant buffer.
ComPtr<ID3DBlob> serializedRootSig = nullptr;
ComPtr<ID3DBlob> errorBlob = nullptr;
HRESULT hr = D3D12SerializeRootSignature(&rootSigDesc,
	D3D_ROOT_SIGNATURE_VERSION_1,
	serializedRootSig.GetAddressOf(),
	errorBlob.GetAddressOf());
	
ThrowIfFailed(md3dDevice->CreateRootSignature(
	0,
	serializedRootSig->GetBufferPointer(),
	serializedRootSig->GetBufferSize(),
	IID_PPV_ARGS(&mRootSignature)));

我們將會在下節中更多的描述CD3DX12_ROOT_PARAMETER和CD3DX12_DESCRIPTOR_RANGE，目前只需要瞭解下面的程式碼：

CD3DX12_ROOT_PARAMETER slotRootParameter[1];
CD3DX12_DESCRIPTOR_RANGE cbvTable;

cbvTable.Init(
	D3D12_DESCRIPTOR_RANGE_TYPE_CBV, // table type
	1, // Number of descriptors in table
	0);// base shader register arguments are bound to for this root parameter

slotRootParameter[0].InitAsDescriptorTable(
	1, // Number of ranges
	&cbvTable); // Pointer to array of ranges

一個根簽名只是定義了那些資源要被繫結到渲染流水線中，但是它並不真正執行繫結工作，我們需要呼叫ID3D12GraphicsCommandList::SetGraphicsRootDescriptorTab方法：

void ID3D12GraphicsCommandList::SetGraphicsRootDescriptorTable(
	UINT RootParameterIndex,
	D3D12_GPU_DESCRIPTOR_HANDLE BaseDescriptor);

RootParameterIndex：根引數的索引；
BaseDescriptor：描述表中的第一個描述；
下面的程式碼設定根簽名和CBV堆到命令列表，然後設定描述表初始化我們要繫結的資源：

mCommandList->SetGraphicsRootSignature(mRootSignature.Get());
ID3D12DescriptorHeap* descriptorHeaps[] = {
	mCbvHeap.Get() };
	
mCommandList->SetDescriptorHeaps(_countof(descriptorHeaps),
	descriptorHeaps);
	
// Offset the CBV we want to use for this draw call.
CD3DX12_GPU_DESCRIPTOR_HANDLE cbv(mCbvHeap ->GetGPUDescriptorHandleForHeapStart());
cbv.Offset(cbvIndex, mCbvSrvUavDescriptorSize);
mCommandList->SetGraphicsRootDescriptorTable(0, cbv);

出於效能考慮，根簽名要儘可能小，並且在每幀中儘可能減少切換根簽名的次數。

7 編譯著色器程式碼

在Direct3D中，著色器程式碼首先需要編譯成可移植的位元組碼，然後圖形驅動會將這些位元組碼針對本機的GPU結構進行再次編譯和優化。在執行時，我們可以使用下面的函式進行編譯：

HRESULT D3DCompileFromFile(
	LPCWSTR pFileName,
	const D3D_SHADER_MACRO *pDefines,
	ID3DInclude *pInclude,
	LPCSTR pEntrypoint,
	LPCSTR pTarget,
	UINT Flags1,
	UINT Flags2,
	ID3DBlob **ppCode,
	ID3DBlob **ppErrorMsgs);

pFileName：hlsl檔案的名稱；
pDefines：本書中不使用，直接設定為Null，具體功能可以檢視文件；
pInclude：本書中不使用，直接設定為null，具體功能可以檢視文件；
pEntrypoint：著色器程式入口函式名稱；
pTarget：一個指明著色器程式型別和版本的字串，本書中使用5.0或者5.1：

			a) vs_5_0 and vs_5_1: Vertex shader 5.0 and 5.1, respectively.
			b) hs_5_0 and hs_5_1: Hull shader 5.0 and 5.1, respectively.
			c) ds_5_0 and ds_5_1: Domain shader 5.0 and 5.1, respectively.
			d) gs_5_0 and gs_5_1: Geometry shader 5.0 and 5.1, respectively.
			e) ps_5_0 and ps_5_1: Pixel shader 5.0 and 5.1, respectively.
			f) cs_5_0 and cs_5_1: Compute shader 5.0 and 5.1, respectively.

Flags1：指明著色器程式如何被編譯的標記，本書中只使用下面的2個：

			a) D3DCOMPILE_DEBUG: Compiles the shaders in debug mode.
			b) D3DCOMPILE_SKIP_OPTIMIZATION: Instructs the compiler to skip optimizations (useful for debugging).

Flags2：本書中不使用，具體功能檢視文件；
ppCode：返回儲存了編譯完成的資料ID3DBlob結構的指標；
ppErrorMsgs：返回儲存瞭如果有錯誤資訊的ID3DBlob結構的指標；

ID3DBlob只是一個包含了2個函式的記憶體塊：

LPVOID GetBufferPointer：返回資料的void*，所以在使用前必須轉換為適當的型別；
SIZE_T GetBufferSize：返回資料的大小；

為了支援錯誤輸出，我們在d3dUtil.h/.cpp中定義了下面的幫助函式：

ComPtr<ID3DBlob> d3dUtil::CompileShader(
	const std::wstring& filename,
	const D3D_SHADER_MACRO* defines,
	const std::string& entrypoint,
	const std::string& target)
{
	// Use debug flags in debug mode.
	UINT compileFlags = 0;
	
	#if defined(DEBUG) || defined(_DEBUG)
		compileFlags = D3DCOMPILE_DEBUG | D3DCOMPILE_SKIP_OPTIMIZATION;
	#endif
	
	HRESULT hr = S_OK;
	ComPtr<ID3DBlob> byteCode = nullptr;
	ComPtr<ID3DBlob> errors;
	
	hr = D3DCompileFromFile(filename.c_str(),
		defines, D3D_COMPILE_STANDARD_FILE_INCLUDE,
		entrypoint.c_str(), target.c_str(),
		compileFlags, 0, &byteCode, &errors);
		
	// Output errors to debug window.
	if(errors != nullptr)
		OutputDebugStringA((char*)errors->GetBufferPointer());
		
	ThrowIfFailed(hr);
	return byteCode;
} 

//Here is an example of calling this function:
ComPtr<ID3DBlob> mvsByteCode = nullptr;
ComPtr<ID3DBlob> mpsByteCode = nullptr;

mvsByteCode = d3dUtil::CompileShader(L"Shaders\\color.hlsl", nullptr, "VS", "vs_5_0");
mpsByteCode = d3dUtil::CompileShader(L"Shaders\\color.hlsl", nullptr, "PS", "ps_5_0");

編譯著色器程式碼並沒有把著色器繫結到渲染流水線，繫結工作將在第九節中描述。

7.1 離線編譯

使用離線編譯的好處：

編譯已經完成的著色器程式碼可能會佔用大量時間；
可以更早的，更方便的檢視編譯產生的錯誤；
Window 8的應用必須使用離線編譯。

通常對編譯好的著色器檔案使用.cso（compiled shader object）副檔名；我們使用DirectX附帶的命令列工具FXC tool，對於包含VS和PS入口函式的color.hlsl程式碼，如果要編譯除錯版本：

fxc "color.hlsl" /Od /Zi /T vs_5_0 /E "VS" /Fo "color_vs.cso" /Fc "color_vs.asm"
fxc "color.hlsl" /Od /Zi /T ps_5_0 /E "PS" /Fo "color_ps.cso" /Fc "color_ps.asm"

如果要編譯釋出版本：

fxc "color.hlsl" /T vs_5_0 /E "VS" /Fo "color_vs.cso" /Fc "color_vs.asm"
fxc "color.hlsl" /T ps_5_0 /E "PS" /Fo "color_ps.cso" /Fc "color_ps.asm"

在這裡插入圖片描述
編譯完成後，我們還需要載入檔案，可以使用C++問價輸入機制：

ComPtr<ID3DBlob> d3dUtil::LoadBinary(const std::wstring& filename)
{
	std::ifstream fin(filename, std::ios::binary);
	fin.seekg(0, std::ios_base::end);
	
	std::ifstream::pos_type size = (int)fin.tellg();
	fin.seekg(0, std::ios_base::beg);
	
	ComPtr<ID3DBlob> blob;
	ThrowIfFailed(D3DCreateBlob(size,
		blob.GetAddressOf()));
		fin.read((char*)blob->GetBufferPointer(),
		size);
		
	fin.close();
	return blob;
}… 

ComPtr<ID3DBlob> mvsByteCode = d3dUtil::LoadBinary(L"Shaders\\color_vs.cso");
ComPtr<ID3DBlob> mpsByteCode = d3dUtil::LoadBinary(L"Shaders\\color_ps.cso");

7.2 生成彙編程式碼

/Fc選項引數可以讓FXC生成彙編程式碼，觀察著色器的彙編程式碼可以檢視指令數和確認被生成的是哪類程式碼，有些時候和你所預期的會不太一樣。比如說在你的HLSL程式碼中有個條件語句，你就希望在彙編程式碼中有一個分支；但是在早期的可程式設計GPU上，分支運算非常耗時，所以有時編譯器會在條件語句中評估每個分支，然後插入語句選擇正確的選項，就像下面的程式碼一樣：
在這裡插入圖片描述
所以flattened函式給我們一個沒有分支的相同結果，但是如果不檢視彙編程式碼，我們無法知道是否已經flattened處理過。所以有時你需要檢視彙編程式碼來確認具體發生了什麼，下面就是一個color.hlsl的例子：

//
// Generated by Microsoft (R) HLSL Shader Compiler 6.4.9844.0
//
//
// Buffer Definitions:
//
// cbuffer cbPerObject
// {
//
// float4x4 gWorldViewProj; // Offset: 0 Size: 64
//
// }
//
//
// Resource Bindings:
//
// Name Type Format Dim Slot Elements
// ------------------------------ ---------- ---- --- ----------- ---- ---------
// cbPerObject cbuffer NA NA 0 1
//
//
//
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- ---
----- ------- ------
// POSITION 0 xyz 0 NONE float xyz
// COLOR 0 xyzw 1 NONE float xyzw
//
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- --- ----- ------- ------
// SV_POSITION 0 xyzw 0 POS float xyzw
// COLOR 0 xyzw 1 NONE float xyzw
//
vs_5_0
dcl_globalFlags refactoringAllowed |
skipOptimization
dcl_constantbuffer cb0[4], immediateIndexed
dcl_input v0.xyz
dcl_input v1.xyzw
dcl_output_siv o0.xyzw, position
dcl_output o1.xyzw
dcl_temps 2
//
// Initial variable locations:
// v0.x <- vin.PosL.x; v0.y <- vin.PosL.y; v0.z <- vin.PosL.z;
// v1.x <- vin.Color.x; v1.y <- vin.Color.y; v1.z <- vin.Color.z; v1.w <- vin.Color.w;
// o1.x <- <VS return value>.Color.x;
// o1.y <- <VS return value>.Color.y;
// o1.z <- <VS return value>.Color.z;
// o1.w <- <VS return value>.Color.w;
// o0.x <- <VS return value>.PosH.x;
// o0.y <- <VS return value>.PosH.y;
// o0.z <- <VS return value>.PosH.z;
// o0.w <- <VS return value>.PosH.w
//
#line 29 "color.hlsl"
mov r0.xyz, v0.xyzx
mov r0.w, l(1.000000)
dp4 r1.x, r0.xyzw, cb0[0].xyzw // r1.x <- vout.PosH.x
dp4 r1.y, r0.xyzw, cb0[1].xyzw // r1.y <- vout.PosH.y
dp4 r1.z, r0.xyzw, cb0[2].xyzw // r1.z <- vout.PosH.z
dp4 r1.w, r0.xyzw, cb0[3].xyzw // r1.w <- vout.PosH.w
#line 32
mov r0.xyzw, v1.xyzw // r0.x <- vout.Color.x;
r0.y <- vout.Color.y;
// r0.z <- vout.Color.z; r0.w <- vout.Color.w
mov o0.xyzw, r1.xyzw
mov o1.xyzw, r0.xyzw
ret
// Approximately 10 instruction slots used

7.3 使用Visual Studio離線編譯著色器

VS2013以後對著色器的編譯有了完整的支援，你可以直接新增.hlsl檔案到工程，然後VS將會識別和提供編譯相關的選項，這些選項允許在UI介面下設定FXC引數：
在這裡插入圖片描述
但是在VS下編譯著色器程式，每個檔案只能支援一個著色器程式，這代表不能講VS和PS同時放在一個檔案裡（所以一般不會使用它）。

8 光柵化階段

光柵化階段由D3D12_RASTERIZER_DESC結構進行配置；該階段它不可程式設計，只能配置：

typedef struct D3D12_RASTERIZER_DESC {
	D3D12_FILL_MODE FillMode; // Default: D3D12_FILL_SOLID
	D3D12_CULL_MODE CullMode; // Default: D3D12_CULL_BACK
	BOOL FrontCounterClockwise; // Default: false
	INT DepthBias; // Default: 0
	FLOAT DepthBiasClamp; // Default: 0.0f
	FLOAT SlopeScaledDepthBias; // Default: 0.0f
	BOOL DepthClipEnable; // Default: true
	BOOL ScissorEnable; // Default: false
	BOOL MultisampleEnable; // Default: false
	BOOL AntialiasedLineEnable; // Default: false
	UINT ForcedSampleCount; // Default: 0
	
	// Default: D3D12_CONSERVATIVE_RASTERIZATION_MODE_OFF
	D3D12_CONSERVATIVE_RASTERIZATION_MODE ConservativeRaster;
	} D3D12_RASTERIZER_DESC;

其中大部分引數都不經常使用，本章只介紹4個，其他引數建議檢視官方文件仔細閱讀：

FillMode：D3D12_FILL_WIREFRAME線框渲染模式；D3D12_FILL_SOLID實體渲染模式；
CullMode：D3D12_CULL_NONE禁用裁切；D3D12_CULL_BACK背面裁切三角形；D3D12_CULL_FRONT正面裁切三角形；背面裁切是預設值；
FrontCounterClockwise：false代表三角形順時針方向為前向，true相反；
ScissorEnable：設定true開啟scissor test，false禁用，其中false是預設值。

下面是建立的一個程式碼例子：

CD3DX12_RASTERIZER_DESC rsDesc(D3D12_DEFAULT);
rsDesc.FillMode = D3D12_FILL_WIREFRAME;
rsDesc.CullMode = D3D12_CULL_NONE;

CD3DX12_RASTERIZER_DESC是繼承自D3D12_RASTERIZER_DESC的一個非常方便好用的類，它添加了一些幫助的建構函式。CD3D12_DEFAULT 和 D3D12_DEFAULT定義類似於：

struct CD3D12_DEFAULT {};
extern const DECLSPEC_SELECTANY CD3D12_DEFAULT D3D12_DEFAULT;

9 流水線狀態物件（PIPELINE STATE OBJECT：PSO）

一個流水線狀態物件（PSO）由一個ID3D12PipelineState介面表示，為了建立它，我們需要先填充一個D3D12_GRAPHICS_PIPELINE_STATE_DESC例項：

typedef struct D3D12_GRAPHICS_PIPELINE_STATE_DESC
{ 
	ID3D12RootSignature *pRootSignature;
	D3D12_SHADER_BYTECODE VS;
	D3D12_SHADER_BYTECODE PS;
	D3D12_SHADER_BYTECODE DS;
	D3D12_SHADER_BYTECODE HS;
	D3D12_SHADER_BYTECODE GS;
	D3D12_STREAM_OUTPUT_DESC StreamOutput;
	D3D12_BLEND_DESC BlendState;
	UINT SampleMask;
	D3D12_RASTERIZER_DESC RasterizerState;
	D3D12_DEPTH_STENCIL_DESC DepthStencilState;
	D3D12_INPUT_LAYOUT_DESC InputLayout;
	D3D12_PRIMITIVE_TOPOLOGY_TYPE PrimitiveTopologyType;
	UINT NumRenderTargets;
	DXGI_FORMAT RTVFormats[8];
	DXGI_FORMAT DSVFormat;
	DXGI_SAMPLE_DESC SampleDesc;
} D3D12_GRAPHICS_PIPELINE_STATE_DESC;

pRootSignature：指向根簽名的指標；
VS：需要繫結的頂點種色器，由D3D12_SHADER_BYTECODE結構來定義

		typedef struct D3D12_SHADER_BYTECODE {
		const BYTE *pShaderBytecode;
		SIZE_T BytecodeLength;
		} D3D12_SHADER_BYTECODE;

PS：需要繫結的畫素著色器；
DS：需要繫結的domain shader；
HS：需要繫結的hull shader；
GS：需要繫結的幾何著色器；
StreamOutput：用以stream-out技術，目前先設定為空；
BlendState：指定混合狀態的選項，目前先設定為預設值CD3DX12_BLEND_DESC(D3D12_DEFAULT)，後續章節中會繼續描述；
SampleMask：多重紋理取樣可以設定到32重。該引數（32位整形）用以啟用/禁用對應的紋理取樣；比如第五位設定為0，那麼第五重紋理取樣將會關閉；所以第五位的值只能影響到紋理取樣設定到5重以上的渲染。如果應用設定為單紋理取樣，那麼只有第一位的值能起到作用。一般來講該值都設定為0xffffffff，即不去影響紋理取樣。
RasterizerState：指定光柵化的狀態；
DepthStencilState：指定深度/模板測試狀態，後續章節會詳細討論，目前設定為預設值CD3DX12_DEPTH_STENCIL_DESC(D3D12_DEFAULT)；
InputLayout：一個輸入佈局描述（D3D12_INPUT_ELEMENT_DESC陣列和它的個數）：

typedef struct D3D12_INPUT_LAYOUT_DESC
{
	const D3D12_INPUT_ELEMENT_DESC
	*pInputElementDescs;
	UINT NumElements;
} D3D12_INPUT_LAYOUT_DESC;

PrimitiveTopologyType：指定拓撲型別：

typedef enum D3D12_PRIMITIVE_TOPOLOGY_TYPE
{
	D3D12_PRIMITIVE_TOPOLOGY_TYPE_UNDEFINED = 0,
	D3D12_PRIMITIVE_TOPOLOGY_TYPE_POINT = 1,
	D3D12_PRIMITIVE_TOPOLOGY_TYPE_LINE = 2,
	D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE = 3,
	D3D12_PRIMITIVE_TOPOLOGY_TYPE_PATCH = 4
} D3D12_PRIMITIVE_TOPOLOGY_TYPE;

NumRenderTargets：同時使用的渲染目標；
RTVFormats：渲染目標的格式；
DSVFormat：深度/模板緩衝的格式；
SampleDesc：描述多重紋理對映數量和等級；

當我們給一個D3D12_GRAPHICS_PIPELINE_STATE_DESC的例項賦值後，可以使用ID3D12Device::CreateGraphicsPipelineState方法建立一個ID3D12PipelineState物件：

ComPtr<ID3D12RootSignature> mRootSignature;
std::vector<D3D12_INPUT_ELEMENT_DESC> mInputLayout;
ComPtr<ID3DBlob> mvsByteCode;
ComPtr<ID3DBlob> mpsByteCode;
… 
D3D12_GRAPHICS_PIPELINE_STATE_DESC psoDesc;
ZeroMemory(&psoDesc, sizeof(D3D12_GRAPHICS_PIPELINE_STATE_DESC));
psoDesc.InputLayout = { mInputLayout.data(), (UINT)mInputLayout.size() };
psoDesc.pRootSignature = mRootSignature.Get();
psoDesc.VS =
{
	reinterpret_cast<BYTE*>(mvsByteCode->GetBufferPointer()), mvsByteCode->GetBufferSize()
};
psoDesc.PS =
{
	reinterpret_cast<BYTE*>(mpsByteCode->GetBufferPointer()), mpsByteCode->GetBufferSize()
};
psoDesc.RasterizerState = CD3D12_RASTERIZER_DESC(D3D12_DEFAULT);
psoDesc.BlendState = CD3D12_BLEND_DESC(D3D12_DEFAULT);
psoDesc.DepthStencilState = CD3D12_DEPTH_STENCIL_DESC(D3D12_DEFAULT);
psoDesc.SampleMask = UINT_MAX;
psoDesc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;
psoDesc.NumRenderTargets = 1;
psoDesc.RTVFormats[0] = mBackBufferFormat;
psoDesc.SampleDesc.Count = m4xMsaaState ? 4 : 1;
psoDesc.SampleDesc.Quality = m4xMsaaState ? (m4xMsaaQuality - 1) : 0;
psoDesc.DSVFormat = mDepthStencilFormat;
ComPtr<ID3D12PipelineState> mPSO;

md3dDevice->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&mPSO)));

在一個ID3D12PipelineState物件集合中有很多狀態，這樣設計是為了效能；Direct3D可以同時驗證這些狀態是否相容，並且驅動程式可以預先生成程式碼來對硬體狀態程式設計。在Direct3D 11狀態模型，這些狀態都是分開設定的，如果一個狀態發生了改變，那麼其他和它相關的狀態都要被驅動重新程式設計；如果多個狀態發生了改變，那麼就會生成很多多餘的驅動重新程式設計；為了避免這個問題，就需要延時進行驅動重新程式設計，那麼就需要在執行時跟蹤是否所有狀態已經改變完畢。在新的Direct3D 12模型中，因為我們把所有狀態的設定放在一個集合中，那麼驅動久可以一次所有需要的程式碼。

因為PSO的驗證和建立對時間的消耗比較大，所以PSO需要在初始化階段被建立。需要在執行時建立PSO的例外就是第一次引用它的時候，為了讓它在後續使用中能夠快速取到，可以收集到型別雜湊表的結構中。

並不是所有渲染狀態都封裝到了一個PSO中，比如視口（viewport）和裁切框（scissor rectangles），就是獨立指定的；因為這些狀態可以很高效的設定到其他渲染管線中，所以把它們封裝到一個PSO中並沒有什麼好處。

Direct3D基本上是一個狀態機，它一直會停留在一個狀態上，除非我們改變該狀態；所以如果你要設定不同的PSO，那麼程式碼如下：

// Reset specifies initial PSO.
mCommandList->Reset(mDirectCmdListAlloc.Get(), mPSO1.Get())
/* …draw objects using PSO 1… */
// Change PSO
mCommandList->SetPipelineState(mPSO2.Get());
/* …draw objects using PSO 2… */
// Change PSO
mCommandList->SetPipelineState(mPSO3.Get());
/* …draw objects using PSO 3… */

為了效能考慮，PSO的狀態改變次數應該儘可能減少，將可以使用同一個PSO的物體放到一起渲染，不要每個渲染呼叫都改變PSO。

10 幾何體助手結構

對於定義一個集合體的集合，建立一個把頂點和索引資訊集合在一起的結構就很方便；並且該結構體可以支援頂點和索引資料儲存在系統記憶體中，CPU就可以訪問到它們。CPU需要訪問到它們進行拾取和碰撞檢測等操作。並且結構體中還快取了很多重要屬性和方法。本書中我們使用MeshGeometry（d3dUtil.h中定義）結構來定義幾何體塊：

// Defines a subrange of geometry in a MeshGeometry. This is for when
// multiple geometries are stored in one vertex and index buffer. It
// provides the offsets and data needed to draw a subset of geometry
// stores in the vertex and index buffers so that we can implement the
// technique described by Figure 6.3.
struct SubmeshGeometry
{
	UINT IndexCount = 0;
	UINT StartIndexLocation = 0;
	INT BaseVertexLocation = 0;
	// Bounding box of the geometry defined by this submesh.
	// This is used in later chapters of the book.
	DirectX::BoundingBox Bounds;
};

struct MeshGeometry
{
	// Give it a name so we can look it up by name.
	std::string Name;
	
	// System memory copies. Use Blobs because the vertex/index format can
	// be generic.
	// It is up to the client to cast appropriately.
	Microsoft::WRL::ComPtr<ID3DBlob> VertexBufferCPU = nullptr;
	Microsoft::WRL::ComPtr<ID3DBlob> IndexBufferCPU = nullptr;
	
	Microsoft::WRL::ComPtr<ID3D12Resource> VertexBufferGPU = nullptr;
	Microsoft::WRL::ComPtr<ID3D12Resource> IndexBufferGPU = nullptr;
	
	Microsoft::WRL::ComPtr<ID3D12Resource> VertexBufferUploader = nullptr;
	Microsoft::WRL::ComPtr<ID3D12Resource> IndexBufferUploader = nullptr;
	
	// Data about the buffers.
	UINT VertexByteStride = 0;
	UINT VertexBufferByteSize = 0;
	DXGI_FORMAT IndexFormat = DXGI_FORMAT_R16_UINT;
	UINT IndexBufferByteSize = 0;
	
	// A MeshGeometry may store multiple geometries in one vertex/index
	// buffer.
	// Use this container to define the Submesh geometries so we can draw
	// the Submeshes individually.
	std::unordered_map<std::string, SubmeshGeometry> DrawArgs;
	
	D3D12_VERTEX_BUFFER_VIEW VertexBufferView()const
	{
		D3D12_VERTEX_BUFFER_VIEW vbv;
		vbv.BufferLocation = VertexBufferGPU->GetGPUVirtualAddress();
		vbv.StrideInBytes = VertexByteStride;
		vbv.SizeInBytes = VertexBufferByteSize;
		return vbv;
	}
	D3D12_INDEX_BUFFER_VIEW IndexBufferView()const
	{
		D3D12_INDEX_BUFFER_VIEW ibv;
		ibv.BufferLocation = IndexBufferGPU->GetGPUVirtualAddress();
		ibv.Format = IndexFormat;
		ibv.SizeInBytes = IndexBufferByteSize;
		return ibv;
	}
	// We can free this memory after we finish upload to the GPU.
	void DisposeUploaders()
	{
		VertexBufferUploader = nullptr;
		IndexBufferUploader = nullptr;
	}
};

11 Box Demo

程式碼工程地址：（或者帶本書官方網站下載示例工程）
https://githu.com/jiabaodan/Direct12BookReadingNotes
在這裡插入圖片描述

12 本章總結

在Direct3D中，頂點除了位置以外還可以包含其他資料，為了定義一個自定義的頂點，我們可以定義一個頂點結構來選擇我們需要的資料；頂點結構定義完成後，我們使用輸入佈局描述(D3D12_INPUT_LAYOUT_DESC)來向Direct3D描述定義的頂點；它是PSO中的一個結構D3D12_GRAPHICS_PIPELINE_STATE_DESC，並且它針對定點著色器輸入簽名來驗證相容性。一個輸入佈局是在PSO被繫結是，繫結到IA階段。
為了讓GPU訪問到頂點/索引陣列，它們需要放在緩衝中（ID3D12Resource介面）；一個緩衝是通過賦值D3D12_RESOURCE_DESC結構體，並呼叫ID3D12Device::CreateCommittedResource來建立。頂點緩衝的描述用D3D12_VERTEX_BUFFER_VIEW結構來定義，索引緩衝的描述是使用D3D12_INDEX_BUFFER_VIEW結構；頂點緩衝使用ID3D12GraphicsCommandList::IASetVertexBuffers方法來繫結到IA階段；索引緩衝使用ID3D12GraphicsCommandList::IASetIndexBuffer方法；無索引幾何體可以使用ID3D12GraphicsCommandList::DrawInstanced方法來繪製，有索引的幾何體使用ID3D12GraphicsCommandList::DrawIndexedInstanced；
頂點著色器是用HLSL編寫，執行在GPU上的程式碼