1. 程式人生 > >Introduction to 3D Game Programming with DirectX 12 學習筆記之 --- 第十六章:實例化和截頭錐體裁切

Introduction to 3D Game Programming with DirectX 12 學習筆記之 --- 第十六章:實例化和截頭錐體裁切

srv 參數 linear clam 階段 res lease log multiple

原文:Introduction to 3D Game Programming with DirectX 12 學習筆記之 --- 第十六章:實例化和截頭錐體裁切

代碼工程地址:

https://github.com/jiabaodan/Direct12BookReadingNotes



學習目標

  1. 學習如何實現硬件實例化;
  2. 熟悉包圍體,學習如何創建和使用它們;
  3. 學習如何實現截頭錐體剔除。


1 硬件實例化

給每個實例都復制一份頂點和索引是非常浪費的,所以我們只保存一份物體在局部坐標系下的數據,然後使用不同的世界變換矩陣和材質繪制它多次。
雖然這個策略節約了內存,但它需要逐物體的API開銷(對每個物體我們需要設置它的世界變換矩陣,材質和調用繪制命令)。即使D3D12已經重新設計和最小化了D3D11裏在繪制調用時候的API開銷,但是還是會有一些開銷。D3D的實例化API可以讓你繪制一個物體多次,但是只需要一個繪制調用;並且使用動態索引,實例化在D3D11中變得更加靈活。

繪制調用(Draw Call)造成的性能開銷是CPU的瓶頸(不是GPU),因為每次繪制調用,CPU需要做很多狀態改變。圖形引擎采用批處理技術([Wloka03])來減少繪制調用,硬件實例化就是達成批處理的一方面方案。


1.1 繪制實例數據

在之前的章節,我們已經使用實例化技術,只不過每次只實例化1個:

cmdList->DrawIndexedInstanced(ri->IndexCount,
	1,
	ri->StartIndexLocation, ri->BaseVertexLocation, 0);

第二個參數就是實例化的個數。


1.2 實例數據

本書的之前的版本包含了一個方法,將實例數據通過IA階段輸入。當創建一個輸入布局的時候可以通過D3D12_INPUT_CLASSIFICATION_PER_INSTANCE_DATA指定一個針對每個實例的數據流,而不是針對每個頂點列表的D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA。D3D12依然支持這種方法,但是這次我們會介紹一個更高級的方法。
這個方法就是創建一個結構化緩沖好包含所有需要的實例數據;然後綁定這個結構化緩沖到渲染流水線。然後通過一個系統值SV_InstanceID在著色器代碼中索引:

// Defaults for number of lights.
#ifndef NUM_DIR_LIGHTS
	#define NUM_DIR_LIGHTS 3
#endif
#ifndef NUM_POINT_LIGHTS
	#define NUM_POINT_LIGHTS 0
#endif
#ifndef NUM_SPOT_LIGHTS
	#define NUM_SPOT_LIGHTS 0
#endif

// Include structures and functions for lighting.
#include "LightingUtil.hlsl"

struct InstanceData
{
	float4x4 World;
	float4x4 TexTransform;
	uint MaterialIndex;
	uint InstPad0;
	uint InstPad1;
	uint InstPad2;
};

struct MaterialData
{
	float4 DiffuseAlbedo;
	float3 FresnelR0;
	float Roughness;
	float4x4 MatTransform;
	uint DiffuseMapIndex;
	uint MatPad0;
	uint MatPad1;
	uint MatPad2;
};

// An array of textures, which is only supported in shader model 5.1+.
// Unlike Texture2DArray, the textures in this array can be different
// sizes and formats, making it more flexible than texture arrays.
Texture2D gDiffuseMap[7] : register(t0);

// Put in space1, so the texture array does not overlap with these.
// The texture array above will occupy registers t0, t1, …, t6 in
// space0.
StructuredBuffer<InstanceData> gInstanceData : register(t0, space1);
StructuredBuffer<MaterialData> gMaterialData : register(t1, space1);

SamplerState gsamPointWrap : register(s0);
SamplerState gsamPointClamp : register(s1);
SamplerState gsamLinearWrap : register(s2);
SamplerState gsamLinearClamp : register(s3);
SamplerState gsamAnisotropicWrap : register(s4);
SamplerState gsamAnisotropicClamp : register(s5);

// Constant data that varies per pass.
cbuffer cbPass : register(b0)
{
	float4x4 gView;
	float4x4 gInvView;
	float4x4 gProj;
	float4x4 gInvProj;
	float4x4 gViewProj;
	float4x4 gInvViewProj;
	float3 gEyePosW;
	float cbPerObjectPad1;
	float2 gRenderTargetSize;
	float2 gInvRenderTargetSize;
	float gNearZ;
	float gFarZ;
	float gTotalTime;
	float gDeltaTime;
	float4 gAmbientLight;
	
	// Indices [0, NUM_DIR_LIGHTS) are directional lights;
	// indices [NUM_DIR_LIGHTS, NUM_DIR_LIGHTS+NUM_POINT_LIGHTS) are point lights;
	// indices [NUM_DIR_LIGHTS+NUM_POINT_LIGHTS,
	// NUM_DIR_LIGHTS+NUM_POINT_LIGHT+NUM_SPOT_LIGHTS)
	// are spot lights for a maximum of MaxLights per object.
	Light gLights[MaxLights];
};

struct VertexIn
{
	float3 PosL : POSITION;
	float3 NormalL : NORMAL;
	float2 TexC : TEXCOORD;
};

struct VertexOut
{
	float4 PosH : SV_POSITION;
	float3 PosW : POSITION;
	float3 NormalW : NORMAL;
	float2 TexC : TEXCOORD;
	
	// nointerpolation is used so the index is not interpolated
	// across the triangle.
	nointerpolation uint MatIndex : MATINDEX;
};

VertexOut VS(VertexIn vin, uint instanceID : SV_InstanceID)
{
	VertexOut vout = (VertexOut)0.0f;
	
	// Fetch the instance data.
	InstanceData instData = gInstanceData[instanceID];
	float4x4 world = instData.World;
	float4x4 texTransform = instData.TexTransform;
	uint matIndex = instData.MaterialIndex;
	vout.MatIndex = matIndex;
	
	// Fetch the material data.
	MaterialData matData = gMaterialData[matIndex];
	
	// Transform to world space.
	float4 posW = mul(float4(vin.PosL, 1.0f), world);
	vout.PosW = posW.xyz;
	
	// Assumes nonuniform scaling; otherwise, need to use inverse-transpose
	// of world matrix.
	vout.NormalW = mul(vin.NormalL, (float3x3)world);
	
	// Transform to homogeneous clip space.
	vout.PosH = mul(posW, gViewProj);
	
	// Output vertex attributes for interpolation across triangle.
	float4 texC = mul(float4(vin.TexC, 0.0f, 1.0f), texTransform);
	vout.TexC = mul(texC, matData.MatTransform).xy;
	
	return vout;
}

float4 PS(VertexOut pin) : SV_Target
{
	// Fetch the material data.
	MaterialData matData = gMaterialData[pin.MatIndex];
	float4 diffuseAlbedo = matData.DiffuseAlbedo;
	float3 fresnelR0 = matData.FresnelR0;
	float roughness = matData.Roughness;
	uint diffuseTexIndex = matData.DiffuseMapIndex;
	
	// Dynamically look up the texture in the array.
	diffuseAlbedo *= gDiffuseMap[diffuseTexIndex].Sample(gsamLinearWrap, pin.TexC);
	
	// Interpolating normal can unnormalize it, so renormalize it.
	pin.NormalW = normalize(pin.NormalW);
	
	// Vector from point being lit to eye.
	float3 toEyeW = normalize(gEyePosW - pin.PosW);
	
	// Light terms.
	float4 ambient = gAmbientLight*diffuseAlbedo;
	Material mat = { diffuseAlbedo, fresnelR0, roughness };
	float4 directLight = ComputeDirectLighting(gLights, mat, pin.PosW, pin.NormalW, toEyeW);
	float4 litColor = ambient + directLight;
	
	// Common convention to take alpha from diffuse albedo.
	litColor.a = diffuseAlbedo.a;
	
	return litColor;
}

我們不再需要逐物體的常量緩沖,修改為通過實例緩沖來獲取,下面是對應的根簽名代碼:

CD3DX12_DESCRIPTOR_RANGE texTable;
texTable.Init(D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 7, 0, 0);

// Root parameter can be a table, root descriptor or root constants.
CD3DX12_ROOT_PARAMETER slotRootParameter[4];

// Perfomance TIP: Order from most frequent to least frequent.
slotRootParameter[0].InitAsShaderResourceView(0, 1);
slotRootParameter[1].InitAsShaderResourceView(1, 1);
slotRootParameter[2].InitAsConstantBufferView(0);
slotRootParameter[3].InitAsDescriptorTable(1, &texTable, D3D12_SHADER_VISIBILITY_PIXEL);

auto staticSamplers = GetStaticSamplers();

// A root signature is an array of root parameters.
CD3DX12_ROOT_SIGNATURE_DESC rootSigDesc(4,
	slotRootParameter,
	(UINT)staticSamplers.size(),
	staticSamplers.data(),
	D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_

在最後一節,我們每幀綁定場景中所有材質和紋理一次,每次繪制調用設置一次實例數據的結構化緩沖:

void InstancingAndCullingApp::Draw(const GameTimer& gt)
{
	…
	// Bind all the materials used in this scene. For structured buffers, we
	// can bypass the heap and set as a root descriptor.
	auto matBuffer = mCurrFrameResource->MaterialBuffer->Resource();
	mCommandList->SetGraphicsRootShaderResourceView(1, matBuffer->GetGPUVirtualAddress());
	
	auto passCB = mCurrFrameResource->PassCB->Resource();
	mCommandList->SetGraphicsRootConstantBufferView(2, passCB->GetGPUVirtualAddress());
	
	// Bind all the textures used in this scene.
	mCommandList->SetGraphicsRootDescriptorTable(3,
		mSrvDescriptorHeap->GetGPUDescriptorHandleForHeapStart());
		
	DrawRenderItems(mCommandList.Get(), mOpaqueRitems);
	…
}

void InstancingAndCullingApp::DrawRenderItems(
	ID3D12GraphicsCommandList* cmdList,
	const std::vector<RenderItem*>& ritems)
{
	// For each render item…
	for(size_t i = 0; i < ritems.size(); ++i)
	{
		auto ri = ritems[i];
		cmdList->IASetVertexBuffers(0, 1, &ri->Geo->VertexBufferView());
		cmdList->IASetIndexBuffer(&ri->Geo->IndexBufferView());
		cmdList->IASetPrimitiveTopology(ri->PrimitiveType);
		
		// Set the instance buffer to use for this render-item.
		// For structured buffers, we can bypass
		// the heap and set as a root descriptor.
		auto instanceBuffer = mCurrFrameResource->InstanceBuffer->Resource();
		
		mCommandList->SetGraphicsRootShaderResourceView(
			0, instanceBuffer->GetGPUVirtualAddress());
			
		cmdList->DrawIndexedInstanced(ri->IndexCount,
			ri->InstanceCount, ri->StartIndexLocation,
			ri->BaseVertexLocation, 0);
	}
}

1.3 創建實例緩沖

實例緩沖保存每個實例的數據,它和我們之前創建的逐物體緩沖很類似,在CPU上,它的結構如下:

struct InstanceData
{
	DirectX::XMFLOAT4X4 World = MathHelper::Identity4x4();
	DirectX::XMFLOAT4X4 TexTransform = MathHelper::Identity4x4();
	UINT MaterialIndex;
	UINT InstancePad0;
	UINT InstancePad1;
	UINT InstancePad2;
};

它保存在渲染項目中(Render-Item):

struct RenderItem
{
	…
	std::vector<InstanceData> Instances;
	…
};

在GPU方面,我們需要創建一個InstanceData類型的結構化緩沖。並且緩沖需要時動態的(Upload buffer)這樣我們可以每幀更新;在我們的Demo中,我們只復制可見的數據(於截頭錐體消除相關),這組實例會根據攝像機的移動/旋轉來修改。創建動態緩沖可以簡單的通過UploadBuffer類創建:

struct FrameResource
{ 
public:
	FrameResource(ID3D12Device* device, UINT passCount,
		UINT maxInstanceCount, UINT materialCount);
	FrameResource(const FrameResource& rhs) = delete;
	FrameResource& operator=(const FrameResource& rhs) = delete;
	?FrameResource();
	
	// We cannot reset the allocator until the GPU is done processing the commands.
	// So each frame needs their own allocator.
	Microsoft::WRL::ComPtr<ID3D12CommandAllocator> CmdListAlloc;
	
	// We cannot update a cbuffer until the GPU is done processing the commands
	// that reference it. So each frame needs their own cbuffers.
	//
	std::unique_ptr<UploadBuffer<FrameConstants>> FrameCB = nullptr;
	std::unique_ptr<UploadBuffer<PassConstants>> PassCB = nullptr;
	std::unique_ptr<UploadBuffer<MaterialData>>  MaterialBuffer = nullptr;
	
	// NOTE: In this demo, we instance only one render-item, so we only have
	// one structured buffer to store instancing data. To make this more
	// general (i.e., to support instancing multiple render-items), you
	// would need to have a structured buffer for each render-item, and
	// allocate each buffer with enough room for the maximum number of
	// instances you would ever draw. This sounds like a lot, but it is
	// actually no more than the amount of perobject constant data we
	// would need if we were not using instancing. For example, if we
	// were drawing 1000 objects without instancing, we would create a
	// constant buffer with enough room for a 1000 objects. With instancing,
	// we would just create a structured buffer large enough to store the
	// instance data for 1000 instances.
	**std::unique_ptr<UploadBuffer<InstanceData>> InstanceBuffer = nullptr;**
	
	// Fence value to mark commands up to this fence point. This lets us
	// check if these frame resources are still in use by the GPU.
	UINT64 Fence = 0;
};

FrameResource::FrameResource(ID3D12Device* device,
	UINT passCount, UINT maxInstanceCount, UINT materialCount)
{
	ThrowIfFailed(device->CreateCommandAllocator(
		D3D12_COMMAND_LIST_TYPE_DIRECT,
		IID_PPV_ARGS(CmdListAlloc.GetAddressOf())));
		
	PassCB = std::make_unique<UploadBuffer<PassConstants>>( device, passCount, true);
	MaterialBuffer = std::make_unique<UploadBuffer<MaterialData>>( device, materialCount, false);
	InstanceBuffer = std::make_unique<UploadBuffer<InstanceData>>( device, maxInstanceCount, false);
}

需要註意的是,InstanceBuffer不是一個常量緩沖,所以我們把第三個參數設置為false。



2 包圍體和截頭錐體

為了實現截頭錐體剔除,我們先要熟悉截頭錐體和幾種包圍體的數學表達。包圍體是一個近似物體體積的物體,它將物體簡化到一個簡單的數學表達的幾何體,可以更容易計算剔除:
技術分享圖片


2.1 DirectX數學碰撞

我們使用DirectX Math中的DirectXCollision.h庫。它提供了幾種常用的幾何基元碰撞檢測(比如射線/三角面相交、射線/盒子(Box)相交、盒子和盒子相交、盒子/平面相交、盒子/截頭錐體、球/截頭錐體相交等等)的實現。練習3需要你探索這個庫中包含的內容。


2.2 盒子

一個網格的軸平行的包圍盒(axis-aligned bounding box (AABB))是一個平行於主軸並緊緊包圍網格的盒子。一個AABB可以通過最小點和最大點來描述:
技術分享圖片
另外AABB也可以通過一個中心點c和一個區域向量e來表示:
技術分享圖片
DX碰撞庫使用center/extents方式來表示:

struct BoundingBox
{
	static const size_t CORNER_COUNT = 8;
	XMFLOAT3 Center; // Center of the box.
	XMFLOAT3 Extents; // Distance from the center to each side.
	…

兩種表達方式很容易進行切換,比如如果給出最小點和最大點:
技術分享圖片
下面的代碼展示了如果計算一個骷髏頭的包圍盒:

XMFLOAT3 vMinf3(+MathHelper::Infinity, +MathHelper::Infinity, +MathHelper::Infinity);
XMFLOAT3 vMaxf3(-MathHelper::Infinity, - MathHelper::Infinity, -MathHelper::Infinity);

XMVECTOR vMin = XMLoadFloat3(&vMinf3);
XMVECTOR vMax = XMLoadFloat3(&vMaxf3);

std::vector<Vertex> vertices(vcount);

for(UINT i = 0; i < vcount; ++i)
{
	fin >> vertices[i].Pos.x >> vertices[i].Pos.y >> vertices[i].Pos.z;
	fin >> vertices[i].Normal.x >> vertices[i].Normal.y >> vertices[i].Normal.z;
	XMVECTOR P = XMLoadFloat3(&vertices[i].Pos);
	
	// Project point onto unit sphere and generate spherical texture coordinates.
	XMFLOAT3 spherePos;
	XMStoreFloat3(&spherePos, XMVector3Normalize(P));
	float theta = atan2f(spherePos.z, spherePos.x);
	
	// Put in [0, 2pi].
	if(theta < 0.0f)
		theta += XM_2PI;
		
	float phi = acosf(spherePos.y);
	float u = theta / (2.0f*XM_PI);
	float v = phi / XM_PI;
	vertices[i].TexC = { u, v };
	vMin = XMVectorMin(vMin, P);
	vMax = XMVectorMax(vMax, P);
}

BoundingBox bounds;
XMStoreFloat3(&bounds.Center, 0.5f*(vMin + vMax));
XMStoreFloat3(&bounds.Extents, 0.5f*(vMax - vMin));

XMVectorMin和XMVectorMax如下:
技術分享圖片


2.2.1 旋轉軸平行的包圍盒

如下圖所示,如果我們在物體局部坐標系下計算AABB,當放到世界坐標系下時,它可能變成旋轉後的包圍盒(oriented bounding box(OBB))。當然,我們也可以變換到網格的局部坐標系下進行相交檢測。
技術分享圖片
另外,我們可以在世界坐標系下重新計算AABB,但是這樣可能導致包圍盒變大,並且不那麽近似網格本來的形狀:
技術分享圖片
另外一種方法是放棄AABB,只使用OBB。DirectX碰撞檢測庫提供了下面的結構來表達OBB:

struct BoundingOrientedBox
{
	static const size_t CORNER_COUNT = 8;
	
	XMFLOAT3 Center; // Center of the box.
	XMFLOAT3 Extents; // Distance from the center to each side.
	XMFLOAT4 Orientation; // Unit quaternion representing rotation (box -> world).
	…

AABB和OBB可以使用DirectX碰撞檢測庫中的靜態成員函數,通過一組頂點來構建:

void BoundingBox::CreateFromPoints(
	_Out_ BoundingBox& Out,
	_In_ size_t Count,
	_In_reads_bytes_(sizeof(XMFLOAT3)+Stride* (Count-1)) const XMFLOAT3* pPoints,
	_In_ size_t Stride );
	
void BoundingOrientedBox::CreateFromPoints(
	_Out_ BoundingOrientedBox& Out,
	_In_ size_t Count,
	_In_reads_bytes_(sizeof(XMFLOAT3)+Stride*  (Count-1)) const XMFLOAT3* pPoints,
	_In_ size_t Stride );

如果你的頂點結構如下:

struct Basic32
{
	XMFLOAT3 Pos;
	XMFLOAT3 Normal;
	XMFLOAT2 TexC;
};

然後你有一個頂點數組組成你的網格:

std::vector<Vertex::Basic32> vertices;

那麽你可以這樣調用這個函數:

BoundingBox box;

BoundingBox::CreateFromPoints(
	box,
	vertices.size(),
	&vertices[0].Pos,
	sizeof(Vertex::Basic32));

stride指定獲取下一個元素需要偏移多少。

為了計算你的網格的包圍體,你的頂點列表需要系統內存拷貝可用,比如保存到std::vector。這是因為CPU不能通過渲染創建的頂點緩沖來讀取數據。所以,對於應用來說,保持系統內存拷貝可用是一致的,例如拾取(下章介紹)。


2.3 球體

包圍球可以通過中心點和半徑來表示,第一種計算方法是AABB,中心點計算如下:
技術分享圖片
半徑通過計算頂點到中心點的最大距離得到:
技術分享圖片
加入計算包圍球是在局部坐標系中進行的,在變換到世界坐標系中時,如果進行了縮放,包圍求不一定能緊緊包圍網格。第一種策略是根據最大的縮放組件值來縮放半徑;另一種方案是變換到世界坐標系中時不進行縮放處理,而是在加載物體網格的時候直接進行縮放操作。
DirectX碰撞檢測庫提供了下面的結構來表示包圍球:

struct BoundingSphere
{
	XMFLOAT3 Center; // Center of the sphere.
	float Radius; // Radius of the sphere.
	…

並且提供了一個靜態成員函數來計算它:

void BoundingSphere::CreateFromPoints(
	_Out_ BoundingSphere& Out,
	_In_ size_t Count,
	_In_reads_bytes_(sizeof(XMFLOAT3)+Stride* (Count-1)) const XMFLOAT3* pPoints,
	_In_ size_t Stride );

2.4 截頭錐體

截頭錐體可以通過6個向內的面來描述:
技術分享圖片
這六個面的表示可以讓我們很容易進行截頭錐體和包圍體的相交測試。


2.4.1 創建截頭錐體平面

其中一個簡單的創建方法是在視景坐標系下,截頭錐體的中心點在原點,並看向Z軸負方向。
DirectX碰撞檢測庫提供了下面的結構來表示截頭錐體:

struct BoundingFrustum
{
	static const size_t CORNER_COUNT = 8;
	
	XMFLOAT3 Origin; // Origin of the frustum (and projection).
	XMFLOAT4 Orientation; // Quaternion representing rotation.
	float RightSlope; // Positive X slope (X/Z).
	float LeftSlope; // Negative X slope.
	float TopSlope; // Positive Y slope (Y/Z).
	float BottomSlope; // Negative Y slope.
	float Near, Far; // Z of the near plane and far plane.
	…

在截頭錐體的局部坐標系下(比如相機的視景坐標系),Origin是0,Orientation是初始值表示不旋轉。我們可以通過這兩個值來移動和旋轉截頭錐體。
如果我們通過緩存截頭錐體的高度,寬高比,近平面和遠平面來定義攝像機,那麽我們就可以使用數學方式定義截頭錐體。並且,我們也可能通過透視投影矩陣導出截頭錐體平面的方程(在視景坐標系下)(see [Lengyel02] or [M?ller08] for two different ways)。XNA碰撞檢測庫給出了下面的策略,在NDC空間下,截頭錐體被歪曲成一個盒子[?1,1] × [?1,1] × [0,1],所以截頭錐體的8個頂角就很簡單:

// Corners of the projection frustum in homogenous space.
static XMVECTORF32 HomogenousPoints[6] =
{
	{ 1.0f, 0.0f, 1.0f, 1.0f }, // right (at far plane)
	{ -1.0f, 0.0f, 1.0f, 1.0f }, // left
	{ 0.0f, 1.0f, 1.0f, 1.0f }, // top
	{ 0.0f, -1.0f, 1.0f, 1.0f }, // bottom
	{ 0.0f, 0.0f, 0.0f, 1.0f }, // near
	{ 0.0f, 0.0f, 1.0f, 1.0f } // far
};

我們可以計算透視投影的逆矩陣來將這8個頂點變換到視景坐標系下。有了頂點後,截頭錐體的平面的計算就變得很簡答。下面的DirectX碰撞檢測庫代碼就是通過透視投影矩陣來計算在視景坐標系下的截頭錐體:

//----------------------------------------------------------------------------
// Build a frustum from a persepective projection matrix. The matrix may only
// contain a projection; any rotation, translation or scale will cause the
// constructed frustum to be incorrect.
//----------------------------------------------------------------------------
_Use_decl_annotations_ inline void XM_CALLCONV

BoundingFrustum::CreateFromMatrix(
	BoundingFrustum& Out,
	FXMMATRIX Projection )
{
	// Corners of the projection frustum in homogenous space.
	static XMVECTORF32 HomogenousPoints[6] =
	{
		{ 1.0f, 0.0f, 1.0f, 1.0f }, // right (at far plane)
		{ -1.0f, 0.0f, 1.0f, 1.0f }, // left
		{ 0.0f, 1.0f, 1.0f, 1.0f }, // top
		{ 0.0f, -1.0f, 1.0f, 1.0f }, // bottom
		{ 0.0f, 0.0f, 0.0f, 1.0f }, // near
		{ 0.0f, 0.0f, 1.0f, 1.0f } // far
	};
	
	XMVECTOR Determinant;
	XMMATRIX matInverse = XMMatrixInverse( &Determinant, Projection );
	
	// Compute the frustum corners in world space.
	XMVECTOR Points[6];
	for( size_t i = 0; i < 6; ++i )
	{
		// Transform point.
		Points[i] = XMVector4Transform( HomogenousPoints[i], matInverse );
	}
	
	Out.Origin = XMFLOAT3( 0.0f, 0.0f, 0.0f );
	Out.Orientation = XMFLOAT4( 0.0f, 0.0f, 0.0f, 1.0f );
	
	// Compute the slopes.
	Points[0] = Points[0] * XMVectorReciprocal( XMVectorSplatZ( Points[0] ) );
	Points[1] = Points[1] * XMVectorReciprocal( XMVectorSplatZ( Points[1] ) );
	Points[2] = Points[2] * XMVectorReciprocal( XMVectorSplatZ( Points[2] ) );
	Points[3] = Points[3] * XMVectorReciprocal( XMVectorSplatZ( Points[3] ) );
	
	Out.RightSlope = XMVectorGetX( Points[0] );
	Out.LeftSlope = XMVectorGetX( Points[1] );
	Out.TopSlope = XMVectorGetY( Points[2] );
	Out.BottomSlope = XMVectorGetY( Points[3] );
	
	// Compute near and far.
	Points[4] = Points[4] * XMVectorReciprocal( XMVectorSplatW( Points[4] ) );
	Points[5] = Points[5] * XMVectorReciprocal( XMVectorSplatW( Points[5] ) );
	Out.Near = XMVectorGetZ( Points[4] );
	Out.Far = XMVectorGetZ( Points[5] );
}

2.4.2 截頭錐體/球體 碰撞檢測

因為截頭錐體我們使用6個向內的平面來表示,所以檢測可以根據下面的狀態來:如果存在一個截頭錐體平面L,球體在L的負方向,那麽我們可以得出結論球體完全在截頭錐體以外,如果不存在這樣的平面,那麽球體就要包含在內。
所以截頭錐體的檢測修改為和6個平面的檢測,如下圖所示:令球體有中心點c和半徑r。那麽從中心點到平面的距離為k = n · c + d
技術分享圖片
BoundingFrustum類提供了下面的成員函數來測試截頭錐體和球體的檢測,註意球體和截頭錐體必須要在相同的坐標系下:

enum ContainmentType
{
	// The object is completely outside the frustum.
	DISJOINT = 0,
	// The object intersects the frustum boundaries.
	INTERSECTS = 1,
	// The object lies completely inside the frustum volume.
	CONTAINS = 2,
};
ContainmentType BoundingFrustum::Contains( _In_ const BoundingSphere& sphere ) const;

BoundingSphere也包含對於的函數:

ContainmentType BoundingSphere::Contains( _In_ const BoundingFrustum& fr ) const;

2.4.3 截頭錐體/AABB 碰撞檢測

和與球體的碰撞檢測類似,與AABB碰撞檢測策略如下:如果存在一個平面L,盒子在L的負半側,那麽盒子就在截頭錐體以外,否則就包含/相交在內。
首先找到盒子的一條結果中點,並且最接近平面法向量的對角線向量v,那麽判定如下圖:
技術分享圖片
找到與法向量最對齊的對角線向量代碼如下:

// For each coordinate axis x, y, z…
for(int j = 0; j < 3; ++j)
{
	// Make PQ point in the same direction as
	// the plane normal on this axis.
	if( planeNormal[j] >= 0.0f )
	{
		P[j] = box.minPt[j];
		Q[j] = box.maxPt[j];
	}
	else
	{
		P[j] = box.maxPt[j];
		Q[j] = box.minPt[j];
	}
}

上面的代碼可以從一維的方式查看,選擇Pi和Qi然後Qi ? Pi和法向量有一個相同的方向:
技術分享圖片
BoundingFrustum類提供了下面的成員函數來測試AABB和截頭錐體,註意他們測試的時候必須在同一個坐標系中:

ContainmentType BoundingFrustum::Contains( _In_ const BoundingBox& box ) const;

BoundingBox也包含類似的函數:

ContainmentType BoundingBox::Contains( _In_ const BoundingFrustum& fr ) const;

3 截頭錐體剔除

技術分享圖片
在本Demo中,渲染了5x5x5個骷髏頭網格。我們為它們在局部坐標系中創建AABB。在UpdateInstanceData函數中,我們執行截頭錐體剔除計算。如果測試通過,我們將它添加到結構化緩沖中,並增加visibleInstanceCount值。那麽結構化緩沖中前面的網格就是可見的。因為AABB是在局部坐標系下,所以我們需要變化截頭錐體到每個局部坐標系下完成檢測;也可以將它們都轉換到世界坐標系中,代碼如下:

XMMATRIX view = mCamera.GetView();
XMMATRIX invView = XMMatrixInverse(&XMMatrixDeterminant(view), view);
auto currInstanceBuffer = mCurrFrameResource->InstanceBuffer.get();

for(auto& e : mAllRitems)
{
	const auto& instanceData = e->Instances;
	int visibleInstanceCount = 0;
	
	for(UINT i = 0; i < (UINT)instanceData.size(); ++i)
	{
		XMMATRIX world = XMLoadFloat4x4(&instanceData[i].World);
		XMMATRIX texTransform = XMLoadFloat4x4(&instanceData[i].TexTransform);
		XMMATRIX invWorld = XMMatrixInverse(&XMMatrixDeterminant(world), world);
		
		// View space to the object’s local space.
		XMMATRIX viewToLocal = XMMatrixMultiply(invView, invWorld);
		
		// Transform the camera frustum from view space to the object’s local space.
		BoundingFrustum localSpaceFrustum;
		mCamFrustum.Transform(localSpaceFrustum, viewToLocal);
		
		// Perform the box/frustum intersection test in local space.
		if(localSpaceFrustum.Contains(e->Bounds) != DirectX::DISJOINT)
		{
			InstanceData data;
			XMStoreFloat4x4(&data.World, XMMatrixTranspose(world));
			XMStoreFloat4x4(&data.TexTransform, XMMatrixTranspose(texTransform));
			data.MaterialIndex = instanceData[i].MaterialIndex;
			
			// Write the instance data to structured buffer for the visible objects.
			currInstanceBuffer->CopyData(visibleInstanceCount++, data);
		}
	}
	
	e->InstanceCount = visibleInstanceCount;
	// For informational purposes, output the number of instances
	// visible over the total number of instances.
	std::wostringstream outs;
	outs.precision(6);
	outs << L"Instancing and Culling Demo" <<
		L" " << e->InstanceCount <<
		L" objects visible out of " << e-
		>Instances.size();
		
	mMainWndCaption = outs.str();
}

即使實例化緩沖中可以包含所有實例,但是我們只渲染可見的網格(0到visibleInstanceCount-1):

cmdList->DrawIndexedInstanced(ri->IndexCount,
	ri->InstanceCount,
	ri->StartIndexLocation,
	ri->BaseVertexLocation, 0);


4 總結

  1. 實例化是指在場景中繪制同一個物體多次,但是使用不同的位置,材質,紋理等。可以綁定SRV到一個結構化緩沖中然後使用SV_InstancedID來索引實例數據。並且可以通過設置ID3D12GraphicsCommandList::DrawIndexedInstanced第二個參數InstanceCount在同一個繪制調用中繪制相關的實例;
  2. 包圍體是近似物體的幾何基元。它降低了物體的精度用以更高效和簡單的計算(碰撞檢測、截頭錐體剔除等)。在DirectXCollision.h庫中包含AABB和OBB的結構實現;
  3. GPU自動剔除在視景截頭錐體以外的三角形(在裁剪階段)。但是這些三角形還是會經過渲染管線,通過頂點著色器,也可能通過曲面細分階段,也可能通過幾何著色器階段。為了提高這個性能,我們手動實現一個截頭錐體剔除;主要思路就是使用包圍體簡化物體,然後進行剔除測試,值提交可見的物體到渲染管線。


5 練習

Introduction to 3D Game Programming with DirectX 12 學習筆記之 --- 第十六章:實例化和截頭錐體裁切