Wednesday 12 August 2015

DirectX12, First impressions

So here we go, Windows 10 is out, and so is "officially" DirectX12 (first official samples are finally now available).

Since I was not part of early access program, I could not see much samples, but setting up pipeline for tests was reasonably easy, even tho you have no idea about "best practices".
I helped a bit fixing some of the remaining issues in SharpDX (and also integrated DirectX11.3 support in there on the way, welcome volume tiled resources and conservative rasterization).

So way before official samples I managed to have most features I needed running, but official samples helped to finally nail down a bit the "last pieces of the puzzle" (mostly swap chain handling and descriptor heaps best practices).

First thing we obviously do is to build a small utility library (to remove the boilerplate and be able to prototype fast), and then play :)

So new API means changes, new approaches, new possibilities, everything that I love.

As I'm generally not dealing with the "general case" (game engines with ton of assets), eg: lot of procedural geometry, tons of different effects permutations, many different scenes... I will of course speak about what it changes for those scenarios.

So.... let's go

1/Resources

Finally (and really I mean it), a resource is just a resource, eg: some place in memory, there are no restrictions anymore about binding.

Before in DirectX11, you could, for example, not create a resource usable both as Vertex/Index buffer and StructuredBuffer.

For vertex buffer it's okay (can just bind as StructuredBuffer and fetch in Vertex Shader using VertexID), but index buffer can only be bound within the pipeline, so either had to copy to byte address, or mimic Append/Counter features with ByteAddressBuffer.

Now I can create a resource, create an Index buffer view for it, and an append or counter structured view as well, no problem, way it should be.

(Note: From DirectX11.2 it was possible to do it in some ways using Tiled resources, you create 2 "empty resources" (one for index, one for structured), and allocate the same tile mapping to each of those, works but a bit hacky).

This is a huge step forward in my opinion, since now I can construct geometry once and seamlessly use it in pipeline or compute.

2/Heaps

As well as having resource as locations, now we have several ways to create resources. 

  • Commited : this is roughly equivalent to DirectX11 style, you create a resource and the runtime allocates backing memory for it
  • Placed : Now we can create a Heap (eg: some memory), and place resources in there, so several resources can share the same heap (and even overlap, so for small temporary resources this is rather useful)
  • Reserved : Reserved resources is the Tiled resources equivalent
Heaps can also be of 3 types:
  • Default : same as previously
  • Upload : Cpu accessible for writing
  • Readback : Cpu Accessible for reading

As a side note, the way to upload resources now is totally up to you, recommended way is to allocate an upload resource, write data in there, and use a copy command to copy data to a default resources, since it's mentionned that gpu access to Upload "resources" is slower (which I actually confirmed from some benchmarks).

So you have pretty much full control on how you organize/load your memory (specially as you can have dedicated copy Command Queues, but that one likely for another post).

3/Queries

Queries in DirectX11 were an utter pain (and often some form of disaster waiting to happen).

You had to wait for some point later in the frame, loop until data is available, then get data, Stall party in brief.

Now queries handling is much simpler, you just create a query heap (per query type), which can also contain backing memory for several of those, then use begin/end (except time which now only uses end as it gets gpu timer)

Then when you need to access data, you resolve query in a resource (which can be of any type), so you can either wait for end of frame and readback or also even use that data right away (stream output to indirect draw, I was waiting for this for so long..., bye bye DrawAuto, will no miss you anytime ;)

4/Uav Counters

On the same fashion, uav counters have simply.... (drum rolls) ... disappeared.

Now uav counters are simply backed using a resource, which of course means that you have full read/write control over it in a cpu/gpu fashion (you can even share counter between different resources/uavs, I'm pretty sure I'll find a weird use case for it at some points)

Previously in DirectX11, you could only set initial count before a dispatch in Cpu, which was sometimes quite limiting (quite often just ended up using a small buffer and interlocked functions to mimic append/counter, now all can also be used seamlessly, also a Stream Output query result can now be set as an initial count for an append buffer, which is something I already see myself abusing).


5/Pipeline State Objects

PSO are an obvious huge difference in how you set your pipeline.

Before in DirectX11, you have to set several states individually (blend/rasterizer, shaders...), which of course provided a reasonable level of flexibility (really easy to switch from solid to wireframe for example), but at the expense of a performance hit (as well as have to implement state tracking).

Now the whole pipeline (shaders/states/input layout, all except resource binding and render targets) is created up front and sent in a single call.

This offers the obvious advantage of a very big performance gain (since now the card know up front what to do and doesn't need to "reconstruct" the pipeline every draw, but pipeline states are expensive to create, so don't expect to tweak small states and have an immediate feedback (pso creation cost some tenth of milliseconds, please note they can of course be created async).

PSO can also be cached (hence also serialized to disk), so this can improve loading times drastically for applications in runtime mode.



So that's it for the first overview part, there's of course many more new features and changes, but let's keep those for another post.