Skip to content

Commit

Permalink
Merge pull request #742 from w23/stream-E379-staging
Browse files Browse the repository at this point in the history
Implements new totally automatic barrier placement. Also, staging is refactored.

- [x] image staging
  - [x] some images are corrupted
  - ~~[ ] #745~~ -- postponed until next time we'd need to touch images; current code works good enough for now.
  - [x] use combuf auto barriers everywhere where it makes sense
- [x] corrupted geometry in playdemo ...
- [x] buffer staging
  - [x] #743 
  - [x] track copied staging regions: i.e. staging must know that it has been drained fully
- [x] RT-trad dynamic toggle
  - [x] push-pull staging boundary
- [x] frame dependency tracking: automatically free/flip buffers when frame using them is done
- [x] replace ALL barriers with combuf ones
  - [x] buffers in rtx/resources
  - [x] images
    - [x] track images sync state inline where possible
  - [x] find other uses
- [x] improve staging
  - [x] track staging users explicitly
    - [x] per-user stats: sizes, allocations, etc
    - [x] push remaining data for stale users
  - [x] use ring buffer directly, track frame boundaries externally in fctl
- [x] crash in `buildBlases()`:
  1. load map with rt disabled
  2. change to another map
  3. enable rt
  4. 💥
- [x] suboptimal barrier, see comment #742 (comment)
- [x] simplify creating and building TLAS
- [x] Run rendering tests
  - [x] missing emissive toxic waters
    - Leave as a known problem: it's due to inadvertently skipping some water surfaces when looking for emissive ones, see:
      - #56
      - #752
  - [x] slightly different indirect blur
    - Assuming that this is due to Á-Trous filtering, which could've sneaked through before the gold images were set. Not going to investigate, as we're about to submit a big change to the denoiser.
  • Loading branch information
w23 authored Dec 19, 2024
2 parents d3171d9 + 9ad0888 commit 300658c
Show file tree
Hide file tree
Showing 55 changed files with 2,217 additions and 1,671 deletions.
2 changes: 1 addition & 1 deletion .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
root = true

[*]
charset = latin1
charset = utf-8
end_of_line = lf
indent_style = tab
insert_final_newline = true
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/c-cpp.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: Build & Deploy Engine
on:
on:
push:
paths-ignore:
- '**.md'
Expand All @@ -21,10 +21,10 @@ jobs:
fail-fast: false
matrix:
include:
- os: ubuntu-20.04
- os: ubuntu-22.04
targetos: linux
targetarch: amd64
- os: ubuntu-20.04
- os: ubuntu-22.04
targetos: linux
targetarch: i386
# TODO enable and test ref_vk for it too
Expand Down Expand Up @@ -57,7 +57,7 @@ jobs:
targetarch: i386
env:
SDL_VERSION: 2.26.2
VULKAN_SDK_VERSION: 1.3.239
VULKAN_SDK_VERSION: 1.3.296
GH_CPU_ARCH: ${{ matrix.targetarch }}
ANDROID_SDK_TOOLS_VER: 4333796
steps:
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

# Other
*.save
prefix/

# Qt Creator for some reason creates *.user.$version files, so exclude it too
*.user*
Expand Down
6 changes: 5 additions & 1 deletion engine/platform/sdl/vid_sdl.c
Original file line number Diff line number Diff line change
Expand Up @@ -861,7 +861,11 @@ qboolean VID_CreateWindow( int width, int height, window_mode_t window_mode )

if( !GL_UpdateContext( ))
return false;

}
else if( glw_state.context_type == REF_VULKAN )
{
// FIXME this is probably not correct place or way to do it, just copypasting GL stuff
VID_StartupGamma();
}

#else // SDL_VERSION_ATLEAST( 2, 0, 0 )
Expand Down
23 changes: 23 additions & 0 deletions ref/vk/NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -1177,3 +1177,26 @@ Cons: ridiculous texture explosion
- `performTracing()` write resource desc values passed from outside on each call
- new resources are added in `reloadMainpipe()`
- resource with zero refcount are destroyed in `cleanupResources()`


# 2024-11-26
`./waf clangdb` produces `compile_commands.json` file inside of the build directory. All the paths in the file are relative to that directory.
If the build directory is something 2nd level, like `build/amd64-debug`, and the file is then symlinked to (as nvim/lsp/clangd only looks for the file in the root and in the `./build` dir), then it confuses nvim/lsp/clangd.
Solution: make build dir literally just `./build`.


# 2024-11-27 E381
## Removing staging flush

### vk_scene.c/reloadPatches()
- Can ignore for now

### Staging full
- (I) Just allocate another buffer for staging
- (II) Figure out why the hell do we need so much staging memory
- PBR/remastered textures
- possible solution: lazy/ondemand loading

### vk_brush.c / collect emissive surfaces
- (I) try to merge emissive collection with surface loading
- (II) convert from pushing material data to pulling. Not really clear how to do easily.
52 changes: 52 additions & 0 deletions ref/vk/TODO.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,59 @@
## Next

## Upcoming
- [ ] framectl frame tracking, e.g.:
- [ ] wait for frame fence only really before actually starting to build combuf in R_BeginFrame()
- why: there should be nothing to synchronize with
- why: more straightforward dependency tracking
- why not: waiting on frame fence allows freeing up staging and other temp memory
- [ ] Remove second semaphore from submit, replace it with explicit barriers for e.g. geom buffer
- [x] why: best practice validation complains about too wide ALL_COMMANDS semaphore
- why: explicit barriers are more clear, better perf possible too
- [ ] Do not lose barrier-tracking state between frames
- [ ] Render graph
- [ ] performance profiling and comparison

## 2024-12-17 E385
- [x] fix rendering on amdgpu+radv
### After stream
- [x] cleanup TLAS creation and building code

## 2024-12-12 E384
- [x] track image sync state with the image object itself (and not with vk_resource)

### After stream
- [x] Proper staging-vs-frame tracking, replace tag with something sensitive
- currently assert fails because there's 1 frame latency, not one.
- [x] comment for future: full staging might want to wait for previous frame to finish
- [x] zero vkCmdPipelineBarriers calls
- [x] grep for anything else

## 2024-12-10 E383
- [x] Add transfer stage to submit semaphore separating command buffer: fixes sync for rt
- [x] Issue staging commit for a bunch of RT buffers (likely not all of them)
- [x] move destination buffer tracking to outside of staging:
- [x] vk_geometry
- [x] vk_light: grid, metadata
- [x] vk_ray_accel: TLAS geometries
- [x] vk_ray_model: kusochki
- [x] staging should not be aware of cmdbuf either
- [x] `R_VkStagingCommit()`: -- removed
- [x] `R_VkStagingGetCommandBuffer()` -- removed
- [x] Go through all staged buffers and make sure that they are committed
- [x] Commit staging in right places for right buffers
- [x] Add mode staging debug tracking/logs

### After stream
- [x] Fix glitch geometry
- [x] Which specific models produce it? Use nsight btw

## 2024-05-24 E379
- [ ] refactor staging:
- [ ] move destination image tracking to outside of staging
- [x] vk_image ← vk_texture (E380)
- [x] implement generic staging regions (E380)
- [ ] implement stricter staging regions tracking

## 2024-05-07 E376
- [ ] resource manager
- [x] extract all resource mgmt from vk_rtx into a designated file
Expand Down
2 changes: 1 addition & 1 deletion ref/vk/alolcator.c
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,7 @@ uint32_t aloRingAlloc(alo_ring_t* ring, uint32_t size, uint32_t alignment) {

// 1. Check if we have enough space immediately in front of head
if (pos + size <= ring->size) {
ring->head = (pos + size) % ring->size;
ring->head = pos + size;
return pos;
}

Expand Down
52 changes: 52 additions & 0 deletions ref/vk/arrays.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#include "arrays.h"

#include "vk_core.h" // Mem_Malloc

#include <stddef.h> // NULL


void arrayDynamicInit(array_dynamic_t *array, int item_size) {
array->items = NULL;
array->count = 0;
array->capacity = 0;
array->item_size = item_size;
}

void arrayDynamicDestroy(array_dynamic_t *array) {
if (array->items)
Mem_Free(array->items);
}

static void arrayDynamicEnsureCapacity(array_dynamic_t *array, int min_capacity) {
if (array->capacity >= min_capacity)
return;

if (array->capacity == 0)
array->capacity = 2;

while (array->capacity < min_capacity)
array->capacity = array->capacity * 3 / 2;

void *new_buffer = Mem_Malloc(vk_core.pool, array->capacity * array->item_size);
if (array->items) {
memcpy(new_buffer, array->items, array->count * array->item_size);
Mem_Free(array->items);
}
array->items = new_buffer;
}

void arrayDynamicResize(array_dynamic_t *array, int count) {
arrayDynamicEnsureCapacity(array, count);
array->count = count;
}

void arrayDynamicAppend(array_dynamic_t *array, void *item) {
const int new_count = array->count + 1;
arrayDynamicEnsureCapacity(array, new_count);

if (item)
memcpy((char*)array->items + array->count * array->item_size, item, array->item_size);

array->count = new_count;
}

87 changes: 87 additions & 0 deletions ref/vk/arrays.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#pragma once

#include <stddef.h> // size_t

#define VIEW_DECLARE_CONST(TYPE, NAME) \
struct { \
const TYPE *items; \
int count; \
} NAME

// Array with compile-time maximum size
#define BOUNDED_ARRAY_DECLARE(TYPE, NAME, MAX_SIZE) \
struct { \
TYPE items[MAX_SIZE]; \
int count; \
} NAME

#define BOUNDED_ARRAY(TYPE, NAME, MAX_SIZE) \
BOUNDED_ARRAY_DECLARE(TYPE, NAME, MAX_SIZE) = {0}

#define BOUNDED_ARRAY_HAS_SPACE(array_, space_) \
((COUNTOF((array_).items) - (array_).count) >= space_)

#define BOUNDED_ARRAY_APPEND_UNSAFE(array_) \
((array_).items[(array_).count++])

#define BOUNDED_ARRAY_APPEND_ITEM(var, item) \
do { \
ASSERT(BOUNDED_ARRAY_HAS_SPACE(var, 1)); \
var.items[var.count++] = item; \
} while(0)


// Dynamically-sized array
// I. Type-agnostic

typedef struct array_dynamic_s {
void *items;
size_t count, capacity;
size_t item_size;
} array_dynamic_t;

void arrayDynamicInit(array_dynamic_t *array, int item_size);
void arrayDynamicDestroy(array_dynamic_t *array);

void arrayDynamicReserve(array_dynamic_t *array, int capacity);
void arrayDynamicAppend(array_dynamic_t *array, void *item);
#define arrayDynamicAppendItem(array, item) \
do { \
ASSERT((array)->item_size == sizeof(&(item))); \
arrayDynamicAppend(array, item); \
} while (0)
/* void *arrayDynamicGet(array_dynamic_t *array, int index); */
/* #define arrayDynamicAt(array, type, index) \ */
/* (ASSERT((array)->item_size == sizeof(type)), \ */
/* ASSERT((array)->count > (index)), \ */
/* arrayDynamicGet(array, index)) */
void arrayDynamicResize(array_dynamic_t *array, int count);
//void arrayDynamicErase(array_dynamic_t *array, int begin, int end);

//void arrayDynamicInsert(array_dynamic_t *array, int before, int count, void *items);

// II. Type-specific
#define ARRAY_DYNAMIC_DECLARE(TYPE, NAME) \
struct { \
TYPE *items; \
size_t count, capacity; \
size_t item_size; \
} NAME

#define arrayDynamicInitT(array) \
arrayDynamicInit((array_dynamic_t*)array, sizeof((array)->items[0]))

#define arrayDynamicDestroyT(array) \
arrayDynamicDestroy((array_dynamic_t*)array)

#define arrayDynamicResizeT(array, size) \
arrayDynamicResize((array_dynamic_t*)(array), (size))

#define arrayDynamicAppendT(array, item) \
arrayDynamicAppend((array_dynamic_t*)(array), (item))

#define arrayDynamicInsertT(array, before, count, items) \
arrayDynamicInsert((array_dynamic_t*)(array), before, count, items)

#define arrayDynamicAppendManyT(array, items_count, items) \
arrayDynamicInsert((array_dynamic_t*)(array), (array)->count, items_count, items)
2 changes: 1 addition & 1 deletion ref/vk/r_block.c
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ typedef struct r_blocks_block_s {
// <--- pool --><-- ring --->
// offset ? --->

int allocMetablock(r_blocks_t *blocks) {
static int allocMetablock(r_blocks_t *blocks) {
return aloIntPoolAlloc(&blocks->blocks.freelist);
// TODO grow if needed
}
Expand Down
1 change: 1 addition & 0 deletions ref/vk/r_speeds.c
Original file line number Diff line number Diff line change
Expand Up @@ -946,6 +946,7 @@ void R_SpeedsDisplayMore(uint32_t prev_frame_index, const struct vk_combuf_scope
speedsPrintf( "Driver: %u.%u.%u, Vulkan: %u.%u.%u\n",
XVK_PARSE_VERSION(vk_core.physical_device.properties.driverVersion),
XVK_PARSE_VERSION(vk_core.physical_device.properties.apiVersion));
speedsPrintf( "Resolution: %ux%u\n", vk_frame.width, vk_frame.height);
}

const uint32_t events = g_aprof.events_last_frame - prev_frame_index;
Expand Down
11 changes: 6 additions & 5 deletions ref/vk/r_textures.c
Original file line number Diff line number Diff line change
Expand Up @@ -177,14 +177,15 @@ static void createDefaultTextures( void )

// emo-texture from quake1
pic = Common_FakeImage( 16, 16, 1, IMAGE_HAS_COLOR );
uint *const buffer = PTR_CAST(uint, pic->buffer);

for( y = 0; y < 16; y++ )
{
for( x = 0; x < 16; x++ )
{
if(( y < 8 ) ^ ( x < 8 ))
((uint *)pic->buffer)[y*16+x] = 0xFFFF00FF;
else ((uint *)pic->buffer)[y*16+x] = 0xFF000000;
buffer[y*16+x] = 0xFFFF00FF;
else buffer[y*16+x] = 0xFF000000;
}
}

Expand All @@ -211,19 +212,19 @@ static void createDefaultTextures( void )
// white texture
pic = Common_FakeImage( 4, 4, 1, IMAGE_HAS_COLOR );
for( x = 0; x < 16; x++ )
((uint *)pic->buffer)[x] = 0xFFFFFFFF;
buffer[x] = 0xFFFFFFFF;
tglob.whiteTexture = R_TextureUploadFromBufferNew( REF_WHITE_TEXTURE, pic, TF_COLORMAP );

// gray texture
pic = Common_FakeImage( 4, 4, 1, IMAGE_HAS_COLOR );
for( x = 0; x < 16; x++ )
((uint *)pic->buffer)[x] = 0xFF7F7F7F;
buffer[x] = 0xFF7F7F7F;
tglob.grayTexture = R_TextureUploadFromBufferNew( REF_GRAY_TEXTURE, pic, TF_COLORMAP );

// black texture
pic = Common_FakeImage( 4, 4, 1, IMAGE_HAS_COLOR );
for( x = 0; x < 16; x++ )
((uint *)pic->buffer)[x] = 0xFF000000;
buffer[x] = 0xFF000000;
tglob.blackTexture = R_TextureUploadFromBufferNew( REF_BLACK_TEXTURE, pic, TF_COLORMAP );

// cinematic dummy
Expand Down
17 changes: 12 additions & 5 deletions ref/vk/ray_pass.c
Original file line number Diff line number Diff line change
Expand Up @@ -270,14 +270,24 @@ void RayPassPerform(struct ray_pass_s *pass, vk_combuf_t *combuf, ray_pass_perfo

const qboolean write = i >= pass->desc.write_from;
R_VkResourceAddToBarrier(res, write, pass->pipeline_type, &barrier);
}

DEBUG_BEGIN(combuf->cmdbuf, pass->debug_name);
R_VkBarrierCommit(combuf, &barrier, pass->pipeline_type);

for (int i = 0; i < num_bindings; ++i) {
const int index = args.resources_map ? args.resources_map[i] : i;
vk_resource_t* const res = args.resources[index];

const vk_descriptor_value_t *const src_value = &res->value;
vk_descriptor_value_t *const dst_value = pass->desc.riptors.values + i;

// layout is only known after barrier
// FIXME this is not true, it can be known earlier
if (res->type == VK_DESCRIPTOR_TYPE_STORAGE_IMAGE) {
dst_value->image = (VkDescriptorImageInfo) {
.imageLayout = write ? res->write.image_layout : res->read.image_layout,
.imageView = src_value->image_object->view,
.imageLayout = res->ref.image->sync.layout,
.imageView = res->ref.image->view,
.sampler = VK_NULL_HANDLE,
};
} else {
Expand All @@ -287,9 +297,6 @@ void RayPassPerform(struct ray_pass_s *pass, vk_combuf_t *combuf, ray_pass_perfo

VK_DescriptorsWrite(&pass->desc.riptors, args.frame_set_slot);

DEBUG_BEGIN(combuf->cmdbuf, pass->debug_name);
R_VkBarrierCommit(combuf->cmdbuf, &barrier, pass->pipeline_type);

switch (pass->type) {
case RayPassType_Tracing:
{
Expand Down
Loading

0 comments on commit 300658c

Please sign in to comment.