Compare commits

..

42 Commits

Author SHA1 Message Date
Narr the Reg
4bda2b475f opengl: Sanitize antialiasing config 2023-01-06 13:42:20 -06:00
Fernando S
7ef897a277 Merge pull request #9566 from Wollnashorn/vulkan-cache-header-fix
video_core/vulkan: Fixed loading of Vulkan driver pipeline cache
2023-01-06 11:58:36 -05:00
Wollnashorn
457826a83b video_core/vulkan: Fixed loading of Vulkan driver pipeline cache
The header size of the Vulkan driver pipeline cache files was incorrectly in PipelineCache::LoadVulkanPipelineCache, for which the pipeline cache wasn't read correctly and got invalidated on each load.
2023-01-06 16:52:41 +01:00
Fernando S
8b251fc3f6 Merge pull request #9535 from bylaws/master
Port over several shader-compiler fixes from skyline
2023-01-06 10:06:45 -05:00
liamwhite
3c05988df2 Merge pull request #9561 from liamwhite/update-dynarmic
externals: update dynarmic, xbyak
2023-01-06 10:00:18 -05:00
liamwhite
6d74490139 Merge pull request #9558 from MonsterDruide1/network-timeout-noerror
net: Silently translate ETIMEDOUT network error
2023-01-06 10:00:09 -05:00
liamwhite
020dbcdbc7 Merge pull request #9552 from liamwhite/turbo
vulkan: implement 'turbo mode' clock booster
2023-01-06 09:59:59 -05:00
Fernando S
5bcbb8de45 Merge pull request #9559 from FernandoS27/cached-writes
VideoCore: Implement Cached Writes, use fastmem for reading GPU memory and eliminate old stuffs
2023-01-06 07:31:39 -05:00
liamwhite
990fe2b3fc Merge pull request #9564 from FernandoS27/oops-i-did-it-again
MacroHLE: eliminate 2 rushed macros.
2023-01-05 22:14:27 -05:00
Fernando Sahmkow
f6245dc40a MacroHLE: eliminate 2 rushed macros. 2023-01-05 20:53:31 -05:00
liamwhite
eaca61e073 Merge pull request #9528 from liamwhite/mvk-nulldesc
renderer_vulkan: implement fallback path for null buffer descriptors
2023-01-05 18:31:55 -05:00
liamwhite
3e33a878dc Merge pull request #9536 from liamwhite/debug-utils
vulkan_common: unify VK_EXT_debug_utils and selection of validation layer
2023-01-05 18:31:45 -05:00
Liam
1ee0540f82 externals: update dynarmic, xbyak 2023-01-05 18:06:06 -05:00
Billy Laws
58fec43768 Run clang-format 2023-01-05 22:18:10 +00:00
Billy Laws
12b4c9c04c externals: Update sirit 2023-01-05 22:13:07 +00:00
Billy Laws
68ed60cee4 shader_recompiler: Fix shuffle partitioning for >64 invoc-per-subgroup GPUs
The existing implementation only supports 64 invoc-per-subgroup GPUs, and misbehaves on adreno when invocations need to be split into 4 emulated subgroups.
2023-01-05 22:13:07 +00:00
Billy Laws
6c812a0c84 Vulkan, OpenGL: Hook up geometry shader passthrough emulation 2023-01-05 22:13:07 +00:00
Billy Laws
625a4af73a shader_recompiler: Add support for lowering geometry passthrough
Reuses most of the existing code for generating the gl_Layer passthrough. Fixes geometry in Nier: Automata on GPUs without HW passthrough support.
2023-01-05 22:13:07 +00:00
Billy Laws
9e2997c4b6 Vulkan, OpenGL: Hook up storage buffer alignment code 2023-01-05 22:13:07 +00:00
Billy Laws
8804a4eb23 shader_recompiler: Align SSBO offsets to meet host requirements
We can take advantage of SSBO addresses being passed in a constant bufer to account for the extra alignment requirements in the shader itself.
2023-01-05 22:13:07 +00:00
Billy Laws
3f0985c7b0 shader_recompiler: SPIRV: Only enable int64 feature when supported 2023-01-05 22:13:07 +00:00
Billy Laws
c1cc99584c shader_recompiler: Add comparison operators to descriptor types 2023-01-05 22:13:07 +00:00
Billy Laws
bbfad79c89 Vulkan: Add a workaround for input_position on Adreno drivers
Adreno drivers will crash compiling geometry shaders if the input position is not wrapped in a gl_in struct.
2023-01-05 22:13:07 +00:00
Fernando S
1428451722 Merge pull request #9527 from Wollnashorn/amd-cache-fix
video_core/vulkan: Implemented `VkPipelineCache` to store Vulkan pipelines
2023-01-05 16:38:07 -05:00
Wollnashorn
e07976a22b video_core/vulkan: Vulkan driver pipelines now contain cache version
So that old cache can get deleted when the cache version changes and does not grow infinitely
2023-01-05 21:03:01 +01:00
Wollnashorn
9c9008ac81 video_core/vulkan: Driver pipeline cache will now be deleted with the shader cache 2023-01-05 21:03:01 +01:00
Wollnashorn
8945fafcc0 config: Set the Vulkan driver pipeline cache option to be global 2023-01-05 21:03:01 +01:00
Wollnashorn
f2aa816679 video_core/vulkan: Added check if Vulkan pipeline path has been set 2023-01-05 21:03:01 +01:00
Wollnashorn
f4626512ff config: Better wording for VK pipeline cache option and enable by default 2023-01-05 21:03:01 +01:00
Wollnashorn
67d4f190f7 yuzu-cmd: Removed use_vulkan_driver_pipeline_cache from default_ini.h
The addition of the use_vulkan_driver_pipeline_cache option into the default ini string literal caused the 16,384-byte limit of the MSVC compiler to be exceeded.
2023-01-05 21:03:01 +01:00
Wollnashorn
16809c1fa7 video_core/vulkan: Added VkPipelineCache to store Vulkan pipelines
As an optional feature which can be enabled in the advanced graphics configuration, all pipelines that get built at the initial shader loading are stored in a VkPipelineCache object and are dumped to the disk.

These vendor specific pipeline cache files are located at `/shader/GAME_ID/vulkan_pipelines.bin`. This feature was mainly added because of an issue with the AMD driver (see yuzu-emu#8507) causing invalidation of the cache files the driver builds automatically.
2023-01-05 21:02:44 +01:00
Fernando Sahmkow
b56ad93bbc BufferBase: Don't ignore GPU pages. 2023-01-05 14:00:10 -05:00
Fernando Sahmkow
2d0c4f2b1d Fermi2D: sync cache flushes 2023-01-05 06:43:28 -05:00
Fernando Sahmkow
af5ecb0b15 MemoryManager: use fastmem directly. 2023-01-05 06:06:33 -05:00
MonsterDruide1
688a9fbfa6 net: Silently translate ETIMEDOUT network error 2023-01-05 11:54:36 +01:00
Fernando Sahmkow
6c7eb81f7d video_core: Cache GPU internal writes. 2023-01-05 05:23:39 -05:00
liamwhite
e82e3e06be Merge pull request #9557 from FernandoS27/ooops-i-killed-the-shitty-drivers
Vulkan: Fix drivers that don't support dynamic_state_2 up
2023-01-05 00:14:01 -05:00
Fernando Sahmkow
4d9af4a9d2 Vulkan: Fix drivers that don't support dynamic_state_2 up 2023-01-05 00:11:16 -05:00
Liam
a4269c285a common: add setting for renderer clock workaround 2023-01-04 22:22:01 -05:00
Liam
301e9bbc03 vulkan: implement 'turbo mode' clock booster 2023-01-04 22:22:01 -05:00
Liam
66ae79de13 renderer_vulkan: implement fallback path for null descriptors 2023-01-04 22:14:01 -05:00
Liam
aa13ee5c4a vulkan_common: unify VK_EXT_debug_utils and selection of validation layer 2023-01-01 11:59:47 -05:00
85 changed files with 1198 additions and 775 deletions

View File

@@ -185,6 +185,7 @@ void RestoreGlobalState(bool is_powered_on) {
// Renderer
values.fsr_sharpening_slider.SetGlobal(true);
values.renderer_backend.SetGlobal(true);
values.renderer_force_max_clock.SetGlobal(true);
values.vulkan_device.SetGlobal(true);
values.aspect_ratio.SetGlobal(true);
values.max_anisotropy.SetGlobal(true);
@@ -200,6 +201,7 @@ void RestoreGlobalState(bool is_powered_on) {
values.use_asynchronous_shaders.SetGlobal(true);
values.use_fast_gpu_time.SetGlobal(true);
values.use_pessimistic_flushes.SetGlobal(true);
values.use_vulkan_driver_pipeline_cache.SetGlobal(true);
values.bg_red.SetGlobal(true);
values.bg_green.SetGlobal(true);
values.bg_blue.SetGlobal(true);

View File

@@ -415,6 +415,7 @@ struct Values {
// Renderer
SwitchableSetting<RendererBackend, true> renderer_backend{
RendererBackend::Vulkan, RendererBackend::OpenGL, RendererBackend::Null, "backend"};
SwitchableSetting<bool> renderer_force_max_clock{true, "force_max_clock"};
Setting<bool> renderer_debug{false, "debug"};
Setting<bool> renderer_shader_feedback{false, "shader_feedback"};
Setting<bool> enable_nsight_aftermath{false, "nsight_aftermath"};
@@ -451,6 +452,8 @@ struct Values {
SwitchableSetting<bool> use_asynchronous_shaders{false, "use_asynchronous_shaders"};
SwitchableSetting<bool> use_fast_gpu_time{true, "use_fast_gpu_time"};
SwitchableSetting<bool> use_pessimistic_flushes{false, "use_pessimistic_flushes"};
SwitchableSetting<bool> use_vulkan_driver_pipeline_cache{true,
"use_vulkan_driver_pipeline_cache"};
SwitchableSetting<u8> bg_red{0, "bg_red"};
SwitchableSetting<u8> bg_green{0, "bg_green"};

View File

@@ -229,7 +229,11 @@ std::shared_ptr<Dynarmic::A32::Jit> ARM_Dynarmic_32::MakeJit(Common::PageTable*
config.enable_cycle_counting = true;
// Code cache size
#ifdef ARCHITECTURE_arm64
config.code_cache_size = 128_MiB;
#else
config.code_cache_size = 512_MiB;
#endif
// Allow memory fault handling to work
if (system.DebuggerEnabled()) {

View File

@@ -288,7 +288,11 @@ std::shared_ptr<Dynarmic::A64::Jit> ARM_Dynarmic_64::MakeJit(Common::PageTable*
config.enable_cycle_counting = true;
// Code cache size
#ifdef ARCHITECTURE_arm64
config.code_cache_size = 128_MiB;
#else
config.code_cache_size = 512_MiB;
#endif
// Allow memory fault handling to work
if (system.DebuggerEnabled()) {

View File

@@ -117,6 +117,8 @@ Errno TranslateNativeError(int e) {
return Errno::NETUNREACH;
case WSAEMSGSIZE:
return Errno::MSGSIZE;
case WSAETIMEDOUT:
return Errno::TIMEDOUT;
default:
UNIMPLEMENTED_MSG("Unimplemented errno={}", e);
return Errno::OTHER;
@@ -211,6 +213,8 @@ Errno TranslateNativeError(int e) {
return Errno::NETUNREACH;
case EMSGSIZE:
return Errno::MSGSIZE;
case ETIMEDOUT:
return Errno::TIMEDOUT;
default:
UNIMPLEMENTED_MSG("Unimplemented errno={}", e);
return Errno::OTHER;
@@ -226,7 +230,7 @@ Errno GetAndLogLastError() {
int e = errno;
#endif
const Errno err = TranslateNativeError(e);
if (err == Errno::AGAIN) {
if (err == Errno::AGAIN || err == Errno::TIMEDOUT) {
return err;
}
LOG_ERROR(Network, "Socket operation error: {}", Common::NativeErrorToString(e));

View File

@@ -436,7 +436,7 @@ struct Memory::Impl {
}
if (Settings::IsFastmemEnabled()) {
const bool is_read_enable = Settings::IsGPULevelHigh() || !cached;
const bool is_read_enable = !Settings::IsGPULevelExtreme() || !cached;
system.DeviceMemory().buffer.Protect(vaddr, size, is_read_enable, !cached);
}

View File

@@ -321,8 +321,12 @@ Id EmitGetAttribute(EmitContext& ctx, IR::Attribute attr, Id vertex) {
case IR::Attribute::PositionY:
case IR::Attribute::PositionZ:
case IR::Attribute::PositionW:
return ctx.OpLoad(ctx.F32[1], AttrPointer(ctx, ctx.input_f32, vertex, ctx.input_position,
ctx.Const(element)));
return ctx.OpLoad(
ctx.F32[1],
ctx.need_input_position_indirect
? AttrPointer(ctx, ctx.input_f32, vertex, ctx.input_position, ctx.u32_zero_value,
ctx.Const(element))
: AttrPointer(ctx, ctx.input_f32, vertex, ctx.input_position, ctx.Const(element)));
case IR::Attribute::InstanceId:
if (ctx.profile.support_vertex_instance_id) {
return ctx.OpBitcast(ctx.F32[1], ctx.OpLoad(ctx.U32[1], ctx.instance_id));

View File

@@ -58,11 +58,10 @@ Id SelectValue(EmitContext& ctx, Id in_range, Id value, Id src_thread_id) {
ctx.OpGroupNonUniformShuffle(ctx.U32[1], SubgroupScope(ctx), value, src_thread_id), value);
}
Id GetUpperClamp(EmitContext& ctx, Id invocation_id, Id clamp) {
const Id thirty_two{ctx.Const(32u)};
const Id is_upper_partition{ctx.OpSGreaterThanEqual(ctx.U1, invocation_id, thirty_two)};
const Id upper_clamp{ctx.OpIAdd(ctx.U32[1], thirty_two, clamp)};
return ctx.OpSelect(ctx.U32[1], is_upper_partition, upper_clamp, clamp);
Id AddPartitionBase(EmitContext& ctx, Id thread_id) {
const Id partition_idx{ctx.OpShiftRightLogical(ctx.U32[1], GetThreadId(ctx), ctx.Const(5u))};
const Id partition_base{ctx.OpShiftLeftLogical(ctx.U32[1], partition_idx, ctx.Const(5u))};
return ctx.OpIAdd(ctx.U32[1], thread_id, partition_base);
}
} // Anonymous namespace
@@ -145,64 +144,63 @@ Id EmitSubgroupGeMask(EmitContext& ctx) {
Id EmitShuffleIndex(EmitContext& ctx, IR::Inst* inst, Id value, Id index, Id clamp,
Id segmentation_mask) {
const Id not_seg_mask{ctx.OpNot(ctx.U32[1], segmentation_mask)};
const Id thread_id{GetThreadId(ctx)};
if (ctx.profile.warp_size_potentially_larger_than_guest) {
const Id thirty_two{ctx.Const(32u)};
const Id is_upper_partition{ctx.OpSGreaterThanEqual(ctx.U1, thread_id, thirty_two)};
const Id upper_index{ctx.OpIAdd(ctx.U32[1], thirty_two, index)};
const Id upper_clamp{ctx.OpIAdd(ctx.U32[1], thirty_two, clamp)};
index = ctx.OpSelect(ctx.U32[1], is_upper_partition, upper_index, index);
clamp = ctx.OpSelect(ctx.U32[1], is_upper_partition, upper_clamp, clamp);
}
const Id thread_id{EmitLaneId(ctx)};
const Id min_thread_id{ComputeMinThreadId(ctx, thread_id, segmentation_mask)};
const Id max_thread_id{ComputeMaxThreadId(ctx, min_thread_id, clamp, not_seg_mask)};
const Id lhs{ctx.OpBitwiseAnd(ctx.U32[1], index, not_seg_mask)};
const Id src_thread_id{ctx.OpBitwiseOr(ctx.U32[1], lhs, min_thread_id)};
Id src_thread_id{ctx.OpBitwiseOr(ctx.U32[1], lhs, min_thread_id)};
const Id in_range{ctx.OpSLessThanEqual(ctx.U1, src_thread_id, max_thread_id)};
if (ctx.profile.warp_size_potentially_larger_than_guest) {
src_thread_id = AddPartitionBase(ctx, src_thread_id);
}
SetInBoundsFlag(inst, in_range);
return SelectValue(ctx, in_range, value, src_thread_id);
}
Id EmitShuffleUp(EmitContext& ctx, IR::Inst* inst, Id value, Id index, Id clamp,
Id segmentation_mask) {
const Id thread_id{GetThreadId(ctx)};
if (ctx.profile.warp_size_potentially_larger_than_guest) {
clamp = GetUpperClamp(ctx, thread_id, clamp);
}
const Id thread_id{EmitLaneId(ctx)};
const Id max_thread_id{GetMaxThreadId(ctx, thread_id, clamp, segmentation_mask)};
const Id src_thread_id{ctx.OpISub(ctx.U32[1], thread_id, index)};
Id src_thread_id{ctx.OpISub(ctx.U32[1], thread_id, index)};
const Id in_range{ctx.OpSGreaterThanEqual(ctx.U1, src_thread_id, max_thread_id)};
if (ctx.profile.warp_size_potentially_larger_than_guest) {
src_thread_id = AddPartitionBase(ctx, src_thread_id);
}
SetInBoundsFlag(inst, in_range);
return SelectValue(ctx, in_range, value, src_thread_id);
}
Id EmitShuffleDown(EmitContext& ctx, IR::Inst* inst, Id value, Id index, Id clamp,
Id segmentation_mask) {
const Id thread_id{GetThreadId(ctx)};
if (ctx.profile.warp_size_potentially_larger_than_guest) {
clamp = GetUpperClamp(ctx, thread_id, clamp);
}
const Id thread_id{EmitLaneId(ctx)};
const Id max_thread_id{GetMaxThreadId(ctx, thread_id, clamp, segmentation_mask)};
const Id src_thread_id{ctx.OpIAdd(ctx.U32[1], thread_id, index)};
Id src_thread_id{ctx.OpIAdd(ctx.U32[1], thread_id, index)};
const Id in_range{ctx.OpSLessThanEqual(ctx.U1, src_thread_id, max_thread_id)};
if (ctx.profile.warp_size_potentially_larger_than_guest) {
src_thread_id = AddPartitionBase(ctx, src_thread_id);
}
SetInBoundsFlag(inst, in_range);
return SelectValue(ctx, in_range, value, src_thread_id);
}
Id EmitShuffleButterfly(EmitContext& ctx, IR::Inst* inst, Id value, Id index, Id clamp,
Id segmentation_mask) {
const Id thread_id{GetThreadId(ctx)};
if (ctx.profile.warp_size_potentially_larger_than_guest) {
clamp = GetUpperClamp(ctx, thread_id, clamp);
}
const Id thread_id{EmitLaneId(ctx)};
const Id max_thread_id{GetMaxThreadId(ctx, thread_id, clamp, segmentation_mask)};
const Id src_thread_id{ctx.OpBitwiseXor(ctx.U32[1], thread_id, index)};
Id src_thread_id{ctx.OpBitwiseXor(ctx.U32[1], thread_id, index)};
const Id in_range{ctx.OpSLessThanEqual(ctx.U1, src_thread_id, max_thread_id)};
if (ctx.profile.warp_size_potentially_larger_than_guest) {
src_thread_id = AddPartitionBase(ctx, src_thread_id);
}
SetInBoundsFlag(inst, in_range);
return SelectValue(ctx, in_range, value, src_thread_id);
}

View File

@@ -544,7 +544,7 @@ void EmitContext::DefineCommonTypes(const Info& info) {
U16 = Name(TypeInt(16, false), "u16");
S16 = Name(TypeInt(16, true), "s16");
}
if (info.uses_int64) {
if (info.uses_int64 && profile.support_int64) {
AddCapability(spv::Capability::Int64);
U64 = Name(TypeInt(64, false), "u64");
}
@@ -721,9 +721,21 @@ void EmitContext::DefineAttributeMemAccess(const Info& info) {
size_t label_index{0};
if (info.loads.AnyComponent(IR::Attribute::PositionX)) {
AddLabel(labels[label_index]);
const Id pointer{is_array
? OpAccessChain(input_f32, input_position, vertex, masked_index)
: OpAccessChain(input_f32, input_position, masked_index)};
const Id pointer{[&]() {
if (need_input_position_indirect) {
if (is_array)
return OpAccessChain(input_f32, input_position, vertex, u32_zero_value,
masked_index);
else
return OpAccessChain(input_f32, input_position, u32_zero_value,
masked_index);
} else {
if (is_array)
return OpAccessChain(input_f32, input_position, vertex, masked_index);
else
return OpAccessChain(input_f32, input_position, masked_index);
}
}()};
const Id result{OpLoad(F32[1], pointer)};
OpReturnValue(result);
++label_index;
@@ -1367,12 +1379,25 @@ void EmitContext::DefineInputs(const IR::Program& program) {
Decorate(layer, spv::Decoration::Flat);
}
if (loads.AnyComponent(IR::Attribute::PositionX)) {
const bool is_fragment{stage != Stage::Fragment};
const spv::BuiltIn built_in{is_fragment ? spv::BuiltIn::Position : spv::BuiltIn::FragCoord};
input_position = DefineInput(*this, F32[4], true, built_in);
if (profile.support_geometry_shader_passthrough) {
if (info.passthrough.AnyComponent(IR::Attribute::PositionX)) {
Decorate(input_position, spv::Decoration::PassthroughNV);
const bool is_fragment{stage == Stage::Fragment};
if (!is_fragment && profile.has_broken_spirv_position_input) {
need_input_position_indirect = true;
const Id input_position_struct = TypeStruct(F32[4]);
input_position = DefineInput(*this, input_position_struct, true);
MemberDecorate(input_position_struct, 0, spv::Decoration::BuiltIn,
static_cast<unsigned>(spv::BuiltIn::Position));
Decorate(input_position_struct, spv::Decoration::Block);
} else {
const spv::BuiltIn built_in{is_fragment ? spv::BuiltIn::FragCoord
: spv::BuiltIn::Position};
input_position = DefineInput(*this, F32[4], true, built_in);
if (profile.support_geometry_shader_passthrough) {
if (info.passthrough.AnyComponent(IR::Attribute::PositionX)) {
Decorate(input_position, spv::Decoration::PassthroughNV);
}
}
}
}

View File

@@ -280,6 +280,7 @@ public:
Id write_global_func_u32x2{};
Id write_global_func_u32x4{};
bool need_input_position_indirect{};
Id input_position{};
std::array<Id, 32> input_generics{};

View File

@@ -171,6 +171,70 @@ std::map<IR::Attribute, IR::Attribute> GenerateLegacyToGenericMappings(
}
return mapping;
}
void EmitGeometryPassthrough(IR::IREmitter& ir, const IR::Program& program,
const Shader::VaryingState& passthrough_mask,
bool passthrough_position,
std::optional<IR::Attribute> passthrough_layer_attr) {
for (u32 i = 0; i < program.output_vertices; i++) {
// Assign generics from input
for (u32 j = 0; j < 32; j++) {
if (!passthrough_mask.Generic(j)) {
continue;
}
const IR::Attribute attr = IR::Attribute::Generic0X + (j * 4);
ir.SetAttribute(attr + 0, ir.GetAttribute(attr + 0, ir.Imm32(i)), ir.Imm32(0));
ir.SetAttribute(attr + 1, ir.GetAttribute(attr + 1, ir.Imm32(i)), ir.Imm32(0));
ir.SetAttribute(attr + 2, ir.GetAttribute(attr + 2, ir.Imm32(i)), ir.Imm32(0));
ir.SetAttribute(attr + 3, ir.GetAttribute(attr + 3, ir.Imm32(i)), ir.Imm32(0));
}
if (passthrough_position) {
// Assign position from input
const IR::Attribute attr = IR::Attribute::PositionX;
ir.SetAttribute(attr + 0, ir.GetAttribute(attr + 0, ir.Imm32(i)), ir.Imm32(0));
ir.SetAttribute(attr + 1, ir.GetAttribute(attr + 1, ir.Imm32(i)), ir.Imm32(0));
ir.SetAttribute(attr + 2, ir.GetAttribute(attr + 2, ir.Imm32(i)), ir.Imm32(0));
ir.SetAttribute(attr + 3, ir.GetAttribute(attr + 3, ir.Imm32(i)), ir.Imm32(0));
}
if (passthrough_layer_attr) {
// Assign layer
ir.SetAttribute(IR::Attribute::Layer, ir.GetAttribute(*passthrough_layer_attr),
ir.Imm32(0));
}
// Emit vertex
ir.EmitVertex(ir.Imm32(0));
}
ir.EndPrimitive(ir.Imm32(0));
}
u32 GetOutputTopologyVertices(OutputTopology output_topology) {
switch (output_topology) {
case OutputTopology::PointList:
return 1;
case OutputTopology::LineStrip:
return 2;
default:
return 3;
}
}
void LowerGeometryPassthrough(const IR::Program& program, const HostTranslateInfo& host_info) {
for (IR::Block* const block : program.blocks) {
for (IR::Inst& inst : block->Instructions()) {
if (inst.GetOpcode() == IR::Opcode::Epilogue) {
IR::IREmitter ir{*block, IR::Block::InstructionList::s_iterator_to(inst)};
EmitGeometryPassthrough(
ir, program, program.info.passthrough,
program.info.passthrough.AnyComponent(IR::Attribute::PositionX), {});
}
}
}
}
} // Anonymous namespace
IR::Program TranslateProgram(ObjectPool<IR::Inst>& inst_pool, ObjectPool<IR::Block>& block_pool,
@@ -198,6 +262,11 @@ IR::Program TranslateProgram(ObjectPool<IR::Inst>& inst_pool, ObjectPool<IR::Blo
for (size_t i = 0; i < program.info.passthrough.mask.size(); ++i) {
program.info.passthrough.mask[i] = ((mask[i / 32] >> (i % 32)) & 1) == 0;
}
if (!host_info.support_geometry_shader_passthrough) {
program.output_vertices = GetOutputTopologyVertices(program.output_topology);
LowerGeometryPassthrough(program, host_info);
}
}
break;
}
@@ -223,7 +292,7 @@ IR::Program TranslateProgram(ObjectPool<IR::Inst>& inst_pool, ObjectPool<IR::Blo
Optimization::PositionPass(env, program);
Optimization::GlobalMemoryToStorageBufferPass(program);
Optimization::GlobalMemoryToStorageBufferPass(program, host_info);
Optimization::TexturePass(env, program, host_info);
if (Settings::values.resolution_info.active) {
@@ -342,17 +411,7 @@ IR::Program GenerateGeometryPassthrough(ObjectPool<IR::Inst>& inst_pool,
IR::Program program;
program.stage = Stage::Geometry;
program.output_topology = output_topology;
switch (output_topology) {
case OutputTopology::PointList:
program.output_vertices = 1;
break;
case OutputTopology::LineStrip:
program.output_vertices = 2;
break;
default:
program.output_vertices = 3;
break;
}
program.output_vertices = GetOutputTopologyVertices(output_topology);
program.is_geometry_passthrough = false;
program.info.loads.mask = source_program.info.stores.mask;
@@ -366,35 +425,8 @@ IR::Program GenerateGeometryPassthrough(ObjectPool<IR::Inst>& inst_pool,
node.data.block = current_block;
IR::IREmitter ir{*current_block};
for (u32 i = 0; i < program.output_vertices; i++) {
// Assign generics from input
for (u32 j = 0; j < 32; j++) {
if (!program.info.stores.Generic(j)) {
continue;
}
const IR::Attribute attr = IR::Attribute::Generic0X + (j * 4);
ir.SetAttribute(attr + 0, ir.GetAttribute(attr + 0, ir.Imm32(i)), ir.Imm32(0));
ir.SetAttribute(attr + 1, ir.GetAttribute(attr + 1, ir.Imm32(i)), ir.Imm32(0));
ir.SetAttribute(attr + 2, ir.GetAttribute(attr + 2, ir.Imm32(i)), ir.Imm32(0));
ir.SetAttribute(attr + 3, ir.GetAttribute(attr + 3, ir.Imm32(i)), ir.Imm32(0));
}
// Assign position from input
const IR::Attribute attr = IR::Attribute::PositionX;
ir.SetAttribute(attr + 0, ir.GetAttribute(attr + 0, ir.Imm32(i)), ir.Imm32(0));
ir.SetAttribute(attr + 1, ir.GetAttribute(attr + 1, ir.Imm32(i)), ir.Imm32(0));
ir.SetAttribute(attr + 2, ir.GetAttribute(attr + 2, ir.Imm32(i)), ir.Imm32(0));
ir.SetAttribute(attr + 3, ir.GetAttribute(attr + 3, ir.Imm32(i)), ir.Imm32(0));
// Assign layer
ir.SetAttribute(IR::Attribute::Layer, ir.GetAttribute(source_program.info.emulated_layer),
ir.Imm32(0));
// Emit vertex
ir.EmitVertex(ir.Imm32(0));
}
ir.EndPrimitive(ir.Imm32(0));
EmitGeometryPassthrough(ir, program, program.info.stores, true,
source_program.info.emulated_layer);
IR::Block* return_block{block_pool.Create(inst_pool)};
IR::IREmitter{*return_block}.Epilogue();

View File

@@ -15,6 +15,9 @@ struct HostTranslateInfo {
bool needs_demote_reorder{}; ///< True when the device needs DemoteToHelperInvocation reordered
bool support_snorm_render_buffer{}; ///< True when the device supports SNORM render buffers
bool support_viewport_index_layer{}; ///< True when the device supports gl_Layer in VS
u32 min_ssbo_alignment{}; ///< Minimum alignment supported by the device for SSBOs
bool support_geometry_shader_passthrough{}; ///< True when the device supports geometry
///< passthrough shaders
};
} // namespace Shader

View File

@@ -11,6 +11,7 @@
#include "shader_recompiler/frontend/ir/breadth_first_search.h"
#include "shader_recompiler/frontend/ir/ir_emitter.h"
#include "shader_recompiler/frontend/ir/value.h"
#include "shader_recompiler/host_translate_info.h"
#include "shader_recompiler/ir_opt/passes.h"
namespace Shader::Optimization {
@@ -402,7 +403,7 @@ void CollectStorageBuffers(IR::Block& block, IR::Inst& inst, StorageInfo& info)
}
/// Returns the offset in indices (not bytes) for an equivalent storage instruction
IR::U32 StorageOffset(IR::Block& block, IR::Inst& inst, StorageBufferAddr buffer) {
IR::U32 StorageOffset(IR::Block& block, IR::Inst& inst, StorageBufferAddr buffer, u32 alignment) {
IR::IREmitter ir{block, IR::Block::InstructionList::s_iterator_to(inst)};
IR::U32 offset;
if (const std::optional<LowAddrInfo> low_addr{TrackLowAddress(&inst)}) {
@@ -415,7 +416,10 @@ IR::U32 StorageOffset(IR::Block& block, IR::Inst& inst, StorageBufferAddr buffer
}
// Subtract the least significant 32 bits from the guest offset. The result is the storage
// buffer offset in bytes.
const IR::U32 low_cbuf{ir.GetCbuf(ir.Imm32(buffer.index), ir.Imm32(buffer.offset))};
IR::U32 low_cbuf{ir.GetCbuf(ir.Imm32(buffer.index), ir.Imm32(buffer.offset))};
// Align the offset base to match the host alignment requirements
low_cbuf = ir.BitwiseAnd(low_cbuf, ir.Imm32(~(alignment - 1U)));
return ir.ISub(offset, low_cbuf);
}
@@ -510,7 +514,7 @@ void Replace(IR::Block& block, IR::Inst& inst, const IR::U32& storage_index,
}
} // Anonymous namespace
void GlobalMemoryToStorageBufferPass(IR::Program& program) {
void GlobalMemoryToStorageBufferPass(IR::Program& program, const HostTranslateInfo& host_info) {
StorageInfo info;
for (IR::Block* const block : program.post_order_blocks) {
for (IR::Inst& inst : block->Instructions()) {
@@ -534,7 +538,8 @@ void GlobalMemoryToStorageBufferPass(IR::Program& program) {
const IR::U32 index{IR::Value{static_cast<u32>(info.set.index_of(it))}};
IR::Block* const block{storage_inst.block};
IR::Inst* const inst{storage_inst.inst};
const IR::U32 offset{StorageOffset(*block, *inst, storage_buffer)};
const IR::U32 offset{
StorageOffset(*block, *inst, storage_buffer, host_info.min_ssbo_alignment)};
Replace(*block, *inst, index, offset);
}
}

View File

@@ -15,7 +15,7 @@ namespace Shader::Optimization {
void CollectShaderInfoPass(Environment& env, IR::Program& program);
void ConstantPropagationPass(Environment& env, IR::Program& program);
void DeadCodeEliminationPass(IR::Program& program);
void GlobalMemoryToStorageBufferPass(IR::Program& program);
void GlobalMemoryToStorageBufferPass(IR::Program& program, const HostTranslateInfo& host_info);
void IdentityRemovalPass(IR::Program& program);
void LowerFp16ToFp32(IR::Program& program);
void LowerInt64ToInt32(IR::Program& program);

View File

@@ -55,6 +55,8 @@ struct Profile {
/// OpFClamp is broken and OpFMax + OpFMin should be used instead
bool has_broken_spirv_clamp{};
/// The Position builtin needs to be wrapped in a struct when used as an input
bool has_broken_spirv_position_input{};
/// Offset image operands with an unsigned type do not work
bool has_broken_unsigned_image_offsets{};
/// Signed instructions with unsigned data types are misinterpreted

View File

@@ -65,6 +65,8 @@ enum class Interpolation {
struct ConstantBufferDescriptor {
u32 index;
u32 count;
auto operator<=>(const ConstantBufferDescriptor&) const = default;
};
struct StorageBufferDescriptor {
@@ -72,6 +74,8 @@ struct StorageBufferDescriptor {
u32 cbuf_offset;
u32 count;
bool is_written;
auto operator<=>(const StorageBufferDescriptor&) const = default;
};
struct TextureBufferDescriptor {
@@ -84,6 +88,8 @@ struct TextureBufferDescriptor {
u32 secondary_shift_left;
u32 count;
u32 size_shift;
auto operator<=>(const TextureBufferDescriptor&) const = default;
};
using TextureBufferDescriptors = boost::container::small_vector<TextureBufferDescriptor, 6>;
@@ -95,6 +101,8 @@ struct ImageBufferDescriptor {
u32 cbuf_offset;
u32 count;
u32 size_shift;
auto operator<=>(const ImageBufferDescriptor&) const = default;
};
using ImageBufferDescriptors = boost::container::small_vector<ImageBufferDescriptor, 2>;
@@ -110,6 +118,8 @@ struct TextureDescriptor {
u32 secondary_shift_left;
u32 count;
u32 size_shift;
auto operator<=>(const TextureDescriptor&) const = default;
};
using TextureDescriptors = boost::container::small_vector<TextureDescriptor, 12>;
@@ -122,6 +132,8 @@ struct ImageDescriptor {
u32 cbuf_offset;
u32 count;
u32 size_shift;
auto operator<=>(const ImageDescriptor&) const = default;
};
using ImageDescriptors = boost::container::small_vector<ImageDescriptor, 4>;

View File

@@ -538,7 +538,7 @@ TEST_CASE("BufferBase: Cached write downloads") {
int num = 0;
buffer.ForEachDownloadRangeAndClear(c, WORD, [&](u64 offset, u64 size) { ++num; });
buffer.ForEachUploadRange(c, WORD, [&](u64 offset, u64 size) { ++num; });
REQUIRE(num == 0);
REQUIRE(num == 1);
REQUIRE(!buffer.IsRegionCpuModified(c + PAGE, PAGE));
REQUIRE(!buffer.IsRegionGpuModified(c + PAGE, PAGE));
buffer.FlushCachedWrites();

View File

@@ -85,6 +85,7 @@ add_library(video_core STATIC
gpu.h
gpu_thread.cpp
gpu_thread.h
invalidation_accumulator.h
memory_manager.cpp
memory_manager.h
precompiled_headers.h
@@ -99,8 +100,6 @@ add_library(video_core STATIC
renderer_null/null_rasterizer.h
renderer_null/renderer_null.cpp
renderer_null/renderer_null.h
renderer_opengl/blit_image.cpp
renderer_opengl/blit_image.h
renderer_opengl/gl_buffer_cache.cpp
renderer_opengl/gl_buffer_cache.h
renderer_opengl/gl_compute_pipeline.cpp
@@ -192,6 +191,8 @@ add_library(video_core STATIC
renderer_vulkan/vk_texture_cache.cpp
renderer_vulkan/vk_texture_cache.h
renderer_vulkan/vk_texture_cache_base.cpp
renderer_vulkan/vk_turbo_mode.cpp
renderer_vulkan/vk_turbo_mode.h
renderer_vulkan/vk_update_descriptor.cpp
renderer_vulkan/vk_update_descriptor.h
shader_cache.cpp

View File

@@ -430,7 +430,7 @@ private:
if (query_begin >= SizeBytes() || size < 0) {
return;
}
u64* const untracked_words = Array<Type::Untracked>();
[[maybe_unused]] u64* const untracked_words = Array<Type::Untracked>();
u64* const state_words = Array<type>();
const u64 query_end = query_begin + std::min(static_cast<u64>(size), SizeBytes());
u64* const words_begin = state_words + query_begin / BYTES_PER_WORD;
@@ -483,7 +483,7 @@ private:
NotifyRasterizer<true>(word_index, current_bits, ~u64{0});
}
// Exclude CPU modified pages when visiting GPU pages
const u64 word = current_word & ~(type == Type::GPU ? untracked_words[word_index] : 0);
const u64 word = current_word;
u64 page = page_begin;
page_begin = 0;
@@ -531,7 +531,7 @@ private:
[[nodiscard]] bool IsRegionModified(u64 offset, u64 size) const noexcept {
static_assert(type != Type::Untracked);
const u64* const untracked_words = Array<Type::Untracked>();
[[maybe_unused]] const u64* const untracked_words = Array<Type::Untracked>();
const u64* const state_words = Array<type>();
const u64 num_query_words = size / BYTES_PER_WORD + 1;
const u64 word_begin = offset / BYTES_PER_WORD;
@@ -539,8 +539,7 @@ private:
const u64 page_limit = Common::DivCeil(offset + size, BYTES_PER_PAGE);
u64 page_index = (offset / BYTES_PER_PAGE) % PAGES_PER_WORD;
for (u64 word_index = word_begin; word_index < word_end; ++word_index, page_index = 0) {
const u64 off_word = type == Type::GPU ? untracked_words[word_index] : 0;
const u64 word = state_words[word_index] & ~off_word;
const u64 word = state_words[word_index];
if (word == 0) {
continue;
}
@@ -564,7 +563,7 @@ private:
[[nodiscard]] std::pair<u64, u64> ModifiedRegion(u64 offset, u64 size) const noexcept {
static_assert(type != Type::Untracked);
const u64* const untracked_words = Array<Type::Untracked>();
[[maybe_unused]] const u64* const untracked_words = Array<Type::Untracked>();
const u64* const state_words = Array<type>();
const u64 num_query_words = size / BYTES_PER_WORD + 1;
const u64 word_begin = offset / BYTES_PER_WORD;
@@ -574,8 +573,7 @@ private:
u64 begin = std::numeric_limits<u64>::max();
u64 end = 0;
for (u64 word_index = word_begin; word_index < word_end; ++word_index) {
const u64 off_word = type == Type::GPU ? untracked_words[word_index] : 0;
const u64 word = state_words[word_index] & ~off_word;
const u64 word = state_words[word_index];
if (word == 0) {
continue;
}

View File

@@ -1938,14 +1938,21 @@ typename BufferCache<P>::Binding BufferCache<P>::StorageBufferBinding(GPUVAddr s
bool is_written) const {
const GPUVAddr gpu_addr = gpu_memory->Read<u64>(ssbo_addr);
const u32 size = gpu_memory->Read<u32>(ssbo_addr + 8);
const std::optional<VAddr> cpu_addr = gpu_memory->GpuToCpuAddress(gpu_addr);
const u32 alignment = runtime.GetStorageBufferAlignment();
const GPUVAddr aligned_gpu_addr = Common::AlignDown(gpu_addr, alignment);
const u32 aligned_size =
Common::AlignUp(static_cast<u32>(gpu_addr - aligned_gpu_addr) + size, alignment);
const std::optional<VAddr> cpu_addr = gpu_memory->GpuToCpuAddress(aligned_gpu_addr);
if (!cpu_addr || size == 0) {
return NULL_BINDING;
}
const VAddr cpu_end = Common::AlignUp(*cpu_addr + size, Core::Memory::YUZU_PAGESIZE);
const VAddr cpu_end = Common::AlignUp(*cpu_addr + aligned_size, Core::Memory::YUZU_PAGESIZE);
const Binding binding{
.cpu_addr = *cpu_addr,
.size = is_written ? size : static_cast<u32>(cpu_end - *cpu_addr),
.size = is_written ? aligned_size : static_cast<u32>(cpu_end - *cpu_addr),
.buffer_id = BufferId{},
};
return binding;

View File

@@ -51,10 +51,6 @@ void DrawManager::ProcessMethodCall(u32 method, u32 argument) {
LOG_WARNING(HW_GPU, "(STUBBED) called");
break;
}
case MAXWELL3D_REG_INDEX(draw_texture.src_y0): {
DrawTexture();
break;
}
default:
break;
}
@@ -183,33 +179,6 @@ void DrawManager::DrawIndexSmall(u32 argument) {
ProcessDraw(true, 1);
}
void DrawManager::DrawTexture() {
const auto& regs{maxwell3d->regs};
draw_texture_state.dst_x0 = static_cast<float>(regs.draw_texture.dst_x0) / 4096.f;
draw_texture_state.dst_y0 = static_cast<float>(regs.draw_texture.dst_y0) / 4096.f;
const auto dst_width = static_cast<float>(regs.draw_texture.dst_width) / 4096.f;
const auto dst_height = static_cast<float>(regs.draw_texture.dst_height) / 4096.f;
const bool lower_left{regs.window_origin.mode !=
Maxwell3D::Regs::WindowOrigin::Mode::UpperLeft};
if (lower_left) {
draw_texture_state.dst_y0 -= dst_height;
}
draw_texture_state.dst_x1 = draw_texture_state.dst_x0 + dst_width;
draw_texture_state.dst_y1 = draw_texture_state.dst_y0 + dst_height;
draw_texture_state.src_x0 = static_cast<float>(regs.draw_texture.src_x0) / 4096.f;
draw_texture_state.src_y0 = static_cast<float>(regs.draw_texture.src_y0) / 4096.f;
draw_texture_state.src_x1 =
(static_cast<float>(regs.draw_texture.dx_du) / 4294967296.f) * dst_width +
draw_texture_state.src_x0;
draw_texture_state.src_y1 =
(static_cast<float>(regs.draw_texture.dy_dv) / 4294967296.f) * dst_height +
draw_texture_state.src_y0;
draw_texture_state.src_sampler = regs.draw_texture.src_sampler;
draw_texture_state.src_texture = regs.draw_texture.src_texture;
maxwell3d->rasterizer->DrawTexture();
}
void DrawManager::UpdateTopology() {
const auto& regs{maxwell3d->regs};
switch (regs.primitive_topology_control) {

View File

@@ -32,19 +32,6 @@ public:
std::vector<u8> inline_index_draw_indexes;
};
struct DrawTextureState {
f32 dst_x0;
f32 dst_y0;
f32 dst_x1;
f32 dst_y1;
f32 src_x0;
f32 src_y0;
f32 src_x1;
f32 src_y1;
u32 src_sampler;
u32 src_texture;
};
struct IndirectParams {
bool is_indexed;
bool include_count;
@@ -77,10 +64,6 @@ public:
return draw_state;
}
const DrawTextureState& GetDrawTextureState() const {
return draw_texture_state;
}
IndirectParams& GetIndirectParams() {
return indirect_state;
}
@@ -98,8 +81,6 @@ private:
void DrawIndexSmall(u32 argument);
void DrawTexture();
void UpdateTopology();
void ProcessDraw(bool draw_indexed, u32 instance_count);
@@ -108,7 +89,6 @@ private:
Maxwell3D* maxwell3d{};
State draw_state{};
DrawTextureState draw_texture_state{};
IndirectParams indirect_state{};
};
} // namespace Tegra::Engines

View File

@@ -76,7 +76,7 @@ void State::ProcessData(std::span<const u8> read_buffer) {
regs.dest.height, regs.dest.depth, x_offset, regs.dest.y,
x_elements, regs.line_count, regs.dest.BlockHeight(),
regs.dest.BlockDepth(), regs.line_length_in);
memory_manager.WriteBlock(address, tmp_buffer.data(), dst_size);
memory_manager.WriteBlockCached(address, tmp_buffer.data(), dst_size);
}
}

View File

@@ -6,6 +6,7 @@
#include "common/microprofile.h"
#include "video_core/engines/fermi_2d.h"
#include "video_core/engines/sw_blitter/blitter.h"
#include "video_core/memory_manager.h"
#include "video_core/rasterizer_interface.h"
#include "video_core/surface.h"
#include "video_core/textures/decoders.h"
@@ -20,8 +21,8 @@ namespace Tegra::Engines {
using namespace Texture;
Fermi2D::Fermi2D(MemoryManager& memory_manager_) {
sw_blitter = std::make_unique<Blitter::SoftwareBlitEngine>(memory_manager_);
Fermi2D::Fermi2D(MemoryManager& memory_manager_) : memory_manager{memory_manager_} {
sw_blitter = std::make_unique<Blitter::SoftwareBlitEngine>(memory_manager);
// Nvidia's OpenGL driver seems to assume these values
regs.src.depth = 1;
regs.dst.depth = 1;
@@ -104,6 +105,7 @@ void Fermi2D::Blit() {
config.src_x0 = 0;
}
memory_manager.FlushCaching();
if (!rasterizer->AccelerateSurfaceCopy(src, regs.dst, config)) {
sw_blitter->Blit(src, regs.dst, config);
}

View File

@@ -305,6 +305,7 @@ public:
private:
VideoCore::RasterizerInterface* rasterizer = nullptr;
std::unique_ptr<Blitter::SoftwareBlitEngine> sw_blitter;
MemoryManager& memory_manager;
/// Performs the copy from the source surface to the destination surface as configured in the
/// registers.

View File

@@ -149,7 +149,6 @@ bool Maxwell3D::IsMethodExecutable(u32 method) {
case MAXWELL3D_REG_INDEX(inline_index_4x8.index0):
case MAXWELL3D_REG_INDEX(vertex_array_instance_first):
case MAXWELL3D_REG_INDEX(vertex_array_instance_subsequent):
case MAXWELL3D_REG_INDEX(draw_texture.src_y0):
case MAXWELL3D_REG_INDEX(wait_for_idle):
case MAXWELL3D_REG_INDEX(shadow_ram_control):
case MAXWELL3D_REG_INDEX(load_mme.instruction_ptr):
@@ -486,11 +485,6 @@ void Maxwell3D::StampQueryResult(u64 payload, bool long_query) {
}
void Maxwell3D::ProcessQueryGet() {
// TODO(Subv): Support the other query units.
if (regs.report_semaphore.query.location != Regs::ReportSemaphore::Location::All) {
LOG_DEBUG(HW_GPU, "Locations other than ALL are unimplemented");
}
switch (regs.report_semaphore.query.operation) {
case Regs::ReportSemaphore::Operation::Release:
if (regs.report_semaphore.query.short_query != 0) {
@@ -650,7 +644,7 @@ void Maxwell3D::ProcessCBMultiData(const u32* start_base, u32 amount) {
const GPUVAddr address{buffer_address + regs.const_buffer.offset};
const size_t copy_size = amount * sizeof(u32);
memory_manager.WriteBlock(address, start_base, copy_size);
memory_manager.WriteBlockCached(address, start_base, copy_size);
// Increment the current buffer position.
regs.const_buffer.offset += static_cast<u32>(copy_size);

View File

@@ -1599,20 +1599,6 @@ public:
};
static_assert(sizeof(TIRModulationCoeff) == 0x4);
struct DrawTexture {
s32 dst_x0;
s32 dst_y0;
s32 dst_width;
s32 dst_height;
s64 dx_du;
s64 dy_dv;
u32 src_sampler;
u32 src_texture;
s32 src_x0;
s32 src_y0;
};
static_assert(sizeof(DrawTexture) == 0x30);
struct ReduceColorThreshold {
union {
BitField<0, 8, u32> all_hit_once;
@@ -2765,7 +2751,7 @@ public:
u32 reserved_sw_method2; ///< 0x102C
std::array<TIRModulationCoeff, 5> tir_modulation_coeff; ///< 0x1030
std::array<u32, 15> spare_nop; ///< 0x1044
DrawTexture draw_texture; ///< 0x1080
INSERT_PADDING_BYTES_NOINIT(0x30);
std::array<u32, 7> reserved_sw_method3_to_7; ///< 0x10B0
ReduceColorThreshold reduce_color_thresholds_unorm8; ///< 0x10CC
std::array<u32, 4> reserved_sw_method10_to_13; ///< 0x10D0

View File

@@ -69,7 +69,7 @@ void MaxwellDMA::Launch() {
if (launch.multi_line_enable) {
const bool is_src_pitch = launch.src_memory_layout == LaunchDMA::MemoryLayout::PITCH;
const bool is_dst_pitch = launch.dst_memory_layout == LaunchDMA::MemoryLayout::PITCH;
memory_manager.FlushCaching();
if (!is_src_pitch && !is_dst_pitch) {
// If both the source and the destination are in block layout, assert.
CopyBlockLinearToBlockLinear();
@@ -104,6 +104,7 @@ void MaxwellDMA::Launch() {
reinterpret_cast<u8*>(tmp_buffer.data()),
regs.line_length_in * sizeof(u32));
} else {
memory_manager.FlushCaching();
const auto convert_linear_2_blocklinear_addr = [](u64 address) {
return (address & ~0x1f0ULL) | ((address & 0x40) >> 2) | ((address & 0x10) << 1) |
((address & 0x180) >> 1) | ((address & 0x20) << 3);
@@ -121,8 +122,8 @@ void MaxwellDMA::Launch() {
memory_manager.ReadBlockUnsafe(
convert_linear_2_blocklinear_addr(regs.offset_in + offset),
tmp_buffer.data(), tmp_buffer.size());
memory_manager.WriteBlock(regs.offset_out + offset, tmp_buffer.data(),
tmp_buffer.size());
memory_manager.WriteBlockCached(regs.offset_out + offset, tmp_buffer.data(),
tmp_buffer.size());
}
} else if (is_src_pitch && !is_dst_pitch) {
UNIMPLEMENTED_IF(regs.line_length_in % 16 != 0);
@@ -132,7 +133,7 @@ void MaxwellDMA::Launch() {
for (u32 offset = 0; offset < regs.line_length_in; offset += 16) {
memory_manager.ReadBlockUnsafe(regs.offset_in + offset, tmp_buffer.data(),
tmp_buffer.size());
memory_manager.WriteBlock(
memory_manager.WriteBlockCached(
convert_linear_2_blocklinear_addr(regs.offset_out + offset),
tmp_buffer.data(), tmp_buffer.size());
}
@@ -141,8 +142,8 @@ void MaxwellDMA::Launch() {
std::vector<u8> tmp_buffer(regs.line_length_in);
memory_manager.ReadBlockUnsafe(regs.offset_in, tmp_buffer.data(),
regs.line_length_in);
memory_manager.WriteBlock(regs.offset_out, tmp_buffer.data(),
regs.line_length_in);
memory_manager.WriteBlockCached(regs.offset_out, tmp_buffer.data(),
regs.line_length_in);
}
}
}
@@ -204,7 +205,7 @@ void MaxwellDMA::CopyBlockLinearToPitch() {
src_params.origin.y, x_elements, regs.line_count, block_height, block_depth,
regs.pitch_out);
memory_manager.WriteBlock(regs.offset_out, write_buffer.data(), dst_size);
memory_manager.WriteBlockCached(regs.offset_out, write_buffer.data(), dst_size);
}
void MaxwellDMA::CopyPitchToBlockLinear() {
@@ -256,7 +257,7 @@ void MaxwellDMA::CopyPitchToBlockLinear() {
dst_params.origin.y, x_elements, regs.line_count, block_height, block_depth,
regs.pitch_in);
memory_manager.WriteBlock(regs.offset_out, write_buffer.data(), dst_size);
memory_manager.WriteBlockCached(regs.offset_out, write_buffer.data(), dst_size);
}
void MaxwellDMA::FastCopyBlockLinearToPitch() {
@@ -287,7 +288,7 @@ void MaxwellDMA::FastCopyBlockLinearToPitch() {
regs.src_params.block_size.height, regs.src_params.block_size.depth,
regs.pitch_out);
memory_manager.WriteBlock(regs.offset_out, write_buffer.data(), dst_size);
memory_manager.WriteBlockCached(regs.offset_out, write_buffer.data(), dst_size);
}
void MaxwellDMA::CopyBlockLinearToBlockLinear() {
@@ -347,7 +348,7 @@ void MaxwellDMA::CopyBlockLinearToBlockLinear() {
dst.depth, dst_x_offset, dst.origin.y, x_elements, regs.line_count,
dst.block_size.height, dst.block_size.depth, pitch);
memory_manager.WriteBlock(regs.offset_out, write_buffer.data(), dst_size);
memory_manager.WriteBlockCached(regs.offset_out, write_buffer.data(), dst_size);
}
void MaxwellDMA::ReleaseSemaphore() {

View File

@@ -11,7 +11,6 @@ set(GLSL_INCLUDES
set(SHADER_FILES
astc_decoder.comp
blit_color_float.frag
block_linear_unswizzle_2d.comp
block_linear_unswizzle_3d.comp
convert_abgr8_to_d24s8.frag
@@ -37,6 +36,7 @@ set(SHADER_FILES
smaa_blending_weight_calculation.frag
smaa_neighborhood_blending.vert
smaa_neighborhood_blending.frag
vulkan_blit_color_float.frag
vulkan_blit_depth_stencil.frag
vulkan_fidelityfx_fsr_easu_fp16.comp
vulkan_fidelityfx_fsr_easu_fp32.comp
@@ -47,6 +47,7 @@ set(SHADER_FILES
vulkan_present_scaleforce_fp16.frag
vulkan_present_scaleforce_fp32.frag
vulkan_quad_indexed.comp
vulkan_turbo_mode.comp
vulkan_uint8.comp
)

View File

@@ -4,20 +4,13 @@
#version 450
#ifdef VULKAN
#define VERTEX_ID gl_VertexIndex
#define BEGIN_PUSH_CONSTANTS layout(push_constant) uniform PushConstants {
#define END_PUSH_CONSTANTS };
#define UNIFORM(n)
#define FLIPY 1
#else // ^^^ Vulkan ^^^ // vvv OpenGL vvv
#define VERTEX_ID gl_VertexID
#define BEGIN_PUSH_CONSTANTS
#define END_PUSH_CONSTANTS
#define FLIPY -1
#define UNIFORM(n) layout (location = n) uniform
out gl_PerVertex {
vec4 gl_Position;
};
#endif
BEGIN_PUSH_CONSTANTS
@@ -28,8 +21,8 @@ END_PUSH_CONSTANTS
layout(location = 0) out vec2 texcoord;
void main() {
float x = float((VERTEX_ID & 1) << 2);
float y = float((VERTEX_ID & 2) << 1);
gl_Position = vec4(x - 1.0, FLIPY * (y - 1.0), 0.0, 1.0);
float x = float((gl_VertexIndex & 1) << 2);
float y = float((gl_VertexIndex & 2) << 1);
gl_Position = vec4(x - 1.0, y - 1.0, 0.0, 1.0);
texcoord = fma(vec2(x, y) / 2.0, tex_scale, tex_offset);
}

View File

@@ -0,0 +1,29 @@
// SPDX-FileCopyrightText: Copyright 2022 yuzu Emulator Project
// SPDX-License-Identifier: GPL-2.0-or-later
#version 460 core
layout (local_size_x = 16, local_size_y = 8, local_size_z = 1) in;
layout (binding = 0) buffer ThreadData {
uint data[];
};
uint xorshift32(uint x) {
x ^= x << 13;
x ^= x >> 17;
x ^= x << 5;
return x;
}
uint getGlobalIndex() {
return gl_GlobalInvocationID.x + gl_GlobalInvocationID.y * gl_WorkGroupSize.y * gl_NumWorkGroups.y;
}
void main() {
uint myIndex = xorshift32(getGlobalIndex());
uint otherIndex = xorshift32(myIndex);
uint otherValue = atomicAdd(data[otherIndex % data.length()], 0) + 1;
atomicAdd(data[myIndex % data.length()], otherValue);
}

View File

@@ -0,0 +1,79 @@
// SPDX-FileCopyrightText: Copyright 2018 yuzu Emulator Project
// SPDX-License-Identifier: GPL-2.0-or-later
#pragma once
#include <utility>
#include <vector>
#include "common/common_types.h"
namespace VideoCommon {
class InvalidationAccumulator {
public:
InvalidationAccumulator() = default;
~InvalidationAccumulator() = default;
void Add(GPUVAddr address, size_t size) {
const auto reset_values = [&]() {
if (has_collected) {
buffer.emplace_back(start_address, accumulated_size);
}
start_address = address;
accumulated_size = size;
last_collection = start_address + size;
};
if (address >= start_address && address + size <= last_collection) [[likely]] {
return;
}
size = ((address + size + atomicity_size_mask) & atomicity_mask) - address;
address = address & atomicity_mask;
if (!has_collected) [[unlikely]] {
reset_values();
has_collected = true;
return;
}
if (address != last_collection) [[unlikely]] {
reset_values();
return;
}
accumulated_size += size;
last_collection += size;
}
void Clear() {
buffer.clear();
start_address = 0;
last_collection = 0;
has_collected = false;
}
bool AnyAccumulated() const {
return has_collected;
}
template <typename Func>
void Callback(Func&& func) {
if (!has_collected) {
return;
}
buffer.emplace_back(start_address, accumulated_size);
for (auto& [address, size] : buffer) {
func(address, size);
}
}
private:
static constexpr size_t atomicity_bits = 5;
static constexpr size_t atomicity_size = 1ULL << atomicity_bits;
static constexpr size_t atomicity_size_mask = atomicity_size - 1;
static constexpr size_t atomicity_mask = ~atomicity_size_mask;
GPUVAddr start_address{};
GPUVAddr last_collection{};
size_t accumulated_size{};
bool has_collected{};
std::vector<std::pair<VAddr, size_t>> buffer;
};
} // namespace VideoCommon

View File

@@ -50,38 +50,6 @@ protected:
Maxwell3D& maxwell3d;
};
class HLE_DrawArrays final : public HLEMacroImpl {
public:
explicit HLE_DrawArrays(Maxwell3D& maxwell3d_) : HLEMacroImpl(maxwell3d_) {}
void Execute(const std::vector<u32>& parameters, [[maybe_unused]] u32 method) override {
maxwell3d.RefreshParameters();
auto topology = static_cast<Maxwell3D::Regs::PrimitiveTopology>(parameters[0]);
maxwell3d.draw_manager->DrawArray(topology, parameters[1], parameters[2],
maxwell3d.regs.global_base_instance_index, 1);
}
};
class HLE_DrawIndexed final : public HLEMacroImpl {
public:
explicit HLE_DrawIndexed(Maxwell3D& maxwell3d_) : HLEMacroImpl(maxwell3d_) {}
void Execute(const std::vector<u32>& parameters, [[maybe_unused]] u32 method) override {
maxwell3d.RefreshParameters();
maxwell3d.regs.index_buffer.start_addr_high = parameters[1];
maxwell3d.regs.index_buffer.start_addr_low = parameters[2];
maxwell3d.regs.index_buffer.format =
static_cast<Engines::Maxwell3D::Regs::IndexFormat>(parameters[3]);
maxwell3d.dirty.flags[VideoCommon::Dirty::IndexBuffer] = true;
auto topology = static_cast<Maxwell3D::Regs::PrimitiveTopology>(parameters[0]);
maxwell3d.draw_manager->DrawIndex(topology, 0, parameters[4],
maxwell3d.regs.global_base_vertex_index,
maxwell3d.regs.global_base_instance_index, 1);
}
};
/*
* @note: these macros have two versions, a normal and extended version, with the extended version
* also assigning the base vertex/instance.
@@ -497,11 +465,6 @@ public:
} // Anonymous namespace
HLEMacro::HLEMacro(Maxwell3D& maxwell3d_) : maxwell3d{maxwell3d_} {
builders.emplace(0xDD6A7FA92A7D2674ULL,
std::function<std::unique_ptr<CachedMacro>(Maxwell3D&)>(
[](Maxwell3D& maxwell3d__) -> std::unique_ptr<CachedMacro> {
return std::make_unique<HLE_DrawArrays>(maxwell3d__);
}));
builders.emplace(0x0D61FC9FAAC9FCADULL,
std::function<std::unique_ptr<CachedMacro>(Maxwell3D&)>(
[](Maxwell3D& maxwell3d__) -> std::unique_ptr<CachedMacro> {
@@ -512,11 +475,6 @@ HLEMacro::HLEMacro(Maxwell3D& maxwell3d_) : maxwell3d{maxwell3d_} {
[](Maxwell3D& maxwell3d__) -> std::unique_ptr<CachedMacro> {
return std::make_unique<HLE_DrawArraysIndirect<true>>(maxwell3d__);
}));
builders.emplace(0x2DB33AADB741839CULL,
std::function<std::unique_ptr<CachedMacro>(Maxwell3D&)>(
[](Maxwell3D& maxwell3d__) -> std::unique_ptr<CachedMacro> {
return std::make_unique<HLE_DrawIndexed>(maxwell3d__);
}));
builders.emplace(0x771BB18C62444DA0ULL,
std::function<std::unique_ptr<CachedMacro>(Maxwell3D&)>(
[](Maxwell3D& maxwell3d__) -> std::unique_ptr<CachedMacro> {

View File

@@ -6,11 +6,13 @@
#include "common/alignment.h"
#include "common/assert.h"
#include "common/logging/log.h"
#include "common/settings.h"
#include "core/core.h"
#include "core/device_memory.h"
#include "core/hle/kernel/k_page_table.h"
#include "core/hle/kernel/k_process.h"
#include "core/memory.h"
#include "video_core/invalidation_accumulator.h"
#include "video_core/memory_manager.h"
#include "video_core/rasterizer_interface.h"
#include "video_core/renderer_base.h"
@@ -26,7 +28,8 @@ MemoryManager::MemoryManager(Core::System& system_, u64 address_space_bits_, u64
entries{}, big_entries{}, page_table{address_space_bits, address_space_bits + page_bits - 38,
page_bits != big_page_bits ? page_bits : 0},
kind_map{PTEKind::INVALID}, unique_identifier{unique_identifier_generator.fetch_add(
1, std::memory_order_acq_rel)} {
1, std::memory_order_acq_rel)},
accumulator{std::make_unique<VideoCommon::InvalidationAccumulator>()} {
address_space_size = 1ULL << address_space_bits;
page_size = 1ULL << page_bits;
page_mask = page_size - 1ULL;
@@ -43,6 +46,11 @@ MemoryManager::MemoryManager(Core::System& system_, u64 address_space_bits_, u64
big_page_table_cpu.resize(big_page_table_size);
big_page_continous.resize(big_page_table_size / continous_bits, 0);
entries.resize(page_table_size / 32, 0);
if (!Settings::IsGPULevelExtreme() && Settings::IsFastmemEnabled()) {
fastmem_arena = system.DeviceMemory().buffer.VirtualBasePointer();
} else {
fastmem_arena = nullptr;
}
}
MemoryManager::~MemoryManager() = default;
@@ -185,15 +193,12 @@ void MemoryManager::Unmap(GPUVAddr gpu_addr, std::size_t size) {
if (size == 0) {
return;
}
const auto submapped_ranges = GetSubmappedRange(gpu_addr, size);
GetSubmappedRangeImpl<false>(gpu_addr, size, page_stash);
for (const auto& [map_addr, map_size] : submapped_ranges) {
// Flush and invalidate through the GPU interface, to be asynchronous if possible.
const std::optional<VAddr> cpu_addr = GpuToCpuAddress(map_addr);
ASSERT(cpu_addr);
rasterizer->UnmapMemory(*cpu_addr, map_size);
for (const auto& [map_addr, map_size] : page_stash) {
rasterizer->UnmapMemory(map_addr, map_size);
}
page_stash.clear();
BigPageTableOp<EntryType::Free>(gpu_addr, 0, size, PTEKind::INVALID);
PageTableOp<EntryType::Free>(gpu_addr, 0, size, PTEKind::INVALID);
@@ -355,7 +360,7 @@ inline void MemoryManager::MemoryOperation(GPUVAddr gpu_src_addr, std::size_t si
}
}
template <bool is_safe>
template <bool is_safe, bool use_fastmem>
void MemoryManager::ReadBlockImpl(GPUVAddr gpu_src_addr, void* dest_buffer, std::size_t size,
[[maybe_unused]] VideoCommon::CacheType which) const {
auto set_to_zero = [&]([[maybe_unused]] std::size_t page_index,
@@ -369,8 +374,12 @@ void MemoryManager::ReadBlockImpl(GPUVAddr gpu_src_addr, void* dest_buffer, std:
if constexpr (is_safe) {
rasterizer->FlushRegion(cpu_addr_base, copy_amount, which);
}
u8* physical = memory.GetPointer(cpu_addr_base);
std::memcpy(dest_buffer, physical, copy_amount);
if constexpr (use_fastmem) {
std::memcpy(dest_buffer, &fastmem_arena[cpu_addr_base], copy_amount);
} else {
u8* physical = memory.GetPointer(cpu_addr_base);
std::memcpy(dest_buffer, physical, copy_amount);
}
dest_buffer = static_cast<u8*>(dest_buffer) + copy_amount;
};
auto mapped_big = [&](std::size_t page_index, std::size_t offset, std::size_t copy_amount) {
@@ -379,11 +388,15 @@ void MemoryManager::ReadBlockImpl(GPUVAddr gpu_src_addr, void* dest_buffer, std:
if constexpr (is_safe) {
rasterizer->FlushRegion(cpu_addr_base, copy_amount, which);
}
if (!IsBigPageContinous(page_index)) [[unlikely]] {
memory.ReadBlockUnsafe(cpu_addr_base, dest_buffer, copy_amount);
if constexpr (use_fastmem) {
std::memcpy(dest_buffer, &fastmem_arena[cpu_addr_base], copy_amount);
} else {
u8* physical = memory.GetPointer(cpu_addr_base);
std::memcpy(dest_buffer, physical, copy_amount);
if (!IsBigPageContinous(page_index)) [[unlikely]] {
memory.ReadBlockUnsafe(cpu_addr_base, dest_buffer, copy_amount);
} else {
u8* physical = memory.GetPointer(cpu_addr_base);
std::memcpy(dest_buffer, physical, copy_amount);
}
}
dest_buffer = static_cast<u8*>(dest_buffer) + copy_amount;
};
@@ -397,12 +410,20 @@ void MemoryManager::ReadBlockImpl(GPUVAddr gpu_src_addr, void* dest_buffer, std:
void MemoryManager::ReadBlock(GPUVAddr gpu_src_addr, void* dest_buffer, std::size_t size,
VideoCommon::CacheType which) const {
ReadBlockImpl<true>(gpu_src_addr, dest_buffer, size, which);
if (fastmem_arena) [[likely]] {
ReadBlockImpl<true, true>(gpu_src_addr, dest_buffer, size, which);
return;
}
ReadBlockImpl<true, false>(gpu_src_addr, dest_buffer, size, which);
}
void MemoryManager::ReadBlockUnsafe(GPUVAddr gpu_src_addr, void* dest_buffer,
const std::size_t size) const {
ReadBlockImpl<false>(gpu_src_addr, dest_buffer, size, VideoCommon::CacheType::None);
if (fastmem_arena) [[likely]] {
ReadBlockImpl<false, true>(gpu_src_addr, dest_buffer, size, VideoCommon::CacheType::None);
return;
}
ReadBlockImpl<false, false>(gpu_src_addr, dest_buffer, size, VideoCommon::CacheType::None);
}
template <bool is_safe>
@@ -454,6 +475,12 @@ void MemoryManager::WriteBlockUnsafe(GPUVAddr gpu_dest_addr, const void* src_buf
WriteBlockImpl<false>(gpu_dest_addr, src_buffer, size, VideoCommon::CacheType::None);
}
void MemoryManager::WriteBlockCached(GPUVAddr gpu_dest_addr, const void* src_buffer,
std::size_t size) {
WriteBlockImpl<false>(gpu_dest_addr, src_buffer, size, VideoCommon::CacheType::None);
accumulator->Add(gpu_dest_addr, size);
}
void MemoryManager::FlushRegion(GPUVAddr gpu_addr, size_t size,
VideoCommon::CacheType which) const {
auto do_nothing = [&]([[maybe_unused]] std::size_t page_index,
@@ -663,7 +690,17 @@ bool MemoryManager::IsFullyMappedRange(GPUVAddr gpu_addr, std::size_t size) cons
std::vector<std::pair<GPUVAddr, std::size_t>> MemoryManager::GetSubmappedRange(
GPUVAddr gpu_addr, std::size_t size) const {
std::vector<std::pair<GPUVAddr, std::size_t>> result{};
std::optional<std::pair<GPUVAddr, std::size_t>> last_segment{};
GetSubmappedRangeImpl<true>(gpu_addr, size, result);
return result;
}
template <bool is_gpu_address>
void MemoryManager::GetSubmappedRangeImpl(
GPUVAddr gpu_addr, std::size_t size,
std::vector<std::pair<std::conditional_t<is_gpu_address, GPUVAddr, VAddr>, std::size_t>>&
result) const {
std::optional<std::pair<std::conditional_t<is_gpu_address, GPUVAddr, VAddr>, std::size_t>>
last_segment{};
std::optional<VAddr> old_page_addr{};
const auto split = [&last_segment, &result]([[maybe_unused]] std::size_t page_index,
[[maybe_unused]] std::size_t offset,
@@ -685,8 +722,12 @@ std::vector<std::pair<GPUVAddr, std::size_t>> MemoryManager::GetSubmappedRange(
}
old_page_addr = {cpu_addr_base + copy_amount};
if (!last_segment) {
const GPUVAddr new_base_addr = (page_index << big_page_bits) + offset;
last_segment = {new_base_addr, copy_amount};
if constexpr (is_gpu_address) {
const GPUVAddr new_base_addr = (page_index << big_page_bits) + offset;
last_segment = {new_base_addr, copy_amount};
} else {
last_segment = {cpu_addr_base, copy_amount};
}
} else {
last_segment->second += copy_amount;
}
@@ -703,8 +744,12 @@ std::vector<std::pair<GPUVAddr, std::size_t>> MemoryManager::GetSubmappedRange(
}
old_page_addr = {cpu_addr_base + copy_amount};
if (!last_segment) {
const GPUVAddr new_base_addr = (page_index << page_bits) + offset;
last_segment = {new_base_addr, copy_amount};
if constexpr (is_gpu_address) {
const GPUVAddr new_base_addr = (page_index << page_bits) + offset;
last_segment = {new_base_addr, copy_amount};
} else {
last_segment = {cpu_addr_base, copy_amount};
}
} else {
last_segment->second += copy_amount;
}
@@ -715,7 +760,18 @@ std::vector<std::pair<GPUVAddr, std::size_t>> MemoryManager::GetSubmappedRange(
};
MemoryOperation<true>(gpu_addr, size, extend_size_big, split, do_short_pages);
split(0, 0, 0);
return result;
}
void MemoryManager::FlushCaching() {
if (!accumulator->AnyAccumulated()) {
return;
}
accumulator->Callback([this](GPUVAddr addr, size_t size) {
GetSubmappedRangeImpl<false>(addr, size, page_stash);
});
rasterizer->InnerInvalidation(page_stash);
page_stash.clear();
accumulator->Clear();
}
} // namespace Tegra

View File

@@ -19,6 +19,10 @@ namespace VideoCore {
class RasterizerInterface;
}
namespace VideoCommon {
class InvalidationAccumulator;
}
namespace Core {
class DeviceMemory;
namespace Memory {
@@ -80,6 +84,7 @@ public:
*/
void ReadBlockUnsafe(GPUVAddr gpu_src_addr, void* dest_buffer, std::size_t size) const;
void WriteBlockUnsafe(GPUVAddr gpu_dest_addr, const void* src_buffer, std::size_t size);
void WriteBlockCached(GPUVAddr gpu_dest_addr, const void* src_buffer, std::size_t size);
/**
* Checks if a gpu region can be simply read with a pointer.
@@ -129,12 +134,14 @@ public:
size_t GetMemoryLayoutSize(GPUVAddr gpu_addr,
size_t max_size = std::numeric_limits<size_t>::max()) const;
void FlushCaching();
private:
template <bool is_big_pages, typename FuncMapped, typename FuncReserved, typename FuncUnmapped>
inline void MemoryOperation(GPUVAddr gpu_src_addr, std::size_t size, FuncMapped&& func_mapped,
FuncReserved&& func_reserved, FuncUnmapped&& func_unmapped) const;
template <bool is_safe>
template <bool is_safe, bool use_fastmem>
void ReadBlockImpl(GPUVAddr gpu_src_addr, void* dest_buffer, std::size_t size,
VideoCommon::CacheType which) const;
@@ -154,6 +161,12 @@ private:
inline bool IsBigPageContinous(size_t big_page_index) const;
inline void SetBigPageContinous(size_t big_page_index, bool value);
template <bool is_gpu_address>
void GetSubmappedRangeImpl(
GPUVAddr gpu_addr, std::size_t size,
std::vector<std::pair<std::conditional_t<is_gpu_address, GPUVAddr, VAddr>, std::size_t>>&
result) const;
Core::System& system;
Core::Memory::Memory& memory;
Core::DeviceMemory& device_memory;
@@ -201,10 +214,13 @@ private:
Common::VirtualBuffer<u32> big_page_table_cpu;
std::vector<u64> big_page_continous;
std::vector<std::pair<VAddr, std::size_t>> page_stash{};
u8* fastmem_arena{};
constexpr static size_t continous_bits = 64;
const size_t unique_identifier;
std::unique_ptr<VideoCommon::InvalidationAccumulator> accumulator;
static std::atomic<size_t> unique_identifier_generator;
};

View File

@@ -6,6 +6,7 @@
#include <functional>
#include <optional>
#include <span>
#include <utility>
#include "common/common_types.h"
#include "common/polyfill_thread.h"
#include "video_core/cache_types.h"
@@ -46,9 +47,6 @@ public:
/// Dispatches an indirect draw invocation
virtual void DrawIndirect() {}
/// Dispatches an draw texture invocation
virtual void DrawTexture() = 0;
/// Clear the current framebuffer
virtual void Clear(u32 layer_count) = 0;
@@ -98,6 +96,12 @@ public:
virtual void InvalidateRegion(VAddr addr, u64 size,
VideoCommon::CacheType which = VideoCommon::CacheType::All) = 0;
virtual void InnerInvalidation(std::span<const std::pair<VAddr, std::size_t>> sequences) {
for (const auto& [cpu_addr, size] : sequences) {
InvalidateRegion(cpu_addr, size);
}
}
/// Notify rasterizer that any caches of the specified region are desync with guest
virtual void OnCPUWrite(VAddr addr, u64 size) = 0;

View File

@@ -21,7 +21,6 @@ RasterizerNull::RasterizerNull(Core::Memory::Memory& cpu_memory_, Tegra::GPU& gp
RasterizerNull::~RasterizerNull() = default;
void RasterizerNull::Draw(bool is_indexed, u32 instance_count) {}
void RasterizerNull::DrawTexture() {}
void RasterizerNull::Clear(u32 layer_count) {}
void RasterizerNull::DispatchCompute() {}
void RasterizerNull::ResetCounter(VideoCore::QueryType type) {}

View File

@@ -31,7 +31,6 @@ public:
~RasterizerNull() override;
void Draw(bool is_indexed, u32 instance_count) override;
void DrawTexture() override;
void Clear(u32 layer_count) override;
void DispatchCompute() override;
void ResetCounter(VideoCore::QueryType type) override;

View File

@@ -1,59 +0,0 @@
// SPDX-FileCopyrightText: Copyright 2023 yuzu Emulator Project
// SPDX-License-Identifier: GPL-2.0-or-later
#include <algorithm>
#include "video_core/host_shaders/blit_color_float_frag.h"
#include "video_core/host_shaders/full_screen_triangle_vert.h"
#include "video_core/renderer_opengl/blit_image.h"
#include "video_core/renderer_opengl/gl_shader_manager.h"
#include "video_core/renderer_opengl/gl_shader_util.h"
namespace OpenGL {
BlitImageHelper::BlitImageHelper(ProgramManager& program_manager_)
: program_manager(program_manager_),
full_screen_vert(CreateProgram(HostShaders::FULL_SCREEN_TRIANGLE_VERT, GL_VERTEX_SHADER)),
blit_color_to_color_frag(
CreateProgram(HostShaders::BLIT_COLOR_FLOAT_FRAG, GL_FRAGMENT_SHADER)) {}
BlitImageHelper::~BlitImageHelper() = default;
void BlitImageHelper::BlitColor(GLuint dst_framebuffer, GLuint src_image_view, GLuint src_sampler,
const Region2D& dst_region, const Region2D& src_region,
const Extent3D& src_size) {
glEnable(GL_CULL_FACE);
glDisable(GL_COLOR_LOGIC_OP);
glDisable(GL_DEPTH_TEST);
glDisable(GL_STENCIL_TEST);
glDisable(GL_POLYGON_OFFSET_FILL);
glDisable(GL_RASTERIZER_DISCARD);
glDisable(GL_ALPHA_TEST);
glDisablei(GL_BLEND, 0);
glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);
glCullFace(GL_BACK);
glFrontFace(GL_CW);
glColorMaski(0, GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);
glDepthRangeIndexed(0, 0.0, 0.0);
program_manager.BindPresentPrograms(full_screen_vert.handle, blit_color_to_color_frag.handle);
glProgramUniform2f(full_screen_vert.handle, 0,
static_cast<float>(src_region.end.x - src_region.start.x) /
static_cast<float>(src_size.width),
static_cast<float>(src_region.end.y - src_region.start.y) /
static_cast<float>(src_size.height));
glProgramUniform2f(full_screen_vert.handle, 1,
static_cast<float>(src_region.start.x) / static_cast<float>(src_size.width),
static_cast<float>(src_region.start.y) /
static_cast<float>(src_size.height));
glViewport(std::min(dst_region.start.x, dst_region.end.x),
std::min(dst_region.start.y, dst_region.end.y),
std::abs(dst_region.end.x - dst_region.start.x),
std::abs(dst_region.end.y - dst_region.start.y));
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, dst_framebuffer);
glBindSampler(0, src_sampler);
glBindTextureUnit(0, src_image_view);
glClear(GL_COLOR_BUFFER_BIT);
glDrawArrays(GL_TRIANGLES, 0, 3);
}
} // namespace OpenGL

View File

@@ -1,38 +0,0 @@
// SPDX-FileCopyrightText: Copyright 2023 yuzu Emulator Project
// SPDX-License-Identifier: GPL-2.0-or-later
#pragma once
#include <glad/glad.h>
#include "video_core/engines/fermi_2d.h"
#include "video_core/renderer_opengl/gl_resource_manager.h"
#include "video_core/texture_cache/types.h"
namespace OpenGL {
using VideoCommon::Extent3D;
using VideoCommon::Offset2D;
using VideoCommon::Region2D;
class ProgramManager;
class Framebuffer;
class ImageView;
class BlitImageHelper {
public:
explicit BlitImageHelper(ProgramManager& program_manager);
~BlitImageHelper();
void BlitColor(GLuint dst_framebuffer, GLuint src_image_view, GLuint src_sampler,
const Region2D& dst_region, const Region2D& src_region,
const Extent3D& src_size);
private:
ProgramManager& program_manager;
OGLProgram full_screen_vert;
OGLProgram blit_color_to_color_frag;
};
} // namespace OpenGL

View File

@@ -160,6 +160,10 @@ public:
return device.CanReportMemoryUsage();
}
u32 GetStorageBufferAlignment() const {
return static_cast<u32>(device.GetShaderStorageBufferAlignment());
}
private:
static constexpr std::array PABO_LUT{
GL_VERTEX_PROGRAM_PARAMETER_BUFFER_NV, GL_TESS_CONTROL_PROGRAM_PARAMETER_BUFFER_NV,

View File

@@ -166,7 +166,6 @@ Device::Device(Core::Frontend::EmuWindow& emu_window) {
has_shader_int64 = HasExtension(extensions, "GL_ARB_gpu_shader_int64");
has_amd_shader_half_float = GLAD_GL_AMD_gpu_shader_half_float;
has_sparse_texture_2 = GLAD_GL_ARB_sparse_texture2;
has_draw_texture = GLAD_GL_NV_draw_texture;
warp_size_potentially_larger_than_guest = !is_nvidia && !is_intel;
need_fastmath_off = is_nvidia;
can_report_memory = GLAD_GL_NVX_gpu_memory_info;

View File

@@ -4,8 +4,6 @@
#pragma once
#include <cstddef>
#include <string>
#include "common/common_types.h"
#include "core/frontend/emu_window.h"
#include "shader_recompiler/stage.h"
@@ -148,10 +146,6 @@ public:
return has_sparse_texture_2;
}
bool HasDrawTexture() const {
return has_draw_texture;
}
bool IsWarpSizePotentiallyLargerThanGuest() const {
return warp_size_potentially_larger_than_guest;
}
@@ -222,7 +216,6 @@ private:
bool has_shader_int64{};
bool has_amd_shader_half_float{};
bool has_sparse_texture_2{};
bool has_draw_texture{};
bool warp_size_potentially_larger_than_guest{};
bool need_fastmath_off{};
bool has_cbuf_ftou_bug{};

View File

@@ -64,8 +64,7 @@ RasterizerOpenGL::RasterizerOpenGL(Core::Frontend::EmuWindow& emu_window_, Tegra
shader_cache(*this, emu_window_, device, texture_cache, buffer_cache, program_manager,
state_tracker, gpu.ShaderNotify()),
query_cache(*this), accelerate_dma(buffer_cache),
fence_manager(*this, gpu, texture_cache, buffer_cache, query_cache),
blit_image(program_manager_) {}
fence_manager(*this, gpu, texture_cache, buffer_cache, query_cache) {}
RasterizerOpenGL::~RasterizerOpenGL() = default;
@@ -319,47 +318,6 @@ void RasterizerOpenGL::DrawIndirect() {
buffer_cache.SetDrawIndirect(nullptr);
}
void RasterizerOpenGL::DrawTexture() {
MICROPROFILE_SCOPE(OpenGL_Drawing);
SCOPE_EXIT({ gpu.TickWork(); });
query_cache.UpdateCounters();
texture_cache.SynchronizeGraphicsDescriptors();
texture_cache.UpdateRenderTargets(false);
SyncState();
const auto& draw_texture_state = maxwell3d->draw_manager->GetDrawTextureState();
const auto& sampler = texture_cache.GetGraphicsSampler(draw_texture_state.src_sampler);
const auto& texture = texture_cache.GetImageView(draw_texture_state.src_texture);
if (device.HasDrawTexture()) {
state_tracker.BindFramebuffer(texture_cache.GetFramebuffer()->Handle());
glDrawTextureNV(texture.DefaultHandle(), sampler->Handle(), draw_texture_state.dst_x0,
draw_texture_state.dst_y0, draw_texture_state.dst_x1,
draw_texture_state.dst_y1, 0,
draw_texture_state.src_x0 / static_cast<float>(texture.size.width),
draw_texture_state.src_y0 / static_cast<float>(texture.size.height),
draw_texture_state.src_x1 / static_cast<float>(texture.size.width),
draw_texture_state.src_y1 / static_cast<float>(texture.size.height));
} else {
Region2D dst_region = {Offset2D{.x = static_cast<s32>(draw_texture_state.dst_x0),
.y = static_cast<s32>(draw_texture_state.dst_y0)},
Offset2D{.x = static_cast<s32>(draw_texture_state.dst_x1),
.y = static_cast<s32>(draw_texture_state.dst_y1)}};
Region2D src_region = {Offset2D{.x = static_cast<s32>(draw_texture_state.src_x0),
.y = static_cast<s32>(draw_texture_state.src_y0)},
Offset2D{.x = static_cast<s32>(draw_texture_state.src_x1),
.y = static_cast<s32>(draw_texture_state.src_y1)}};
blit_image.BlitColor(texture_cache.GetFramebuffer()->Handle(), texture.DefaultHandle(),
sampler->Handle(), dst_region, src_region, texture.size);
}
++num_queued_commands;
}
void RasterizerOpenGL::DispatchCompute() {
ComputePipeline* const pipeline{shader_cache.CurrentComputePipeline()};
if (!pipeline) {

View File

@@ -16,7 +16,6 @@
#include "video_core/engines/maxwell_dma.h"
#include "video_core/rasterizer_accelerated.h"
#include "video_core/rasterizer_interface.h"
#include "video_core/renderer_opengl/blit_image.h"
#include "video_core/renderer_opengl/gl_buffer_cache.h"
#include "video_core/renderer_opengl/gl_device.h"
#include "video_core/renderer_opengl/gl_fence_manager.h"
@@ -71,7 +70,6 @@ public:
void Draw(bool is_indexed, u32 instance_count) override;
void DrawIndirect() override;
void DrawTexture() override;
void Clear(u32 layer_count) override;
void DispatchCompute() override;
void ResetCounter(VideoCore::QueryType type) override;
@@ -226,8 +224,6 @@ private:
AccelerateDMA accelerate_dma;
FenceManagerOpenGL fence_manager;
BlitImageHelper blit_image;
boost::container::static_vector<u32, MAX_IMAGE_VIEWS> image_view_indices;
std::array<ImageViewId, MAX_IMAGE_VIEWS> image_view_ids;
boost::container::static_vector<GLuint, MAX_TEXTURES> sampler_handles;

View File

@@ -236,6 +236,8 @@ ShaderCache::ShaderCache(RasterizerOpenGL& rasterizer_, Core::Frontend::EmuWindo
.needs_demote_reorder = device.IsAmd(),
.support_snorm_render_buffer = false,
.support_viewport_index_layer = device.HasVertexViewportLayer(),
.min_ssbo_alignment = static_cast<u32>(device.GetShaderStorageBufferAlignment()),
.support_geometry_shader_passthrough = device.HasGeometryShaderPassthrough(),
} {
if (use_asynchronous_shaders) {
workers = CreateWorkers();

View File

@@ -1,123 +1,2 @@
// SPDX-FileCopyrightText: Copyright 2018 yuzu Emulator Project
// SPDX-License-Identifier: GPL-2.0-or-later
#include <glad/glad.h>
#include "video_core/renderer_opengl/gl_shader_manager.h"
namespace OpenGL {
static constexpr std::array ASSEMBLY_PROGRAM_ENUMS{
GL_VERTEX_PROGRAM_NV, GL_TESS_CONTROL_PROGRAM_NV, GL_TESS_EVALUATION_PROGRAM_NV,
GL_GEOMETRY_PROGRAM_NV, GL_FRAGMENT_PROGRAM_NV,
};
ProgramManager::ProgramManager(const Device& device) {
glCreateProgramPipelines(1, &pipeline.handle);
if (device.UseAssemblyShaders()) {
glEnable(GL_COMPUTE_PROGRAM_NV);
}
}
void ProgramManager::BindComputeProgram(GLuint program) {
glUseProgram(program);
is_compute_bound = true;
}
void ProgramManager::BindComputeAssemblyProgram(GLuint program) {
if (current_assembly_compute_program != program) {
current_assembly_compute_program = program;
glBindProgramARB(GL_COMPUTE_PROGRAM_NV, program);
}
UnbindPipeline();
}
void ProgramManager::BindSourcePrograms(std::span<const OGLProgram, NUM_STAGES> programs) {
static constexpr std::array<GLenum, 5> stage_enums{
GL_VERTEX_SHADER_BIT, GL_TESS_CONTROL_SHADER_BIT, GL_TESS_EVALUATION_SHADER_BIT,
GL_GEOMETRY_SHADER_BIT, GL_FRAGMENT_SHADER_BIT,
};
for (size_t stage = 0; stage < NUM_STAGES; ++stage) {
if (current_programs[stage] != programs[stage].handle) {
current_programs[stage] = programs[stage].handle;
glUseProgramStages(pipeline.handle, stage_enums[stage], programs[stage].handle);
}
}
BindPipeline();
}
void ProgramManager::BindPresentPrograms(GLuint vertex, GLuint fragment) {
if (current_programs[0] != vertex) {
current_programs[0] = vertex;
glUseProgramStages(pipeline.handle, GL_VERTEX_SHADER_BIT, vertex);
}
if (current_programs[4] != fragment) {
current_programs[4] = fragment;
glUseProgramStages(pipeline.handle, GL_FRAGMENT_SHADER_BIT, fragment);
}
glUseProgramStages(
pipeline.handle,
GL_TESS_CONTROL_SHADER_BIT | GL_TESS_EVALUATION_SHADER_BIT | GL_GEOMETRY_SHADER_BIT, 0);
current_programs[1] = 0;
current_programs[2] = 0;
current_programs[3] = 0;
if (current_stage_mask != 0) {
current_stage_mask = 0;
for (const GLenum program_type : ASSEMBLY_PROGRAM_ENUMS) {
glDisable(program_type);
}
}
BindPipeline();
}
void ProgramManager::BindAssemblyPrograms(std::span<const OGLAssemblyProgram, NUM_STAGES> programs,
u32 stage_mask) {
const u32 changed_mask = current_stage_mask ^ stage_mask;
current_stage_mask = stage_mask;
if (changed_mask != 0) {
for (size_t stage = 0; stage < NUM_STAGES; ++stage) {
if (((changed_mask >> stage) & 1) != 0) {
if (((stage_mask >> stage) & 1) != 0) {
glEnable(ASSEMBLY_PROGRAM_ENUMS[stage]);
} else {
glDisable(ASSEMBLY_PROGRAM_ENUMS[stage]);
}
}
}
}
for (size_t stage = 0; stage < NUM_STAGES; ++stage) {
if (current_programs[stage] != programs[stage].handle) {
current_programs[stage] = programs[stage].handle;
glBindProgramARB(ASSEMBLY_PROGRAM_ENUMS[stage], programs[stage].handle);
}
}
UnbindPipeline();
}
void ProgramManager::RestoreGuestCompute() {}
void ProgramManager::BindPipeline() {
if (!is_pipeline_bound) {
is_pipeline_bound = true;
glBindProgramPipeline(pipeline.handle);
}
UnbindCompute();
}
void ProgramManager::UnbindPipeline() {
if (is_pipeline_bound) {
is_pipeline_bound = false;
glBindProgramPipeline(0);
}
UnbindCompute();
}
void ProgramManager::UnbindCompute() {
if (is_compute_bound) {
is_compute_bound = false;
glUseProgram(0);
}
}
} // namespace OpenGL

View File

@@ -6,6 +6,8 @@
#include <array>
#include <span>
#include <glad/glad.h>
#include "video_core/renderer_opengl/gl_device.h"
#include "video_core/renderer_opengl/gl_resource_manager.h"
@@ -14,28 +16,121 @@ namespace OpenGL {
class ProgramManager {
static constexpr size_t NUM_STAGES = 5;
static constexpr std::array ASSEMBLY_PROGRAM_ENUMS{
GL_VERTEX_PROGRAM_NV, GL_TESS_CONTROL_PROGRAM_NV, GL_TESS_EVALUATION_PROGRAM_NV,
GL_GEOMETRY_PROGRAM_NV, GL_FRAGMENT_PROGRAM_NV,
};
public:
explicit ProgramManager(const Device& device);
explicit ProgramManager(const Device& device) {
glCreateProgramPipelines(1, &pipeline.handle);
if (device.UseAssemblyShaders()) {
glEnable(GL_COMPUTE_PROGRAM_NV);
}
}
void BindComputeProgram(GLuint program);
void BindComputeProgram(GLuint program) {
glUseProgram(program);
is_compute_bound = true;
}
void BindComputeAssemblyProgram(GLuint program);
void BindComputeAssemblyProgram(GLuint program) {
if (current_assembly_compute_program != program) {
current_assembly_compute_program = program;
glBindProgramARB(GL_COMPUTE_PROGRAM_NV, program);
}
UnbindPipeline();
}
void BindSourcePrograms(std::span<const OGLProgram, NUM_STAGES> programs);
void BindSourcePrograms(std::span<const OGLProgram, NUM_STAGES> programs) {
static constexpr std::array<GLenum, 5> stage_enums{
GL_VERTEX_SHADER_BIT, GL_TESS_CONTROL_SHADER_BIT, GL_TESS_EVALUATION_SHADER_BIT,
GL_GEOMETRY_SHADER_BIT, GL_FRAGMENT_SHADER_BIT,
};
for (size_t stage = 0; stage < NUM_STAGES; ++stage) {
if (current_programs[stage] != programs[stage].handle) {
current_programs[stage] = programs[stage].handle;
glUseProgramStages(pipeline.handle, stage_enums[stage], programs[stage].handle);
}
}
BindPipeline();
}
void BindPresentPrograms(GLuint vertex, GLuint fragment);
void BindPresentPrograms(GLuint vertex, GLuint fragment) {
if (current_programs[0] != vertex) {
current_programs[0] = vertex;
glUseProgramStages(pipeline.handle, GL_VERTEX_SHADER_BIT, vertex);
}
if (current_programs[4] != fragment) {
current_programs[4] = fragment;
glUseProgramStages(pipeline.handle, GL_FRAGMENT_SHADER_BIT, fragment);
}
glUseProgramStages(
pipeline.handle,
GL_TESS_CONTROL_SHADER_BIT | GL_TESS_EVALUATION_SHADER_BIT | GL_GEOMETRY_SHADER_BIT, 0);
current_programs[1] = 0;
current_programs[2] = 0;
current_programs[3] = 0;
if (current_stage_mask != 0) {
current_stage_mask = 0;
for (const GLenum program_type : ASSEMBLY_PROGRAM_ENUMS) {
glDisable(program_type);
}
}
BindPipeline();
}
void BindAssemblyPrograms(std::span<const OGLAssemblyProgram, NUM_STAGES> programs,
u32 stage_mask);
u32 stage_mask) {
const u32 changed_mask = current_stage_mask ^ stage_mask;
current_stage_mask = stage_mask;
void RestoreGuestCompute();
if (changed_mask != 0) {
for (size_t stage = 0; stage < NUM_STAGES; ++stage) {
if (((changed_mask >> stage) & 1) != 0) {
if (((stage_mask >> stage) & 1) != 0) {
glEnable(ASSEMBLY_PROGRAM_ENUMS[stage]);
} else {
glDisable(ASSEMBLY_PROGRAM_ENUMS[stage]);
}
}
}
}
for (size_t stage = 0; stage < NUM_STAGES; ++stage) {
if (current_programs[stage] != programs[stage].handle) {
current_programs[stage] = programs[stage].handle;
glBindProgramARB(ASSEMBLY_PROGRAM_ENUMS[stage], programs[stage].handle);
}
}
UnbindPipeline();
}
void RestoreGuestCompute() {}
private:
void BindPipeline();
void BindPipeline() {
if (!is_pipeline_bound) {
is_pipeline_bound = true;
glBindProgramPipeline(pipeline.handle);
}
UnbindCompute();
}
void UnbindPipeline();
void UnbindPipeline() {
if (is_pipeline_bound) {
is_pipeline_bound = false;
glBindProgramPipeline(0);
}
UnbindCompute();
}
void UnbindCompute();
void UnbindCompute() {
if (is_compute_bound) {
is_compute_bound = false;
glUseProgram(0);
}
}
OGLPipeline pipeline;
bool is_pipeline_bound{};

View File

@@ -442,7 +442,13 @@ void RendererOpenGL::DrawScreen(const Layout::FramebufferLayout& layout) {
glBindTextureUnit(0, screen_info.display_texture);
const auto anti_aliasing = Settings::values.anti_aliasing.GetValue();
auto anti_aliasing = Settings::values.anti_aliasing.GetValue();
if (anti_aliasing > Settings::AntiAliasing::LastAA) {
LOG_ERROR(Render_OpenGL, "Invalid antialiasing option selected {}", anti_aliasing);
anti_aliasing = Settings::AntiAliasing::None;
Settings::values.anti_aliasing.SetValue(anti_aliasing);
}
if (anti_aliasing != Settings::AntiAliasing::None) {
glEnablei(GL_SCISSOR_TEST, 0);
auto viewport_width = screen_info.texture.width;

View File

@@ -4,13 +4,13 @@
#include <algorithm>
#include "common/settings.h"
#include "video_core/host_shaders/blit_color_float_frag_spv.h"
#include "video_core/host_shaders/convert_abgr8_to_d24s8_frag_spv.h"
#include "video_core/host_shaders/convert_d24s8_to_abgr8_frag_spv.h"
#include "video_core/host_shaders/convert_depth_to_float_frag_spv.h"
#include "video_core/host_shaders/convert_float_to_depth_frag_spv.h"
#include "video_core/host_shaders/convert_s8d24_to_abgr8_frag_spv.h"
#include "video_core/host_shaders/full_screen_triangle_vert_spv.h"
#include "video_core/host_shaders/vulkan_blit_color_float_frag_spv.h"
#include "video_core/host_shaders/vulkan_blit_depth_stencil_frag_spv.h"
#include "video_core/renderer_vulkan/blit_image.h"
#include "video_core/renderer_vulkan/maxwell_to_vk.h"
@@ -303,7 +303,7 @@ void UpdateTwoTexturesDescriptorSet(const Device& device, VkDescriptorSet descri
}
void BindBlitState(vk::CommandBuffer cmdbuf, VkPipelineLayout layout, const Region2D& dst_region,
const Region2D& src_region, const Extent3D& src_size = {1, 1, 1}) {
const Region2D& src_region) {
const VkOffset2D offset{
.x = std::min(dst_region.start.x, dst_region.end.x),
.y = std::min(dst_region.start.y, dst_region.end.y),
@@ -325,15 +325,12 @@ void BindBlitState(vk::CommandBuffer cmdbuf, VkPipelineLayout layout, const Regi
.offset = offset,
.extent = extent,
};
const float scale_x = static_cast<float>(src_region.end.x - src_region.start.x) /
static_cast<float>(src_size.width);
const float scale_y = static_cast<float>(src_region.end.y - src_region.start.y) /
static_cast<float>(src_size.height);
const float scale_x = static_cast<float>(src_region.end.x - src_region.start.x);
const float scale_y = static_cast<float>(src_region.end.y - src_region.start.y);
const PushConstants push_constants{
.tex_scale = {scale_x, scale_y},
.tex_offset = {static_cast<float>(src_region.start.x) / static_cast<float>(src_size.width),
static_cast<float>(src_region.start.y) /
static_cast<float>(src_size.height)},
.tex_offset = {static_cast<float>(src_region.start.x),
static_cast<float>(src_region.start.y)},
};
cmdbuf.SetViewport(0, viewport);
cmdbuf.SetScissor(0, scissor);
@@ -350,51 +347,6 @@ VkExtent2D GetConversionExtent(const ImageView& src_image_view) {
.height = is_rescaled ? resolution.ScaleUp(height) : height,
};
}
void TransitionImageLayout(vk::CommandBuffer& cmdbuf, VkImage image, VkImageLayout target_layout,
VkImageLayout source_layout = VK_IMAGE_LAYOUT_GENERAL) {
constexpr VkFlags flags{VK_ACCESS_COLOR_ATTACHMENT_READ_BIT |
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_SHADER_READ_BIT};
const VkImageMemoryBarrier barrier{
.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
.pNext = nullptr,
.srcAccessMask = flags,
.dstAccessMask = flags,
.oldLayout = source_layout,
.newLayout = target_layout,
.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
.image = image,
.subresourceRange{
.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
.baseMipLevel = 0,
.levelCount = 1,
.baseArrayLayer = 0,
.layerCount = 1,
},
};
cmdbuf.PipelineBarrier(VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, VK_PIPELINE_STAGE_ALL_COMMANDS_BIT,
0, barrier);
}
void BeginRenderPass(vk::CommandBuffer& cmdbuf, const Framebuffer* framebuffer) {
const VkRenderPass render_pass = framebuffer->RenderPass();
const VkFramebuffer framebuffer_handle = framebuffer->Handle();
const VkExtent2D render_area = framebuffer->RenderArea();
const VkRenderPassBeginInfo renderpass_bi{
.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
.pNext = nullptr,
.renderPass = render_pass,
.framebuffer = framebuffer_handle,
.renderArea{
.offset{},
.extent = render_area,
},
.clearValueCount = 0,
.pClearValues = nullptr,
};
cmdbuf.BeginRenderPass(renderpass_bi, VK_SUBPASS_CONTENTS_INLINE);
}
} // Anonymous namespace
BlitImageHelper::BlitImageHelper(const Device& device_, Scheduler& scheduler_,
@@ -413,7 +365,7 @@ BlitImageHelper::BlitImageHelper(const Device& device_, Scheduler& scheduler_,
two_textures_pipeline_layout(device.GetLogical().CreatePipelineLayout(
PipelineLayoutCreateInfo(two_textures_set_layout.address()))),
full_screen_vert(BuildShader(device, FULL_SCREEN_TRIANGLE_VERT_SPV)),
blit_color_to_color_frag(BuildShader(device, BLIT_COLOR_FLOAT_FRAG_SPV)),
blit_color_to_color_frag(BuildShader(device, VULKAN_BLIT_COLOR_FLOAT_FRAG_SPV)),
blit_depth_stencil_frag(BuildShader(device, VULKAN_BLIT_DEPTH_STENCIL_FRAG_SPV)),
convert_depth_to_float_frag(BuildShader(device, CONVERT_DEPTH_TO_FLOAT_FRAG_SPV)),
convert_float_to_depth_frag(BuildShader(device, CONVERT_FLOAT_TO_DEPTH_FRAG_SPV)),
@@ -452,32 +404,6 @@ void BlitImageHelper::BlitColor(const Framebuffer* dst_framebuffer, VkImageView
scheduler.InvalidateState();
}
void BlitImageHelper::BlitColor(const Framebuffer* dst_framebuffer, VkImageView src_image_view,
VkImage src_image, VkSampler src_sampler,
const Region2D& dst_region, const Region2D& src_region,
const Extent3D& src_size) {
const BlitImagePipelineKey key{
.renderpass = dst_framebuffer->RenderPass(),
.operation = Tegra::Engines::Fermi2D::Operation::SrcCopy,
};
const VkPipelineLayout layout = *one_texture_pipeline_layout;
const VkPipeline pipeline = FindOrEmplaceColorPipeline(key);
scheduler.RequestOutsideRenderPassOperationContext();
scheduler.Record([this, dst_framebuffer, src_image_view, src_image, src_sampler, dst_region,
src_region, src_size, pipeline, layout](vk::CommandBuffer cmdbuf) {
TransitionImageLayout(cmdbuf, src_image, VK_IMAGE_LAYOUT_READ_ONLY_OPTIMAL);
BeginRenderPass(cmdbuf, dst_framebuffer);
const VkDescriptorSet descriptor_set = one_texture_descriptor_allocator.Commit();
UpdateOneTextureDescriptorSet(device, descriptor_set, src_sampler, src_image_view);
cmdbuf.BindPipeline(VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);
cmdbuf.BindDescriptorSets(VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 0, descriptor_set,
nullptr);
BindBlitState(cmdbuf, layout, dst_region, src_region, src_size);
cmdbuf.Draw(3, 1, 0, 0);
cmdbuf.EndRenderPass();
});
}
void BlitImageHelper::BlitDepthStencil(const Framebuffer* dst_framebuffer,
VkImageView src_depth_view, VkImageView src_stencil_view,
const Region2D& dst_region, const Region2D& src_region,

View File

@@ -10,8 +10,6 @@
namespace Vulkan {
using VideoCommon::Extent3D;
using VideoCommon::Offset2D;
using VideoCommon::Region2D;
class Device;
@@ -38,10 +36,6 @@ public:
Tegra::Engines::Fermi2D::Filter filter,
Tegra::Engines::Fermi2D::Operation operation);
void BlitColor(const Framebuffer* dst_framebuffer, VkImageView src_image_view,
VkImage src_image, VkSampler src_sampler, const Region2D& dst_region,
const Region2D& src_region, const Extent3D& src_size);
void BlitDepthStencil(const Framebuffer* dst_framebuffer, VkImageView src_depth_view,
VkImageView src_stencil_view, const Region2D& dst_region,
const Region2D& src_region, Tegra::Engines::Fermi2D::Filter filter,

View File

@@ -148,7 +148,7 @@ void FixedPipelineState::Refresh(Tegra::Engines::Maxwell3D& maxwell3d, DynamicFe
});
}
if (!extended_dynamic_state_2_extra) {
dynamic_state.Refresh2(regs, topology, extended_dynamic_state_2);
dynamic_state.Refresh2(regs, topology_, extended_dynamic_state_2);
}
if (!extended_dynamic_state_3_blend) {
if (maxwell3d.dirty.flags[Dirty::Blending]) {

View File

@@ -78,6 +78,8 @@ std::string BuildCommaSeparatedExtensions(std::vector<std::string> available_ext
return separated_extensions;
}
} // Anonymous namespace
Device CreateDevice(const vk::Instance& instance, const vk::InstanceDispatch& dld,
VkSurfaceKHR surface) {
const std::vector<VkPhysicalDevice> devices = instance.EnumeratePhysicalDevices();
@@ -89,7 +91,6 @@ Device CreateDevice(const vk::Instance& instance, const vk::InstanceDispatch& dl
const vk::PhysicalDevice physical_device(devices[device_index], dld);
return Device(*instance, physical_device, surface, dld);
}
} // Anonymous namespace
RendererVulkan::RendererVulkan(Core::TelemetrySession& telemetry_session_,
Core::Frontend::EmuWindow& emu_window,
@@ -98,7 +99,7 @@ RendererVulkan::RendererVulkan(Core::TelemetrySession& telemetry_session_,
: RendererBase(emu_window, std::move(context_)), telemetry_session(telemetry_session_),
cpu_memory(cpu_memory_), gpu(gpu_), library(OpenLibrary()),
instance(CreateInstance(library, dld, VK_API_VERSION_1_1, render_window.GetWindowInfo().type,
true, Settings::values.renderer_debug.GetValue())),
Settings::values.renderer_debug.GetValue())),
debug_callback(Settings::values.renderer_debug ? CreateDebugCallback(instance) : nullptr),
surface(CreateSurface(instance, render_window)),
device(CreateDevice(instance, dld, *surface)), memory_allocator(device, false),
@@ -109,6 +110,9 @@ RendererVulkan::RendererVulkan(Core::TelemetrySession& telemetry_session_,
screen_info),
rasterizer(render_window, gpu, cpu_memory, screen_info, device, memory_allocator,
state_tracker, scheduler) {
if (Settings::values.renderer_force_max_clock.GetValue()) {
turbo_mode.emplace(instance, dld);
}
Report();
} catch (const vk::Exception& exception) {
LOG_ERROR(Render_Vulkan, "Vulkan initialization failed with error: {}", exception.what());

View File

@@ -13,6 +13,7 @@
#include "video_core/renderer_vulkan/vk_scheduler.h"
#include "video_core/renderer_vulkan/vk_state_tracker.h"
#include "video_core/renderer_vulkan/vk_swapchain.h"
#include "video_core/renderer_vulkan/vk_turbo_mode.h"
#include "video_core/vulkan_common/vulkan_device.h"
#include "video_core/vulkan_common/vulkan_memory_allocator.h"
#include "video_core/vulkan_common/vulkan_wrapper.h"
@@ -31,6 +32,9 @@ class GPU;
namespace Vulkan {
Device CreateDevice(const vk::Instance& instance, const vk::InstanceDispatch& dld,
VkSurfaceKHR surface);
class RendererVulkan final : public VideoCore::RendererBase {
public:
explicit RendererVulkan(Core::TelemetrySession& telemtry_session,
@@ -74,6 +78,7 @@ private:
Swapchain swapchain;
BlitScreen blit_screen;
RasterizerVulkan rasterizer;
std::optional<TurboMode> turbo_mode;
};
} // namespace Vulkan

View File

@@ -330,12 +330,19 @@ bool BufferCacheRuntime::CanReportMemoryUsage() const {
return device.CanReportMemoryUsage();
}
u32 BufferCacheRuntime::GetStorageBufferAlignment() const {
return static_cast<u32>(device.GetStorageBufferAlignment());
}
void BufferCacheRuntime::Finish() {
scheduler.Finish();
}
void BufferCacheRuntime::CopyBuffer(VkBuffer dst_buffer, VkBuffer src_buffer,
std::span<const VideoCommon::BufferCopy> copies, bool barrier) {
if (dst_buffer == VK_NULL_HANDLE || src_buffer == VK_NULL_HANDLE) {
return;
}
static constexpr VkMemoryBarrier READ_BARRIER{
.sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER,
.pNext = nullptr,
@@ -394,6 +401,9 @@ void BufferCacheRuntime::PostCopyBarrier() {
}
void BufferCacheRuntime::ClearBuffer(VkBuffer dest_buffer, u32 offset, size_t size, u32 value) {
if (dest_buffer == VK_NULL_HANDLE) {
return;
}
static constexpr VkMemoryBarrier READ_BARRIER{
.sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER,
.pNext = nullptr,
@@ -473,6 +483,11 @@ void BufferCacheRuntime::BindVertexBuffer(u32 index, VkBuffer buffer, u32 offset
cmdbuf.BindVertexBuffers2EXT(index, 1, &buffer, &vk_offset, &vk_size, &vk_stride);
});
} else {
if (!device.HasNullDescriptor() && buffer == VK_NULL_HANDLE) {
ReserveNullBuffer();
buffer = *null_buffer;
offset = 0;
}
scheduler.Record([index, buffer, offset](vk::CommandBuffer cmdbuf) {
cmdbuf.BindVertexBuffer(index, buffer, offset);
});

View File

@@ -73,6 +73,8 @@ public:
bool CanReportMemoryUsage() const;
u32 GetStorageBufferAlignment() const;
[[nodiscard]] StagingBufferRef UploadStagingBuffer(size_t size);
[[nodiscard]] StagingBufferRef DownloadStagingBuffer(size_t size);

View File

@@ -24,13 +24,15 @@ using Shader::ImageBufferDescriptor;
using Shader::Backend::SPIRV::RESCALING_LAYOUT_WORDS_OFFSET;
using Tegra::Texture::TexturePair;
ComputePipeline::ComputePipeline(const Device& device_, DescriptorPool& descriptor_pool,
ComputePipeline::ComputePipeline(const Device& device_, vk::PipelineCache& pipeline_cache_,
DescriptorPool& descriptor_pool,
UpdateDescriptorQueue& update_descriptor_queue_,
Common::ThreadWorker* thread_worker,
PipelineStatistics* pipeline_statistics,
VideoCore::ShaderNotify* shader_notify, const Shader::Info& info_,
vk::ShaderModule spv_module_)
: device{device_}, update_descriptor_queue{update_descriptor_queue_}, info{info_},
: device{device_}, pipeline_cache(pipeline_cache_),
update_descriptor_queue{update_descriptor_queue_}, info{info_},
spv_module(std::move(spv_module_)) {
if (shader_notify) {
shader_notify->MarkShaderBuilding();
@@ -56,23 +58,27 @@ ComputePipeline::ComputePipeline(const Device& device_, DescriptorPool& descript
if (device.IsKhrPipelineExecutablePropertiesEnabled()) {
flags |= VK_PIPELINE_CREATE_CAPTURE_STATISTICS_BIT_KHR;
}
pipeline = device.GetLogical().CreateComputePipeline({
.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO,
.pNext = nullptr,
.flags = flags,
.stage{
.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,
.pNext = device.IsExtSubgroupSizeControlSupported() ? &subgroup_size_ci : nullptr,
.flags = 0,
.stage = VK_SHADER_STAGE_COMPUTE_BIT,
.module = *spv_module,
.pName = "main",
.pSpecializationInfo = nullptr,
pipeline = device.GetLogical().CreateComputePipeline(
{
.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO,
.pNext = nullptr,
.flags = flags,
.stage{
.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,
.pNext =
device.IsExtSubgroupSizeControlSupported() ? &subgroup_size_ci : nullptr,
.flags = 0,
.stage = VK_SHADER_STAGE_COMPUTE_BIT,
.module = *spv_module,
.pName = "main",
.pSpecializationInfo = nullptr,
},
.layout = *pipeline_layout,
.basePipelineHandle = 0,
.basePipelineIndex = 0,
},
.layout = *pipeline_layout,
.basePipelineHandle = 0,
.basePipelineIndex = 0,
});
*pipeline_cache);
if (pipeline_statistics) {
pipeline_statistics->Collect(*pipeline);
}

View File

@@ -28,7 +28,8 @@ class Scheduler;
class ComputePipeline {
public:
explicit ComputePipeline(const Device& device, DescriptorPool& descriptor_pool,
explicit ComputePipeline(const Device& device, vk::PipelineCache& pipeline_cache,
DescriptorPool& descriptor_pool,
UpdateDescriptorQueue& update_descriptor_queue,
Common::ThreadWorker* thread_worker,
PipelineStatistics* pipeline_statistics,
@@ -46,6 +47,7 @@ public:
private:
const Device& device;
vk::PipelineCache& pipeline_cache;
UpdateDescriptorQueue& update_descriptor_queue;
Shader::Info info;

View File

@@ -234,13 +234,14 @@ ConfigureFuncPtr ConfigureFunc(const std::array<vk::ShaderModule, NUM_STAGES>& m
GraphicsPipeline::GraphicsPipeline(
Scheduler& scheduler_, BufferCache& buffer_cache_, TextureCache& texture_cache_,
VideoCore::ShaderNotify* shader_notify, const Device& device_, DescriptorPool& descriptor_pool,
vk::PipelineCache& pipeline_cache_, VideoCore::ShaderNotify* shader_notify,
const Device& device_, DescriptorPool& descriptor_pool,
UpdateDescriptorQueue& update_descriptor_queue_, Common::ThreadWorker* worker_thread,
PipelineStatistics* pipeline_statistics, RenderPassCache& render_pass_cache,
const GraphicsPipelineCacheKey& key_, std::array<vk::ShaderModule, NUM_STAGES> stages,
const std::array<const Shader::Info*, NUM_STAGES>& infos)
: key{key_}, device{device_}, texture_cache{texture_cache_},
buffer_cache{buffer_cache_}, scheduler{scheduler_},
: key{key_}, device{device_}, texture_cache{texture_cache_}, buffer_cache{buffer_cache_},
pipeline_cache(pipeline_cache_), scheduler{scheduler_},
update_descriptor_queue{update_descriptor_queue_}, spv_modules{std::move(stages)} {
if (shader_notify) {
shader_notify->MarkShaderBuilding();
@@ -644,12 +645,15 @@ void GraphicsPipeline::MakePipeline(VkRenderPass render_pass) {
.pNext = nullptr,
.flags = 0,
.topology = input_assembly_topology,
.primitiveRestartEnable = dynamic.primitive_restart_enable != 0 &&
((input_assembly_topology != VK_PRIMITIVE_TOPOLOGY_PATCH_LIST &&
device.IsTopologyListPrimitiveRestartSupported()) ||
SupportsPrimitiveRestart(input_assembly_topology) ||
(input_assembly_topology == VK_PRIMITIVE_TOPOLOGY_PATCH_LIST &&
device.IsPatchListPrimitiveRestartSupported())),
.primitiveRestartEnable =
dynamic.primitive_restart_enable != 0 &&
((input_assembly_topology != VK_PRIMITIVE_TOPOLOGY_PATCH_LIST &&
device.IsTopologyListPrimitiveRestartSupported()) ||
SupportsPrimitiveRestart(input_assembly_topology) ||
(input_assembly_topology == VK_PRIMITIVE_TOPOLOGY_PATCH_LIST &&
device.IsPatchListPrimitiveRestartSupported()))
? VK_TRUE
: VK_FALSE,
};
const VkPipelineTessellationStateCreateInfo tessellation_ci{
.sType = VK_STRUCTURE_TYPE_PIPELINE_TESSELLATION_STATE_CREATE_INFO,
@@ -699,7 +703,7 @@ void GraphicsPipeline::MakePipeline(VkRenderPass render_pass) {
.cullMode = static_cast<VkCullModeFlags>(
dynamic.cull_enable ? MaxwellToVK::CullFace(dynamic.CullFace()) : VK_CULL_MODE_NONE),
.frontFace = MaxwellToVK::FrontFace(dynamic.FrontFace()),
.depthBiasEnable = (dynamic.depth_bias_enable == 0 ? VK_TRUE : VK_FALSE),
.depthBiasEnable = (dynamic.depth_bias_enable != 0 ? VK_TRUE : VK_FALSE),
.depthBiasConstantFactor = 0.0f,
.depthBiasClamp = 0.0f,
.depthBiasSlopeFactor = 0.0f,
@@ -894,27 +898,29 @@ void GraphicsPipeline::MakePipeline(VkRenderPass render_pass) {
if (device.IsKhrPipelineExecutablePropertiesEnabled()) {
flags |= VK_PIPELINE_CREATE_CAPTURE_STATISTICS_BIT_KHR;
}
pipeline = device.GetLogical().CreateGraphicsPipeline({
.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO,
.pNext = nullptr,
.flags = flags,
.stageCount = static_cast<u32>(shader_stages.size()),
.pStages = shader_stages.data(),
.pVertexInputState = &vertex_input_ci,
.pInputAssemblyState = &input_assembly_ci,
.pTessellationState = &tessellation_ci,
.pViewportState = &viewport_ci,
.pRasterizationState = &rasterization_ci,
.pMultisampleState = &multisample_ci,
.pDepthStencilState = &depth_stencil_ci,
.pColorBlendState = &color_blend_ci,
.pDynamicState = &dynamic_state_ci,
.layout = *pipeline_layout,
.renderPass = render_pass,
.subpass = 0,
.basePipelineHandle = nullptr,
.basePipelineIndex = 0,
});
pipeline = device.GetLogical().CreateGraphicsPipeline(
{
.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO,
.pNext = nullptr,
.flags = flags,
.stageCount = static_cast<u32>(shader_stages.size()),
.pStages = shader_stages.data(),
.pVertexInputState = &vertex_input_ci,
.pInputAssemblyState = &input_assembly_ci,
.pTessellationState = &tessellation_ci,
.pViewportState = &viewport_ci,
.pRasterizationState = &rasterization_ci,
.pMultisampleState = &multisample_ci,
.pDepthStencilState = &depth_stencil_ci,
.pColorBlendState = &color_blend_ci,
.pDynamicState = &dynamic_state_ci,
.layout = *pipeline_layout,
.renderPass = render_pass,
.subpass = 0,
.basePipelineHandle = nullptr,
.basePipelineIndex = 0,
},
*pipeline_cache);
}
void GraphicsPipeline::Validate() {

View File

@@ -70,16 +70,14 @@ class GraphicsPipeline {
static constexpr size_t NUM_STAGES = Tegra::Engines::Maxwell3D::Regs::MaxShaderStage;
public:
explicit GraphicsPipeline(Scheduler& scheduler, BufferCache& buffer_cache,
TextureCache& texture_cache, VideoCore::ShaderNotify* shader_notify,
const Device& device, DescriptorPool& descriptor_pool,
UpdateDescriptorQueue& update_descriptor_queue,
Common::ThreadWorker* worker_thread,
PipelineStatistics* pipeline_statistics,
RenderPassCache& render_pass_cache,
const GraphicsPipelineCacheKey& key,
std::array<vk::ShaderModule, NUM_STAGES> stages,
const std::array<const Shader::Info*, NUM_STAGES>& infos);
explicit GraphicsPipeline(
Scheduler& scheduler, BufferCache& buffer_cache, TextureCache& texture_cache,
vk::PipelineCache& pipeline_cache, VideoCore::ShaderNotify* shader_notify,
const Device& device, DescriptorPool& descriptor_pool,
UpdateDescriptorQueue& update_descriptor_queue, Common::ThreadWorker* worker_thread,
PipelineStatistics* pipeline_statistics, RenderPassCache& render_pass_cache,
const GraphicsPipelineCacheKey& key, std::array<vk::ShaderModule, NUM_STAGES> stages,
const std::array<const Shader::Info*, NUM_STAGES>& infos);
GraphicsPipeline& operator=(GraphicsPipeline&&) noexcept = delete;
GraphicsPipeline(GraphicsPipeline&&) noexcept = delete;
@@ -133,6 +131,7 @@ private:
const Device& device;
TextureCache& texture_cache;
BufferCache& buffer_cache;
vk::PipelineCache& pipeline_cache;
Scheduler& scheduler;
UpdateDescriptorQueue& update_descriptor_queue;

View File

@@ -55,6 +55,7 @@ using VideoCommon::GenericEnvironment;
using VideoCommon::GraphicsEnvironment;
constexpr u32 CACHE_VERSION = 10;
constexpr std::array<char, 8> VULKAN_CACHE_MAGIC_NUMBER{'y', 'u', 'z', 'u', 'v', 'k', 'c', 'h'};
template <typename Container>
auto MakeSpan(Container& container) {
@@ -284,6 +285,7 @@ PipelineCache::PipelineCache(RasterizerVulkan& rasterizer_, const Device& device
render_pass_cache{render_pass_cache_}, buffer_cache{buffer_cache_},
texture_cache{texture_cache_}, shader_notify{shader_notify_},
use_asynchronous_shaders{Settings::values.use_asynchronous_shaders.GetValue()},
use_vulkan_pipeline_cache{Settings::values.use_vulkan_driver_pipeline_cache.GetValue()},
workers(std::max(std::thread::hardware_concurrency(), 2U) - 1, "VkPipelineBuilder"),
serialization_thread(1, "VkPipelineSerialization") {
const auto& float_control{device.FloatControlProperties()};
@@ -329,6 +331,7 @@ PipelineCache::PipelineCache(RasterizerVulkan& rasterizer_, const Device& device
.need_declared_frag_colors = false,
.has_broken_spirv_clamp = driver_id == VK_DRIVER_ID_INTEL_PROPRIETARY_WINDOWS,
.has_broken_spirv_position_input = driver_id == VK_DRIVER_ID_QUALCOMM_PROPRIETARY,
.has_broken_unsigned_image_offsets = false,
.has_broken_signed_operations = false,
.has_broken_fp16_float_controls = driver_id == VK_DRIVER_ID_NVIDIA_PROPRIETARY,
@@ -341,6 +344,8 @@ PipelineCache::PipelineCache(RasterizerVulkan& rasterizer_, const Device& device
driver_id == VK_DRIVER_ID_AMD_PROPRIETARY || driver_id == VK_DRIVER_ID_AMD_OPEN_SOURCE,
.support_snorm_render_buffer = true,
.support_viewport_index_layer = device.IsExtShaderViewportIndexLayerSupported(),
.min_ssbo_alignment = static_cast<u32>(device.GetStorageBufferAlignment()),
.support_geometry_shader_passthrough = device.IsNvGeometryShaderPassthroughSupported(),
};
if (device.GetMaxVertexInputAttributes() < Maxwell::NumVertexAttributes) {
@@ -362,7 +367,12 @@ PipelineCache::PipelineCache(RasterizerVulkan& rasterizer_, const Device& device
};
}
PipelineCache::~PipelineCache() = default;
PipelineCache::~PipelineCache() {
if (use_vulkan_pipeline_cache && !vulkan_pipeline_cache_filename.empty()) {
SerializeVulkanPipelineCache(vulkan_pipeline_cache_filename, vulkan_pipeline_cache,
CACHE_VERSION);
}
}
GraphicsPipeline* PipelineCache::CurrentGraphicsPipeline() {
MICROPROFILE_SCOPE(Vulkan_PipelineCache);
@@ -418,6 +428,12 @@ void PipelineCache::LoadDiskResources(u64 title_id, std::stop_token stop_loading
}
pipeline_cache_filename = base_dir / "vulkan.bin";
if (use_vulkan_pipeline_cache) {
vulkan_pipeline_cache_filename = base_dir / "vulkan_pipelines.bin";
vulkan_pipeline_cache =
LoadVulkanPipelineCache(vulkan_pipeline_cache_filename, CACHE_VERSION);
}
struct {
std::mutex mutex;
size_t total{};
@@ -496,6 +512,11 @@ void PipelineCache::LoadDiskResources(u64 title_id, std::stop_token stop_loading
workers.WaitForRequests(stop_loading);
if (use_vulkan_pipeline_cache) {
SerializeVulkanPipelineCache(vulkan_pipeline_cache_filename, vulkan_pipeline_cache,
CACHE_VERSION);
}
if (state.statistics) {
state.statistics->Report();
}
@@ -616,10 +637,10 @@ std::unique_ptr<GraphicsPipeline> PipelineCache::CreateGraphicsPipeline(
previous_stage = &program;
}
Common::ThreadWorker* const thread_worker{build_in_parallel ? &workers : nullptr};
return std::make_unique<GraphicsPipeline>(scheduler, buffer_cache, texture_cache,
&shader_notify, device, descriptor_pool,
update_descriptor_queue, thread_worker, statistics,
render_pass_cache, key, std::move(modules), infos);
return std::make_unique<GraphicsPipeline>(
scheduler, buffer_cache, texture_cache, vulkan_pipeline_cache, &shader_notify, device,
descriptor_pool, update_descriptor_queue, thread_worker, statistics, render_pass_cache, key,
std::move(modules), infos);
} catch (const Shader::Exception& exception) {
LOG_ERROR(Render_Vulkan, "{}", exception.what());
@@ -689,13 +710,108 @@ std::unique_ptr<ComputePipeline> PipelineCache::CreateComputePipeline(
spv_module.SetObjectNameEXT(name.c_str());
}
Common::ThreadWorker* const thread_worker{build_in_parallel ? &workers : nullptr};
return std::make_unique<ComputePipeline>(device, descriptor_pool, update_descriptor_queue,
thread_worker, statistics, &shader_notify,
program.info, std::move(spv_module));
return std::make_unique<ComputePipeline>(device, vulkan_pipeline_cache, descriptor_pool,
update_descriptor_queue, thread_worker, statistics,
&shader_notify, program.info, std::move(spv_module));
} catch (const Shader::Exception& exception) {
LOG_ERROR(Render_Vulkan, "{}", exception.what());
return nullptr;
}
void PipelineCache::SerializeVulkanPipelineCache(const std::filesystem::path& filename,
const vk::PipelineCache& pipeline_cache,
u32 cache_version) try {
std::ofstream file(filename, std::ios::binary);
file.exceptions(std::ifstream::failbit);
if (!file.is_open()) {
LOG_ERROR(Common_Filesystem, "Failed to open Vulkan driver pipeline cache file {}",
Common::FS::PathToUTF8String(filename));
return;
}
file.write(VULKAN_CACHE_MAGIC_NUMBER.data(), VULKAN_CACHE_MAGIC_NUMBER.size())
.write(reinterpret_cast<const char*>(&cache_version), sizeof(cache_version));
size_t cache_size = 0;
std::vector<char> cache_data;
if (pipeline_cache) {
pipeline_cache.Read(&cache_size, nullptr);
cache_data.resize(cache_size);
pipeline_cache.Read(&cache_size, cache_data.data());
}
file.write(cache_data.data(), cache_size);
LOG_INFO(Render_Vulkan, "Vulkan driver pipelines cached at: {}",
Common::FS::PathToUTF8String(filename));
} catch (const std::ios_base::failure& e) {
LOG_ERROR(Common_Filesystem, "{}", e.what());
if (!Common::FS::RemoveFile(filename)) {
LOG_ERROR(Common_Filesystem, "Failed to delete Vulkan driver pipeline cache file {}",
Common::FS::PathToUTF8String(filename));
}
}
vk::PipelineCache PipelineCache::LoadVulkanPipelineCache(const std::filesystem::path& filename,
u32 expected_cache_version) {
const auto create_pipeline_cache = [this](size_t data_size, const void* data) {
VkPipelineCacheCreateInfo pipeline_cache_ci = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_CACHE_CREATE_INFO,
.pNext = nullptr,
.flags = 0,
.initialDataSize = data_size,
.pInitialData = data};
return device.GetLogical().CreatePipelineCache(pipeline_cache_ci);
};
try {
std::ifstream file(filename, std::ios::binary | std::ios::ate);
if (!file.is_open()) {
return create_pipeline_cache(0, nullptr);
}
file.exceptions(std::ifstream::failbit);
const auto end{file.tellg()};
file.seekg(0, std::ios::beg);
std::array<char, 8> magic_number;
u32 cache_version;
file.read(magic_number.data(), magic_number.size())
.read(reinterpret_cast<char*>(&cache_version), sizeof(cache_version));
if (magic_number != VULKAN_CACHE_MAGIC_NUMBER || cache_version != expected_cache_version) {
file.close();
if (Common::FS::RemoveFile(filename)) {
if (magic_number != VULKAN_CACHE_MAGIC_NUMBER) {
LOG_ERROR(Common_Filesystem, "Invalid Vulkan driver pipeline cache file");
}
if (cache_version != expected_cache_version) {
LOG_INFO(Common_Filesystem, "Deleting old Vulkan driver pipeline cache");
}
} else {
LOG_ERROR(Common_Filesystem,
"Invalid Vulkan pipeline cache file and failed to delete it in \"{}\"",
Common::FS::PathToUTF8String(filename));
}
return create_pipeline_cache(0, nullptr);
}
static constexpr size_t header_size = magic_number.size() + sizeof(cache_version);
const size_t cache_size = static_cast<size_t>(end) - header_size;
std::vector<char> cache_data(cache_size);
file.read(cache_data.data(), cache_size);
LOG_INFO(Render_Vulkan,
"Loaded Vulkan driver pipeline cache: ", Common::FS::PathToUTF8String(filename));
return create_pipeline_cache(cache_size, cache_data.data());
} catch (const std::ios_base::failure& e) {
LOG_ERROR(Common_Filesystem, "{}", e.what());
if (!Common::FS::RemoveFile(filename)) {
LOG_ERROR(Common_Filesystem, "Failed to delete Vulkan driver pipeline cache file {}",
Common::FS::PathToUTF8String(filename));
}
return create_pipeline_cache(0, nullptr);
}
}
} // namespace Vulkan

View File

@@ -135,6 +135,12 @@ private:
PipelineStatistics* statistics,
bool build_in_parallel);
void SerializeVulkanPipelineCache(const std::filesystem::path& filename,
const vk::PipelineCache& pipeline_cache, u32 cache_version);
vk::PipelineCache LoadVulkanPipelineCache(const std::filesystem::path& filename,
u32 expected_cache_version);
const Device& device;
Scheduler& scheduler;
DescriptorPool& descriptor_pool;
@@ -144,6 +150,7 @@ private:
TextureCache& texture_cache;
VideoCore::ShaderNotify& shader_notify;
bool use_asynchronous_shaders{};
bool use_vulkan_pipeline_cache{};
GraphicsPipelineCacheKey graphics_key{};
GraphicsPipeline* current_pipeline{};
@@ -158,6 +165,9 @@ private:
std::filesystem::path pipeline_cache_filename;
std::filesystem::path vulkan_pipeline_cache_filename;
vk::PipelineCache vulkan_pipeline_cache;
Common::ThreadWorker workers;
Common::ThreadWorker serialization_thread;
DynamicFeatures dynamic_features;

View File

@@ -186,6 +186,7 @@ void RasterizerVulkan::PrepareDraw(bool is_indexed, Func&& draw_func) {
SCOPE_EXIT({ gpu.TickWork(); });
FlushWork();
gpu_memory->FlushCaching();
query_cache.UpdateCounters();
@@ -265,35 +266,6 @@ void RasterizerVulkan::DrawIndirect() {
buffer_cache.SetDrawIndirect(nullptr);
}
void RasterizerVulkan::DrawTexture() {
MICROPROFILE_SCOPE(Vulkan_Drawing);
SCOPE_EXIT({ gpu.TickWork(); });
FlushWork();
query_cache.UpdateCounters();
texture_cache.SynchronizeGraphicsDescriptors();
texture_cache.UpdateRenderTargets(false);
UpdateDynamicStates();
const auto& draw_texture_state = maxwell3d->draw_manager->GetDrawTextureState();
const auto& sampler = texture_cache.GetGraphicsSampler(draw_texture_state.src_sampler);
const auto& texture = texture_cache.GetImageView(draw_texture_state.src_texture);
Region2D dst_region = {Offset2D{.x = static_cast<s32>(draw_texture_state.dst_x0),
.y = static_cast<s32>(draw_texture_state.dst_y0)},
Offset2D{.x = static_cast<s32>(draw_texture_state.dst_x1),
.y = static_cast<s32>(draw_texture_state.dst_y1)}};
Region2D src_region = {Offset2D{.x = static_cast<s32>(draw_texture_state.src_x0),
.y = static_cast<s32>(draw_texture_state.src_y0)},
Offset2D{.x = static_cast<s32>(draw_texture_state.src_x1),
.y = static_cast<s32>(draw_texture_state.src_y1)}};
blit_image.BlitColor(texture_cache.GetFramebuffer(), texture.RenderTarget(),
texture.ImageHandle(), sampler->Handle(), dst_region, src_region,
texture.size);
}
void RasterizerVulkan::Clear(u32 layer_count) {
MICROPROFILE_SCOPE(Vulkan_Clearing);
@@ -422,6 +394,7 @@ void RasterizerVulkan::Clear(u32 layer_count) {
void RasterizerVulkan::DispatchCompute() {
FlushWork();
gpu_memory->FlushCaching();
ComputePipeline* const pipeline{pipeline_cache.CurrentComputePipeline()};
if (!pipeline) {
@@ -510,6 +483,27 @@ void RasterizerVulkan::InvalidateRegion(VAddr addr, u64 size, VideoCommon::Cache
}
}
void RasterizerVulkan::InnerInvalidation(std::span<const std::pair<VAddr, std::size_t>> sequences) {
{
std::scoped_lock lock{texture_cache.mutex};
for (const auto& [addr, size] : sequences) {
texture_cache.WriteMemory(addr, size);
}
}
{
std::scoped_lock lock{buffer_cache.mutex};
for (const auto& [addr, size] : sequences) {
buffer_cache.WriteMemory(addr, size);
}
}
{
for (const auto& [addr, size] : sequences) {
query_cache.InvalidateRegion(addr, size);
pipeline_cache.InvalidateRegion(addr, size);
}
}
}
void RasterizerVulkan::OnCPUWrite(VAddr addr, u64 size) {
if (addr == 0 || size == 0) {
return;

View File

@@ -66,7 +66,6 @@ public:
void Draw(bool is_indexed, u32 instance_count) override;
void DrawIndirect() override;
void DrawTexture() override;
void Clear(u32 layer_count) override;
void DispatchCompute() override;
void ResetCounter(VideoCore::QueryType type) override;
@@ -80,6 +79,7 @@ public:
VideoCommon::CacheType which = VideoCommon::CacheType::All) override;
void InvalidateRegion(VAddr addr, u64 size,
VideoCommon::CacheType which = VideoCommon::CacheType::All) override;
void InnerInvalidation(std::span<const std::pair<VAddr, std::size_t>> sequences) override;
void OnCPUWrite(VAddr addr, u64 size) override;
void InvalidateGPUCache() override;
void UnmapMemory(VAddr addr, u64 size) override;

View File

@@ -0,0 +1,205 @@
// SPDX-FileCopyrightText: Copyright 2022 yuzu Emulator Project
// SPDX-License-Identifier: GPL-2.0-or-later
#include "common/literals.h"
#include "video_core/host_shaders/vulkan_turbo_mode_comp_spv.h"
#include "video_core/renderer_vulkan/renderer_vulkan.h"
#include "video_core/renderer_vulkan/vk_shader_util.h"
#include "video_core/renderer_vulkan/vk_turbo_mode.h"
#include "video_core/vulkan_common/vulkan_device.h"
namespace Vulkan {
using namespace Common::Literals;
TurboMode::TurboMode(const vk::Instance& instance, const vk::InstanceDispatch& dld)
: m_device{CreateDevice(instance, dld, VK_NULL_HANDLE)}, m_allocator{m_device, false} {
m_thread = std::jthread([&](auto stop_token) { Run(stop_token); });
}
TurboMode::~TurboMode() = default;
void TurboMode::Run(std::stop_token stop_token) {
auto& dld = m_device.GetLogical();
// Allocate buffer. 2MiB should be sufficient.
auto buffer = dld.CreateBuffer(VkBufferCreateInfo{
.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO,
.pNext = nullptr,
.flags = 0,
.size = 2_MiB,
.usage = VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT,
.sharingMode = VK_SHARING_MODE_EXCLUSIVE,
.queueFamilyIndexCount = 0,
.pQueueFamilyIndices = nullptr,
});
// Commit some device local memory for the buffer.
auto commit = m_allocator.Commit(buffer, MemoryUsage::DeviceLocal);
// Create the descriptor pool to contain our descriptor.
constexpr VkDescriptorPoolSize pool_size{
.type = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
.descriptorCount = 1,
};
auto descriptor_pool = dld.CreateDescriptorPool(VkDescriptorPoolCreateInfo{
.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO,
.pNext = nullptr,
.flags = VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT,
.maxSets = 1,
.poolSizeCount = 1,
.pPoolSizes = &pool_size,
});
// Create the descriptor set layout from the pool.
constexpr VkDescriptorSetLayoutBinding layout_binding{
.binding = 0,
.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
.descriptorCount = 1,
.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT,
.pImmutableSamplers = nullptr,
};
auto descriptor_set_layout = dld.CreateDescriptorSetLayout(VkDescriptorSetLayoutCreateInfo{
.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO,
.pNext = nullptr,
.flags = 0,
.bindingCount = 1,
.pBindings = &layout_binding,
});
// Actually create the descriptor set.
auto descriptor_set = descriptor_pool.Allocate(VkDescriptorSetAllocateInfo{
.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO,
.pNext = nullptr,
.descriptorPool = *descriptor_pool,
.descriptorSetCount = 1,
.pSetLayouts = descriptor_set_layout.address(),
});
// Create the shader.
auto shader = BuildShader(m_device, VULKAN_TURBO_MODE_COMP_SPV);
// Create the pipeline layout.
auto pipeline_layout = dld.CreatePipelineLayout(VkPipelineLayoutCreateInfo{
.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,
.pNext = nullptr,
.flags = 0,
.setLayoutCount = 1,
.pSetLayouts = descriptor_set_layout.address(),
.pushConstantRangeCount = 0,
.pPushConstantRanges = nullptr,
});
// Actually create the pipeline.
const VkPipelineShaderStageCreateInfo shader_stage{
.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,
.pNext = nullptr,
.flags = 0,
.stage = VK_SHADER_STAGE_COMPUTE_BIT,
.module = *shader,
.pName = "main",
.pSpecializationInfo = nullptr,
};
auto pipeline = dld.CreateComputePipeline(VkComputePipelineCreateInfo{
.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO,
.pNext = nullptr,
.flags = 0,
.stage = shader_stage,
.layout = *pipeline_layout,
.basePipelineHandle = VK_NULL_HANDLE,
.basePipelineIndex = 0,
});
// Create a fence to wait on.
auto fence = dld.CreateFence(VkFenceCreateInfo{
.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO,
.pNext = nullptr,
.flags = 0,
});
// Create a command pool to allocate a command buffer from.
auto command_pool = dld.CreateCommandPool(VkCommandPoolCreateInfo{
.sType = VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO,
.pNext = nullptr,
.flags =
VK_COMMAND_POOL_CREATE_TRANSIENT_BIT | VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT,
.queueFamilyIndex = m_device.GetGraphicsFamily(),
});
// Create a single command buffer.
auto cmdbufs = command_pool.Allocate(1, VK_COMMAND_BUFFER_LEVEL_PRIMARY);
auto cmdbuf = vk::CommandBuffer{cmdbufs[0], m_device.GetDispatchLoader()};
while (!stop_token.stop_requested()) {
// Reset the fence.
fence.Reset();
// Update descriptor set.
const VkDescriptorBufferInfo buffer_info{
.buffer = *buffer,
.offset = 0,
.range = VK_WHOLE_SIZE,
};
const VkWriteDescriptorSet buffer_write{
.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.pNext = nullptr,
.dstSet = descriptor_set[0],
.dstBinding = 0,
.dstArrayElement = 0,
.descriptorCount = 1,
.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
.pImageInfo = nullptr,
.pBufferInfo = &buffer_info,
.pTexelBufferView = nullptr,
};
dld.UpdateDescriptorSets(std::array{buffer_write}, {});
// Set up the command buffer.
cmdbuf.Begin(VkCommandBufferBeginInfo{
.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO,
.pNext = nullptr,
.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT,
.pInheritanceInfo = nullptr,
});
// Clear the buffer.
cmdbuf.FillBuffer(*buffer, 0, VK_WHOLE_SIZE, 0);
// Bind descriptor set.
cmdbuf.BindDescriptorSets(VK_PIPELINE_BIND_POINT_COMPUTE, *pipeline_layout, 0,
descriptor_set, {});
// Bind the pipeline.
cmdbuf.BindPipeline(VK_PIPELINE_BIND_POINT_COMPUTE, *pipeline);
// Dispatch.
cmdbuf.Dispatch(64, 64, 1);
// Finish.
cmdbuf.End();
const VkSubmitInfo submit_info{
.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO,
.pNext = nullptr,
.waitSemaphoreCount = 0,
.pWaitSemaphores = nullptr,
.pWaitDstStageMask = nullptr,
.commandBufferCount = 1,
.pCommandBuffers = cmdbuf.address(),
.signalSemaphoreCount = 0,
.pSignalSemaphores = nullptr,
};
m_device.GetGraphicsQueue().Submit(std::array{submit_info}, *fence);
// Wait for completion.
fence.Wait();
}
}
} // namespace Vulkan

View File

@@ -0,0 +1,26 @@
// SPDX-FileCopyrightText: Copyright 2022 yuzu Emulator Project
// SPDX-License-Identifier: GPL-2.0-or-later
#pragma once
#include "common/polyfill_thread.h"
#include "video_core/vulkan_common/vulkan_device.h"
#include "video_core/vulkan_common/vulkan_memory_allocator.h"
#include "video_core/vulkan_common/vulkan_wrapper.h"
namespace Vulkan {
class TurboMode {
public:
explicit TurboMode(const vk::Instance& instance, const vk::InstanceDispatch& dld);
~TurboMode();
private:
void Run(std::stop_token stop_token);
Device m_device;
MemoryAllocator m_allocator;
std::jthread m_thread;
};
} // namespace Vulkan

View File

@@ -148,13 +148,6 @@ typename P::ImageView& TextureCache<P>::GetImageView(ImageViewId id) noexcept {
return slot_image_views[id];
}
template <class P>
typename P::ImageView& TextureCache<P>::GetImageView(u32 index) noexcept {
const auto image_view_id = VisitImageView(channel_state->graphics_image_table,
channel_state->graphics_image_view_ids, index);
return slot_image_views[image_view_id];
}
template <class P>
void TextureCache<P>::MarkModification(ImageId id) noexcept {
MarkModification(slot_images[id]);

View File

@@ -129,9 +129,6 @@ public:
/// Return a reference to the given image view id
[[nodiscard]] ImageView& GetImageView(ImageViewId id) noexcept;
/// Get the imageview from the graphics descriptor table in the specified index
[[nodiscard]] ImageView& GetImageView(u32 index) noexcept;
/// Mark an image as modified from the GPU
void MarkModification(ImageId id) noexcept;

View File

@@ -1472,7 +1472,7 @@ std::vector<const char*> Device::LoadExtensions(bool requires_surface) {
is_patch_list_restart_supported =
primitive_topology_list_restart.primitiveTopologyPatchListRestart;
}
if (has_khr_image_format_list && has_khr_swapchain_mutable_format) {
if (requires_surface && has_khr_image_format_list && has_khr_swapchain_mutable_format) {
extensions.push_back(VK_KHR_IMAGE_FORMAT_LIST_EXTENSION_NAME);
extensions.push_back(VK_KHR_SWAPCHAIN_MUTABLE_FORMAT_EXTENSION_NAME);
khr_swapchain_mutable_format = true;
@@ -1487,6 +1487,9 @@ std::vector<const char*> Device::LoadExtensions(bool requires_surface) {
max_push_descriptors = push_descriptor.maxPushDescriptors;
}
has_null_descriptor = true;
return extensions;
}

View File

@@ -397,6 +397,10 @@ public:
return must_emulate_bgr565;
}
bool HasNullDescriptor() const {
return has_null_descriptor;
}
u32 GetMaxVertexInputAttributes() const {
return max_vertex_input_attributes;
}
@@ -511,6 +515,7 @@ private:
bool supports_d24_depth{}; ///< Supports D24 depth buffers.
bool cant_blit_msaa{}; ///< Does not support MSAA<->MSAA blitting.
bool must_emulate_bgr565{}; ///< Emulates BGR565 by swizzling RGB565 format.
bool has_null_descriptor{}; ///< Has support for null descriptors.
u32 max_vertex_input_attributes{}; ///< Max vertex input attributes in pipeline
u32 max_vertex_input_bindings{}; ///< Max vertex input buffers in pipeline

View File

@@ -32,7 +32,7 @@
namespace Vulkan {
namespace {
[[nodiscard]] std::vector<const char*> RequiredExtensions(
Core::Frontend::WindowSystemType window_type, bool enable_debug_utils) {
Core::Frontend::WindowSystemType window_type, bool enable_validation) {
std::vector<const char*> extensions;
extensions.reserve(6);
switch (window_type) {
@@ -65,7 +65,7 @@ namespace {
if (window_type != Core::Frontend::WindowSystemType::Headless) {
extensions.push_back(VK_KHR_SURFACE_EXTENSION_NAME);
}
if (enable_debug_utils) {
if (enable_validation) {
extensions.push_back(VK_EXT_DEBUG_UTILS_EXTENSION_NAME);
}
extensions.push_back(VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2_EXTENSION_NAME);
@@ -95,9 +95,9 @@ namespace {
return true;
}
[[nodiscard]] std::vector<const char*> Layers(bool enable_layers) {
[[nodiscard]] std::vector<const char*> Layers(bool enable_validation) {
std::vector<const char*> layers;
if (enable_layers) {
if (enable_validation) {
layers.push_back("VK_LAYER_KHRONOS_validation");
}
return layers;
@@ -125,7 +125,7 @@ void RemoveUnavailableLayers(const vk::InstanceDispatch& dld, std::vector<const
vk::Instance CreateInstance(const Common::DynamicLibrary& library, vk::InstanceDispatch& dld,
u32 required_version, Core::Frontend::WindowSystemType window_type,
bool enable_debug_utils, bool enable_layers) {
bool enable_validation) {
if (!library.IsOpen()) {
LOG_ERROR(Render_Vulkan, "Vulkan library not available");
throw vk::Exception(VK_ERROR_INITIALIZATION_FAILED);
@@ -138,11 +138,11 @@ vk::Instance CreateInstance(const Common::DynamicLibrary& library, vk::InstanceD
LOG_ERROR(Render_Vulkan, "Failed to load Vulkan function pointers");
throw vk::Exception(VK_ERROR_INITIALIZATION_FAILED);
}
const std::vector<const char*> extensions = RequiredExtensions(window_type, enable_debug_utils);
const std::vector<const char*> extensions = RequiredExtensions(window_type, enable_validation);
if (!AreExtensionsSupported(dld, extensions)) {
throw vk::Exception(VK_ERROR_EXTENSION_NOT_PRESENT);
}
std::vector<const char*> layers = Layers(enable_layers);
std::vector<const char*> layers = Layers(enable_validation);
RemoveUnavailableLayers(dld, layers);
const u32 available_version = vk::AvailableVersion(dld);

View File

@@ -17,8 +17,7 @@ namespace Vulkan {
* @param dld Dispatch table to load function pointers into
* @param required_version Required Vulkan version (for example, VK_API_VERSION_1_1)
* @param window_type Window system type's enabled extension
* @param enable_debug_utils Whether to enable VK_EXT_debug_utils_extension_name or not
* @param enable_layers Whether to enable Vulkan validation layers or not
* @param enable_validation Whether to enable Vulkan validation layers or not
*
* @return A new Vulkan instance
* @throw vk::Exception on failure
@@ -26,6 +25,6 @@ namespace Vulkan {
[[nodiscard]] vk::Instance CreateInstance(
const Common::DynamicLibrary& library, vk::InstanceDispatch& dld, u32 required_version,
Core::Frontend::WindowSystemType window_type = Core::Frontend::WindowSystemType::Headless,
bool enable_debug_utils = false, bool enable_layers = false);
bool enable_validation = false);
} // namespace Vulkan

View File

@@ -152,6 +152,7 @@ void Load(VkDevice device, DeviceDispatch& dld) noexcept {
X(vkCreateGraphicsPipelines);
X(vkCreateImage);
X(vkCreateImageView);
X(vkCreatePipelineCache);
X(vkCreatePipelineLayout);
X(vkCreateQueryPool);
X(vkCreateRenderPass);
@@ -171,6 +172,7 @@ void Load(VkDevice device, DeviceDispatch& dld) noexcept {
X(vkDestroyImage);
X(vkDestroyImageView);
X(vkDestroyPipeline);
X(vkDestroyPipelineCache);
X(vkDestroyPipelineLayout);
X(vkDestroyQueryPool);
X(vkDestroyRenderPass);
@@ -188,6 +190,7 @@ void Load(VkDevice device, DeviceDispatch& dld) noexcept {
X(vkGetEventStatus);
X(vkGetFenceStatus);
X(vkGetImageMemoryRequirements);
X(vkGetPipelineCacheData);
X(vkGetMemoryFdKHR);
#ifdef _WIN32
X(vkGetMemoryWin32HandleKHR);
@@ -431,6 +434,10 @@ void Destroy(VkDevice device, VkPipeline handle, const DeviceDispatch& dld) noex
dld.vkDestroyPipeline(device, handle, nullptr);
}
void Destroy(VkDevice device, VkPipelineCache handle, const DeviceDispatch& dld) noexcept {
dld.vkDestroyPipelineCache(device, handle, nullptr);
}
void Destroy(VkDevice device, VkPipelineLayout handle, const DeviceDispatch& dld) noexcept {
dld.vkDestroyPipelineLayout(device, handle, nullptr);
}
@@ -651,6 +658,10 @@ void ShaderModule::SetObjectNameEXT(const char* name) const {
SetObjectName(dld, owner, handle, VK_OBJECT_TYPE_SHADER_MODULE, name);
}
void PipelineCache::SetObjectNameEXT(const char* name) const {
SetObjectName(dld, owner, handle, VK_OBJECT_TYPE_PIPELINE_CACHE, name);
}
void Semaphore::SetObjectNameEXT(const char* name) const {
SetObjectName(dld, owner, handle, VK_OBJECT_TYPE_SEMAPHORE, name);
}
@@ -746,21 +757,29 @@ DescriptorSetLayout Device::CreateDescriptorSetLayout(
return DescriptorSetLayout(object, handle, *dld);
}
PipelineCache Device::CreatePipelineCache(const VkPipelineCacheCreateInfo& ci) const {
VkPipelineCache cache;
Check(dld->vkCreatePipelineCache(handle, &ci, nullptr, &cache));
return PipelineCache(cache, handle, *dld);
}
PipelineLayout Device::CreatePipelineLayout(const VkPipelineLayoutCreateInfo& ci) const {
VkPipelineLayout object;
Check(dld->vkCreatePipelineLayout(handle, &ci, nullptr, &object));
return PipelineLayout(object, handle, *dld);
}
Pipeline Device::CreateGraphicsPipeline(const VkGraphicsPipelineCreateInfo& ci) const {
Pipeline Device::CreateGraphicsPipeline(const VkGraphicsPipelineCreateInfo& ci,
VkPipelineCache cache) const {
VkPipeline object;
Check(dld->vkCreateGraphicsPipelines(handle, nullptr, 1, &ci, nullptr, &object));
Check(dld->vkCreateGraphicsPipelines(handle, cache, 1, &ci, nullptr, &object));
return Pipeline(object, handle, *dld);
}
Pipeline Device::CreateComputePipeline(const VkComputePipelineCreateInfo& ci) const {
Pipeline Device::CreateComputePipeline(const VkComputePipelineCreateInfo& ci,
VkPipelineCache cache) const {
VkPipeline object;
Check(dld->vkCreateComputePipelines(handle, nullptr, 1, &ci, nullptr, &object));
Check(dld->vkCreateComputePipelines(handle, cache, 1, &ci, nullptr, &object));
return Pipeline(object, handle, *dld);
}

View File

@@ -270,6 +270,7 @@ struct DeviceDispatch : InstanceDispatch {
PFN_vkCreateGraphicsPipelines vkCreateGraphicsPipelines{};
PFN_vkCreateImage vkCreateImage{};
PFN_vkCreateImageView vkCreateImageView{};
PFN_vkCreatePipelineCache vkCreatePipelineCache{};
PFN_vkCreatePipelineLayout vkCreatePipelineLayout{};
PFN_vkCreateQueryPool vkCreateQueryPool{};
PFN_vkCreateRenderPass vkCreateRenderPass{};
@@ -289,6 +290,7 @@ struct DeviceDispatch : InstanceDispatch {
PFN_vkDestroyImage vkDestroyImage{};
PFN_vkDestroyImageView vkDestroyImageView{};
PFN_vkDestroyPipeline vkDestroyPipeline{};
PFN_vkDestroyPipelineCache vkDestroyPipelineCache{};
PFN_vkDestroyPipelineLayout vkDestroyPipelineLayout{};
PFN_vkDestroyQueryPool vkDestroyQueryPool{};
PFN_vkDestroyRenderPass vkDestroyRenderPass{};
@@ -306,6 +308,7 @@ struct DeviceDispatch : InstanceDispatch {
PFN_vkGetEventStatus vkGetEventStatus{};
PFN_vkGetFenceStatus vkGetFenceStatus{};
PFN_vkGetImageMemoryRequirements vkGetImageMemoryRequirements{};
PFN_vkGetPipelineCacheData vkGetPipelineCacheData{};
PFN_vkGetMemoryFdKHR vkGetMemoryFdKHR{};
#ifdef _WIN32
PFN_vkGetMemoryWin32HandleKHR vkGetMemoryWin32HandleKHR{};
@@ -351,6 +354,7 @@ void Destroy(VkDevice, VkFramebuffer, const DeviceDispatch&) noexcept;
void Destroy(VkDevice, VkImage, const DeviceDispatch&) noexcept;
void Destroy(VkDevice, VkImageView, const DeviceDispatch&) noexcept;
void Destroy(VkDevice, VkPipeline, const DeviceDispatch&) noexcept;
void Destroy(VkDevice, VkPipelineCache, const DeviceDispatch&) noexcept;
void Destroy(VkDevice, VkPipelineLayout, const DeviceDispatch&) noexcept;
void Destroy(VkDevice, VkQueryPool, const DeviceDispatch&) noexcept;
void Destroy(VkDevice, VkRenderPass, const DeviceDispatch&) noexcept;
@@ -773,6 +777,18 @@ public:
void SetObjectNameEXT(const char* name) const;
};
class PipelineCache : public Handle<VkPipelineCache, VkDevice, DeviceDispatch> {
using Handle<VkPipelineCache, VkDevice, DeviceDispatch>::Handle;
public:
/// Set object name.
void SetObjectNameEXT(const char* name) const;
VkResult Read(size_t* size, void* data) const noexcept {
return dld->vkGetPipelineCacheData(owner, handle, size, data);
}
};
class Semaphore : public Handle<VkSemaphore, VkDevice, DeviceDispatch> {
using Handle<VkSemaphore, VkDevice, DeviceDispatch>::Handle;
@@ -844,11 +860,15 @@ public:
DescriptorSetLayout CreateDescriptorSetLayout(const VkDescriptorSetLayoutCreateInfo& ci) const;
PipelineCache CreatePipelineCache(const VkPipelineCacheCreateInfo& ci) const;
PipelineLayout CreatePipelineLayout(const VkPipelineLayoutCreateInfo& ci) const;
Pipeline CreateGraphicsPipeline(const VkGraphicsPipelineCreateInfo& ci) const;
Pipeline CreateGraphicsPipeline(const VkGraphicsPipelineCreateInfo& ci,
VkPipelineCache cache = nullptr) const;
Pipeline CreateComputePipeline(const VkComputePipelineCreateInfo& ci) const;
Pipeline CreateComputePipeline(const VkComputePipelineCreateInfo& ci,
VkPipelineCache cache = nullptr) const;
Sampler CreateSampler(const VkSamplerCreateInfo& ci) const;

View File

@@ -690,6 +690,7 @@ void Config::ReadRendererValues() {
qt_config->beginGroup(QStringLiteral("Renderer"));
ReadGlobalSetting(Settings::values.renderer_backend);
ReadGlobalSetting(Settings::values.renderer_force_max_clock);
ReadGlobalSetting(Settings::values.vulkan_device);
ReadGlobalSetting(Settings::values.fullscreen_mode);
ReadGlobalSetting(Settings::values.aspect_ratio);
@@ -709,6 +710,7 @@ void Config::ReadRendererValues() {
ReadGlobalSetting(Settings::values.use_asynchronous_shaders);
ReadGlobalSetting(Settings::values.use_fast_gpu_time);
ReadGlobalSetting(Settings::values.use_pessimistic_flushes);
ReadGlobalSetting(Settings::values.use_vulkan_driver_pipeline_cache);
ReadGlobalSetting(Settings::values.bg_red);
ReadGlobalSetting(Settings::values.bg_green);
ReadGlobalSetting(Settings::values.bg_blue);
@@ -1305,6 +1307,9 @@ void Config::SaveRendererValues() {
static_cast<u32>(Settings::values.renderer_backend.GetValue(global)),
static_cast<u32>(Settings::values.renderer_backend.GetDefault()),
Settings::values.renderer_backend.UsingGlobal());
WriteSetting(QString::fromStdString(Settings::values.renderer_force_max_clock.GetLabel()),
static_cast<u32>(Settings::values.renderer_force_max_clock.GetValue(global)),
static_cast<u32>(Settings::values.renderer_force_max_clock.GetDefault()));
WriteGlobalSetting(Settings::values.vulkan_device);
WriteSetting(QString::fromStdString(Settings::values.fullscreen_mode.GetLabel()),
static_cast<u32>(Settings::values.fullscreen_mode.GetValue(global)),
@@ -1348,6 +1353,7 @@ void Config::SaveRendererValues() {
WriteGlobalSetting(Settings::values.use_asynchronous_shaders);
WriteGlobalSetting(Settings::values.use_fast_gpu_time);
WriteGlobalSetting(Settings::values.use_pessimistic_flushes);
WriteGlobalSetting(Settings::values.use_vulkan_driver_pipeline_cache);
WriteGlobalSetting(Settings::values.bg_red);
WriteGlobalSetting(Settings::values.bg_green);
WriteGlobalSetting(Settings::values.bg_blue);

View File

@@ -25,10 +25,13 @@ void ConfigureGraphicsAdvanced::SetConfiguration() {
ui->use_asynchronous_shaders->setEnabled(runtime_lock);
ui->anisotropic_filtering_combobox->setEnabled(runtime_lock);
ui->renderer_force_max_clock->setChecked(Settings::values.renderer_force_max_clock.GetValue());
ui->use_vsync->setChecked(Settings::values.use_vsync.GetValue());
ui->use_asynchronous_shaders->setChecked(Settings::values.use_asynchronous_shaders.GetValue());
ui->use_fast_gpu_time->setChecked(Settings::values.use_fast_gpu_time.GetValue());
ui->use_pessimistic_flushes->setChecked(Settings::values.use_pessimistic_flushes.GetValue());
ui->use_vulkan_driver_pipeline_cache->setChecked(
Settings::values.use_vulkan_driver_pipeline_cache.GetValue());
if (Settings::IsConfiguringGlobal()) {
ui->gpu_accuracy->setCurrentIndex(
@@ -37,6 +40,8 @@ void ConfigureGraphicsAdvanced::SetConfiguration() {
Settings::values.max_anisotropy.GetValue());
} else {
ConfigurationShared::SetPerGameSetting(ui->gpu_accuracy, &Settings::values.gpu_accuracy);
ConfigurationShared::SetPerGameSetting(ui->renderer_force_max_clock,
&Settings::values.renderer_force_max_clock);
ConfigurationShared::SetPerGameSetting(ui->anisotropic_filtering_combobox,
&Settings::values.max_anisotropy);
ConfigurationShared::SetHighlight(ui->label_gpu_accuracy,
@@ -48,6 +53,9 @@ void ConfigureGraphicsAdvanced::SetConfiguration() {
void ConfigureGraphicsAdvanced::ApplyConfiguration() {
ConfigurationShared::ApplyPerGameSetting(&Settings::values.gpu_accuracy, ui->gpu_accuracy);
ConfigurationShared::ApplyPerGameSetting(&Settings::values.renderer_force_max_clock,
ui->renderer_force_max_clock,
renderer_force_max_clock);
ConfigurationShared::ApplyPerGameSetting(&Settings::values.max_anisotropy,
ui->anisotropic_filtering_combobox);
ConfigurationShared::ApplyPerGameSetting(&Settings::values.use_vsync, ui->use_vsync, use_vsync);
@@ -58,6 +66,9 @@ void ConfigureGraphicsAdvanced::ApplyConfiguration() {
ui->use_fast_gpu_time, use_fast_gpu_time);
ConfigurationShared::ApplyPerGameSetting(&Settings::values.use_pessimistic_flushes,
ui->use_pessimistic_flushes, use_pessimistic_flushes);
ConfigurationShared::ApplyPerGameSetting(&Settings::values.use_vulkan_driver_pipeline_cache,
ui->use_vulkan_driver_pipeline_cache,
use_vulkan_driver_pipeline_cache);
}
void ConfigureGraphicsAdvanced::changeEvent(QEvent* event) {
@@ -76,18 +87,25 @@ void ConfigureGraphicsAdvanced::SetupPerGameUI() {
// Disable if not global (only happens during game)
if (Settings::IsConfiguringGlobal()) {
ui->gpu_accuracy->setEnabled(Settings::values.gpu_accuracy.UsingGlobal());
ui->renderer_force_max_clock->setEnabled(
Settings::values.renderer_force_max_clock.UsingGlobal());
ui->use_vsync->setEnabled(Settings::values.use_vsync.UsingGlobal());
ui->use_asynchronous_shaders->setEnabled(
Settings::values.use_asynchronous_shaders.UsingGlobal());
ui->use_fast_gpu_time->setEnabled(Settings::values.use_fast_gpu_time.UsingGlobal());
ui->use_pessimistic_flushes->setEnabled(
Settings::values.use_pessimistic_flushes.UsingGlobal());
ui->use_vulkan_driver_pipeline_cache->setEnabled(
Settings::values.use_vulkan_driver_pipeline_cache.UsingGlobal());
ui->anisotropic_filtering_combobox->setEnabled(
Settings::values.max_anisotropy.UsingGlobal());
return;
}
ConfigurationShared::SetColoredTristate(ui->renderer_force_max_clock,
Settings::values.renderer_force_max_clock,
renderer_force_max_clock);
ConfigurationShared::SetColoredTristate(ui->use_vsync, Settings::values.use_vsync, use_vsync);
ConfigurationShared::SetColoredTristate(ui->use_asynchronous_shaders,
Settings::values.use_asynchronous_shaders,
@@ -97,6 +115,9 @@ void ConfigureGraphicsAdvanced::SetupPerGameUI() {
ConfigurationShared::SetColoredTristate(ui->use_pessimistic_flushes,
Settings::values.use_pessimistic_flushes,
use_pessimistic_flushes);
ConfigurationShared::SetColoredTristate(ui->use_vulkan_driver_pipeline_cache,
Settings::values.use_vulkan_driver_pipeline_cache,
use_vulkan_driver_pipeline_cache);
ConfigurationShared::SetColoredComboBox(
ui->gpu_accuracy, ui->label_gpu_accuracy,
static_cast<int>(Settings::values.gpu_accuracy.GetValue(true)));

View File

@@ -36,10 +36,12 @@ private:
std::unique_ptr<Ui::ConfigureGraphicsAdvanced> ui;
ConfigurationShared::CheckState renderer_force_max_clock;
ConfigurationShared::CheckState use_vsync;
ConfigurationShared::CheckState use_asynchronous_shaders;
ConfigurationShared::CheckState use_fast_gpu_time;
ConfigurationShared::CheckState use_pessimistic_flushes;
ConfigurationShared::CheckState use_vulkan_driver_pipeline_cache;
const Core::System& system;
};

View File

@@ -69,6 +69,16 @@
</layout>
</widget>
</item>
<item>
<widget class="QCheckBox" name="renderer_force_max_clock">
<property name="toolTip">
<string>Runs work in the background while waiting for graphics commands to keep the GPU from lowering its clock speed.</string>
</property>
<property name="text">
<string>Force maximum clocks (Vulkan only)</string>
</property>
</widget>
</item>
<item>
<widget class="QCheckBox" name="use_vsync">
<property name="toolTip">
@@ -109,6 +119,16 @@
</property>
</widget>
</item>
<item>
<widget class="QCheckBox" name="use_vulkan_driver_pipeline_cache">
<property name="toolTip">
<string>Enables GPU vendor-specific pipeline cache. This option can improve shader loading time significantly in cases where the Vulkan driver does not store pipeline cache files internally.</string>
</property>
<property name="text">
<string>Use Vulkan pipeline cache</string>
</property>
</widget>
</item>
<item>
<widget class="QWidget" name="af_layout" native="true">
<layout class="QHBoxLayout" name="horizontalLayout_1">

View File

@@ -2229,8 +2229,10 @@ void GMainWindow::OnGameListRemoveFile(u64 program_id, GameListRemoveTarget targ
}
switch (target) {
case GameListRemoveTarget::GlShaderCache:
case GameListRemoveTarget::VkShaderCache:
RemoveVulkanDriverPipelineCache(program_id);
[[fallthrough]];
case GameListRemoveTarget::GlShaderCache:
RemoveTransferableShaderCache(program_id, target);
break;
case GameListRemoveTarget::AllShaderCache:
@@ -2271,6 +2273,22 @@ void GMainWindow::RemoveTransferableShaderCache(u64 program_id, GameListRemoveTa
}
}
void GMainWindow::RemoveVulkanDriverPipelineCache(u64 program_id) {
static constexpr std::string_view target_file_name = "vulkan_pipelines.bin";
const auto shader_cache_dir = Common::FS::GetYuzuPath(Common::FS::YuzuPath::ShaderDir);
const auto shader_cache_folder_path = shader_cache_dir / fmt::format("{:016x}", program_id);
const auto target_file = shader_cache_folder_path / target_file_name;
if (!Common::FS::Exists(target_file)) {
return;
}
if (!Common::FS::RemoveFile(target_file)) {
QMessageBox::warning(this, tr("Error Removing Vulkan Driver Pipeline Cache"),
tr("Failed to remove the driver pipeline cache."));
}
}
void GMainWindow::RemoveAllTransferableShaderCaches(u64 program_id) {
const auto shader_cache_dir = Common::FS::GetYuzuPath(Common::FS::YuzuPath::ShaderDir);
const auto program_shader_cache_dir = shader_cache_dir / fmt::format("{:016x}", program_id);

View File

@@ -347,6 +347,7 @@ private:
void RemoveUpdateContent(u64 program_id, InstalledEntryType type);
void RemoveAddOnContent(u64 program_id, InstalledEntryType type);
void RemoveTransferableShaderCache(u64 program_id, GameListRemoveTarget target);
void RemoveVulkanDriverPipelineCache(u64 program_id);
void RemoveAllTransferableShaderCaches(u64 program_id);
void RemoveCustomConfiguration(u64 program_id, const std::string& game_path);
std::optional<u64> SelectRomFSDumpTarget(const FileSys::ContentProvider&, u64 program_id);

View File

@@ -296,6 +296,7 @@ void Config::ReadValues() {
// Renderer
ReadSetting("Renderer", Settings::values.renderer_backend);
ReadSetting("Renderer", Settings::values.renderer_force_max_clock);
ReadSetting("Renderer", Settings::values.renderer_debug);
ReadSetting("Renderer", Settings::values.renderer_shader_feedback);
ReadSetting("Renderer", Settings::values.enable_nsight_aftermath);
@@ -321,6 +322,7 @@ void Config::ReadValues() {
ReadSetting("Renderer", Settings::values.accelerate_astc);
ReadSetting("Renderer", Settings::values.use_fast_gpu_time);
ReadSetting("Renderer", Settings::values.use_pessimistic_flushes);
ReadSetting("Renderer", Settings::values.use_vulkan_driver_pipeline_cache);
ReadSetting("Renderer", Settings::values.bg_red);
ReadSetting("Renderer", Settings::values.bg_green);