Compare commits

...

133 Commits

Author SHA1 Message Date
Morph1984
ec95c73a12 remove <f32>
We can remove this since its already a f32 value
2019-09-03 23:20:19 -04:00
Morph1984
58783b8a46 explicitly represent 1 as a float (1.0f instead of 1) 2019-09-03 23:06:32 -04:00
Morph1984
b1ca56bed2 Change u32 -> f32
Volume is a f32 value. (SwIPC describes it as a u32, but it is actually f32 as corroborated by switchbrew docs and SetAudioDeviceOutputVolume)

 ```cpp
const f32 volume = rp.Pop<f32>();
```
2019-09-03 22:30:20 -04:00
Morph1984
ba661c8d9a service/audio/audren_u: Stub IAudioDevice::GetAudioDeviceOutputVolume 2019-09-03 16:05:33 -04:00
bunnei
50b5bb44a0 Merge pull request #2765 from FernandoS27/dma-fix
MaxwellDMA: Fixes, corrections and relaxations.
2019-09-01 13:13:05 -04:00
Rodrigo Locatti
4d4f9cc104 video_core: Silent miscellaneous warnings (#2820)
* texture_cache/surface_params: Remove unused local variable

* rasterizer_interface: Add missing documentation commentary

* maxwell_dma: Remove unused rasterizer reference

* video_core/gpu: Sort member declaration order to silent -Wreorder warning

* fermi_2d: Remove unused MemoryManager reference

* video_core: Silent unused variable warnings

* buffer_cache: Silent -Wreorder warnings

* kepler_memory: Remove unused MemoryManager reference

* gl_texture_cache: Add missing override

* buffer_cache: Add missing include

* shader/decode: Remove unused variables
2019-08-30 14:08:00 -04:00
Fernando Sahmkow
67cc2d5046 Merge pull request #2819 from ReinUsesLisp/fixup-clang
gl_buffer_cache: Add missing include
2019-08-29 18:13:35 -04:00
ReinUsesLisp
878adee0a3 gl_buffer_cache: Add missing include
RasterizerInterface was considered an incomplete object by clang.
2019-08-29 22:02:52 +00:00
bunnei
a67c4e6e02 Merge pull request #2742 from ReinUsesLisp/fix-texture-buffers
gl_texture_cache: Miscellaneous texture buffer fixes
2019-08-29 15:59:17 -04:00
James Rowe
f0c75573b1 Revert "externals: Update FMT to 6.0.0"
This reverts commit ca4ca8a6dc.
2019-08-29 12:23:34 -06:00
Ethan
ca4ca8a6dc externals: Update FMT to 6.0.0 2019-08-29 19:37:46 +02:00
bunnei
e424615839 Merge pull request #2783 from FernandoS27/new-buffer-cache
Implement a New LLE Buffer Cache
2019-08-29 13:07:01 -04:00
bunnei
f8cc5668f8 Merge pull request #2758 from ReinUsesLisp/packed-tid
shader/decode: Implement S2R Tic
2019-08-29 12:58:43 -04:00
bunnei
680ab61327 Merge pull request #2786 from ReinUsesLisp/vote
shader_ir: Implement VOTE on Nvidia drivers
2019-08-29 12:58:10 -04:00
ReinUsesLisp
4e35177e23 shader_ir: Implement VOTE
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics

Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.

To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:

* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true

ballotARB, also known as "uint64_t(activeThreadsNV())", emits

VOTE.ANY Rd, PT, PT;

on nouveau's compiler. This doesn't match exactly to Nvidia's code

VOTE.ALL Rd, PT, PT;

Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
2019-08-21 14:50:38 -03:00
Fernando Sahmkow
83ec2091c1 Buffer Cache: Adress Feedback. 2019-08-21 12:14:27 -04:00
Fernando Sahmkow
6ce2c85047 Buffer_Cache: Implement flushing. 2019-08-21 12:14:26 -04:00
Fernando Sahmkow
de8ff8a1c6 Buffer_Cache: Implement barriers. 2019-08-21 12:14:25 -04:00
Fernando Sahmkow
286f4c446a Buffer_Cache: Optimize and track written areas. 2019-08-21 12:14:25 -04:00
Fernando Sahmkow
5f4b746a1e BufferCache: Rework mapping caching. 2019-08-21 12:14:24 -04:00
Fernando Sahmkow
86d8563314 Buffer_Cache: Fixes and optimizations. 2019-08-21 12:14:23 -04:00
Fernando Sahmkow
862bec001b Video_Core: Implement a new Buffer Cache 2019-08-21 12:14:22 -04:00
bunnei
b4a8cfbd00 Merge pull request #2748 from FernandoS27/align-memory
VM_Manager: Align allocated host physical memory to 256bytes
2019-08-21 12:10:10 -04:00
bunnei
d654b3d82e Merge pull request #2769 from FernandoS27/commands-flush
GPU: Flush commands on every dma pusher step.
2019-08-21 10:29:56 -04:00
bunnei
dfdd20142e Merge pull request #2777 from ReinUsesLisp/hsetp2-fe3h-fix
half_set_predicate: Fix HSETP2_C constant buffer offset
2019-08-21 10:29:17 -04:00
bunnei
cedc1aab4a Merge pull request #2753 from FernandoS27/float-convert
Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.
2019-08-21 10:27:57 -04:00
bunnei
74a7ce1df7 Merge pull request #2773 from lioncash/test-unused
yuzu-tester/yuzu: Remove unused variable
2019-08-21 10:27:29 -04:00
bunnei
ef584f1a3a Merge pull request #2747 from lioncash/audio
service/audren_u: Unstub ListAudioDeviceName
2019-08-18 09:08:25 -04:00
bunnei
ca61e298b3 Merge pull request #2778 from ReinUsesLisp/nop
shader_ir: Implement NOP
2019-08-18 08:51:34 -04:00
bunnei
87bbefe55f Merge pull request #2768 from ReinUsesLisp/hsetp2-fix
decode/half_set_predicate: Fix predicates
2019-08-18 08:50:54 -04:00
James Rowe
93abe1ccf3 Merge pull request #2789 from jroweboy/quickfix
Fixup! #2772 missed this one file
2019-08-16 21:47:20 -06:00
James Rowe
509734d818 Fixup! #2772 missed this one file 2019-08-16 21:24:17 -06:00
James Rowe
e2392fe46f Merge pull request #2766 from FearlessTobi/port-4849
Port citra-emu/citra#4849: "Qt: Fixed behaviour of buttons by connecting functors to correct signals"
2019-08-16 19:39:05 -06:00
James Rowe
0e9e166d85 Merge pull request #2772 from lioncash/ui
yuzu/CMakeLists: Remove qt5_wrap_ui macro usage
2019-08-16 19:37:35 -06:00
Lioncash
5980aa1e51 yuzu/CMakeLists: Remove qt5_wrap_ui macro usage
We can simply enable CMAKE_AUTOUIC and let CMake take care of handling
the UI code generation for targets.

As part of letting CMake automatically handle the header file parsing,
we must not name includes with "ui_*" unless they're related to the
output of the Qt UIC compiler. Because of this, we need to rename
ui_settings, given it would conflict with this restriction.
2019-08-09 17:54:08 -04:00
ReinUsesLisp
2ff8044806 shader_ir: Implement NOP 2019-08-04 03:02:55 -03:00
ReinUsesLisp
ec0da3ef64 half_set_predicate: Fix HSETP2_C constant buffer offset 2019-08-04 02:50:55 -03:00
Silent
221250d922 Qt: Fixed behaviour of buttons by connecting functors to correct signals
Following screens got fixes:
- Configure/Debug
- Configure/Input
2019-08-02 04:09:38 +02:00
Flame Sage
978f7067ee Merge pull request #2770 from DarkLordZach/azure-pr-fix
ci: Fix Azure PR Builds
2019-08-01 22:04:21 -04:00
Zach Hilman
9aef7e5e22 Correct apt permissions 2019-08-01 21:33:53 -04:00
Zach Hilman
6b2937bf76 Upgrade PIP version with APT 2019-08-01 21:29:27 -04:00
Zach Hilman
a2d2a6b6dd Upgrade pip version 2019-08-01 21:23:17 -04:00
Zach Hilman
d3ea2df06d Add missing dot 2019-08-01 21:09:11 -04:00
Lioncash
6e11cfcdf0 yuzu-tester/yuzu: Correct format string
Prevents an invalid formatting exception from being thrown.
2019-07-29 20:55:48 -04:00
Lioncash
a0ee10b114 yuzu-tester/yuzu: Remove unused variable
Gets rid of a compilation warning.
2019-07-29 20:50:33 -04:00
Zach Hilman
bcbec6f37c ci: Fix Azure PR Builds 2019-07-28 14:21:18 -04:00
Fernando Sahmkow
e52c895559 GPU: Flush commands on every dma pusher step.
This commit ensures that the host gpu is constantly fed with commands to
work with, while the guest gpu keeps producing the rest of the commands.
This reduces syncing time between host and guest gpu.
2019-07-26 16:54:22 -04:00
bunnei
52f54c728d Merge pull request #2592 from FernandoS27/sync1
Implement GPU Synchronization Mechanisms & Correct NVFlinger
2019-07-26 14:26:44 -04:00
ReinUsesLisp
77f1a676a1 decode/half_set_predicate: Fix predicates 2019-07-26 00:12:38 -03:00
Fernando Sahmkow
a452ff983d MaxwellDMA: Fixes, corrections and relaxations.
This commit fixes offsets on Linear -> Tiled copies, corrects z pos
fortiled->linear copies, corrects bytes_per_pixel calculation in tiled
-> linear copies and relaxes some limitations set by latest dma fixes
refactors.
2019-07-25 20:41:42 -04:00
bunnei
b0ff3179ef Merge pull request #2739 from lioncash/cflow
video_core/control_flow: Minor changes/warning cleanup
2019-07-25 13:04:56 -04:00
bunnei
4d26550f5f Merge pull request #2737 from FernandoS27/track-fix
Shader_Ir: Correct tracking to track from right to left
2019-07-25 12:41:52 -04:00
bunnei
ccbc554949 Merge pull request #2689 from lioncash/tl
yuzu/main: Make error messages within OnCoreError more localization-friendly
2019-07-25 12:35:07 -04:00
bunnei
31e8a61527 Merge pull request #2743 from FernandoS27/surpress-assert
Downgrade and suppress a series of GPU asserts and debug messages.
2019-07-25 12:34:36 -04:00
bunnei
9be9600bdc Merge pull request #2704 from FernandoS27/conditional
maxwell3d: Implement Conditional Rendering
2019-07-24 17:07:57 -04:00
Zach Hilman
12514ccd35 Fix README change mistake (#2754)
Fix README change mistake
2019-07-24 16:42:33 -04:00
ReinUsesLisp
104641db07 shader/decode: Implement S2R Tic 2019-07-22 16:16:10 -03:00
bunnei
f601f25bcc Merge pull request #2734 from ReinUsesLisp/compute-shaders
gl_rasterizer: Implement compute shaders
2019-07-22 11:12:55 -04:00
bunnei
27e10e0442 Merge pull request #2735 from FernandoS27/pipeline-rework
Rework Dirty Flags in GPU Pipeline, Optimize CBData and Redo Clearing mechanism
2019-07-21 00:59:52 -04:00
Zach Hilman
6738fb5fef Update README.md 2019-07-20 21:03:30 -04:00
Fernando Sahmkow
7a35178ee2 Maxwell3D: Reorganize and address feedback 2019-07-20 10:18:35 -04:00
Fernando Sahmkow
1158777737 Shader_Ir: Change Debug Asserts for Log Warnings 2019-07-19 22:15:34 -04:00
Fernando Sahmkow
febb88efc4 Common/Alignment: Add noexcept where required. 2019-07-19 21:49:54 -04:00
Fernando Sahmkow
024b5fe91a Kernel: Address Feedback 2019-07-19 11:28:57 -04:00
Fernando Sahmkow
0901c33753 Common: Correct alignment allocator to work on C++14 or higher. 2019-07-19 11:11:42 -04:00
Fernando Sahmkow
9bede4eeed VM_Manager: Align allocated memory to 256bytes
This commit ensures that all backing memory allocated for the Guest CPU
is aligned to 256 bytes. This due to how gpu memory works and the heavy
constraints it has in the alignment of physical memory.
2019-07-19 10:06:08 -04:00
Lioncash
16730c4c43 service/audren_u: Handle audio USB output revision queries in ListAudioDeviceName()
Audio devices use the supplied revision information in order to
determine if USB audio output is able to be supported. In this case, we
can only really handle using this revision information in
ListAudioDeviceName(), where it checks if USB audio output is supported
before supplying it as a device name.

A few other scenarios exist where the revision info is checked, such as:

- Early exiting from SetAudioDeviceOutputVolume if USB audio is
  attempted to be set when that device is unsupported.

- Early exiting and returning 0.0f in GetAudioDeviceOutputVolume when
  USB output volume is queried and it's an unsupported device.

- Falling back to AHUB headphones in GetActiveAudioDeviceName when the
  device type is USB output, but is unsupported based off the revision
  info.

In order for these changes to also be implemented, a few other changes
to the interface need to be made.

Given we now properly handle everything about ListAudioDeviceName(), we
no longer need to describe it as a stubbed function.
2019-07-19 07:55:27 -04:00
Lioncash
b9ebab71be service/audren_u: Move revision testing code out of AudRenU
The revision querying facilities are used by more than just audren. e.g.
audio devices can use this to test whether or not USB audio output is
supported.

This will be used within the following change.
2019-07-19 07:55:23 -04:00
Lioncash
ed0485c599 service/audio: Remove global system accessors
Trims out the lingering reliance on global state out of the audio code.
2019-07-19 07:29:36 -04:00
Lioncash
7653e4babc service/audren_u: Remove unnecessary return value from GetActiveAudioDeviceName()
This service function only ever returns a result and nothing more.
2019-07-19 06:57:31 -04:00
Lioncash
6ecbc6c557 service/audren_u: Report proper device names
AudioDevice and AudioInterface aren't valid device names on the Switch.
We should also be returning consistent names in
GetActiveAudioDeviceName().

While we're at it, we can also handle proper name output in
ListAudioDeviceName, by returning all the available devices on the
Switch.
2019-07-19 06:57:30 -04:00
Lioncash
c1c89411da video_core/control_flow: Provide operator!= for types with operator==
Provides operational symmetry for the respective structures.
2019-07-18 21:03:31 -04:00
Lioncash
1780e0e3d0 video_core/control_flow: Prevent sign conversion in TryGetBlock()
The return value is a u32, not an s32, so this would result in an
implicit signedness conversion.
2019-07-18 21:03:31 -04:00
Lioncash
a162a844d2 video_core/control_flow: Remove unnecessary BlockStack copy constructor
This is the default behavior of the copy constructor, so it doesn't need
to be specified.

While we're at it we can make the other non-default constructor
explicit.
2019-07-18 21:03:30 -04:00
Lioncash
56bc11d952 video_core/control_flow: Use std::move where applicable
Results in less work being done where avoidable.
2019-07-18 21:03:30 -04:00
Lioncash
e7b39f47f8 video_core/control_flow: Use the prefix variant of operator++ for iterators
Same thing, but potentially allows a standard library implementation to
pick a more efficient codepath.
2019-07-18 21:03:30 -04:00
Lioncash
6885e7e7ec video_core/control_flow: Use empty() member function for checking emptiness
It's what it's there for.
2019-07-18 21:03:30 -04:00
Lioncash
45fa12a05c video_core: Resolve -Wreorder warnings
Ensures that the constructor members are always initialized in the order
that they're declared in.
2019-07-18 21:03:30 -04:00
Lioncash
47df844338 video_core/control_flow: Make program_size for ScanFlow() a std::size_t
Prevents a truncation warning from occurring with MSVC. Also the
internal data structures already treat it as a size_t, so this is just a
discrepancy in the interface.
2019-07-18 21:03:29 -04:00
Lioncash
3df9558593 video_core/control_flow: Place all internally linked types/functions within an anonymous namespace
Previously, quite a few functions were being linked with external
linkage.
2019-07-18 21:03:29 -04:00
Lioncash
1109db86b7 video_core/shader/decode: Prevent sign-conversion warnings
Makes it explicit that the conversions here are intentional.
2019-07-18 21:03:29 -04:00
Fernando Sahmkow
5a06e33859 Shader_Ir: correct clang format 2019-07-18 10:09:26 -04:00
Fernando Sahmkow
43f57d668c GPU: Add missing puller methods.
This adds some missing puller methods. We don't assert them as these are 
nop operations for us.
2019-07-18 08:54:42 -04:00
Fernando Sahmkow
3a3fee5abf MaxwellDMA/KeplerCopy: Downgrade DMA log message to Trace.
This log was just to know which games used DMA. It's no longer 
important.
2019-07-18 08:31:38 -04:00
Fernando Sahmkow
d3b71ff80d Gl_Texture_Cache: Remove assert on component type in GetFormatTuple
Textures can have different components types in different orders. This 
assert was completely inprecise and the effectiveness of such is better 
handled by case and within the texture cache.
2019-07-18 08:20:31 -04:00
Fernando Sahmkow
0b65e9335e Shader_Ir: Downgrade precision and rounding asserts to debug asserts.
This commit reduces the sevirity of asserts for FP precision and 
rounding as this are well known and have little to no consequences in 
gpu's accuracy.
2019-07-18 08:17:19 -04:00
ReinUsesLisp
74632c76ce gl_shader_decompiler: Rename bufferImage to imageBuffer
The online OpenGL documentation is wrong. The type definition is
imageBuffer.
2019-07-18 01:16:44 -03:00
ReinUsesLisp
87909d327f gl_shader_cache: Fix newline on buffer preprocessor definitions 2019-07-18 01:16:15 -03:00
ReinUsesLisp
e7bdf8b22a textures: Fix texture buffer size calculation 2019-07-18 01:07:08 -03:00
ReinUsesLisp
84027f4808 gl_texture_cache: Do not set texture parameters to buffers 2019-07-18 01:06:26 -03:00
ReinUsesLisp
73b2dc6d4f gl_texture_cache: Add missing break in CreateTexture 2019-07-18 01:04:18 -03:00
Fernando Sahmkow
4be61013a1 GL_State: Feedback and fixes 2019-07-17 17:29:56 -04:00
Fernando Sahmkow
5ad889f6fd Maxwell3D: Address Feedback 2019-07-17 17:29:55 -04:00
Fernando Sahmkow
7826f0afd9 Texture_Cache: Rebase Fixes 2019-07-17 17:29:54 -04:00
Fernando Sahmkow
8cdbfe69b1 GL_Rasterizer: Corrections to Clearing. 2019-07-17 17:29:54 -04:00
Fernando Sahmkow
0ff4a5fa39 Maxwell3D: Correct marking dirtiness on CB upload 2019-07-17 17:29:53 -04:00
Fernando Sahmkow
fec32fed18 GL_Rasterizer: Rework RenderTarget/DepthBuffer clearing 2019-07-17 17:29:52 -04:00
Fernando Sahmkow
a081dea8ab Maxwell3D: Implement State Dirty Flags. 2019-07-17 17:29:51 -04:00
Fernando Sahmkow
0d3db58657 Maxwell3D: Rework CBData Upload 2019-07-17 17:29:50 -04:00
Fernando Sahmkow
f2e7b29c14 Maxwell3D: Rework the dirty system to be more consistant and scaleable 2019-07-17 17:29:49 -04:00
Fernando Sahmkow
e42bcf2314 maxwell3d: Implement Conditional Rendering
Conditional Rendering takes care of conditionaly clearing or drawing
depending on a set of queries. This PR implements the query checks to
stablish if things can be rendered or not.
2019-07-17 17:13:19 -04:00
Fernando Sahmkow
d614193e49 Shader_Ir: Correct tracking to track from right to left 2019-07-16 15:06:59 -04:00
ReinUsesLisp
2a4044a858 gl_shader_cache: Fix clang-format issues 2019-07-15 20:33:51 -03:00
ReinUsesLisp
6b0d017675 gl_shader_decompiler: Stub local memory size 2019-07-15 17:38:25 -03:00
ReinUsesLisp
56bca83bde gl_shader_cache: Address review commentaries 2019-07-15 17:38:25 -03:00
ReinUsesLisp
bbecd13697 gl_shader_cache: Address CI issues 2019-07-15 17:38:25 -03:00
ReinUsesLisp
725ba6cf63 gl_rasterizer: Implement compute shaders 2019-07-15 17:38:25 -03:00
Lioncash
5085a16d78 yuzu/main: Make error messages within OnCoreError more localization-friendly
Previously, a translated string was being appended onto another string
in a manner that doesn't allow the translator to control where the
appended text is placed. This can be a nuisance for languages where
grammar and text ordering differs from English.

We now append the strings via the format strings themselves, which
allows translators to reorder where the text will be placed.
2019-07-07 11:02:05 -04:00
Fernando Sahmkow
0fc98958a3 NVServices: Correct delayed responses. 2019-07-05 15:49:35 -04:00
Fernando Sahmkow
8c91d5c166 Nv_Host_Ctrl: Correct difference calculation 2019-07-05 15:49:34 -04:00
Fernando Sahmkow
f3a39e0c9c NVServices: Address Feedback 2019-07-05 15:49:33 -04:00
Fernando Sahmkow
d20ede40b1 NVServices: Styling, define constructors as explicit and corrections 2019-07-05 15:49:32 -04:00
Fernando Sahmkow
b391e5f638 NVFlinger: Correct GCC compile error 2019-07-05 15:49:31 -04:00
Fernando Sahmkow
0335a25d1f NVServices: Make NVEvents Automatic according to documentation. 2019-07-05 15:49:29 -04:00
Fernando Sahmkow
b6844bec60 NVServices: Correct CtrlEventWaitSync to block the ipc until timeout. 2019-07-05 15:49:28 -04:00
Fernando Sahmkow
7d1b974bca GPU: Correct Interrupts to interrupt on syncpt/value instead of event, mirroring hardware 2019-07-05 15:49:26 -04:00
Fernando Sahmkow
61697864c3 nvflinger: Make the force 30 fps still force 30 fps 2019-07-05 15:49:25 -04:00
Fernando Sahmkow
efdeab3a1d nv_services: Fixes to event liberation. 2019-07-05 15:49:24 -04:00
Fernando Sahmkow
ea97589624 nvflinger: Acquire buffers in the same order as they were queued. 2019-07-05 15:49:23 -04:00
Fernando Sahmkow
24408cce9b nv_services: Deglobalize NvServices 2019-07-05 15:49:22 -04:00
Fernando Sahmkow
f2e026a1d8 gpu_asynch: Simplify synchronization to a simpler consumer->producer scheme. 2019-07-05 15:49:20 -04:00
Fernando Sahmkow
0706d633bf nv_host_ctrl: Make Sync GPU variant always return synced result. 2019-07-05 15:49:20 -04:00
Fernando Sahmkow
600dddf88d Async GPU: do invalidate as synced operation
Async GPU: Always invalidate synced.
2019-07-05 15:49:19 -04:00
Fernando Sahmkow
c13433aee4 Gpu: use an std mutex instead of a spin_lock to guard syncpoints 2019-07-05 15:49:18 -04:00
Fernando Sahmkow
78add28aab nvhost_ctrl: Corrections to event handling 2019-07-05 15:49:17 -04:00
Fernando Sahmkow
eef55f493b Gpu: Mark areas as protected. 2019-07-05 15:49:16 -04:00
Fernando Sahmkow
a45643cb3b nv_services: Stub CtrlEventSignal 2019-07-05 15:49:15 -04:00
Fernando Sahmkow
8942047d41 Gpu: Implement Hardware Interrupt Manager and manage GPU interrupts 2019-07-05 15:49:14 -04:00
Fernando Sahmkow
e0027eba85 nv_services: Implement NvQueryEvent, NvCtrlEventWait, NvEventRegister, NvEventUnregister 2019-07-05 15:49:13 -04:00
Fernando Sahmkow
7039ece0a0 nv_services: Create GPU channels correctly 2019-07-05 15:49:12 -04:00
Fernando Sahmkow
82b829625b video_core: Implement GPU side Syncpoints 2019-07-05 15:49:11 -04:00
Fernando Sahmkow
737e978f5b nv_services: Correct buffer queue fencing and GPFifo fencing 2019-07-05 15:49:10 -04:00
Fernando Sahmkow
ceb5f5079c nvflinger: Implement swap intervals 2019-07-05 15:49:08 -04:00
157 changed files with 3170 additions and 1186 deletions

View File

@@ -1,20 +1,22 @@
parameters:
artifactSource: 'true'
cache: 'false'
steps:
- task: DockerInstaller@0
displayName: 'Prepare Environment'
inputs:
dockerVersion: '17.09.0-ce'
- task: CacheBeta@0
displayName: 'Cache Build System'
inputs:
key: yuzu-v1-$(BuildName)-$(BuildSuffix)-$(CacheSuffix)
path: $(System.DefaultWorkingDirectory)/ccache
cacheHitVar: CACHE_RESTORED
- ${{ if eq(parameters.cache, 'true') }}:
- task: CacheBeta@0
displayName: 'Cache Build System'
inputs:
key: yuzu-v1-$(BuildName)-$(BuildSuffix)-$(CacheSuffix)
path: $(System.DefaultWorkingDirectory)/ccache
cacheHitVar: CACHE_RESTORED
- script: chmod a+x ./.ci/scripts/$(ScriptFolder)/exec.sh && ./.ci/scripts/$(ScriptFolder)/exec.sh
displayName: 'Build'
- script: chmod a+x ./.ci/scripts/$(ScriptFolder)/upload.sh && ./.ci/scripts/$(ScriptFolder)/upload.sh
- script: chmod a+x ./.ci/scripts/$(ScriptFolder)/upload.sh && RELEASE_NAME=$(BuildName) ./.ci/scripts/$(ScriptFolder)/upload.sh
displayName: 'Package Artifacts'
- publish: artifacts
artifact: 'yuzu-$(BuildName)-$(BuildSuffix)'

View File

@@ -19,4 +19,5 @@ jobs:
needSubmodules: 'true'
- template: ./build-single.yml
parameters:
artifactSource: 'false'
artifactSource: 'false'
cache: $(parameters.cache)

View File

@@ -4,18 +4,20 @@ jobs:
pool:
vmImage: ubuntu-latest
strategy:
maxParallel: 10
maxParallel: 5
matrix:
windows:
BuildSuffix: 'windows-testing'
ScriptFolder: 'windows'
steps:
- script: sudo apt upgrade python3-pip && pip install requests urllib3
displayName: 'Prepare Environment'
- task: PythonScript@0
condition: eq(variables['Build.Reason'], 'PullRequest')
displayName: 'Determine Testing Status'
inputs:
scriptSource: 'filePath'
scriptPath: '../scripts/merge/check-label-presence.py'
scriptPath: '.ci/scripts/merge/check-label-presence.py'
arguments: '$(System.PullRequest.PullRequestNumber) create-testing-build'
- ${{ if eq(variables.enabletesting, 'true') }}:
- template: ./sync-source.yml
@@ -27,4 +29,5 @@ jobs:
matchLabel: 'testing-merge'
- template: ./build-single.yml
parameters:
artifactSource: 'false'
artifactSource: 'false'
cache: 'false'

View File

@@ -21,3 +21,5 @@ stages:
dependsOn: format
jobs:
- template: ./templates/build-standard.yml
parameters:
cache: 'true'

View File

@@ -15,4 +15,6 @@ stages:
dependsOn: format
jobs:
- template: ./templates/build-standard.yml
- template: ./templates/build-testing.yml
parameters:
cache: 'false'
- template: ./templates/build-testing.yml

View File

@@ -81,6 +81,7 @@ set(HASH_FILES
"${VIDEO_CORE}/shader/decode/register_set_predicate.cpp"
"${VIDEO_CORE}/shader/decode/shift.cpp"
"${VIDEO_CORE}/shader/decode/video.cpp"
"${VIDEO_CORE}/shader/decode/warp.cpp"
"${VIDEO_CORE}/shader/decode/xmad.cpp"
"${VIDEO_CORE}/shader/control_flow.cpp"
"${VIDEO_CORE}/shader/control_flow.h"

View File

@@ -55,6 +55,7 @@ add_custom_command(OUTPUT scm_rev.cpp
"${VIDEO_CORE}/shader/decode/register_set_predicate.cpp"
"${VIDEO_CORE}/shader/decode/shift.cpp"
"${VIDEO_CORE}/shader/decode/video.cpp"
"${VIDEO_CORE}/shader/decode/warp.cpp"
"${VIDEO_CORE}/shader/decode/xmad.cpp"
"${VIDEO_CORE}/shader/control_flow.cpp"
"${VIDEO_CORE}/shader/control_flow.h"

View File

@@ -3,6 +3,7 @@
#pragma once
#include <cstddef>
#include <memory>
#include <type_traits>
namespace Common {
@@ -37,4 +38,63 @@ constexpr bool IsWordAligned(T value) {
return (value & 0b11) == 0;
}
template <typename T, std::size_t Align = 16>
class AlignmentAllocator {
public:
using value_type = T;
using size_type = std::size_t;
using difference_type = std::ptrdiff_t;
using pointer = T*;
using const_pointer = const T*;
using reference = T&;
using const_reference = const T&;
public:
pointer address(reference r) noexcept {
return std::addressof(r);
}
const_pointer address(const_reference r) const noexcept {
return std::addressof(r);
}
pointer allocate(size_type n) {
return static_cast<pointer>(::operator new (n, std::align_val_t{Align}));
}
void deallocate(pointer p, size_type) {
::operator delete (p, std::align_val_t{Align});
}
void construct(pointer p, const value_type& wert) {
new (p) value_type(wert);
}
void destroy(pointer p) {
p->~value_type();
}
size_type max_size() const noexcept {
return size_type(-1) / sizeof(value_type);
}
template <typename T2>
struct rebind {
using other = AlignmentAllocator<T2, Align>;
};
bool operator!=(const AlignmentAllocator<T, Align>& other) const noexcept {
return !(*this == other);
}
// Returns true if and only if storage allocated from *this
// can be deallocated from other, and vice versa.
// Always returns true for stateless allocators.
bool operator==(const AlignmentAllocator<T, Align>& other) const noexcept {
return true;
}
};
} // namespace Common

View File

@@ -111,6 +111,8 @@ add_library(core STATIC
frontend/scope_acquire_window_context.h
gdbstub/gdbstub.cpp
gdbstub/gdbstub.h
hardware_interrupt_manager.cpp
hardware_interrupt_manager.h
hle/ipc.h
hle/ipc_helpers.h
hle/kernel/address_arbiter.cpp
@@ -372,6 +374,7 @@ add_library(core STATIC
hle/service/nvdrv/devices/nvmap.h
hle/service/nvdrv/interface.cpp
hle/service/nvdrv/interface.h
hle/service/nvdrv/nvdata.h
hle/service/nvdrv/nvdrv.cpp
hle/service/nvdrv/nvdrv.h
hle/service/nvdrv/nvmemp.cpp

View File

@@ -19,6 +19,7 @@
#include "core/file_sys/vfs_concat.h"
#include "core/file_sys/vfs_real.h"
#include "core/gdbstub/gdbstub.h"
#include "core/hardware_interrupt_manager.h"
#include "core/hle/kernel/client_port.h"
#include "core/hle/kernel/kernel.h"
#include "core/hle/kernel/process.h"
@@ -151,7 +152,7 @@ struct System::Impl {
if (!renderer->Init()) {
return ResultStatus::ErrorVideoCore;
}
interrupt_manager = std::make_unique<Core::Hardware::InterruptManager>(system);
gpu_core = VideoCore::CreateGPU(system);
is_powered_on = true;
@@ -298,6 +299,7 @@ struct System::Impl {
std::unique_ptr<VideoCore::RendererBase> renderer;
std::unique_ptr<Tegra::GPU> gpu_core;
std::shared_ptr<Tegra::DebugContext> debug_context;
std::unique_ptr<Core::Hardware::InterruptManager> interrupt_manager;
CpuCoreManager cpu_core_manager;
bool is_powered_on = false;
@@ -444,6 +446,14 @@ const Tegra::GPU& System::GPU() const {
return *impl->gpu_core;
}
Core::Hardware::InterruptManager& System::InterruptManager() {
return *impl->interrupt_manager;
}
const Core::Hardware::InterruptManager& System::InterruptManager() const {
return *impl->interrupt_manager;
}
VideoCore::RendererBase& System::Renderer() {
return *impl->renderer;
}

View File

@@ -70,6 +70,10 @@ namespace Core::Timing {
class CoreTiming;
}
namespace Core::Hardware {
class InterruptManager;
}
namespace Core {
class ARM_Interface;
@@ -234,6 +238,12 @@ public:
/// Provides a constant reference to the core timing instance.
const Timing::CoreTiming& CoreTiming() const;
/// Provides a reference to the interrupt manager instance.
Core::Hardware::InterruptManager& InterruptManager();
/// Provides a constant reference to the interrupt manager instance.
const Core::Hardware::InterruptManager& InterruptManager() const;
/// Provides a reference to the kernel instance.
Kernel::KernelCore& Kernel();

View File

@@ -0,0 +1,30 @@
// Copyright 2019 Yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#include "core/core.h"
#include "core/core_timing.h"
#include "core/hardware_interrupt_manager.h"
#include "core/hle/service/nvdrv/interface.h"
#include "core/hle/service/sm/sm.h"
namespace Core::Hardware {
InterruptManager::InterruptManager(Core::System& system_in) : system(system_in) {
gpu_interrupt_event =
system.CoreTiming().RegisterEvent("GPUInterrupt", [this](u64 message, s64) {
auto nvdrv = system.ServiceManager().GetService<Service::Nvidia::NVDRV>("nvdrv");
const u32 syncpt = static_cast<u32>(message >> 32);
const u32 value = static_cast<u32>(message);
nvdrv->SignalGPUInterruptSyncpt(syncpt, value);
});
}
InterruptManager::~InterruptManager() = default;
void InterruptManager::GPUInterruptSyncpt(const u32 syncpoint_id, const u32 value) {
const u64 msg = (static_cast<u64>(syncpoint_id) << 32ULL) | value;
system.CoreTiming().ScheduleEvent(10, gpu_interrupt_event, msg);
}
} // namespace Core::Hardware

View File

@@ -0,0 +1,31 @@
// Copyright 2019 Yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include "common/common_types.h"
namespace Core {
class System;
}
namespace Core::Timing {
struct EventType;
}
namespace Core::Hardware {
class InterruptManager {
public:
explicit InterruptManager(Core::System& system);
~InterruptManager();
void GPUInterruptSyncpt(u32 syncpoint_id, u32 value);
private:
Core::System& system;
Core::Timing::EventType* gpu_interrupt_event{};
};
} // namespace Core::Hardware

View File

@@ -8,6 +8,7 @@
#include <vector>
#include "common/common_types.h"
#include "core/hle/kernel/physical_memory.h"
namespace Kernel {
@@ -77,7 +78,7 @@ struct CodeSet final {
}
/// The overall data that backs this code set.
std::vector<u8> memory;
Kernel::PhysicalMemory memory;
/// The segments that comprise this code set.
std::array<Segment, 3> segments;

View File

@@ -0,0 +1,19 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include "common/alignment.h"
namespace Kernel {
// This encapsulation serves 2 purposes:
// - First, to encapsulate host physical memory under a single type and set an
// standard for managing it.
// - Second to ensure all host backing memory used is aligned to 256 bytes due
// to strict alignment restrictions on GPU memory.
using PhysicalMemory = std::vector<u8, Common::AlignmentAllocator<u8, 256>>;
} // namespace Kernel

View File

@@ -247,7 +247,7 @@ VAddr Process::CreateTLSRegion() {
ASSERT(region_address.Succeeded());
const auto map_result = vm_manager.MapMemoryBlock(
*region_address, std::make_shared<std::vector<u8>>(Memory::PAGE_SIZE), 0,
*region_address, std::make_shared<PhysicalMemory>(Memory::PAGE_SIZE), 0,
Memory::PAGE_SIZE, MemoryState::ThreadLocal);
ASSERT(map_result.Succeeded());
@@ -277,7 +277,7 @@ void Process::FreeTLSRegion(VAddr tls_address) {
}
void Process::LoadModule(CodeSet module_, VAddr base_addr) {
const auto memory = std::make_shared<std::vector<u8>>(std::move(module_.memory));
const auto memory = std::make_shared<PhysicalMemory>(std::move(module_.memory));
const auto MapSegment = [&](const CodeSet::Segment& segment, VMAPermission permissions,
MemoryState memory_state) {
@@ -327,7 +327,7 @@ void Process::AllocateMainThreadStack(u64 stack_size) {
// Allocate and map the main thread stack
const VAddr mapping_address = vm_manager.GetTLSIORegionEndAddress() - main_thread_stack_size;
vm_manager
.MapMemoryBlock(mapping_address, std::make_shared<std::vector<u8>>(main_thread_stack_size),
.MapMemoryBlock(mapping_address, std::make_shared<PhysicalMemory>(main_thread_stack_size),
0, main_thread_stack_size, MemoryState::Stack)
.Unwrap();
}

View File

@@ -28,7 +28,7 @@ SharedPtr<SharedMemory> SharedMemory::Create(KernelCore& kernel, Process* owner_
shared_memory->other_permissions = other_permissions;
if (address == 0) {
shared_memory->backing_block = std::make_shared<std::vector<u8>>(size);
shared_memory->backing_block = std::make_shared<Kernel::PhysicalMemory>(size);
shared_memory->backing_block_offset = 0;
// Refresh the address mappings for the current process.
@@ -59,8 +59,8 @@ SharedPtr<SharedMemory> SharedMemory::Create(KernelCore& kernel, Process* owner_
}
SharedPtr<SharedMemory> SharedMemory::CreateForApplet(
KernelCore& kernel, std::shared_ptr<std::vector<u8>> heap_block, std::size_t offset, u64 size,
MemoryPermission permissions, MemoryPermission other_permissions, std::string name) {
KernelCore& kernel, std::shared_ptr<Kernel::PhysicalMemory> heap_block, std::size_t offset,
u64 size, MemoryPermission permissions, MemoryPermission other_permissions, std::string name) {
SharedPtr<SharedMemory> shared_memory(new SharedMemory(kernel));
shared_memory->owner_process = nullptr;

View File

@@ -10,6 +10,7 @@
#include "common/common_types.h"
#include "core/hle/kernel/object.h"
#include "core/hle/kernel/physical_memory.h"
#include "core/hle/kernel/process.h"
#include "core/hle/result.h"
@@ -62,12 +63,10 @@ public:
* block.
* @param name Optional object name, used for debugging purposes.
*/
static SharedPtr<SharedMemory> CreateForApplet(KernelCore& kernel,
std::shared_ptr<std::vector<u8>> heap_block,
std::size_t offset, u64 size,
MemoryPermission permissions,
MemoryPermission other_permissions,
std::string name = "Unknown Applet");
static SharedPtr<SharedMemory> CreateForApplet(
KernelCore& kernel, std::shared_ptr<Kernel::PhysicalMemory> heap_block, std::size_t offset,
u64 size, MemoryPermission permissions, MemoryPermission other_permissions,
std::string name = "Unknown Applet");
std::string GetTypeName() const override {
return "SharedMemory";
@@ -135,7 +134,7 @@ private:
~SharedMemory() override;
/// Backing memory for this shared memory block.
std::shared_ptr<std::vector<u8>> backing_block;
std::shared_ptr<PhysicalMemory> backing_block;
/// Offset into the backing block for this shared memory.
std::size_t backing_block_offset = 0;
/// Size of the memory block. Page-aligned.

View File

@@ -47,7 +47,7 @@ ResultCode TransferMemory::MapMemory(VAddr address, u64 size, MemoryPermission p
return ERR_INVALID_STATE;
}
backing_block = std::make_shared<std::vector<u8>>(size);
backing_block = std::make_shared<PhysicalMemory>(size);
const auto map_state = owner_permissions == MemoryPermission::None
? MemoryState::TransferMemoryIsolated

View File

@@ -8,6 +8,7 @@
#include <vector>
#include "core/hle/kernel/object.h"
#include "core/hle/kernel/physical_memory.h"
union ResultCode;
@@ -82,7 +83,7 @@ private:
~TransferMemory() override;
/// Memory block backing this instance.
std::shared_ptr<std::vector<u8>> backing_block;
std::shared_ptr<PhysicalMemory> backing_block;
/// The base address for the memory managed by this instance.
VAddr base_address = 0;

View File

@@ -5,6 +5,7 @@
#include <algorithm>
#include <iterator>
#include <utility>
#include "common/alignment.h"
#include "common/assert.h"
#include "common/logging/log.h"
#include "common/memory_hook.h"
@@ -103,7 +104,7 @@ bool VMManager::IsValidHandle(VMAHandle handle) const {
}
ResultVal<VMManager::VMAHandle> VMManager::MapMemoryBlock(VAddr target,
std::shared_ptr<std::vector<u8>> block,
std::shared_ptr<PhysicalMemory> block,
std::size_t offset, u64 size,
MemoryState state, VMAPermission perm) {
ASSERT(block != nullptr);
@@ -260,7 +261,7 @@ ResultVal<VAddr> VMManager::SetHeapSize(u64 size) {
if (heap_memory == nullptr) {
// Initialize heap
heap_memory = std::make_shared<std::vector<u8>>(size);
heap_memory = std::make_shared<PhysicalMemory>(size);
heap_end = heap_region_base + size;
} else {
UnmapRange(heap_region_base, GetCurrentHeapSize());
@@ -341,7 +342,7 @@ ResultCode VMManager::MapPhysicalMemory(VAddr target, u64 size) {
const auto map_size = std::min(end_addr - cur_addr, vma_end - cur_addr);
if (vma.state == MemoryState::Unmapped) {
const auto map_res =
MapMemoryBlock(cur_addr, std::make_shared<std::vector<u8>>(map_size, 0), 0,
MapMemoryBlock(cur_addr, std::make_shared<PhysicalMemory>(map_size, 0), 0,
map_size, MemoryState::Heap, VMAPermission::ReadWrite);
result = map_res.Code();
if (result.IsError()) {
@@ -442,7 +443,7 @@ ResultCode VMManager::UnmapPhysicalMemory(VAddr target, u64 size) {
if (result.IsError()) {
for (const auto [map_address, map_size] : unmapped_regions) {
const auto remap_res =
MapMemoryBlock(map_address, std::make_shared<std::vector<u8>>(map_size, 0), 0,
MapMemoryBlock(map_address, std::make_shared<PhysicalMemory>(map_size, 0), 0,
map_size, MemoryState::Heap, VMAPermission::None);
ASSERT_MSG(remap_res.Succeeded(), "UnmapPhysicalMemory re-map on error");
}
@@ -593,7 +594,7 @@ ResultCode VMManager::MirrorMemory(VAddr dst_addr, VAddr src_addr, u64 size, Mem
ASSERT_MSG(vma_offset + size <= vma->second.size,
"Shared memory exceeds bounds of mapped block");
const std::shared_ptr<std::vector<u8>>& backing_block = vma->second.backing_block;
const std::shared_ptr<PhysicalMemory>& backing_block = vma->second.backing_block;
const std::size_t backing_block_offset = vma->second.offset + vma_offset;
CASCADE_RESULT(auto new_vma,
@@ -606,7 +607,7 @@ ResultCode VMManager::MirrorMemory(VAddr dst_addr, VAddr src_addr, u64 size, Mem
return RESULT_SUCCESS;
}
void VMManager::RefreshMemoryBlockMappings(const std::vector<u8>* block) {
void VMManager::RefreshMemoryBlockMappings(const PhysicalMemory* block) {
// If this ever proves to have a noticeable performance impact, allow users of the function to
// specify a specific range of addresses to limit the scan to.
for (const auto& p : vma_map) {
@@ -764,7 +765,7 @@ void VMManager::MergeAdjacentVMA(VirtualMemoryArea& left, const VirtualMemoryAre
right.backing_block->begin() + right.offset + right.size);
} else {
// Slow case: make a new memory block for left and right.
auto new_memory = std::make_shared<std::vector<u8>>();
auto new_memory = std::make_shared<PhysicalMemory>();
new_memory->insert(new_memory->end(), left.backing_block->begin() + left.offset,
left.backing_block->begin() + left.offset + left.size);
new_memory->insert(new_memory->end(), right.backing_block->begin() + right.offset,

View File

@@ -11,6 +11,7 @@
#include "common/common_types.h"
#include "common/memory_hook.h"
#include "common/page_table.h"
#include "core/hle/kernel/physical_memory.h"
#include "core/hle/result.h"
#include "core/memory.h"
@@ -290,7 +291,7 @@ struct VirtualMemoryArea {
// Settings for type = AllocatedMemoryBlock
/// Memory block backing this VMA.
std::shared_ptr<std::vector<u8>> backing_block = nullptr;
std::shared_ptr<PhysicalMemory> backing_block = nullptr;
/// Offset into the backing_memory the mapping starts from.
std::size_t offset = 0;
@@ -348,7 +349,7 @@ public:
* @param size Size of the mapping.
* @param state MemoryState tag to attach to the VMA.
*/
ResultVal<VMAHandle> MapMemoryBlock(VAddr target, std::shared_ptr<std::vector<u8>> block,
ResultVal<VMAHandle> MapMemoryBlock(VAddr target, std::shared_ptr<PhysicalMemory> block,
std::size_t offset, u64 size, MemoryState state,
VMAPermission perm = VMAPermission::ReadWrite);
@@ -547,7 +548,7 @@ public:
* Scans all VMAs and updates the page table range of any that use the given vector as backing
* memory. This should be called after any operation that causes reallocation of the vector.
*/
void RefreshMemoryBlockMappings(const std::vector<u8>* block);
void RefreshMemoryBlockMappings(const PhysicalMemory* block);
/// Dumps the address space layout to the log, for debugging
void LogLayout() const;
@@ -777,7 +778,7 @@ private:
// the entire virtual address space extents that bound the allocations, including any holes.
// This makes deallocation and reallocation of holes fast and keeps process memory contiguous
// in the emulator address space, allowing Memory::GetPointer to be reasonably safe.
std::shared_ptr<std::vector<u8>> heap_memory;
std::shared_ptr<PhysicalMemory> heap_memory;
// The end of the currently allocated heap. This is not an inclusive
// end of the range. This is essentially 'base_address + current_size'.

View File

@@ -19,16 +19,16 @@
namespace Service::Audio {
void InstallInterfaces(SM::ServiceManager& service_manager) {
void InstallInterfaces(SM::ServiceManager& service_manager, Core::System& system) {
std::make_shared<AudCtl>()->InstallAsService(service_manager);
std::make_shared<AudOutA>()->InstallAsService(service_manager);
std::make_shared<AudOutU>()->InstallAsService(service_manager);
std::make_shared<AudOutU>(system)->InstallAsService(service_manager);
std::make_shared<AudInA>()->InstallAsService(service_manager);
std::make_shared<AudInU>()->InstallAsService(service_manager);
std::make_shared<AudRecA>()->InstallAsService(service_manager);
std::make_shared<AudRecU>()->InstallAsService(service_manager);
std::make_shared<AudRenA>()->InstallAsService(service_manager);
std::make_shared<AudRenU>()->InstallAsService(service_manager);
std::make_shared<AudRenU>(system)->InstallAsService(service_manager);
std::make_shared<CodecCtl>()->InstallAsService(service_manager);
std::make_shared<HwOpus>()->InstallAsService(service_manager);

View File

@@ -4,6 +4,10 @@
#pragma once
namespace Core {
class System;
}
namespace Service::SM {
class ServiceManager;
}
@@ -11,6 +15,6 @@ class ServiceManager;
namespace Service::Audio {
/// Registers all Audio services with the specified service manager.
void InstallInterfaces(SM::ServiceManager& service_manager);
void InstallInterfaces(SM::ServiceManager& service_manager, Core::System& system);
} // namespace Service::Audio

View File

@@ -40,8 +40,8 @@ enum class AudioState : u32 {
class IAudioOut final : public ServiceFramework<IAudioOut> {
public:
IAudioOut(AudoutParams audio_params, AudioCore::AudioOut& audio_core, std::string&& device_name,
std::string&& unique_name)
IAudioOut(Core::System& system, AudoutParams audio_params, AudioCore::AudioOut& audio_core,
std::string&& device_name, std::string&& unique_name)
: ServiceFramework("IAudioOut"), audio_core(audio_core),
device_name(std::move(device_name)), audio_params(audio_params) {
// clang-format off
@@ -65,7 +65,6 @@ public:
RegisterHandlers(functions);
// This is the event handle used to check if the audio buffer was released
auto& system = Core::System::GetInstance();
buffer_event = Kernel::WritableEvent::CreateEventPair(
system.Kernel(), Kernel::ResetType::Manual, "IAudioOutBufferReleased");
@@ -212,6 +211,22 @@ private:
Kernel::EventPair buffer_event;
};
AudOutU::AudOutU(Core::System& system_) : ServiceFramework("audout:u"), system{system_} {
// clang-format off
static const FunctionInfo functions[] = {
{0, &AudOutU::ListAudioOutsImpl, "ListAudioOuts"},
{1, &AudOutU::OpenAudioOutImpl, "OpenAudioOut"},
{2, &AudOutU::ListAudioOutsImpl, "ListAudioOutsAuto"},
{3, &AudOutU::OpenAudioOutImpl, "OpenAudioOutAuto"},
};
// clang-format on
RegisterHandlers(functions);
audio_core = std::make_unique<AudioCore::AudioOut>();
}
AudOutU::~AudOutU() = default;
void AudOutU::ListAudioOutsImpl(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Service_Audio, "called");
@@ -248,7 +263,7 @@ void AudOutU::OpenAudioOutImpl(Kernel::HLERequestContext& ctx) {
std::string unique_name{fmt::format("{}-{}", device_name, audio_out_interfaces.size())};
auto audio_out_interface = std::make_shared<IAudioOut>(
params, *audio_core, std::move(device_name), std::move(unique_name));
system, params, *audio_core, std::move(device_name), std::move(unique_name));
IPC::ResponseBuilder rb{ctx, 6, 0, 1};
rb.Push(RESULT_SUCCESS);
@@ -256,20 +271,9 @@ void AudOutU::OpenAudioOutImpl(Kernel::HLERequestContext& ctx) {
rb.Push<u32>(params.channel_count);
rb.Push<u32>(static_cast<u32>(AudioCore::Codec::PcmFormat::Int16));
rb.Push<u32>(static_cast<u32>(AudioState::Stopped));
rb.PushIpcInterface<Audio::IAudioOut>(audio_out_interface);
rb.PushIpcInterface<IAudioOut>(audio_out_interface);
audio_out_interfaces.push_back(std::move(audio_out_interface));
}
AudOutU::AudOutU() : ServiceFramework("audout:u") {
static const FunctionInfo functions[] = {{0, &AudOutU::ListAudioOutsImpl, "ListAudioOuts"},
{1, &AudOutU::OpenAudioOutImpl, "OpenAudioOut"},
{2, &AudOutU::ListAudioOutsImpl, "ListAudioOutsAuto"},
{3, &AudOutU::OpenAudioOutImpl, "OpenAudioOutAuto"}};
RegisterHandlers(functions);
audio_core = std::make_unique<AudioCore::AudioOut>();
}
AudOutU::~AudOutU() = default;
} // namespace Service::Audio

View File

@@ -11,6 +11,10 @@ namespace AudioCore {
class AudioOut;
}
namespace Core {
class System;
}
namespace Kernel {
class HLERequestContext;
}
@@ -21,15 +25,17 @@ class IAudioOut;
class AudOutU final : public ServiceFramework<AudOutU> {
public:
AudOutU();
explicit AudOutU(Core::System& system_);
~AudOutU() override;
private:
void ListAudioOutsImpl(Kernel::HLERequestContext& ctx);
void OpenAudioOutImpl(Kernel::HLERequestContext& ctx);
std::vector<std::shared_ptr<IAudioOut>> audio_out_interfaces;
std::unique_ptr<AudioCore::AudioOut> audio_core;
void ListAudioOutsImpl(Kernel::HLERequestContext& ctx);
void OpenAudioOutImpl(Kernel::HLERequestContext& ctx);
Core::System& system;
};
} // namespace Service::Audio

View File

@@ -5,6 +5,7 @@
#include <algorithm>
#include <array>
#include <memory>
#include <string_view>
#include "audio_core/audio_renderer.h"
#include "common/alignment.h"
@@ -25,7 +26,7 @@ namespace Service::Audio {
class IAudioRenderer final : public ServiceFramework<IAudioRenderer> {
public:
explicit IAudioRenderer(AudioCore::AudioRendererParameter audren_params,
explicit IAudioRenderer(Core::System& system, AudioCore::AudioRendererParameter audren_params,
const std::size_t instance_number)
: ServiceFramework("IAudioRenderer") {
// clang-format off
@@ -46,7 +47,6 @@ public:
// clang-format on
RegisterHandlers(functions);
auto& system = Core::System::GetInstance();
system_event = Kernel::WritableEvent::CreateEventPair(
system.Kernel(), Kernel::ResetType::Manual, "IAudioRenderer:SystemEvent");
renderer = std::make_unique<AudioCore::AudioRenderer>(
@@ -160,17 +160,18 @@ private:
class IAudioDevice final : public ServiceFramework<IAudioDevice> {
public:
IAudioDevice() : ServiceFramework("IAudioDevice") {
explicit IAudioDevice(Core::System& system, u32_le revision_num)
: ServiceFramework("IAudioDevice"), revision{revision_num} {
static const FunctionInfo functions[] = {
{0, &IAudioDevice::ListAudioDeviceName, "ListAudioDeviceName"},
{1, &IAudioDevice::SetAudioDeviceOutputVolume, "SetAudioDeviceOutputVolume"},
{2, nullptr, "GetAudioDeviceOutputVolume"},
{2, &IAudioDevice::GetAudioDeviceOutputVolume, "GetAudioDeviceOutputVolume"},
{3, &IAudioDevice::GetActiveAudioDeviceName, "GetActiveAudioDeviceName"},
{4, &IAudioDevice::QueryAudioDeviceSystemEvent, "QueryAudioDeviceSystemEvent"},
{5, &IAudioDevice::GetActiveChannelCount, "GetActiveChannelCount"},
{6, &IAudioDevice::ListAudioDeviceName, "ListAudioDeviceNameAuto"},
{7, &IAudioDevice::SetAudioDeviceOutputVolume, "SetAudioDeviceOutputVolumeAuto"},
{8, nullptr, "GetAudioDeviceOutputVolumeAuto"},
{8, &IAudioDevice::GetAudioDeviceOutputVolume, "GetAudioDeviceOutputVolumeAuto"},
{10, &IAudioDevice::GetActiveAudioDeviceName, "GetActiveAudioDeviceNameAuto"},
{11, nullptr, "QueryAudioDeviceInputEvent"},
{12, &IAudioDevice::QueryAudioDeviceOutputEvent, "QueryAudioDeviceOutputEvent"},
@@ -178,7 +179,7 @@ public:
};
RegisterHandlers(functions);
auto& kernel = Core::System::GetInstance().Kernel();
auto& kernel = system.Kernel();
buffer_event = Kernel::WritableEvent::CreateEventPair(kernel, Kernel::ResetType::Automatic,
"IAudioOutBufferReleasedEvent");
@@ -189,15 +190,47 @@ public:
}
private:
void ListAudioDeviceName(Kernel::HLERequestContext& ctx) {
LOG_WARNING(Service_Audio, "(STUBBED) called");
using AudioDeviceName = std::array<char, 256>;
static constexpr std::array<std::string_view, 4> audio_device_names{{
"AudioStereoJackOutput",
"AudioBuiltInSpeakerOutput",
"AudioTvOutput",
"AudioUsbDeviceOutput",
}};
enum class DeviceType {
AHUBHeadphones,
AHUBSpeakers,
HDA,
USBOutput,
};
constexpr std::array<char, 15> audio_interface{{"AudioInterface"}};
ctx.WriteBuffer(audio_interface);
void ListAudioDeviceName(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Service_Audio, "called");
const bool usb_output_supported =
IsFeatureSupported(AudioFeatures::AudioUSBDeviceOutput, revision);
const std::size_t count = ctx.GetWriteBufferSize() / sizeof(AudioDeviceName);
std::vector<AudioDeviceName> name_buffer;
name_buffer.reserve(audio_device_names.size());
for (std::size_t i = 0; i < count && i < audio_device_names.size(); i++) {
const auto type = static_cast<DeviceType>(i);
if (!usb_output_supported && type == DeviceType::USBOutput) {
continue;
}
const auto& device_name = audio_device_names[i];
auto& entry = name_buffer.emplace_back();
device_name.copy(entry.data(), device_name.size());
}
ctx.WriteBuffer(name_buffer);
IPC::ResponseBuilder rb{ctx, 3};
rb.Push(RESULT_SUCCESS);
rb.Push<u32>(1);
rb.Push(static_cast<u32>(name_buffer.size()));
}
void SetAudioDeviceOutputVolume(Kernel::HLERequestContext& ctx) {
@@ -213,15 +246,32 @@ private:
rb.Push(RESULT_SUCCESS);
}
void GetActiveAudioDeviceName(Kernel::HLERequestContext& ctx) {
LOG_WARNING(Service_Audio, "(STUBBED) called");
void GetAudioDeviceOutputVolume(Kernel::HLERequestContext& ctx) {
IPC::RequestParser rp{ctx};
constexpr std::array<char, 12> audio_interface{{"AudioDevice"}};
ctx.WriteBuffer(audio_interface);
const auto device_name_buffer = ctx.ReadBuffer();
const std::string name = Common::StringFromBuffer(device_name_buffer);
LOG_WARNING(Service_Audio, "(STUBBED) called. name={}", name);
IPC::ResponseBuilder rb{ctx, 3};
rb.Push(RESULT_SUCCESS);
rb.Push<u32>(1);
rb.Push(1.0f);
}
void GetActiveAudioDeviceName(Kernel::HLERequestContext& ctx) {
LOG_WARNING(Service_Audio, "(STUBBED) called");
// Currently set to always be TV audio output.
const auto& device_name = audio_device_names[2];
AudioDeviceName out_device_name{};
device_name.copy(out_device_name.data(), device_name.size());
ctx.WriteBuffer(out_device_name);
IPC::ResponseBuilder rb{ctx, 2};
rb.Push(RESULT_SUCCESS);
}
void QueryAudioDeviceSystemEvent(Kernel::HLERequestContext& ctx) {
@@ -250,12 +300,13 @@ private:
rb.PushCopyObjects(audio_output_device_switch_event.readable);
}
u32_le revision = 0;
Kernel::EventPair buffer_event;
Kernel::EventPair audio_output_device_switch_event;
}; // namespace Audio
AudRenU::AudRenU() : ServiceFramework("audren:u") {
AudRenU::AudRenU(Core::System& system_) : ServiceFramework("audren:u"), system{system_} {
// clang-format off
static const FunctionInfo functions[] = {
{0, &AudRenU::OpenAudioRenderer, "OpenAudioRenderer"},
@@ -328,7 +379,7 @@ void AudRenU::GetAudioRendererWorkBufferSize(Kernel::HLERequestContext& ctx) {
};
// Calculates the portion of the size related to the mix data (and the sorting thereof).
const auto calculate_mix_info_size = [this](const AudioCore::AudioRendererParameter& params) {
const auto calculate_mix_info_size = [](const AudioCore::AudioRendererParameter& params) {
// The size of the mixing info data structure.
constexpr u64 mix_info_size = 0x940;
@@ -400,7 +451,7 @@ void AudRenU::GetAudioRendererWorkBufferSize(Kernel::HLERequestContext& ctx) {
// Calculates the part of the size related to the splitter context.
const auto calculate_splitter_context_size =
[this](const AudioCore::AudioRendererParameter& params) -> u64 {
[](const AudioCore::AudioRendererParameter& params) -> u64 {
if (!IsFeatureSupported(AudioFeatures::Splitter, params.revision)) {
return 0;
}
@@ -447,7 +498,7 @@ void AudRenU::GetAudioRendererWorkBufferSize(Kernel::HLERequestContext& ctx) {
};
// Calculates the part of the size related to performance statistics.
const auto calculate_perf_size = [this](const AudioCore::AudioRendererParameter& params) {
const auto calculate_perf_size = [](const AudioCore::AudioRendererParameter& params) {
// Extra size value appended to the end of the calculation.
constexpr u64 appended = 128;
@@ -474,78 +525,76 @@ void AudRenU::GetAudioRendererWorkBufferSize(Kernel::HLERequestContext& ctx) {
};
// Calculates the part of the size that relates to the audio command buffer.
const auto calculate_command_buffer_size =
[this](const AudioCore::AudioRendererParameter& params) {
constexpr u64 alignment = (buffer_alignment_size - 1) * 2;
const auto calculate_command_buffer_size = [](const AudioCore::AudioRendererParameter& params) {
constexpr u64 alignment = (buffer_alignment_size - 1) * 2;
if (!IsFeatureSupported(AudioFeatures::VariadicCommandBuffer, params.revision)) {
constexpr u64 command_buffer_size = 0x18000;
if (!IsFeatureSupported(AudioFeatures::VariadicCommandBuffer, params.revision)) {
constexpr u64 command_buffer_size = 0x18000;
return command_buffer_size + alignment;
}
return command_buffer_size + alignment;
}
// When the variadic command buffer is supported, this means
// the command generator for the audio renderer can issue commands
// that are (as one would expect), variable in size. So what we need to do
// is determine the maximum possible size for a few command data structures
// then multiply them by the amount of present commands indicated by the given
// respective audio parameters.
// When the variadic command buffer is supported, this means
// the command generator for the audio renderer can issue commands
// that are (as one would expect), variable in size. So what we need to do
// is determine the maximum possible size for a few command data structures
// then multiply them by the amount of present commands indicated by the given
// respective audio parameters.
constexpr u64 max_biquad_filters = 2;
constexpr u64 max_mix_buffers = 24;
constexpr u64 max_biquad_filters = 2;
constexpr u64 max_mix_buffers = 24;
constexpr u64 biquad_filter_command_size = 0x2C;
constexpr u64 biquad_filter_command_size = 0x2C;
constexpr u64 depop_mix_command_size = 0x24;
constexpr u64 depop_setup_command_size = 0x50;
constexpr u64 depop_mix_command_size = 0x24;
constexpr u64 depop_setup_command_size = 0x50;
constexpr u64 effect_command_max_size = 0x540;
constexpr u64 effect_command_max_size = 0x540;
constexpr u64 mix_command_size = 0x1C;
constexpr u64 mix_ramp_command_size = 0x24;
constexpr u64 mix_ramp_grouped_command_size = 0x13C;
constexpr u64 mix_command_size = 0x1C;
constexpr u64 mix_ramp_command_size = 0x24;
constexpr u64 mix_ramp_grouped_command_size = 0x13C;
constexpr u64 perf_command_size = 0x28;
constexpr u64 perf_command_size = 0x28;
constexpr u64 sink_command_size = 0x130;
constexpr u64 sink_command_size = 0x130;
constexpr u64 submix_command_max_size =
depop_mix_command_size + (mix_command_size * max_mix_buffers) * max_mix_buffers;
constexpr u64 submix_command_max_size =
depop_mix_command_size + (mix_command_size * max_mix_buffers) * max_mix_buffers;
constexpr u64 volume_command_size = 0x1C;
constexpr u64 volume_ramp_command_size = 0x20;
constexpr u64 volume_command_size = 0x1C;
constexpr u64 volume_ramp_command_size = 0x20;
constexpr u64 voice_biquad_filter_command_size =
biquad_filter_command_size * max_biquad_filters;
constexpr u64 voice_data_command_size = 0x9C;
const u64 voice_command_max_size =
(params.splitter_count * depop_setup_command_size) +
(voice_data_command_size + voice_biquad_filter_command_size +
volume_ramp_command_size + mix_ramp_grouped_command_size);
constexpr u64 voice_biquad_filter_command_size =
biquad_filter_command_size * max_biquad_filters;
constexpr u64 voice_data_command_size = 0x9C;
const u64 voice_command_max_size =
(params.splitter_count * depop_setup_command_size) +
(voice_data_command_size + voice_biquad_filter_command_size + volume_ramp_command_size +
mix_ramp_grouped_command_size);
// Now calculate the individual elements that comprise the size and add them together.
const u64 effect_commands_size = params.effect_count * effect_command_max_size;
// Now calculate the individual elements that comprise the size and add them together.
const u64 effect_commands_size = params.effect_count * effect_command_max_size;
const u64 final_mix_commands_size =
depop_mix_command_size + volume_command_size * max_mix_buffers;
const u64 final_mix_commands_size =
depop_mix_command_size + volume_command_size * max_mix_buffers;
const u64 perf_commands_size =
perf_command_size *
(CalculateNumPerformanceEntries(params) + max_perf_detail_entries);
const u64 perf_commands_size =
perf_command_size * (CalculateNumPerformanceEntries(params) + max_perf_detail_entries);
const u64 sink_commands_size = params.sink_count * sink_command_size;
const u64 sink_commands_size = params.sink_count * sink_command_size;
const u64 splitter_commands_size =
params.num_splitter_send_channels * max_mix_buffers * mix_ramp_command_size;
const u64 splitter_commands_size =
params.num_splitter_send_channels * max_mix_buffers * mix_ramp_command_size;
const u64 submix_commands_size = params.submix_count * submix_command_max_size;
const u64 submix_commands_size = params.submix_count * submix_command_max_size;
const u64 voice_commands_size = params.voice_count * voice_command_max_size;
const u64 voice_commands_size = params.voice_count * voice_command_max_size;
return effect_commands_size + final_mix_commands_size + perf_commands_size +
sink_commands_size + splitter_commands_size + submix_commands_size +
voice_commands_size + alignment;
};
return effect_commands_size + final_mix_commands_size + perf_commands_size +
sink_commands_size + splitter_commands_size + submix_commands_size +
voice_commands_size + alignment;
};
IPC::RequestParser rp{ctx};
const auto params = rp.PopRaw<AudioCore::AudioRendererParameter>();
@@ -578,12 +627,16 @@ void AudRenU::GetAudioRendererWorkBufferSize(Kernel::HLERequestContext& ctx) {
}
void AudRenU::GetAudioDeviceService(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Service_Audio, "called");
IPC::RequestParser rp{ctx};
const u64 aruid = rp.Pop<u64>();
LOG_DEBUG(Service_Audio, "called. aruid={:016X}", aruid);
// Revisionless variant of GetAudioDeviceServiceWithRevisionInfo that
// always assumes the initial release revision (REV1).
IPC::ResponseBuilder rb{ctx, 2, 0, 1};
rb.Push(RESULT_SUCCESS);
rb.PushIpcInterface<Audio::IAudioDevice>();
rb.PushIpcInterface<IAudioDevice>(system, Common::MakeMagic('R', 'E', 'V', '1'));
}
void AudRenU::OpenAudioRendererAuto(Kernel::HLERequestContext& ctx) {
@@ -593,13 +646,19 @@ void AudRenU::OpenAudioRendererAuto(Kernel::HLERequestContext& ctx) {
}
void AudRenU::GetAudioDeviceServiceWithRevisionInfo(Kernel::HLERequestContext& ctx) {
LOG_WARNING(Service_Audio, "(STUBBED) called");
struct Parameters {
u32 revision;
u64 aruid;
};
IPC::RequestParser rp{ctx};
const auto [revision, aruid] = rp.PopRaw<Parameters>();
LOG_DEBUG(Service_Audio, "called. revision={:08X}, aruid={:016X}", revision, aruid);
IPC::ResponseBuilder rb{ctx, 2, 0, 1};
rb.Push(RESULT_SUCCESS);
rb.PushIpcInterface<Audio::IAudioDevice>(); // TODO(ogniK): Figure out what is different
// based on the current revision
rb.PushIpcInterface<IAudioDevice>(system, revision);
}
void AudRenU::OpenAudioRendererImpl(Kernel::HLERequestContext& ctx) {
@@ -608,14 +667,16 @@ void AudRenU::OpenAudioRendererImpl(Kernel::HLERequestContext& ctx) {
IPC::ResponseBuilder rb{ctx, 2, 0, 1};
rb.Push(RESULT_SUCCESS);
rb.PushIpcInterface<IAudioRenderer>(params, audren_instance_count++);
rb.PushIpcInterface<IAudioRenderer>(system, params, audren_instance_count++);
}
bool AudRenU::IsFeatureSupported(AudioFeatures feature, u32_le revision) const {
bool IsFeatureSupported(AudioFeatures feature, u32_le revision) {
// Byte swap
const u32_be version_num = revision - Common::MakeMagic('R', 'E', 'V', '0');
switch (feature) {
case AudioFeatures::AudioUSBDeviceOutput:
return version_num >= 4U;
case AudioFeatures::Splitter:
return version_num >= 2U;
case AudioFeatures::PerformanceMetricsVersion2:

View File

@@ -6,6 +6,10 @@
#include "core/hle/service/service.h"
namespace Core {
class System;
}
namespace Kernel {
class HLERequestContext;
}
@@ -14,7 +18,7 @@ namespace Service::Audio {
class AudRenU final : public ServiceFramework<AudRenU> {
public:
explicit AudRenU();
explicit AudRenU(Core::System& system_);
~AudRenU() override;
private:
@@ -26,14 +30,19 @@ private:
void OpenAudioRendererImpl(Kernel::HLERequestContext& ctx);
enum class AudioFeatures : u32 {
Splitter,
PerformanceMetricsVersion2,
VariadicCommandBuffer,
};
bool IsFeatureSupported(AudioFeatures feature, u32_le revision) const;
std::size_t audren_instance_count = 0;
Core::System& system;
};
// Describes a particular audio feature that may be supported in a particular revision.
enum class AudioFeatures : u32 {
AudioUSBDeviceOutput,
Splitter,
PerformanceMetricsVersion2,
VariadicCommandBuffer,
};
// Tests if a particular audio feature is supported with a given audio revision.
bool IsFeatureSupported(AudioFeatures feature, u32_le revision);
} // namespace Service::Audio

View File

@@ -77,7 +77,7 @@ enum class LoadState : u32 {
Done = 1,
};
static void DecryptSharedFont(const std::vector<u32>& input, std::vector<u8>& output,
static void DecryptSharedFont(const std::vector<u32>& input, Kernel::PhysicalMemory& output,
std::size_t& offset) {
ASSERT_MSG(offset + (input.size() * sizeof(u32)) < SHARED_FONT_MEM_SIZE,
"Shared fonts exceeds 17mb!");
@@ -94,7 +94,7 @@ static void DecryptSharedFont(const std::vector<u32>& input, std::vector<u8>& ou
offset += transformed_font.size() * sizeof(u32);
}
static void EncryptSharedFont(const std::vector<u8>& input, std::vector<u8>& output,
static void EncryptSharedFont(const std::vector<u8>& input, Kernel::PhysicalMemory& output,
std::size_t& offset) {
ASSERT_MSG(offset + input.size() + 8 < SHARED_FONT_MEM_SIZE, "Shared fonts exceeds 17mb!");
const u32 KEY = EXPECTED_MAGIC ^ EXPECTED_RESULT;
@@ -121,7 +121,7 @@ struct PL_U::Impl {
return shared_font_regions.at(index);
}
void BuildSharedFontsRawRegions(const std::vector<u8>& input) {
void BuildSharedFontsRawRegions(const Kernel::PhysicalMemory& input) {
// As we can derive the xor key we can just populate the offsets
// based on the shared memory dump
unsigned cur_offset = 0;
@@ -144,7 +144,7 @@ struct PL_U::Impl {
Kernel::SharedPtr<Kernel::SharedMemory> shared_font_mem;
/// Backing memory for the shared font data
std::shared_ptr<std::vector<u8>> shared_font;
std::shared_ptr<Kernel::PhysicalMemory> shared_font;
// Automatically populated based on shared_fonts dump or system archives.
std::vector<FontRegion> shared_font_regions;
@@ -166,7 +166,7 @@ PL_U::PL_U() : ServiceFramework("pl:u"), impl{std::make_unique<Impl>()} {
// Rebuild shared fonts from data ncas
if (nand->HasEntry(static_cast<u64>(FontArchives::Standard),
FileSys::ContentRecordType::Data)) {
impl->shared_font = std::make_shared<std::vector<u8>>(SHARED_FONT_MEM_SIZE);
impl->shared_font = std::make_shared<Kernel::PhysicalMemory>(SHARED_FONT_MEM_SIZE);
for (auto font : SHARED_FONTS) {
const auto nca =
nand->GetEntry(static_cast<u64>(font.first), FileSys::ContentRecordType::Data);
@@ -207,7 +207,7 @@ PL_U::PL_U() : ServiceFramework("pl:u"), impl{std::make_unique<Impl>()} {
}
} else {
impl->shared_font = std::make_shared<std::vector<u8>>(
impl->shared_font = std::make_shared<Kernel::PhysicalMemory>(
SHARED_FONT_MEM_SIZE); // Shared memory needs to always be allocated and a fixed size
const std::string user_path = FileUtil::GetUserPath(FileUtil::UserPath::SysDataDir);

View File

@@ -8,6 +8,11 @@
#include "common/bit_field.h"
#include "common/common_types.h"
#include "common/swap.h"
#include "core/hle/service/nvdrv/nvdata.h"
namespace Core {
class System;
}
namespace Service::Nvidia::Devices {
@@ -15,7 +20,7 @@ namespace Service::Nvidia::Devices {
/// implement the ioctl interface.
class nvdevice {
public:
nvdevice() = default;
explicit nvdevice(Core::System& system) : system{system} {};
virtual ~nvdevice() = default;
union Ioctl {
u32_le raw;
@@ -33,7 +38,11 @@ public:
* @param output A buffer where the output data will be written to.
* @returns The result code of the ioctl.
*/
virtual u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) = 0;
virtual u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) = 0;
protected:
Core::System& system;
};
} // namespace Service::Nvidia::Devices

View File

@@ -13,10 +13,12 @@
namespace Service::Nvidia::Devices {
nvdisp_disp0::nvdisp_disp0(std::shared_ptr<nvmap> nvmap_dev) : nvmap_dev(std::move(nvmap_dev)) {}
nvdisp_disp0::nvdisp_disp0(Core::System& system, std::shared_ptr<nvmap> nvmap_dev)
: nvdevice(system), nvmap_dev(std::move(nvmap_dev)) {}
nvdisp_disp0 ::~nvdisp_disp0() = default;
u32 nvdisp_disp0::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) {
u32 nvdisp_disp0::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) {
UNIMPLEMENTED_MSG("Unimplemented ioctl");
return 0;
}
@@ -34,9 +36,8 @@ void nvdisp_disp0::flip(u32 buffer_handle, u32 offset, u32 format, u32 width, u3
addr, offset, width, height, stride, static_cast<PixelFormat>(format),
transform, crop_rect};
auto& instance = Core::System::GetInstance();
instance.GetPerfStats().EndGameFrame();
instance.GPU().SwapBuffers(framebuffer);
system.GetPerfStats().EndGameFrame();
system.GPU().SwapBuffers(framebuffer);
}
} // namespace Service::Nvidia::Devices

View File

@@ -17,10 +17,11 @@ class nvmap;
class nvdisp_disp0 final : public nvdevice {
public:
explicit nvdisp_disp0(std::shared_ptr<nvmap> nvmap_dev);
explicit nvdisp_disp0(Core::System& system, std::shared_ptr<nvmap> nvmap_dev);
~nvdisp_disp0() override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) override;
/// Performs a screen flip, drawing the buffer pointed to by the handle.
void flip(u32 buffer_handle, u32 offset, u32 format, u32 width, u32 height, u32 stride,

View File

@@ -22,10 +22,12 @@ enum {
};
}
nvhost_as_gpu::nvhost_as_gpu(std::shared_ptr<nvmap> nvmap_dev) : nvmap_dev(std::move(nvmap_dev)) {}
nvhost_as_gpu::nvhost_as_gpu(Core::System& system, std::shared_ptr<nvmap> nvmap_dev)
: nvdevice(system), nvmap_dev(std::move(nvmap_dev)) {}
nvhost_as_gpu::~nvhost_as_gpu() = default;
u32 nvhost_as_gpu::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) {
u32 nvhost_as_gpu::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) {
LOG_DEBUG(Service_NVDRV, "called, command=0x{:08X}, input_size=0x{:X}, output_size=0x{:X}",
command.raw, input.size(), output.size());
@@ -65,7 +67,7 @@ u32 nvhost_as_gpu::AllocateSpace(const std::vector<u8>& input, std::vector<u8>&
LOG_DEBUG(Service_NVDRV, "called, pages={:X}, page_size={:X}, flags={:X}", params.pages,
params.page_size, params.flags);
auto& gpu = Core::System::GetInstance().GPU();
auto& gpu = system.GPU();
const u64 size{static_cast<u64>(params.pages) * static_cast<u64>(params.page_size)};
if (params.flags & 1) {
params.offset = gpu.MemoryManager().AllocateSpace(params.offset, size, 1);
@@ -85,7 +87,7 @@ u32 nvhost_as_gpu::Remap(const std::vector<u8>& input, std::vector<u8>& output)
std::vector<IoctlRemapEntry> entries(num_entries);
std::memcpy(entries.data(), input.data(), input.size());
auto& gpu = Core::System::GetInstance().GPU();
auto& gpu = system.GPU();
for (const auto& entry : entries) {
LOG_WARNING(Service_NVDRV, "remap entry, offset=0x{:X} handle=0x{:X} pages=0x{:X}",
entry.offset, entry.nvmap_handle, entry.pages);
@@ -136,7 +138,7 @@ u32 nvhost_as_gpu::MapBufferEx(const std::vector<u8>& input, std::vector<u8>& ou
// case to prevent unexpected behavior.
ASSERT(object->id == params.nvmap_handle);
auto& gpu = Core::System::GetInstance().GPU();
auto& gpu = system.GPU();
if (params.flags & 1) {
params.offset = gpu.MemoryManager().MapBufferEx(object->addr, params.offset, object->size);
@@ -173,8 +175,7 @@ u32 nvhost_as_gpu::UnmapBuffer(const std::vector<u8>& input, std::vector<u8>& ou
return 0;
}
params.offset = Core::System::GetInstance().GPU().MemoryManager().UnmapBuffer(params.offset,
itr->second.size);
params.offset = system.GPU().MemoryManager().UnmapBuffer(params.offset, itr->second.size);
buffer_mappings.erase(itr->second.offset);
std::memcpy(output.data(), &params, output.size());

View File

@@ -17,10 +17,11 @@ class nvmap;
class nvhost_as_gpu final : public nvdevice {
public:
explicit nvhost_as_gpu(std::shared_ptr<nvmap> nvmap_dev);
explicit nvhost_as_gpu(Core::System& system, std::shared_ptr<nvmap> nvmap_dev);
~nvhost_as_gpu() override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) override;
private:
enum class IoctlCommand : u32_le {

View File

@@ -7,14 +7,20 @@
#include "common/assert.h"
#include "common/logging/log.h"
#include "core/core.h"
#include "core/hle/kernel/readable_event.h"
#include "core/hle/kernel/writable_event.h"
#include "core/hle/service/nvdrv/devices/nvhost_ctrl.h"
#include "video_core/gpu.h"
namespace Service::Nvidia::Devices {
nvhost_ctrl::nvhost_ctrl() = default;
nvhost_ctrl::nvhost_ctrl(Core::System& system, EventInterface& events_interface)
: nvdevice(system), events_interface{events_interface} {}
nvhost_ctrl::~nvhost_ctrl() = default;
u32 nvhost_ctrl::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) {
u32 nvhost_ctrl::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) {
LOG_DEBUG(Service_NVDRV, "called, command=0x{:08X}, input_size=0x{:X}, output_size=0x{:X}",
command.raw, input.size(), output.size());
@@ -22,11 +28,15 @@ u32 nvhost_ctrl::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<
case IoctlCommand::IocGetConfigCommand:
return NvOsGetConfigU32(input, output);
case IoctlCommand::IocCtrlEventWaitCommand:
return IocCtrlEventWait(input, output, false);
return IocCtrlEventWait(input, output, false, ctrl);
case IoctlCommand::IocCtrlEventWaitAsyncCommand:
return IocCtrlEventWait(input, output, true);
return IocCtrlEventWait(input, output, true, ctrl);
case IoctlCommand::IocCtrlEventRegisterCommand:
return IocCtrlEventRegister(input, output);
case IoctlCommand::IocCtrlEventUnregisterCommand:
return IocCtrlEventUnregister(input, output);
case IoctlCommand::IocCtrlEventSignalCommand:
return IocCtrlEventSignal(input, output);
}
UNIMPLEMENTED_MSG("Unimplemented ioctl");
return 0;
@@ -41,23 +51,137 @@ u32 nvhost_ctrl::NvOsGetConfigU32(const std::vector<u8>& input, std::vector<u8>&
}
u32 nvhost_ctrl::IocCtrlEventWait(const std::vector<u8>& input, std::vector<u8>& output,
bool is_async) {
bool is_async, IoctlCtrl& ctrl) {
IocCtrlEventWaitParams params{};
std::memcpy(&params, input.data(), sizeof(params));
LOG_WARNING(Service_NVDRV,
"(STUBBED) called, syncpt_id={}, threshold={}, timeout={}, is_async={}",
params.syncpt_id, params.threshold, params.timeout, is_async);
LOG_DEBUG(Service_NVDRV, "syncpt_id={}, threshold={}, timeout={}, is_async={}",
params.syncpt_id, params.threshold, params.timeout, is_async);
// TODO(Subv): Implement actual syncpt waiting.
params.value = 0;
if (params.syncpt_id >= MaxSyncPoints) {
return NvResult::BadParameter;
}
auto& gpu = system.GPU();
// This is mostly to take into account unimplemented features. As synced
// gpu is always synced.
if (!gpu.IsAsync()) {
return NvResult::Success;
}
auto lock = gpu.LockSync();
const u32 current_syncpoint_value = gpu.GetSyncpointValue(params.syncpt_id);
const s32 diff = current_syncpoint_value - params.threshold;
if (diff >= 0) {
params.value = current_syncpoint_value;
std::memcpy(output.data(), &params, sizeof(params));
return NvResult::Success;
}
const u32 target_value = current_syncpoint_value - diff;
if (!is_async) {
params.value = 0;
}
if (params.timeout == 0) {
std::memcpy(output.data(), &params, sizeof(params));
return NvResult::Timeout;
}
u32 event_id;
if (is_async) {
event_id = params.value & 0x00FF;
if (event_id >= MaxNvEvents) {
std::memcpy(output.data(), &params, sizeof(params));
return NvResult::BadParameter;
}
} else {
if (ctrl.fresh_call) {
const auto result = events_interface.GetFreeEvent();
if (result) {
event_id = *result;
} else {
LOG_CRITICAL(Service_NVDRV, "No Free Events available!");
event_id = params.value & 0x00FF;
}
} else {
event_id = ctrl.event_id;
}
}
EventState status = events_interface.status[event_id];
if (event_id < MaxNvEvents || status == EventState::Free || status == EventState::Registered) {
events_interface.SetEventStatus(event_id, EventState::Waiting);
events_interface.assigned_syncpt[event_id] = params.syncpt_id;
events_interface.assigned_value[event_id] = target_value;
if (is_async) {
params.value = params.syncpt_id << 4;
} else {
params.value = ((params.syncpt_id & 0xfff) << 16) | 0x10000000;
}
params.value |= event_id;
events_interface.events[event_id].writable->Clear();
gpu.RegisterSyncptInterrupt(params.syncpt_id, target_value);
if (!is_async && ctrl.fresh_call) {
ctrl.must_delay = true;
ctrl.timeout = params.timeout;
ctrl.event_id = event_id;
return NvResult::Timeout;
}
std::memcpy(output.data(), &params, sizeof(params));
return NvResult::Timeout;
}
std::memcpy(output.data(), &params, sizeof(params));
return 0;
return NvResult::BadParameter;
}
u32 nvhost_ctrl::IocCtrlEventRegister(const std::vector<u8>& input, std::vector<u8>& output) {
LOG_WARNING(Service_NVDRV, "(STUBBED) called");
// TODO(bunnei): Implement this.
return 0;
IocCtrlEventRegisterParams params{};
std::memcpy(&params, input.data(), sizeof(params));
const u32 event_id = params.user_event_id & 0x00FF;
LOG_DEBUG(Service_NVDRV, " called, user_event_id: {:X}", event_id);
if (event_id >= MaxNvEvents) {
return NvResult::BadParameter;
}
if (events_interface.registered[event_id]) {
return NvResult::BadParameter;
}
events_interface.RegisterEvent(event_id);
return NvResult::Success;
}
u32 nvhost_ctrl::IocCtrlEventUnregister(const std::vector<u8>& input, std::vector<u8>& output) {
IocCtrlEventUnregisterParams params{};
std::memcpy(&params, input.data(), sizeof(params));
const u32 event_id = params.user_event_id & 0x00FF;
LOG_DEBUG(Service_NVDRV, " called, user_event_id: {:X}", event_id);
if (event_id >= MaxNvEvents) {
return NvResult::BadParameter;
}
if (!events_interface.registered[event_id]) {
return NvResult::BadParameter;
}
events_interface.UnregisterEvent(event_id);
return NvResult::Success;
}
u32 nvhost_ctrl::IocCtrlEventSignal(const std::vector<u8>& input, std::vector<u8>& output) {
IocCtrlEventSignalParams params{};
std::memcpy(&params, input.data(), sizeof(params));
// TODO(Blinkhawk): This is normally called when an NvEvents timeout on WaitSynchronization
// It is believed from RE to cancel the GPU Event. However, better research is required
u32 event_id = params.user_event_id & 0x00FF;
LOG_WARNING(Service_NVDRV, "(STUBBED) called, user_event_id: {:X}", event_id);
if (event_id >= MaxNvEvents) {
return NvResult::BadParameter;
}
if (events_interface.status[event_id] == EventState::Waiting) {
auto& gpu = system.GPU();
if (gpu.CancelSyncptInterrupt(events_interface.assigned_syncpt[event_id],
events_interface.assigned_value[event_id])) {
events_interface.LiberateEvent(event_id);
events_interface.events[event_id].writable->Signal();
}
}
return NvResult::Success;
}
} // namespace Service::Nvidia::Devices

View File

@@ -8,15 +8,17 @@
#include <vector>
#include "common/common_types.h"
#include "core/hle/service/nvdrv/devices/nvdevice.h"
#include "core/hle/service/nvdrv/nvdrv.h"
namespace Service::Nvidia::Devices {
class nvhost_ctrl final : public nvdevice {
public:
nvhost_ctrl();
explicit nvhost_ctrl(Core::System& system, EventInterface& events_interface);
~nvhost_ctrl() override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) override;
private:
enum class IoctlCommand : u32_le {
@@ -132,9 +134,16 @@ private:
u32 NvOsGetConfigU32(const std::vector<u8>& input, std::vector<u8>& output);
u32 IocCtrlEventWait(const std::vector<u8>& input, std::vector<u8>& output, bool is_async);
u32 IocCtrlEventWait(const std::vector<u8>& input, std::vector<u8>& output, bool is_async,
IoctlCtrl& ctrl);
u32 IocCtrlEventRegister(const std::vector<u8>& input, std::vector<u8>& output);
u32 IocCtrlEventUnregister(const std::vector<u8>& input, std::vector<u8>& output);
u32 IocCtrlEventSignal(const std::vector<u8>& input, std::vector<u8>& output);
EventInterface& events_interface;
};
} // namespace Service::Nvidia::Devices

View File

@@ -12,10 +12,11 @@
namespace Service::Nvidia::Devices {
nvhost_ctrl_gpu::nvhost_ctrl_gpu() = default;
nvhost_ctrl_gpu::nvhost_ctrl_gpu(Core::System& system) : nvdevice(system) {}
nvhost_ctrl_gpu::~nvhost_ctrl_gpu() = default;
u32 nvhost_ctrl_gpu::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) {
u32 nvhost_ctrl_gpu::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) {
LOG_DEBUG(Service_NVDRV, "called, command=0x{:08X}, input_size=0x{:X}, output_size=0x{:X}",
command.raw, input.size(), output.size());
@@ -185,7 +186,7 @@ u32 nvhost_ctrl_gpu::GetGpuTime(const std::vector<u8>& input, std::vector<u8>& o
IoctlGetGpuTime params{};
std::memcpy(&params, input.data(), input.size());
const auto ns = Core::Timing::CyclesToNs(Core::System::GetInstance().CoreTiming().GetTicks());
const auto ns = Core::Timing::CyclesToNs(system.CoreTiming().GetTicks());
params.gpu_time = static_cast<u64_le>(ns.count());
std::memcpy(output.data(), &params, output.size());
return 0;

View File

@@ -13,10 +13,11 @@ namespace Service::Nvidia::Devices {
class nvhost_ctrl_gpu final : public nvdevice {
public:
nvhost_ctrl_gpu();
explicit nvhost_ctrl_gpu(Core::System& system);
~nvhost_ctrl_gpu() override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) override;
private:
enum class IoctlCommand : u32_le {

View File

@@ -13,10 +13,12 @@
namespace Service::Nvidia::Devices {
nvhost_gpu::nvhost_gpu(std::shared_ptr<nvmap> nvmap_dev) : nvmap_dev(std::move(nvmap_dev)) {}
nvhost_gpu::nvhost_gpu(Core::System& system, std::shared_ptr<nvmap> nvmap_dev)
: nvdevice(system), nvmap_dev(std::move(nvmap_dev)) {}
nvhost_gpu::~nvhost_gpu() = default;
u32 nvhost_gpu::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) {
u32 nvhost_gpu::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) {
LOG_DEBUG(Service_NVDRV, "called, command=0x{:08X}, input_size=0x{:X}, output_size=0x{:X}",
command.raw, input.size(), output.size());
@@ -119,8 +121,10 @@ u32 nvhost_gpu::AllocGPFIFOEx2(const std::vector<u8>& input, std::vector<u8>& ou
params.num_entries, params.flags, params.unk0, params.unk1, params.unk2,
params.unk3);
params.fence_out.id = 0;
params.fence_out.value = 0;
auto& gpu = system.GPU();
params.fence_out.id = assigned_syncpoints;
params.fence_out.value = gpu.GetSyncpointValue(assigned_syncpoints);
assigned_syncpoints++;
std::memcpy(output.data(), &params, output.size());
return 0;
}
@@ -143,7 +147,7 @@ u32 nvhost_gpu::SubmitGPFIFO(const std::vector<u8>& input, std::vector<u8>& outp
IoctlSubmitGpfifo params{};
std::memcpy(&params, input.data(), sizeof(IoctlSubmitGpfifo));
LOG_WARNING(Service_NVDRV, "(STUBBED) called, gpfifo={:X}, num_entries={:X}, flags={:X}",
params.address, params.num_entries, params.flags);
params.address, params.num_entries, params.flags.raw);
ASSERT_MSG(input.size() == sizeof(IoctlSubmitGpfifo) +
params.num_entries * sizeof(Tegra::CommandListHeader),
@@ -153,10 +157,18 @@ u32 nvhost_gpu::SubmitGPFIFO(const std::vector<u8>& input, std::vector<u8>& outp
std::memcpy(entries.data(), &input[sizeof(IoctlSubmitGpfifo)],
params.num_entries * sizeof(Tegra::CommandListHeader));
Core::System::GetInstance().GPU().PushGPUEntries(std::move(entries));
UNIMPLEMENTED_IF(params.flags.add_wait.Value() != 0);
UNIMPLEMENTED_IF(params.flags.add_increment.Value() != 0);
auto& gpu = system.GPU();
u32 current_syncpoint_value = gpu.GetSyncpointValue(params.fence_out.id);
if (params.flags.increment.Value()) {
params.fence_out.value += current_syncpoint_value;
} else {
params.fence_out.value = current_syncpoint_value;
}
gpu.PushGPUEntries(std::move(entries));
params.fence_out.id = 0;
params.fence_out.value = 0;
std::memcpy(output.data(), &params, sizeof(IoctlSubmitGpfifo));
return 0;
}
@@ -168,16 +180,24 @@ u32 nvhost_gpu::KickoffPB(const std::vector<u8>& input, std::vector<u8>& output)
IoctlSubmitGpfifo params{};
std::memcpy(&params, input.data(), sizeof(IoctlSubmitGpfifo));
LOG_WARNING(Service_NVDRV, "(STUBBED) called, gpfifo={:X}, num_entries={:X}, flags={:X}",
params.address, params.num_entries, params.flags);
params.address, params.num_entries, params.flags.raw);
Tegra::CommandList entries(params.num_entries);
Memory::ReadBlock(params.address, entries.data(),
params.num_entries * sizeof(Tegra::CommandListHeader));
Core::System::GetInstance().GPU().PushGPUEntries(std::move(entries));
UNIMPLEMENTED_IF(params.flags.add_wait.Value() != 0);
UNIMPLEMENTED_IF(params.flags.add_increment.Value() != 0);
auto& gpu = system.GPU();
u32 current_syncpoint_value = gpu.GetSyncpointValue(params.fence_out.id);
if (params.flags.increment.Value()) {
params.fence_out.value += current_syncpoint_value;
} else {
params.fence_out.value = current_syncpoint_value;
}
gpu.PushGPUEntries(std::move(entries));
params.fence_out.id = 0;
params.fence_out.value = 0;
std::memcpy(output.data(), &params, output.size());
return 0;
}

View File

@@ -10,6 +10,7 @@
#include "common/common_types.h"
#include "common/swap.h"
#include "core/hle/service/nvdrv/devices/nvdevice.h"
#include "core/hle/service/nvdrv/nvdata.h"
namespace Service::Nvidia::Devices {
@@ -20,10 +21,11 @@ constexpr u32 NVGPU_IOCTL_CHANNEL_KICKOFF_PB(0x1b);
class nvhost_gpu final : public nvdevice {
public:
explicit nvhost_gpu(std::shared_ptr<nvmap> nvmap_dev);
explicit nvhost_gpu(Core::System& system, std::shared_ptr<nvmap> nvmap_dev);
~nvhost_gpu() override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) override;
private:
enum class IoctlCommand : u32_le {
@@ -113,11 +115,7 @@ private:
static_assert(sizeof(IoctlGetErrorNotification) == 16,
"IoctlGetErrorNotification is incorrect size");
struct IoctlFence {
u32_le id;
u32_le value;
};
static_assert(sizeof(IoctlFence) == 8, "IoctlFence is incorrect size");
static_assert(sizeof(Fence) == 8, "Fence is incorrect size");
struct IoctlAllocGpfifoEx {
u32_le num_entries;
@@ -132,13 +130,13 @@ private:
static_assert(sizeof(IoctlAllocGpfifoEx) == 32, "IoctlAllocGpfifoEx is incorrect size");
struct IoctlAllocGpfifoEx2 {
u32_le num_entries; // in
u32_le flags; // in
u32_le unk0; // in (1 works)
IoctlFence fence_out; // out
u32_le unk1; // in
u32_le unk2; // in
u32_le unk3; // in
u32_le num_entries; // in
u32_le flags; // in
u32_le unk0; // in (1 works)
Fence fence_out; // out
u32_le unk1; // in
u32_le unk2; // in
u32_le unk3; // in
};
static_assert(sizeof(IoctlAllocGpfifoEx2) == 32, "IoctlAllocGpfifoEx2 is incorrect size");
@@ -153,10 +151,16 @@ private:
struct IoctlSubmitGpfifo {
u64_le address; // pointer to gpfifo entry structs
u32_le num_entries; // number of fence objects being submitted
u32_le flags;
IoctlFence fence_out; // returned new fence object for others to wait on
union {
u32_le raw;
BitField<0, 1, u32_le> add_wait; // append a wait sync_point to the list
BitField<1, 1, u32_le> add_increment; // append an increment to the list
BitField<2, 1, u32_le> new_hw_format; // Mostly ignored
BitField<8, 1, u32_le> increment; // increment the returned fence
} flags;
Fence fence_out; // returned new fence object for others to wait on
};
static_assert(sizeof(IoctlSubmitGpfifo) == 16 + sizeof(IoctlFence),
static_assert(sizeof(IoctlSubmitGpfifo) == 16 + sizeof(Fence),
"IoctlSubmitGpfifo is incorrect size");
struct IoctlGetWaitbase {
@@ -184,6 +188,7 @@ private:
u32 ChannelSetTimeout(const std::vector<u8>& input, std::vector<u8>& output);
std::shared_ptr<nvmap> nvmap_dev;
u32 assigned_syncpoints{};
};
} // namespace Service::Nvidia::Devices

View File

@@ -10,10 +10,11 @@
namespace Service::Nvidia::Devices {
nvhost_nvdec::nvhost_nvdec() = default;
nvhost_nvdec::nvhost_nvdec(Core::System& system) : nvdevice(system) {}
nvhost_nvdec::~nvhost_nvdec() = default;
u32 nvhost_nvdec::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) {
u32 nvhost_nvdec::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) {
LOG_DEBUG(Service_NVDRV, "called, command=0x{:08X}, input_size=0x{:X}, output_size=0x{:X}",
command.raw, input.size(), output.size());

View File

@@ -13,10 +13,11 @@ namespace Service::Nvidia::Devices {
class nvhost_nvdec final : public nvdevice {
public:
nvhost_nvdec();
explicit nvhost_nvdec(Core::System& system);
~nvhost_nvdec() override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) override;
private:
enum class IoctlCommand : u32_le {

View File

@@ -10,10 +10,11 @@
namespace Service::Nvidia::Devices {
nvhost_nvjpg::nvhost_nvjpg() = default;
nvhost_nvjpg::nvhost_nvjpg(Core::System& system) : nvdevice(system) {}
nvhost_nvjpg::~nvhost_nvjpg() = default;
u32 nvhost_nvjpg::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) {
u32 nvhost_nvjpg::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) {
LOG_DEBUG(Service_NVDRV, "called, command=0x{:08X}, input_size=0x{:X}, output_size=0x{:X}",
command.raw, input.size(), output.size());

View File

@@ -13,10 +13,11 @@ namespace Service::Nvidia::Devices {
class nvhost_nvjpg final : public nvdevice {
public:
nvhost_nvjpg();
explicit nvhost_nvjpg(Core::System& system);
~nvhost_nvjpg() override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) override;
private:
enum class IoctlCommand : u32_le {

View File

@@ -10,10 +10,11 @@
namespace Service::Nvidia::Devices {
nvhost_vic::nvhost_vic() = default;
nvhost_vic::nvhost_vic(Core::System& system) : nvdevice(system) {}
nvhost_vic::~nvhost_vic() = default;
u32 nvhost_vic::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) {
u32 nvhost_vic::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) {
LOG_DEBUG(Service_NVDRV, "called, command=0x{:08X}, input_size=0x{:X}, output_size=0x{:X}",
command.raw, input.size(), output.size());

View File

@@ -13,10 +13,11 @@ namespace Service::Nvidia::Devices {
class nvhost_vic final : public nvdevice {
public:
nvhost_vic();
explicit nvhost_vic(Core::System& system);
~nvhost_vic() override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) override;
private:
enum class IoctlCommand : u32_le {

View File

@@ -18,7 +18,7 @@ enum {
};
}
nvmap::nvmap() = default;
nvmap::nvmap(Core::System& system) : nvdevice(system) {}
nvmap::~nvmap() = default;
VAddr nvmap::GetObjectAddress(u32 handle) const {
@@ -28,7 +28,8 @@ VAddr nvmap::GetObjectAddress(u32 handle) const {
return object->addr;
}
u32 nvmap::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) {
u32 nvmap::ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) {
switch (static_cast<IoctlCommand>(command.raw)) {
case IoctlCommand::Create:
return IocCreate(input, output);

View File

@@ -16,13 +16,14 @@ namespace Service::Nvidia::Devices {
class nvmap final : public nvdevice {
public:
nvmap();
explicit nvmap(Core::System& system);
~nvmap() override;
/// Returns the allocated address of an nvmap object given its handle.
VAddr GetObjectAddress(u32 handle) const;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output) override;
u32 ioctl(Ioctl command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) override;
/// Represents an nvmap object.
struct Object {

View File

@@ -8,12 +8,18 @@
#include "core/hle/ipc_helpers.h"
#include "core/hle/kernel/kernel.h"
#include "core/hle/kernel/readable_event.h"
#include "core/hle/kernel/thread.h"
#include "core/hle/kernel/writable_event.h"
#include "core/hle/service/nvdrv/interface.h"
#include "core/hle/service/nvdrv/nvdata.h"
#include "core/hle/service/nvdrv/nvdrv.h"
namespace Service::Nvidia {
void NVDRV::SignalGPUInterruptSyncpt(const u32 syncpoint_id, const u32 value) {
nvdrv->SignalSyncpt(syncpoint_id, value);
}
void NVDRV::Open(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Service_NVDRV, "called");
@@ -36,11 +42,31 @@ void NVDRV::Ioctl(Kernel::HLERequestContext& ctx) {
std::vector<u8> output(ctx.GetWriteBufferSize());
IoctlCtrl ctrl{};
u32 result = nvdrv->Ioctl(fd, command, ctx.ReadBuffer(), output, ctrl);
if (ctrl.must_delay) {
ctrl.fresh_call = false;
ctx.SleepClientThread(
"NVServices::DelayedResponse", ctrl.timeout,
[=](Kernel::SharedPtr<Kernel::Thread> thread, Kernel::HLERequestContext& ctx,
Kernel::ThreadWakeupReason reason) {
IoctlCtrl ctrl2{ctrl};
std::vector<u8> output2 = output;
u32 result = nvdrv->Ioctl(fd, command, ctx.ReadBuffer(), output2, ctrl2);
ctx.WriteBuffer(output2);
IPC::ResponseBuilder rb{ctx, 3};
rb.Push(RESULT_SUCCESS);
rb.Push(result);
},
nvdrv->GetEventWriteable(ctrl.event_id));
} else {
ctx.WriteBuffer(output);
}
IPC::ResponseBuilder rb{ctx, 3};
rb.Push(RESULT_SUCCESS);
rb.Push(nvdrv->Ioctl(fd, command, ctx.ReadBuffer(), output));
ctx.WriteBuffer(output);
rb.Push(result);
}
void NVDRV::Close(Kernel::HLERequestContext& ctx) {
@@ -66,13 +92,19 @@ void NVDRV::Initialize(Kernel::HLERequestContext& ctx) {
void NVDRV::QueryEvent(Kernel::HLERequestContext& ctx) {
IPC::RequestParser rp{ctx};
u32 fd = rp.Pop<u32>();
u32 event_id = rp.Pop<u32>();
// TODO(Blinkhawk): Figure the meaning of the flag at bit 16
u32 event_id = rp.Pop<u32>() & 0x000000FF;
LOG_WARNING(Service_NVDRV, "(STUBBED) called, fd={:X}, event_id={:X}", fd, event_id);
IPC::ResponseBuilder rb{ctx, 3, 1};
rb.Push(RESULT_SUCCESS);
rb.PushCopyObjects(query_event.readable);
rb.Push<u32>(0);
if (event_id < MaxNvEvents) {
rb.PushCopyObjects(nvdrv->GetEvent(event_id));
rb.Push<u32>(NvResult::Success);
} else {
rb.Push<u32>(0);
rb.Push<u32>(NvResult::BadParameter);
}
}
void NVDRV::SetClientPID(Kernel::HLERequestContext& ctx) {
@@ -127,10 +159,6 @@ NVDRV::NVDRV(std::shared_ptr<Module> nvdrv, const char* name)
{13, &NVDRV::FinishInitialize, "FinishInitialize"},
};
RegisterHandlers(functions);
auto& kernel = Core::System::GetInstance().Kernel();
query_event = Kernel::WritableEvent::CreateEventPair(kernel, Kernel::ResetType::Automatic,
"NVDRV::query_event");
}
NVDRV::~NVDRV() = default;

View File

@@ -19,6 +19,8 @@ public:
NVDRV(std::shared_ptr<Module> nvdrv, const char* name);
~NVDRV() override;
void SignalGPUInterruptSyncpt(const u32 syncpoint_id, const u32 value);
private:
void Open(Kernel::HLERequestContext& ctx);
void Ioctl(Kernel::HLERequestContext& ctx);
@@ -33,8 +35,6 @@ private:
std::shared_ptr<Module> nvdrv;
u64 pid{};
Kernel::EventPair query_event;
};
} // namespace Service::Nvidia

View File

@@ -0,0 +1,48 @@
#pragma once
#include <array>
#include "common/common_types.h"
namespace Service::Nvidia {
constexpr u32 MaxSyncPoints = 192;
constexpr u32 MaxNvEvents = 64;
struct Fence {
s32 id;
u32 value;
};
static_assert(sizeof(Fence) == 8, "Fence has wrong size");
struct MultiFence {
u32 num_fences;
std::array<Fence, 4> fences;
};
enum NvResult : u32 {
Success = 0,
BadParameter = 4,
Timeout = 5,
ResourceError = 15,
};
enum class EventState {
Free = 0,
Registered = 1,
Waiting = 2,
Busy = 3,
};
struct IoctlCtrl {
// First call done to the servioce for services that call itself again after a call.
bool fresh_call{true};
// Tells the Ioctl Wrapper that it must delay the IPC response and send the thread to sleep
bool must_delay{};
// Timeout for the delay
s64 timeout{};
// NV Event Id
s32 event_id{-1};
};
} // namespace Service::Nvidia

View File

@@ -4,7 +4,10 @@
#include <utility>
#include <fmt/format.h>
#include "core/hle/ipc_helpers.h"
#include "core/hle/kernel/readable_event.h"
#include "core/hle/kernel/writable_event.h"
#include "core/hle/service/nvdrv/devices/nvdevice.h"
#include "core/hle/service/nvdrv/devices/nvdisp_disp0.h"
#include "core/hle/service/nvdrv/devices/nvhost_as_gpu.h"
@@ -22,8 +25,9 @@
namespace Service::Nvidia {
void InstallInterfaces(SM::ServiceManager& service_manager, NVFlinger::NVFlinger& nvflinger) {
auto module_ = std::make_shared<Module>();
void InstallInterfaces(SM::ServiceManager& service_manager, NVFlinger::NVFlinger& nvflinger,
Core::System& system) {
auto module_ = std::make_shared<Module>(system);
std::make_shared<NVDRV>(module_, "nvdrv")->InstallAsService(service_manager);
std::make_shared<NVDRV>(module_, "nvdrv:a")->InstallAsService(service_manager);
std::make_shared<NVDRV>(module_, "nvdrv:s")->InstallAsService(service_manager);
@@ -32,17 +36,25 @@ void InstallInterfaces(SM::ServiceManager& service_manager, NVFlinger::NVFlinger
nvflinger.SetNVDrvInstance(module_);
}
Module::Module() {
auto nvmap_dev = std::make_shared<Devices::nvmap>();
devices["/dev/nvhost-as-gpu"] = std::make_shared<Devices::nvhost_as_gpu>(nvmap_dev);
devices["/dev/nvhost-gpu"] = std::make_shared<Devices::nvhost_gpu>(nvmap_dev);
devices["/dev/nvhost-ctrl-gpu"] = std::make_shared<Devices::nvhost_ctrl_gpu>();
Module::Module(Core::System& system) {
auto& kernel = system.Kernel();
for (u32 i = 0; i < MaxNvEvents; i++) {
std::string event_label = fmt::format("NVDRV::NvEvent_{}", i);
events_interface.events[i] = Kernel::WritableEvent::CreateEventPair(
kernel, Kernel::ResetType::Automatic, event_label);
events_interface.status[i] = EventState::Free;
events_interface.registered[i] = false;
}
auto nvmap_dev = std::make_shared<Devices::nvmap>(system);
devices["/dev/nvhost-as-gpu"] = std::make_shared<Devices::nvhost_as_gpu>(system, nvmap_dev);
devices["/dev/nvhost-gpu"] = std::make_shared<Devices::nvhost_gpu>(system, nvmap_dev);
devices["/dev/nvhost-ctrl-gpu"] = std::make_shared<Devices::nvhost_ctrl_gpu>(system);
devices["/dev/nvmap"] = nvmap_dev;
devices["/dev/nvdisp_disp0"] = std::make_shared<Devices::nvdisp_disp0>(nvmap_dev);
devices["/dev/nvhost-ctrl"] = std::make_shared<Devices::nvhost_ctrl>();
devices["/dev/nvhost-nvdec"] = std::make_shared<Devices::nvhost_nvdec>();
devices["/dev/nvhost-nvjpg"] = std::make_shared<Devices::nvhost_nvjpg>();
devices["/dev/nvhost-vic"] = std::make_shared<Devices::nvhost_vic>();
devices["/dev/nvdisp_disp0"] = std::make_shared<Devices::nvdisp_disp0>(system, nvmap_dev);
devices["/dev/nvhost-ctrl"] = std::make_shared<Devices::nvhost_ctrl>(system, events_interface);
devices["/dev/nvhost-nvdec"] = std::make_shared<Devices::nvhost_nvdec>(system);
devices["/dev/nvhost-nvjpg"] = std::make_shared<Devices::nvhost_nvjpg>(system);
devices["/dev/nvhost-vic"] = std::make_shared<Devices::nvhost_vic>(system);
}
Module::~Module() = default;
@@ -59,12 +71,13 @@ u32 Module::Open(const std::string& device_name) {
return fd;
}
u32 Module::Ioctl(u32 fd, u32 command, const std::vector<u8>& input, std::vector<u8>& output) {
u32 Module::Ioctl(u32 fd, u32 command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl) {
auto itr = open_files.find(fd);
ASSERT_MSG(itr != open_files.end(), "Tried to talk to an invalid device");
auto& device = itr->second;
return device->ioctl({command}, input, output);
return device->ioctl({command}, input, output, ctrl);
}
ResultCode Module::Close(u32 fd) {
@@ -77,4 +90,22 @@ ResultCode Module::Close(u32 fd) {
return RESULT_SUCCESS;
}
void Module::SignalSyncpt(const u32 syncpoint_id, const u32 value) {
for (u32 i = 0; i < MaxNvEvents; i++) {
if (events_interface.assigned_syncpt[i] == syncpoint_id &&
events_interface.assigned_value[i] == value) {
events_interface.LiberateEvent(i);
events_interface.events[i].writable->Signal();
}
}
}
Kernel::SharedPtr<Kernel::ReadableEvent> Module::GetEvent(const u32 event_id) const {
return events_interface.events[event_id].readable;
}
Kernel::SharedPtr<Kernel::WritableEvent> Module::GetEventWriteable(const u32 event_id) const {
return events_interface.events[event_id].writable;
}
} // namespace Service::Nvidia

View File

@@ -8,8 +8,14 @@
#include <unordered_map>
#include <vector>
#include "common/common_types.h"
#include "core/hle/kernel/writable_event.h"
#include "core/hle/service/nvdrv/nvdata.h"
#include "core/hle/service/service.h"
namespace Core {
class System;
}
namespace Service::NVFlinger {
class NVFlinger;
}
@@ -20,16 +26,72 @@ namespace Devices {
class nvdevice;
}
struct IoctlFence {
u32 id;
u32 value;
struct EventInterface {
// Mask representing currently busy events
u64 events_mask{};
// Each kernel event associated to an NV event
std::array<Kernel::EventPair, MaxNvEvents> events;
// The status of the current NVEvent
std::array<EventState, MaxNvEvents> status{};
// Tells if an NVEvent is registered or not
std::array<bool, MaxNvEvents> registered{};
// When an NVEvent is waiting on GPU interrupt, this is the sync_point
// associated with it.
std::array<u32, MaxNvEvents> assigned_syncpt{};
// This is the value of the GPU interrupt for which the NVEvent is waiting
// for.
std::array<u32, MaxNvEvents> assigned_value{};
// Constant to denote an unasigned syncpoint.
static constexpr u32 unassigned_syncpt = 0xFFFFFFFF;
std::optional<u32> GetFreeEvent() const {
u64 mask = events_mask;
for (u32 i = 0; i < MaxNvEvents; i++) {
const bool is_free = (mask & 0x1) == 0;
if (is_free) {
if (status[i] == EventState::Registered || status[i] == EventState::Free) {
return {i};
}
}
mask = mask >> 1;
}
return {};
}
void SetEventStatus(const u32 event_id, EventState new_status) {
EventState old_status = status[event_id];
if (old_status == new_status) {
return;
}
status[event_id] = new_status;
if (new_status == EventState::Registered) {
registered[event_id] = true;
}
if (new_status == EventState::Waiting || new_status == EventState::Busy) {
events_mask |= (1ULL << event_id);
}
}
void RegisterEvent(const u32 event_id) {
registered[event_id] = true;
if (status[event_id] == EventState::Free) {
status[event_id] = EventState::Registered;
}
}
void UnregisterEvent(const u32 event_id) {
registered[event_id] = false;
if (status[event_id] == EventState::Registered) {
status[event_id] = EventState::Free;
}
}
void LiberateEvent(const u32 event_id) {
status[event_id] = registered[event_id] ? EventState::Registered : EventState::Free;
events_mask &= ~(1ULL << event_id);
assigned_syncpt[event_id] = unassigned_syncpt;
assigned_value[event_id] = 0;
}
};
static_assert(sizeof(IoctlFence) == 8, "IoctlFence has wrong size");
class Module final {
public:
Module();
Module(Core::System& system);
~Module();
/// Returns a pointer to one of the available devices, identified by its name.
@@ -44,10 +106,17 @@ public:
/// Opens a device node and returns a file descriptor to it.
u32 Open(const std::string& device_name);
/// Sends an ioctl command to the specified file descriptor.
u32 Ioctl(u32 fd, u32 command, const std::vector<u8>& input, std::vector<u8>& output);
u32 Ioctl(u32 fd, u32 command, const std::vector<u8>& input, std::vector<u8>& output,
IoctlCtrl& ctrl);
/// Closes a device file descriptor and returns operation success.
ResultCode Close(u32 fd);
void SignalSyncpt(const u32 syncpoint_id, const u32 value);
Kernel::SharedPtr<Kernel::ReadableEvent> GetEvent(u32 event_id) const;
Kernel::SharedPtr<Kernel::WritableEvent> GetEventWriteable(u32 event_id) const;
private:
/// Id to use for the next open file descriptor.
u32 next_fd = 1;
@@ -57,9 +126,12 @@ private:
/// Mapping of device node names to their implementation.
std::unordered_map<std::string, std::shared_ptr<Devices::nvdevice>> devices;
EventInterface events_interface;
};
/// Registers all NVDRV services with the specified service manager.
void InstallInterfaces(SM::ServiceManager& service_manager, NVFlinger::NVFlinger& nvflinger);
void InstallInterfaces(SM::ServiceManager& service_manager, NVFlinger::NVFlinger& nvflinger,
Core::System& system);
} // namespace Service::Nvidia

View File

@@ -34,7 +34,8 @@ void BufferQueue::SetPreallocatedBuffer(u32 slot, const IGBPBuffer& igbp_buffer)
buffer_wait_event.writable->Signal();
}
std::optional<u32> BufferQueue::DequeueBuffer(u32 width, u32 height) {
std::optional<std::pair<u32, Service::Nvidia::MultiFence*>> BufferQueue::DequeueBuffer(u32 width,
u32 height) {
auto itr = std::find_if(queue.begin(), queue.end(), [&](const Buffer& buffer) {
// Only consider free buffers. Buffers become free once again after they've been Acquired
// and Released by the compositor, see the NVFlinger::Compose method.
@@ -51,7 +52,7 @@ std::optional<u32> BufferQueue::DequeueBuffer(u32 width, u32 height) {
}
itr->status = Buffer::Status::Dequeued;
return itr->slot;
return {{itr->slot, &itr->multi_fence}};
}
const IGBPBuffer& BufferQueue::RequestBuffer(u32 slot) const {
@@ -63,7 +64,8 @@ const IGBPBuffer& BufferQueue::RequestBuffer(u32 slot) const {
}
void BufferQueue::QueueBuffer(u32 slot, BufferTransformFlags transform,
const Common::Rectangle<int>& crop_rect) {
const Common::Rectangle<int>& crop_rect, u32 swap_interval,
Service::Nvidia::MultiFence& multi_fence) {
auto itr = std::find_if(queue.begin(), queue.end(),
[&](const Buffer& buffer) { return buffer.slot == slot; });
ASSERT(itr != queue.end());
@@ -71,12 +73,21 @@ void BufferQueue::QueueBuffer(u32 slot, BufferTransformFlags transform,
itr->status = Buffer::Status::Queued;
itr->transform = transform;
itr->crop_rect = crop_rect;
itr->swap_interval = swap_interval;
itr->multi_fence = multi_fence;
queue_sequence.push_back(slot);
}
std::optional<std::reference_wrapper<const BufferQueue::Buffer>> BufferQueue::AcquireBuffer() {
auto itr = std::find_if(queue.begin(), queue.end(), [](const Buffer& buffer) {
return buffer.status == Buffer::Status::Queued;
});
auto itr = queue.end();
// Iterate to find a queued buffer matching the requested slot.
while (itr == queue.end() && !queue_sequence.empty()) {
u32 slot = queue_sequence.front();
itr = std::find_if(queue.begin(), queue.end(), [&slot](const Buffer& buffer) {
return buffer.status == Buffer::Status::Queued && buffer.slot == slot;
});
queue_sequence.pop_front();
}
if (itr == queue.end())
return {};
itr->status = Buffer::Status::Acquired;

View File

@@ -4,6 +4,7 @@
#pragma once
#include <list>
#include <optional>
#include <vector>
@@ -12,6 +13,7 @@
#include "common/swap.h"
#include "core/hle/kernel/object.h"
#include "core/hle/kernel/writable_event.h"
#include "core/hle/service/nvdrv/nvdata.h"
namespace Service::NVFlinger {
@@ -68,13 +70,17 @@ public:
IGBPBuffer igbp_buffer;
BufferTransformFlags transform;
Common::Rectangle<int> crop_rect;
u32 swap_interval;
Service::Nvidia::MultiFence multi_fence;
};
void SetPreallocatedBuffer(u32 slot, const IGBPBuffer& igbp_buffer);
std::optional<u32> DequeueBuffer(u32 width, u32 height);
std::optional<std::pair<u32, Service::Nvidia::MultiFence*>> DequeueBuffer(u32 width,
u32 height);
const IGBPBuffer& RequestBuffer(u32 slot) const;
void QueueBuffer(u32 slot, BufferTransformFlags transform,
const Common::Rectangle<int>& crop_rect);
const Common::Rectangle<int>& crop_rect, u32 swap_interval,
Service::Nvidia::MultiFence& multi_fence);
std::optional<std::reference_wrapper<const Buffer>> AcquireBuffer();
void ReleaseBuffer(u32 slot);
u32 Query(QueryType type);
@@ -92,6 +98,7 @@ private:
u64 layer_id;
std::vector<Buffer> queue;
std::list<u32> queue_sequence;
Kernel::EventPair buffer_wait_event;
};

View File

@@ -37,15 +37,14 @@ NVFlinger::NVFlinger(Core::Timing::CoreTiming& core_timing) : core_timing{core_t
displays.emplace_back(4, "Null");
// Schedule the screen composition events
const auto ticks = Settings::values.force_30fps_mode ? frame_ticks_30fps : frame_ticks;
composition_event = core_timing.RegisterEvent("ScreenComposition", [this](u64 userdata,
s64 cycles_late) {
Compose();
const auto ticks = Settings::values.force_30fps_mode ? frame_ticks_30fps : GetNextTicks();
this->core_timing.ScheduleEvent(std::max<s64>(0LL, ticks - cycles_late), composition_event);
});
composition_event = core_timing.RegisterEvent(
"ScreenComposition", [this, ticks](u64 userdata, s64 cycles_late) {
Compose();
this->core_timing.ScheduleEvent(ticks - cycles_late, composition_event);
});
core_timing.ScheduleEvent(ticks, composition_event);
core_timing.ScheduleEvent(frame_ticks, composition_event);
}
NVFlinger::~NVFlinger() {
@@ -206,8 +205,14 @@ void NVFlinger::Compose() {
igbp_buffer.width, igbp_buffer.height, igbp_buffer.stride,
buffer->get().transform, buffer->get().crop_rect);
swap_interval = buffer->get().swap_interval;
buffer_queue.ReleaseBuffer(buffer->get().slot);
}
}
s64 NVFlinger::GetNextTicks() const {
constexpr s64 max_hertz = 120LL;
return (Core::Timing::BASE_CLOCK_RATE * (1LL << swap_interval)) / max_hertz;
}
} // namespace Service::NVFlinger

View File

@@ -74,6 +74,8 @@ public:
/// finished.
void Compose();
s64 GetNextTicks() const;
private:
/// Finds the display identified by the specified ID.
VI::Display* FindDisplay(u64 display_id);
@@ -98,6 +100,8 @@ private:
/// layers.
u32 next_buffer_queue_id = 1;
u32 swap_interval = 1;
/// Event that handles screen composition.
Core::Timing::EventType* composition_event;

View File

@@ -206,7 +206,7 @@ void Init(std::shared_ptr<SM::ServiceManager>& sm, Core::System& system) {
AM::InstallInterfaces(*sm, nv_flinger, system);
AOC::InstallInterfaces(*sm);
APM::InstallInterfaces(system);
Audio::InstallInterfaces(*sm);
Audio::InstallInterfaces(*sm, system);
BCAT::InstallInterfaces(*sm);
BPC::InstallInterfaces(*sm);
BtDrv::InstallInterfaces(*sm);
@@ -236,7 +236,7 @@ void Init(std::shared_ptr<SM::ServiceManager>& sm, Core::System& system) {
NIM::InstallInterfaces(*sm);
NPNS::InstallInterfaces(*sm);
NS::InstallInterfaces(*sm);
Nvidia::InstallInterfaces(*sm, *nv_flinger);
Nvidia::InstallInterfaces(*sm, *nv_flinger, system);
PCIe::InstallInterfaces(*sm);
PCTL::InstallInterfaces(*sm);
PCV::InstallInterfaces(*sm);

View File

@@ -21,6 +21,7 @@
#include "core/hle/kernel/readable_event.h"
#include "core/hle/kernel/thread.h"
#include "core/hle/kernel/writable_event.h"
#include "core/hle/service/nvdrv/nvdata.h"
#include "core/hle/service/nvdrv/nvdrv.h"
#include "core/hle/service/nvflinger/buffer_queue.h"
#include "core/hle/service/nvflinger/nvflinger.h"
@@ -328,32 +329,22 @@ public:
Data data;
};
struct BufferProducerFence {
u32 is_valid;
std::array<Nvidia::IoctlFence, 4> fences;
};
static_assert(sizeof(BufferProducerFence) == 36, "BufferProducerFence has wrong size");
class IGBPDequeueBufferResponseParcel : public Parcel {
public:
explicit IGBPDequeueBufferResponseParcel(u32 slot) : slot(slot) {}
explicit IGBPDequeueBufferResponseParcel(u32 slot, Service::Nvidia::MultiFence& multi_fence)
: slot(slot), multi_fence(multi_fence) {}
~IGBPDequeueBufferResponseParcel() override = default;
protected:
void SerializeData() override {
// TODO(Subv): Find out how this Fence is used.
BufferProducerFence fence = {};
fence.is_valid = 1;
for (auto& fence_ : fence.fences)
fence_.id = -1;
Write(slot);
Write<u32_le>(1);
WriteObject(fence);
WriteObject(multi_fence);
Write<u32_le>(0);
}
u32_le slot;
Service::Nvidia::MultiFence multi_fence;
};
class IGBPRequestBufferRequestParcel : public Parcel {
@@ -400,12 +391,6 @@ public:
data = Read<Data>();
}
struct Fence {
u32_le id;
u32_le value;
};
static_assert(sizeof(Fence) == 8, "Fence has wrong size");
struct Data {
u32_le slot;
INSERT_PADDING_WORDS(3);
@@ -418,15 +403,15 @@ public:
s32_le scaling_mode;
NVFlinger::BufferQueue::BufferTransformFlags transform;
u32_le sticky_transform;
INSERT_PADDING_WORDS(2);
u32_le fence_is_valid;
std::array<Fence, 2> fences;
INSERT_PADDING_WORDS(1);
u32_le swap_interval;
Service::Nvidia::MultiFence multi_fence;
Common::Rectangle<int> GetCropRect() const {
return {crop_left, crop_top, crop_right, crop_bottom};
}
};
static_assert(sizeof(Data) == 80, "ParcelData has wrong size");
static_assert(sizeof(Data) == 96, "ParcelData has wrong size");
Data data;
};
@@ -547,11 +532,11 @@ private:
IGBPDequeueBufferRequestParcel request{ctx.ReadBuffer()};
const u32 width{request.data.width};
const u32 height{request.data.height};
std::optional<u32> slot = buffer_queue.DequeueBuffer(width, height);
auto result = buffer_queue.DequeueBuffer(width, height);
if (slot) {
if (result) {
// Buffer is available
IGBPDequeueBufferResponseParcel response{*slot};
IGBPDequeueBufferResponseParcel response{result->first, *result->second};
ctx.WriteBuffer(response.Serialize());
} else {
// Wait the current thread until a buffer becomes available
@@ -561,10 +546,10 @@ private:
Kernel::ThreadWakeupReason reason) {
// Repeat TransactParcel DequeueBuffer when a buffer is available
auto& buffer_queue = nv_flinger->FindBufferQueue(id);
std::optional<u32> slot = buffer_queue.DequeueBuffer(width, height);
ASSERT_MSG(slot != std::nullopt, "Could not dequeue buffer.");
auto result = buffer_queue.DequeueBuffer(width, height);
ASSERT_MSG(result != std::nullopt, "Could not dequeue buffer.");
IGBPDequeueBufferResponseParcel response{*slot};
IGBPDequeueBufferResponseParcel response{result->first, *result->second};
ctx.WriteBuffer(response.Serialize());
IPC::ResponseBuilder rb{ctx, 2};
rb.Push(RESULT_SUCCESS);
@@ -582,7 +567,8 @@ private:
IGBPQueueBufferRequestParcel request{ctx.ReadBuffer()};
buffer_queue.QueueBuffer(request.data.slot, request.data.transform,
request.data.GetCropRect());
request.data.GetCropRect(), request.data.swap_interval,
request.data.multi_fence);
IGBPQueueBufferResponseParcel response{1280, 720};
ctx.WriteBuffer(response.Serialize());

View File

@@ -295,7 +295,7 @@ Kernel::CodeSet ElfReader::LoadInto(VAddr vaddr) {
}
}
std::vector<u8> program_image(total_image_size);
Kernel::PhysicalMemory program_image(total_image_size);
std::size_t current_image_position = 0;
Kernel::CodeSet codeset;

View File

@@ -69,7 +69,7 @@ AppLoader::LoadResult AppLoader_KIP::Load(Kernel::Process& process) {
const VAddr base_address = process.VMManager().GetCodeRegionBaseAddress();
Kernel::CodeSet codeset;
std::vector<u8> program_image;
Kernel::PhysicalMemory program_image;
const auto load_segment = [&program_image](Kernel::CodeSet::Segment& segment,
const std::vector<u8>& data, u32 offset) {

View File

@@ -143,7 +143,7 @@ static bool LoadNroImpl(Kernel::Process& process, const std::vector<u8>& data,
}
// Build program image
std::vector<u8> program_image(PageAlignSize(nro_header.file_size));
Kernel::PhysicalMemory program_image(PageAlignSize(nro_header.file_size));
std::memcpy(program_image.data(), data.data(), program_image.size());
if (program_image.size() != PageAlignSize(nro_header.file_size)) {
return {};

View File

@@ -89,7 +89,7 @@ std::optional<VAddr> AppLoader_NSO::LoadModule(Kernel::Process& process,
// Build program image
Kernel::CodeSet codeset;
std::vector<u8> program_image;
Kernel::PhysicalMemory program_image;
for (std::size_t i = 0; i < nso_header.segments.size(); ++i) {
std::vector<u8> data =
file.ReadBytes(nso_header.segments_compressed_size[i], nso_header.segments[i].offset);

View File

@@ -1,5 +1,7 @@
add_library(video_core STATIC
buffer_cache.h
buffer_cache/buffer_block.h
buffer_cache/buffer_cache.h
buffer_cache/map_interval.h
dma_pusher.cpp
dma_pusher.h
debug_utils/debug_utils.cpp
@@ -100,6 +102,7 @@ add_library(video_core STATIC
shader/decode/integer_set.cpp
shader/decode/half_set.cpp
shader/decode/video.cpp
shader/decode/warp.cpp
shader/decode/xmad.cpp
shader/decode/other.cpp
shader/control_flow.cpp

View File

@@ -1,299 +0,0 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include <array>
#include <memory>
#include <mutex>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <vector>
#include "common/alignment.h"
#include "common/common_types.h"
#include "core/core.h"
#include "video_core/memory_manager.h"
#include "video_core/rasterizer_cache.h"
namespace VideoCore {
class RasterizerInterface;
}
namespace VideoCommon {
template <typename BufferStorageType>
class CachedBuffer final : public RasterizerCacheObject {
public:
explicit CachedBuffer(VAddr cpu_addr, u8* host_ptr)
: RasterizerCacheObject{host_ptr}, host_ptr{host_ptr}, cpu_addr{cpu_addr} {}
~CachedBuffer() override = default;
VAddr GetCpuAddr() const override {
return cpu_addr;
}
std::size_t GetSizeInBytes() const override {
return size;
}
u8* GetWritableHostPtr() const {
return host_ptr;
}
std::size_t GetSize() const {
return size;
}
std::size_t GetCapacity() const {
return capacity;
}
bool IsInternalized() const {
return is_internal;
}
const BufferStorageType& GetBuffer() const {
return buffer;
}
void SetSize(std::size_t new_size) {
size = new_size;
}
void SetInternalState(bool is_internal_) {
is_internal = is_internal_;
}
BufferStorageType ExchangeBuffer(BufferStorageType buffer_, std::size_t new_capacity) {
capacity = new_capacity;
std::swap(buffer, buffer_);
return buffer_;
}
private:
u8* host_ptr{};
VAddr cpu_addr{};
std::size_t size{};
std::size_t capacity{};
bool is_internal{};
BufferStorageType buffer;
};
template <typename BufferStorageType, typename BufferType, typename StreamBuffer>
class BufferCache : public RasterizerCache<std::shared_ptr<CachedBuffer<BufferStorageType>>> {
public:
using Buffer = std::shared_ptr<CachedBuffer<BufferStorageType>>;
using BufferInfo = std::pair<const BufferType*, u64>;
explicit BufferCache(VideoCore::RasterizerInterface& rasterizer, Core::System& system,
std::unique_ptr<StreamBuffer> stream_buffer)
: RasterizerCache<Buffer>{rasterizer}, system{system},
stream_buffer{std::move(stream_buffer)}, stream_buffer_handle{
this->stream_buffer->GetHandle()} {}
~BufferCache() = default;
void Unregister(const Buffer& entry) override {
std::lock_guard lock{RasterizerCache<Buffer>::mutex};
if (entry->IsInternalized()) {
internalized_entries.erase(entry->GetCacheAddr());
}
ReserveBuffer(entry);
RasterizerCache<Buffer>::Unregister(entry);
}
void TickFrame() {
marked_for_destruction_index =
(marked_for_destruction_index + 1) % marked_for_destruction_ring_buffer.size();
MarkedForDestruction().clear();
}
BufferInfo UploadMemory(GPUVAddr gpu_addr, std::size_t size, std::size_t alignment = 4,
bool internalize = false, bool is_written = false) {
std::lock_guard lock{RasterizerCache<Buffer>::mutex};
auto& memory_manager = system.GPU().MemoryManager();
const auto host_ptr = memory_manager.GetPointer(gpu_addr);
if (!host_ptr) {
return {GetEmptyBuffer(size), 0};
}
const auto cache_addr = ToCacheAddr(host_ptr);
// Cache management is a big overhead, so only cache entries with a given size.
// TODO: Figure out which size is the best for given games.
constexpr std::size_t max_stream_size = 0x800;
if (!internalize && size < max_stream_size &&
internalized_entries.find(cache_addr) == internalized_entries.end()) {
return StreamBufferUpload(host_ptr, size, alignment);
}
auto entry = RasterizerCache<Buffer>::TryGet(cache_addr);
if (!entry) {
return FixedBufferUpload(gpu_addr, host_ptr, size, internalize, is_written);
}
if (entry->GetSize() < size) {
IncreaseBufferSize(entry, size);
}
if (is_written) {
entry->MarkAsModified(true, *this);
}
return {ToHandle(entry->GetBuffer()), 0};
}
/// Uploads from a host memory. Returns the OpenGL buffer where it's located and its offset.
BufferInfo UploadHostMemory(const void* raw_pointer, std::size_t size,
std::size_t alignment = 4) {
std::lock_guard lock{RasterizerCache<Buffer>::mutex};
return StreamBufferUpload(raw_pointer, size, alignment);
}
void Map(std::size_t max_size) {
std::tie(buffer_ptr, buffer_offset_base, invalidated) = stream_buffer->Map(max_size, 4);
buffer_offset = buffer_offset_base;
}
/// Finishes the upload stream, returns true on bindings invalidation.
bool Unmap() {
stream_buffer->Unmap(buffer_offset - buffer_offset_base);
return std::exchange(invalidated, false);
}
virtual const BufferType* GetEmptyBuffer(std::size_t size) = 0;
protected:
void FlushObjectInner(const Buffer& entry) override {
DownloadBufferData(entry->GetBuffer(), 0, entry->GetSize(), entry->GetWritableHostPtr());
}
virtual BufferStorageType CreateBuffer(std::size_t size) = 0;
virtual const BufferType* ToHandle(const BufferStorageType& storage) = 0;
virtual void UploadBufferData(const BufferStorageType& buffer, std::size_t offset,
std::size_t size, const u8* data) = 0;
virtual void DownloadBufferData(const BufferStorageType& buffer, std::size_t offset,
std::size_t size, u8* data) = 0;
virtual void CopyBufferData(const BufferStorageType& src, const BufferStorageType& dst,
std::size_t src_offset, std::size_t dst_offset,
std::size_t size) = 0;
private:
BufferInfo StreamBufferUpload(const void* raw_pointer, std::size_t size,
std::size_t alignment) {
AlignBuffer(alignment);
const std::size_t uploaded_offset = buffer_offset;
std::memcpy(buffer_ptr, raw_pointer, size);
buffer_ptr += size;
buffer_offset += size;
return {&stream_buffer_handle, uploaded_offset};
}
BufferInfo FixedBufferUpload(GPUVAddr gpu_addr, u8* host_ptr, std::size_t size,
bool internalize, bool is_written) {
auto& memory_manager = Core::System::GetInstance().GPU().MemoryManager();
const auto cpu_addr = memory_manager.GpuToCpuAddress(gpu_addr);
ASSERT(cpu_addr);
auto entry = GetUncachedBuffer(*cpu_addr, host_ptr);
entry->SetSize(size);
entry->SetInternalState(internalize);
RasterizerCache<Buffer>::Register(entry);
if (internalize) {
internalized_entries.emplace(ToCacheAddr(host_ptr));
}
if (is_written) {
entry->MarkAsModified(true, *this);
}
if (entry->GetCapacity() < size) {
MarkedForDestruction().push_back(entry->ExchangeBuffer(CreateBuffer(size), size));
}
UploadBufferData(entry->GetBuffer(), 0, size, host_ptr);
return {ToHandle(entry->GetBuffer()), 0};
}
void IncreaseBufferSize(Buffer& entry, std::size_t new_size) {
const std::size_t old_size = entry->GetSize();
if (entry->GetCapacity() < new_size) {
const auto& old_buffer = entry->GetBuffer();
auto new_buffer = CreateBuffer(new_size);
// Copy bits from the old buffer to the new buffer.
CopyBufferData(old_buffer, new_buffer, 0, 0, old_size);
MarkedForDestruction().push_back(
entry->ExchangeBuffer(std::move(new_buffer), new_size));
// This buffer could have been used
invalidated = true;
}
// Upload the new bits.
const std::size_t size_diff = new_size - old_size;
UploadBufferData(entry->GetBuffer(), old_size, size_diff, entry->GetHostPtr() + old_size);
// Update entry's size in the object and in the cache.
Unregister(entry);
entry->SetSize(new_size);
RasterizerCache<Buffer>::Register(entry);
}
Buffer GetUncachedBuffer(VAddr cpu_addr, u8* host_ptr) {
if (auto entry = TryGetReservedBuffer(host_ptr)) {
return entry;
}
return std::make_shared<CachedBuffer<BufferStorageType>>(cpu_addr, host_ptr);
}
Buffer TryGetReservedBuffer(u8* host_ptr) {
const auto it = buffer_reserve.find(ToCacheAddr(host_ptr));
if (it == buffer_reserve.end()) {
return {};
}
auto& reserve = it->second;
auto entry = reserve.back();
reserve.pop_back();
return entry;
}
void ReserveBuffer(Buffer entry) {
buffer_reserve[entry->GetCacheAddr()].push_back(std::move(entry));
}
void AlignBuffer(std::size_t alignment) {
// Align the offset, not the mapped pointer
const std::size_t offset_aligned = Common::AlignUp(buffer_offset, alignment);
buffer_ptr += offset_aligned - buffer_offset;
buffer_offset = offset_aligned;
}
std::vector<BufferStorageType>& MarkedForDestruction() {
return marked_for_destruction_ring_buffer[marked_for_destruction_index];
}
Core::System& system;
std::unique_ptr<StreamBuffer> stream_buffer;
BufferType stream_buffer_handle{};
bool invalidated = false;
u8* buffer_ptr = nullptr;
u64 buffer_offset = 0;
u64 buffer_offset_base = 0;
std::size_t marked_for_destruction_index = 0;
std::array<std::vector<BufferStorageType>, 4> marked_for_destruction_ring_buffer;
std::unordered_set<CacheAddr> internalized_entries;
std::unordered_map<CacheAddr, std::vector<Buffer>> buffer_reserve;
};
} // namespace VideoCommon

View File

@@ -0,0 +1,76 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include <unordered_set>
#include <utility>
#include "common/alignment.h"
#include "common/common_types.h"
#include "video_core/gpu.h"
namespace VideoCommon {
class BufferBlock {
public:
bool Overlaps(const CacheAddr start, const CacheAddr end) const {
return (cache_addr < end) && (cache_addr_end > start);
}
bool IsInside(const CacheAddr other_start, const CacheAddr other_end) const {
return cache_addr <= other_start && other_end <= cache_addr_end;
}
u8* GetWritableHostPtr() const {
return FromCacheAddr(cache_addr);
}
u8* GetWritableHostPtr(std::size_t offset) const {
return FromCacheAddr(cache_addr + offset);
}
std::size_t GetOffset(const CacheAddr in_addr) {
return static_cast<std::size_t>(in_addr - cache_addr);
}
CacheAddr GetCacheAddr() const {
return cache_addr;
}
CacheAddr GetCacheAddrEnd() const {
return cache_addr_end;
}
void SetCacheAddr(const CacheAddr new_addr) {
cache_addr = new_addr;
cache_addr_end = new_addr + size;
}
std::size_t GetSize() const {
return size;
}
void SetEpoch(u64 new_epoch) {
epoch = new_epoch;
}
u64 GetEpoch() {
return epoch;
}
protected:
explicit BufferBlock(CacheAddr cache_addr, const std::size_t size) : size{size} {
SetCacheAddr(cache_addr);
}
~BufferBlock() = default;
private:
CacheAddr cache_addr{};
CacheAddr cache_addr_end{};
std::size_t size{};
u64 epoch{};
};
} // namespace VideoCommon

View File

@@ -0,0 +1,447 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include <array>
#include <memory>
#include <mutex>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <vector>
#include "common/alignment.h"
#include "common/common_types.h"
#include "core/core.h"
#include "video_core/buffer_cache/buffer_block.h"
#include "video_core/buffer_cache/map_interval.h"
#include "video_core/memory_manager.h"
#include "video_core/rasterizer_interface.h"
namespace VideoCommon {
using MapInterval = std::shared_ptr<MapIntervalBase>;
template <typename TBuffer, typename TBufferType, typename StreamBuffer>
class BufferCache {
public:
using BufferInfo = std::pair<const TBufferType*, u64>;
BufferInfo UploadMemory(GPUVAddr gpu_addr, std::size_t size, std::size_t alignment = 4,
bool is_written = false) {
std::lock_guard lock{mutex};
auto& memory_manager = system.GPU().MemoryManager();
const auto host_ptr = memory_manager.GetPointer(gpu_addr);
if (!host_ptr) {
return {GetEmptyBuffer(size), 0};
}
const auto cache_addr = ToCacheAddr(host_ptr);
// Cache management is a big overhead, so only cache entries with a given size.
// TODO: Figure out which size is the best for given games.
constexpr std::size_t max_stream_size = 0x800;
if (size < max_stream_size) {
if (!is_written && !IsRegionWritten(cache_addr, cache_addr + size - 1)) {
return StreamBufferUpload(host_ptr, size, alignment);
}
}
auto block = GetBlock(cache_addr, size);
auto map = MapAddress(block, gpu_addr, cache_addr, size);
if (is_written) {
map->MarkAsModified(true, GetModifiedTicks());
if (!map->IsWritten()) {
map->MarkAsWritten(true);
MarkRegionAsWritten(map->GetStart(), map->GetEnd() - 1);
}
} else {
if (map->IsWritten()) {
WriteBarrier();
}
}
const u64 offset = static_cast<u64>(block->GetOffset(cache_addr));
return {ToHandle(block), offset};
}
/// Uploads from a host memory. Returns the OpenGL buffer where it's located and its offset.
BufferInfo UploadHostMemory(const void* raw_pointer, std::size_t size,
std::size_t alignment = 4) {
std::lock_guard lock{mutex};
return StreamBufferUpload(raw_pointer, size, alignment);
}
void Map(std::size_t max_size) {
std::lock_guard lock{mutex};
std::tie(buffer_ptr, buffer_offset_base, invalidated) = stream_buffer->Map(max_size, 4);
buffer_offset = buffer_offset_base;
}
/// Finishes the upload stream, returns true on bindings invalidation.
bool Unmap() {
std::lock_guard lock{mutex};
stream_buffer->Unmap(buffer_offset - buffer_offset_base);
return std::exchange(invalidated, false);
}
void TickFrame() {
++epoch;
while (!pending_destruction.empty()) {
if (pending_destruction.front()->GetEpoch() + 1 > epoch) {
break;
}
pending_destruction.pop_front();
}
}
/// Write any cached resources overlapping the specified region back to memory
void FlushRegion(CacheAddr addr, std::size_t size) {
std::lock_guard lock{mutex};
std::vector<MapInterval> objects = GetMapsInRange(addr, size);
std::sort(objects.begin(), objects.end(), [](const MapInterval& a, const MapInterval& b) {
return a->GetModificationTick() < b->GetModificationTick();
});
for (auto& object : objects) {
if (object->IsModified() && object->IsRegistered()) {
FlushMap(object);
}
}
}
/// Mark the specified region as being invalidated
void InvalidateRegion(CacheAddr addr, u64 size) {
std::lock_guard lock{mutex};
std::vector<MapInterval> objects = GetMapsInRange(addr, size);
for (auto& object : objects) {
if (object->IsRegistered()) {
Unregister(object);
}
}
}
virtual const TBufferType* GetEmptyBuffer(std::size_t size) = 0;
protected:
explicit BufferCache(VideoCore::RasterizerInterface& rasterizer, Core::System& system,
std::unique_ptr<StreamBuffer> stream_buffer)
: rasterizer{rasterizer}, system{system}, stream_buffer{std::move(stream_buffer)},
stream_buffer_handle{this->stream_buffer->GetHandle()} {}
~BufferCache() = default;
virtual const TBufferType* ToHandle(const TBuffer& storage) = 0;
virtual void WriteBarrier() = 0;
virtual TBuffer CreateBlock(CacheAddr cache_addr, std::size_t size) = 0;
virtual void UploadBlockData(const TBuffer& buffer, std::size_t offset, std::size_t size,
const u8* data) = 0;
virtual void DownloadBlockData(const TBuffer& buffer, std::size_t offset, std::size_t size,
u8* data) = 0;
virtual void CopyBlock(const TBuffer& src, const TBuffer& dst, std::size_t src_offset,
std::size_t dst_offset, std::size_t size) = 0;
/// Register an object into the cache
void Register(const MapInterval& new_map, bool inherit_written = false) {
const CacheAddr cache_ptr = new_map->GetStart();
const std::optional<VAddr> cpu_addr =
system.GPU().MemoryManager().GpuToCpuAddress(new_map->GetGpuAddress());
if (!cache_ptr || !cpu_addr) {
LOG_CRITICAL(HW_GPU, "Failed to register buffer with unmapped gpu_address 0x{:016x}",
new_map->GetGpuAddress());
return;
}
const std::size_t size = new_map->GetEnd() - new_map->GetStart();
new_map->SetCpuAddress(*cpu_addr);
new_map->MarkAsRegistered(true);
const IntervalType interval{new_map->GetStart(), new_map->GetEnd()};
mapped_addresses.insert({interval, new_map});
rasterizer.UpdatePagesCachedCount(*cpu_addr, size, 1);
if (inherit_written) {
MarkRegionAsWritten(new_map->GetStart(), new_map->GetEnd() - 1);
new_map->MarkAsWritten(true);
}
}
/// Unregisters an object from the cache
void Unregister(MapInterval& map) {
const std::size_t size = map->GetEnd() - map->GetStart();
rasterizer.UpdatePagesCachedCount(map->GetCpuAddress(), size, -1);
map->MarkAsRegistered(false);
if (map->IsWritten()) {
UnmarkRegionAsWritten(map->GetStart(), map->GetEnd() - 1);
}
const IntervalType delete_interval{map->GetStart(), map->GetEnd()};
mapped_addresses.erase(delete_interval);
}
private:
MapInterval CreateMap(const CacheAddr start, const CacheAddr end, const GPUVAddr gpu_addr) {
return std::make_shared<MapIntervalBase>(start, end, gpu_addr);
}
MapInterval MapAddress(const TBuffer& block, const GPUVAddr gpu_addr,
const CacheAddr cache_addr, const std::size_t size) {
std::vector<MapInterval> overlaps = GetMapsInRange(cache_addr, size);
if (overlaps.empty()) {
const CacheAddr cache_addr_end = cache_addr + size;
MapInterval new_map = CreateMap(cache_addr, cache_addr_end, gpu_addr);
u8* host_ptr = FromCacheAddr(cache_addr);
UploadBlockData(block, block->GetOffset(cache_addr), size, host_ptr);
Register(new_map);
return new_map;
}
const CacheAddr cache_addr_end = cache_addr + size;
if (overlaps.size() == 1) {
MapInterval& current_map = overlaps[0];
if (current_map->IsInside(cache_addr, cache_addr_end)) {
return current_map;
}
}
CacheAddr new_start = cache_addr;
CacheAddr new_end = cache_addr_end;
bool write_inheritance = false;
bool modified_inheritance = false;
// Calculate new buffer parameters
for (auto& overlap : overlaps) {
new_start = std::min(overlap->GetStart(), new_start);
new_end = std::max(overlap->GetEnd(), new_end);
write_inheritance |= overlap->IsWritten();
modified_inheritance |= overlap->IsModified();
}
GPUVAddr new_gpu_addr = gpu_addr + new_start - cache_addr;
for (auto& overlap : overlaps) {
Unregister(overlap);
}
UpdateBlock(block, new_start, new_end, overlaps);
MapInterval new_map = CreateMap(new_start, new_end, new_gpu_addr);
if (modified_inheritance) {
new_map->MarkAsModified(true, GetModifiedTicks());
}
Register(new_map, write_inheritance);
return new_map;
}
void UpdateBlock(const TBuffer& block, CacheAddr start, CacheAddr end,
std::vector<MapInterval>& overlaps) {
const IntervalType base_interval{start, end};
IntervalSet interval_set{};
interval_set.add(base_interval);
for (auto& overlap : overlaps) {
const IntervalType subtract{overlap->GetStart(), overlap->GetEnd()};
interval_set.subtract(subtract);
}
for (auto& interval : interval_set) {
std::size_t size = interval.upper() - interval.lower();
if (size > 0) {
u8* host_ptr = FromCacheAddr(interval.lower());
UploadBlockData(block, block->GetOffset(interval.lower()), size, host_ptr);
}
}
}
std::vector<MapInterval> GetMapsInRange(CacheAddr addr, std::size_t size) {
if (size == 0) {
return {};
}
std::vector<MapInterval> objects{};
const IntervalType interval{addr, addr + size};
for (auto& pair : boost::make_iterator_range(mapped_addresses.equal_range(interval))) {
objects.push_back(pair.second);
}
return objects;
}
/// Returns a ticks counter used for tracking when cached objects were last modified
u64 GetModifiedTicks() {
return ++modified_ticks;
}
void FlushMap(MapInterval map) {
std::size_t size = map->GetEnd() - map->GetStart();
TBuffer block = blocks[map->GetStart() >> block_page_bits];
u8* host_ptr = FromCacheAddr(map->GetStart());
DownloadBlockData(block, block->GetOffset(map->GetStart()), size, host_ptr);
map->MarkAsModified(false, 0);
}
BufferInfo StreamBufferUpload(const void* raw_pointer, std::size_t size,
std::size_t alignment) {
AlignBuffer(alignment);
const std::size_t uploaded_offset = buffer_offset;
std::memcpy(buffer_ptr, raw_pointer, size);
buffer_ptr += size;
buffer_offset += size;
return {&stream_buffer_handle, uploaded_offset};
}
void AlignBuffer(std::size_t alignment) {
// Align the offset, not the mapped pointer
const std::size_t offset_aligned = Common::AlignUp(buffer_offset, alignment);
buffer_ptr += offset_aligned - buffer_offset;
buffer_offset = offset_aligned;
}
TBuffer EnlargeBlock(TBuffer buffer) {
const std::size_t old_size = buffer->GetSize();
const std::size_t new_size = old_size + block_page_size;
const CacheAddr cache_addr = buffer->GetCacheAddr();
TBuffer new_buffer = CreateBlock(cache_addr, new_size);
CopyBlock(buffer, new_buffer, 0, 0, old_size);
buffer->SetEpoch(epoch);
pending_destruction.push_back(buffer);
const CacheAddr cache_addr_end = cache_addr + new_size - 1;
u64 page_start = cache_addr >> block_page_bits;
const u64 page_end = cache_addr_end >> block_page_bits;
while (page_start <= page_end) {
blocks[page_start] = new_buffer;
++page_start;
}
return new_buffer;
}
TBuffer MergeBlocks(TBuffer first, TBuffer second) {
const std::size_t size_1 = first->GetSize();
const std::size_t size_2 = second->GetSize();
const CacheAddr first_addr = first->GetCacheAddr();
const CacheAddr second_addr = second->GetCacheAddr();
const CacheAddr new_addr = std::min(first_addr, second_addr);
const std::size_t new_size = size_1 + size_2;
TBuffer new_buffer = CreateBlock(new_addr, new_size);
CopyBlock(first, new_buffer, 0, new_buffer->GetOffset(first_addr), size_1);
CopyBlock(second, new_buffer, 0, new_buffer->GetOffset(second_addr), size_2);
first->SetEpoch(epoch);
second->SetEpoch(epoch);
pending_destruction.push_back(first);
pending_destruction.push_back(second);
const CacheAddr cache_addr_end = new_addr + new_size - 1;
u64 page_start = new_addr >> block_page_bits;
const u64 page_end = cache_addr_end >> block_page_bits;
while (page_start <= page_end) {
blocks[page_start] = new_buffer;
++page_start;
}
return new_buffer;
}
TBuffer GetBlock(const CacheAddr cache_addr, const std::size_t size) {
TBuffer found{};
const CacheAddr cache_addr_end = cache_addr + size - 1;
u64 page_start = cache_addr >> block_page_bits;
const u64 page_end = cache_addr_end >> block_page_bits;
while (page_start <= page_end) {
auto it = blocks.find(page_start);
if (it == blocks.end()) {
if (found) {
found = EnlargeBlock(found);
} else {
const CacheAddr start_addr = (page_start << block_page_bits);
found = CreateBlock(start_addr, block_page_size);
blocks[page_start] = found;
}
} else {
if (found) {
if (found == it->second) {
++page_start;
continue;
}
found = MergeBlocks(found, it->second);
} else {
found = it->second;
}
}
++page_start;
}
return found;
}
void MarkRegionAsWritten(const CacheAddr start, const CacheAddr end) {
u64 page_start = start >> write_page_bit;
const u64 page_end = end >> write_page_bit;
while (page_start <= page_end) {
auto it = written_pages.find(page_start);
if (it != written_pages.end()) {
it->second = it->second + 1;
} else {
written_pages[page_start] = 1;
}
page_start++;
}
}
void UnmarkRegionAsWritten(const CacheAddr start, const CacheAddr end) {
u64 page_start = start >> write_page_bit;
const u64 page_end = end >> write_page_bit;
while (page_start <= page_end) {
auto it = written_pages.find(page_start);
if (it != written_pages.end()) {
if (it->second > 1) {
it->second = it->second - 1;
} else {
written_pages.erase(it);
}
}
page_start++;
}
}
bool IsRegionWritten(const CacheAddr start, const CacheAddr end) const {
u64 page_start = start >> write_page_bit;
const u64 page_end = end >> write_page_bit;
while (page_start <= page_end) {
if (written_pages.count(page_start) > 0) {
return true;
}
page_start++;
}
return false;
}
VideoCore::RasterizerInterface& rasterizer;
Core::System& system;
std::unique_ptr<StreamBuffer> stream_buffer;
TBufferType stream_buffer_handle{};
bool invalidated = false;
u8* buffer_ptr = nullptr;
u64 buffer_offset = 0;
u64 buffer_offset_base = 0;
using IntervalSet = boost::icl::interval_set<CacheAddr>;
using IntervalCache = boost::icl::interval_map<CacheAddr, MapInterval>;
using IntervalType = typename IntervalCache::interval_type;
IntervalCache mapped_addresses{};
static constexpr u64 write_page_bit{11};
std::unordered_map<u64, u32> written_pages{};
static constexpr u64 block_page_bits{21};
static constexpr u64 block_page_size{1 << block_page_bits};
std::unordered_map<u64, TBuffer> blocks{};
std::list<TBuffer> pending_destruction{};
u64 epoch{};
u64 modified_ticks{};
std::recursive_mutex mutex;
};
} // namespace VideoCommon

View File

@@ -0,0 +1,89 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include "common/common_types.h"
#include "video_core/gpu.h"
namespace VideoCommon {
class MapIntervalBase {
public:
MapIntervalBase(const CacheAddr start, const CacheAddr end, const GPUVAddr gpu_addr)
: start{start}, end{end}, gpu_addr{gpu_addr} {}
void SetCpuAddress(VAddr new_cpu_addr) {
cpu_addr = new_cpu_addr;
}
VAddr GetCpuAddress() const {
return cpu_addr;
}
GPUVAddr GetGpuAddress() const {
return gpu_addr;
}
bool IsInside(const CacheAddr other_start, const CacheAddr other_end) const {
return (start <= other_start && other_end <= end);
}
bool operator==(const MapIntervalBase& rhs) const {
return std::tie(start, end) == std::tie(rhs.start, rhs.end);
}
bool operator!=(const MapIntervalBase& rhs) const {
return !operator==(rhs);
}
void MarkAsRegistered(const bool registered) {
is_registered = registered;
}
bool IsRegistered() const {
return is_registered;
}
CacheAddr GetStart() const {
return start;
}
CacheAddr GetEnd() const {
return end;
}
void MarkAsModified(const bool is_modified_, const u64 tick) {
is_modified = is_modified_;
ticks = tick;
}
bool IsModified() const {
return is_modified;
}
u64 GetModificationTick() const {
return ticks;
}
void MarkAsWritten(const bool is_written_) {
is_written = is_written_;
}
bool IsWritten() const {
return is_written;
}
private:
CacheAddr start;
CacheAddr end;
GPUVAddr gpu_addr;
VAddr cpu_addr{};
bool is_written{};
bool is_modified{};
bool is_registered{};
u64 ticks{};
};
} // namespace VideoCommon

View File

@@ -22,7 +22,7 @@ void DmaPusher::DispatchCalls() {
MICROPROFILE_SCOPE(DispatchCalls);
// On entering GPU code, assume all memory may be touched by the ARM core.
gpu.Maxwell3D().dirty_flags.OnMemoryWrite();
gpu.Maxwell3D().dirty.OnMemoryWrite();
dma_pushbuffer_subindex = 0;
@@ -31,6 +31,7 @@ void DmaPusher::DispatchCalls() {
break;
}
}
gpu.FlushCommands();
}
bool DmaPusher::Step() {

View File

@@ -10,8 +10,7 @@
namespace Tegra::Engines {
Fermi2D::Fermi2D(VideoCore::RasterizerInterface& rasterizer, MemoryManager& memory_manager)
: rasterizer{rasterizer}, memory_manager{memory_manager} {}
Fermi2D::Fermi2D(VideoCore::RasterizerInterface& rasterizer) : rasterizer{rasterizer} {}
void Fermi2D::CallMethod(const GPU::MethodCall& method_call) {
ASSERT_MSG(method_call.method < Regs::NUM_REGS,

View File

@@ -33,7 +33,7 @@ namespace Tegra::Engines {
class Fermi2D final {
public:
explicit Fermi2D(VideoCore::RasterizerInterface& rasterizer, MemoryManager& memory_manager);
explicit Fermi2D(VideoCore::RasterizerInterface& rasterizer);
~Fermi2D() = default;
/// Write the value to the register identified by method.
@@ -145,7 +145,6 @@ public:
private:
VideoCore::RasterizerInterface& rasterizer;
MemoryManager& memory_manager;
/// Performs the copy from the source surface to the destination surface as configured in the
/// registers.

View File

@@ -37,7 +37,7 @@ void KeplerCompute::CallMethod(const GPU::MethodCall& method_call) {
const bool is_last_call = method_call.IsLastCall();
upload_state.ProcessData(method_call.argument, is_last_call);
if (is_last_call) {
system.GPU().Maxwell3D().dirty_flags.OnMemoryWrite();
system.GPU().Maxwell3D().dirty.OnMemoryWrite();
}
break;
}
@@ -50,13 +50,14 @@ void KeplerCompute::CallMethod(const GPU::MethodCall& method_call) {
}
void KeplerCompute::ProcessLaunch() {
const GPUVAddr launch_desc_loc = regs.launch_desc_loc.Address();
memory_manager.ReadBlockUnsafe(launch_desc_loc, &launch_description,
LaunchParams::NUM_LAUNCH_PARAMETERS * sizeof(u32));
const GPUVAddr code_loc = regs.code_loc.Address() + launch_description.program_start;
LOG_WARNING(HW_GPU, "Compute Kernel Execute at Address 0x{:016x}, STUBBED", code_loc);
const GPUVAddr code_addr = regs.code_loc.Address() + launch_description.program_start;
LOG_TRACE(HW_GPU, "Compute invocation launched at address 0x{:016x}", code_addr);
rasterizer.DispatchCompute(code_addr);
}
} // namespace Tegra::Engines

View File

@@ -15,7 +15,7 @@
namespace Tegra::Engines {
KeplerMemory::KeplerMemory(Core::System& system, MemoryManager& memory_manager)
: system{system}, memory_manager{memory_manager}, upload_state{memory_manager, regs.upload} {}
: system{system}, upload_state{memory_manager, regs.upload} {}
KeplerMemory::~KeplerMemory() = default;
@@ -34,7 +34,7 @@ void KeplerMemory::CallMethod(const GPU::MethodCall& method_call) {
const bool is_last_call = method_call.IsLastCall();
upload_state.ProcessData(method_call.argument, is_last_call);
if (is_last_call) {
system.GPU().Maxwell3D().dirty_flags.OnMemoryWrite();
system.GPU().Maxwell3D().dirty.OnMemoryWrite();
}
break;
}

View File

@@ -65,7 +65,6 @@ public:
private:
Core::System& system;
MemoryManager& memory_manager;
Upload::State upload_state;
};

View File

@@ -22,6 +22,7 @@ Maxwell3D::Maxwell3D(Core::System& system, VideoCore::RasterizerInterface& raste
MemoryManager& memory_manager)
: system{system}, rasterizer{rasterizer}, memory_manager{memory_manager},
macro_interpreter{*this}, upload_state{memory_manager, regs.upload} {
InitDirtySettings();
InitializeRegisterDefaults();
}
@@ -69,6 +70,10 @@ void Maxwell3D::InitializeRegisterDefaults() {
regs.stencil_back_func_mask = 0xFFFFFFFF;
regs.stencil_back_mask = 0xFFFFFFFF;
regs.depth_test_func = Regs::ComparisonOp::Always;
regs.cull.front_face = Regs::Cull::FrontFace::CounterClockWise;
regs.cull.cull_face = Regs::Cull::CullFace::Back;
// TODO(Rodrigo): Most games do not set a point size. I think this is a case of a
// register carrying a default value. Assume it's OpenGL's default (1).
regs.point_size = 1.0f;
@@ -86,6 +91,159 @@ void Maxwell3D::InitializeRegisterDefaults() {
regs.rt_separate_frag_data = 1;
}
#define DIRTY_REGS_POS(field_name) (offsetof(Maxwell3D::DirtyRegs, field_name))
void Maxwell3D::InitDirtySettings() {
const auto set_block = [this](const u32 start, const u32 range, const u8 position) {
const auto start_itr = dirty_pointers.begin() + start;
const auto end_itr = start_itr + range;
std::fill(start_itr, end_itr, position);
};
dirty.regs.fill(true);
// Init Render Targets
constexpr u32 registers_per_rt = sizeof(regs.rt[0]) / sizeof(u32);
constexpr u32 rt_start_reg = MAXWELL3D_REG_INDEX(rt);
constexpr u32 rt_end_reg = rt_start_reg + registers_per_rt * 8;
u32 rt_dirty_reg = DIRTY_REGS_POS(render_target);
for (u32 rt_reg = rt_start_reg; rt_reg < rt_end_reg; rt_reg += registers_per_rt) {
set_block(rt_reg, registers_per_rt, rt_dirty_reg);
rt_dirty_reg++;
}
constexpr u32 depth_buffer_flag = DIRTY_REGS_POS(depth_buffer);
dirty_pointers[MAXWELL3D_REG_INDEX(zeta_enable)] = depth_buffer_flag;
dirty_pointers[MAXWELL3D_REG_INDEX(zeta_width)] = depth_buffer_flag;
dirty_pointers[MAXWELL3D_REG_INDEX(zeta_height)] = depth_buffer_flag;
constexpr u32 registers_in_zeta = sizeof(regs.zeta) / sizeof(u32);
constexpr u32 zeta_reg = MAXWELL3D_REG_INDEX(zeta);
set_block(zeta_reg, registers_in_zeta, depth_buffer_flag);
// Init Vertex Arrays
constexpr u32 vertex_array_start = MAXWELL3D_REG_INDEX(vertex_array);
constexpr u32 vertex_array_size = sizeof(regs.vertex_array[0]) / sizeof(u32);
constexpr u32 vertex_array_end = vertex_array_start + vertex_array_size * Regs::NumVertexArrays;
u32 va_reg = DIRTY_REGS_POS(vertex_array);
u32 vi_reg = DIRTY_REGS_POS(vertex_instance);
for (u32 vertex_reg = vertex_array_start; vertex_reg < vertex_array_end;
vertex_reg += vertex_array_size) {
set_block(vertex_reg, 3, va_reg);
// The divisor concerns vertex array instances
dirty_pointers[vertex_reg + 3] = vi_reg;
va_reg++;
vi_reg++;
}
constexpr u32 vertex_limit_start = MAXWELL3D_REG_INDEX(vertex_array_limit);
constexpr u32 vertex_limit_size = sizeof(regs.vertex_array_limit[0]) / sizeof(u32);
constexpr u32 vertex_limit_end = vertex_limit_start + vertex_limit_size * Regs::NumVertexArrays;
va_reg = DIRTY_REGS_POS(vertex_array);
for (u32 vertex_reg = vertex_limit_start; vertex_reg < vertex_limit_end;
vertex_reg += vertex_limit_size) {
set_block(vertex_reg, vertex_limit_size, va_reg);
va_reg++;
}
constexpr u32 vertex_instance_start = MAXWELL3D_REG_INDEX(instanced_arrays);
constexpr u32 vertex_instance_size =
sizeof(regs.instanced_arrays.is_instanced[0]) / sizeof(u32);
constexpr u32 vertex_instance_end =
vertex_instance_start + vertex_instance_size * Regs::NumVertexArrays;
vi_reg = DIRTY_REGS_POS(vertex_instance);
for (u32 vertex_reg = vertex_instance_start; vertex_reg < vertex_instance_end;
vertex_reg += vertex_instance_size) {
set_block(vertex_reg, vertex_instance_size, vi_reg);
vi_reg++;
}
set_block(MAXWELL3D_REG_INDEX(vertex_attrib_format), regs.vertex_attrib_format.size(),
DIRTY_REGS_POS(vertex_attrib_format));
// Init Shaders
constexpr u32 shader_registers_count =
sizeof(regs.shader_config[0]) * Regs::MaxShaderProgram / sizeof(u32);
set_block(MAXWELL3D_REG_INDEX(shader_config[0]), shader_registers_count,
DIRTY_REGS_POS(shaders));
// State
// Viewport
constexpr u32 viewport_dirty_reg = DIRTY_REGS_POS(viewport);
constexpr u32 viewport_start = MAXWELL3D_REG_INDEX(viewports);
constexpr u32 viewport_size = sizeof(regs.viewports) / sizeof(u32);
set_block(viewport_start, viewport_size, viewport_dirty_reg);
constexpr u32 view_volume_start = MAXWELL3D_REG_INDEX(view_volume_clip_control);
constexpr u32 view_volume_size = sizeof(regs.view_volume_clip_control) / sizeof(u32);
set_block(view_volume_start, view_volume_size, viewport_dirty_reg);
// Viewport transformation
constexpr u32 viewport_trans_start = MAXWELL3D_REG_INDEX(viewport_transform);
constexpr u32 viewport_trans_size = sizeof(regs.viewport_transform) / sizeof(u32);
set_block(viewport_trans_start, viewport_trans_size, DIRTY_REGS_POS(viewport_transform));
// Cullmode
constexpr u32 cull_mode_start = MAXWELL3D_REG_INDEX(cull);
constexpr u32 cull_mode_size = sizeof(regs.cull) / sizeof(u32);
set_block(cull_mode_start, cull_mode_size, DIRTY_REGS_POS(cull_mode));
// Screen y control
dirty_pointers[MAXWELL3D_REG_INDEX(screen_y_control)] = DIRTY_REGS_POS(screen_y_control);
// Primitive Restart
constexpr u32 primitive_restart_start = MAXWELL3D_REG_INDEX(primitive_restart);
constexpr u32 primitive_restart_size = sizeof(regs.primitive_restart) / sizeof(u32);
set_block(primitive_restart_start, primitive_restart_size, DIRTY_REGS_POS(primitive_restart));
// Depth Test
constexpr u32 depth_test_dirty_reg = DIRTY_REGS_POS(depth_test);
dirty_pointers[MAXWELL3D_REG_INDEX(depth_test_enable)] = depth_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(depth_write_enabled)] = depth_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(depth_test_func)] = depth_test_dirty_reg;
// Stencil Test
constexpr u32 stencil_test_dirty_reg = DIRTY_REGS_POS(stencil_test);
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_enable)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_front_func_func)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_front_func_ref)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_front_func_mask)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_front_op_fail)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_front_op_zfail)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_front_op_zpass)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_front_mask)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_two_side_enable)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_back_func_func)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_back_func_ref)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_back_func_mask)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_back_op_fail)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_back_op_zfail)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_back_op_zpass)] = stencil_test_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(stencil_back_mask)] = stencil_test_dirty_reg;
// Color Mask
constexpr u32 color_mask_dirty_reg = DIRTY_REGS_POS(color_mask);
dirty_pointers[MAXWELL3D_REG_INDEX(color_mask_common)] = color_mask_dirty_reg;
set_block(MAXWELL3D_REG_INDEX(color_mask), sizeof(regs.color_mask) / sizeof(u32),
color_mask_dirty_reg);
// Blend State
constexpr u32 blend_state_dirty_reg = DIRTY_REGS_POS(blend_state);
set_block(MAXWELL3D_REG_INDEX(blend_color), sizeof(regs.blend_color) / sizeof(u32),
blend_state_dirty_reg);
dirty_pointers[MAXWELL3D_REG_INDEX(independent_blend_enable)] = blend_state_dirty_reg;
set_block(MAXWELL3D_REG_INDEX(blend), sizeof(regs.blend) / sizeof(u32), blend_state_dirty_reg);
set_block(MAXWELL3D_REG_INDEX(independent_blend), sizeof(regs.independent_blend) / sizeof(u32),
blend_state_dirty_reg);
// Scissor State
constexpr u32 scissor_test_dirty_reg = DIRTY_REGS_POS(scissor_test);
set_block(MAXWELL3D_REG_INDEX(scissor_test), sizeof(regs.scissor_test) / sizeof(u32),
scissor_test_dirty_reg);
// Polygon Offset
constexpr u32 polygon_offset_dirty_reg = DIRTY_REGS_POS(polygon_offset);
dirty_pointers[MAXWELL3D_REG_INDEX(polygon_offset_fill_enable)] = polygon_offset_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(polygon_offset_line_enable)] = polygon_offset_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(polygon_offset_point_enable)] = polygon_offset_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(polygon_offset_units)] = polygon_offset_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(polygon_offset_factor)] = polygon_offset_dirty_reg;
dirty_pointers[MAXWELL3D_REG_INDEX(polygon_offset_clamp)] = polygon_offset_dirty_reg;
}
void Maxwell3D::CallMacroMethod(u32 method, std::vector<u32> parameters) {
// Reset the current macro.
executing_macro = 0;
@@ -108,6 +266,14 @@ void Maxwell3D::CallMethod(const GPU::MethodCall& method_call) {
const u32 method = method_call.method;
if (method == cb_data_state.current) {
regs.reg_array[method] = method_call.argument;
ProcessCBData(method_call.argument);
return;
} else if (cb_data_state.current != null_cb_data) {
FinishCBData();
}
// It is an error to write to a register other than the current macro's ARG register before it
// has finished execution.
if (executing_macro != 0) {
@@ -143,49 +309,19 @@ void Maxwell3D::CallMethod(const GPU::MethodCall& method_call) {
if (regs.reg_array[method] != method_call.argument) {
regs.reg_array[method] = method_call.argument;
// Color buffers
constexpr u32 first_rt_reg = MAXWELL3D_REG_INDEX(rt);
constexpr u32 registers_per_rt = sizeof(regs.rt[0]) / sizeof(u32);
if (method >= first_rt_reg &&
method < first_rt_reg + registers_per_rt * Regs::NumRenderTargets) {
const std::size_t rt_index = (method - first_rt_reg) / registers_per_rt;
dirty_flags.color_buffer.set(rt_index);
}
// Zeta buffer
constexpr u32 registers_in_zeta = sizeof(regs.zeta) / sizeof(u32);
if (method == MAXWELL3D_REG_INDEX(zeta_enable) ||
method == MAXWELL3D_REG_INDEX(zeta_width) ||
method == MAXWELL3D_REG_INDEX(zeta_height) ||
(method >= MAXWELL3D_REG_INDEX(zeta) &&
method < MAXWELL3D_REG_INDEX(zeta) + registers_in_zeta)) {
dirty_flags.zeta_buffer = true;
}
// Shader
constexpr u32 shader_registers_count =
sizeof(regs.shader_config[0]) * Regs::MaxShaderProgram / sizeof(u32);
if (method >= MAXWELL3D_REG_INDEX(shader_config[0]) &&
method < MAXWELL3D_REG_INDEX(shader_config[0]) + shader_registers_count) {
dirty_flags.shaders = true;
}
// Vertex format
if (method >= MAXWELL3D_REG_INDEX(vertex_attrib_format) &&
method < MAXWELL3D_REG_INDEX(vertex_attrib_format) + regs.vertex_attrib_format.size()) {
dirty_flags.vertex_attrib_format = true;
}
// Vertex buffer
if (method >= MAXWELL3D_REG_INDEX(vertex_array) &&
method < MAXWELL3D_REG_INDEX(vertex_array) + 4 * Regs::NumVertexArrays) {
dirty_flags.vertex_array.set((method - MAXWELL3D_REG_INDEX(vertex_array)) >> 2);
} else if (method >= MAXWELL3D_REG_INDEX(vertex_array_limit) &&
method < MAXWELL3D_REG_INDEX(vertex_array_limit) + 2 * Regs::NumVertexArrays) {
dirty_flags.vertex_array.set((method - MAXWELL3D_REG_INDEX(vertex_array_limit)) >> 1);
} else if (method >= MAXWELL3D_REG_INDEX(instanced_arrays) &&
method < MAXWELL3D_REG_INDEX(instanced_arrays) + Regs::NumVertexArrays) {
dirty_flags.vertex_array.set(method - MAXWELL3D_REG_INDEX(instanced_arrays));
const std::size_t dirty_reg = dirty_pointers[method];
if (dirty_reg) {
dirty.regs[dirty_reg] = true;
if (dirty_reg >= DIRTY_REGS_POS(vertex_array) &&
dirty_reg < DIRTY_REGS_POS(vertex_array_buffers)) {
dirty.vertex_array_buffers = true;
} else if (dirty_reg >= DIRTY_REGS_POS(vertex_instance) &&
dirty_reg < DIRTY_REGS_POS(vertex_instances)) {
dirty.vertex_instances = true;
} else if (dirty_reg >= DIRTY_REGS_POS(render_target) &&
dirty_reg < DIRTY_REGS_POS(render_settings)) {
dirty.render_settings = true;
}
}
}
@@ -214,7 +350,7 @@ void Maxwell3D::CallMethod(const GPU::MethodCall& method_call) {
case MAXWELL3D_REG_INDEX(const_buffer.cb_data[13]):
case MAXWELL3D_REG_INDEX(const_buffer.cb_data[14]):
case MAXWELL3D_REG_INDEX(const_buffer.cb_data[15]): {
ProcessCBData(method_call.argument);
StartCBData(method);
break;
}
case MAXWELL3D_REG_INDEX(cb_bind[0].raw_config): {
@@ -249,6 +385,10 @@ void Maxwell3D::CallMethod(const GPU::MethodCall& method_call) {
ProcessQueryGet();
break;
}
case MAXWELL3D_REG_INDEX(condition.mode): {
ProcessQueryCondition();
break;
}
case MAXWELL3D_REG_INDEX(sync_info): {
ProcessSyncPoint();
break;
@@ -261,7 +401,7 @@ void Maxwell3D::CallMethod(const GPU::MethodCall& method_call) {
const bool is_last_call = method_call.IsLastCall();
upload_state.ProcessData(method_call.argument, is_last_call);
if (is_last_call) {
dirty_flags.OnMemoryWrite();
dirty.OnMemoryWrite();
}
break;
}
@@ -302,6 +442,7 @@ void Maxwell3D::ProcessQueryGet() {
result = regs.query.query_sequence;
break;
default:
result = 1;
UNIMPLEMENTED_MSG("Unimplemented query select type {}",
static_cast<u32>(regs.query.query_get.select.Value()));
}
@@ -333,7 +474,6 @@ void Maxwell3D::ProcessQueryGet() {
query_result.timestamp = system.CoreTiming().GetTicks();
memory_manager.WriteBlock(sequence_address, &query_result, sizeof(query_result));
}
dirty_flags.OnMemoryWrite();
break;
}
default:
@@ -342,12 +482,52 @@ void Maxwell3D::ProcessQueryGet() {
}
}
void Maxwell3D::ProcessQueryCondition() {
const GPUVAddr condition_address{regs.condition.Address()};
switch (regs.condition.mode) {
case Regs::ConditionMode::Always: {
execute_on = true;
break;
}
case Regs::ConditionMode::Never: {
execute_on = false;
break;
}
case Regs::ConditionMode::ResNonZero: {
Regs::QueryCompare cmp;
memory_manager.ReadBlockUnsafe(condition_address, &cmp, sizeof(cmp));
execute_on = cmp.initial_sequence != 0U && cmp.initial_mode != 0U;
break;
}
case Regs::ConditionMode::Equal: {
Regs::QueryCompare cmp;
memory_manager.ReadBlockUnsafe(condition_address, &cmp, sizeof(cmp));
execute_on =
cmp.initial_sequence == cmp.current_sequence && cmp.initial_mode == cmp.current_mode;
break;
}
case Regs::ConditionMode::NotEqual: {
Regs::QueryCompare cmp;
memory_manager.ReadBlockUnsafe(condition_address, &cmp, sizeof(cmp));
execute_on =
cmp.initial_sequence != cmp.current_sequence || cmp.initial_mode != cmp.current_mode;
break;
}
default: {
UNIMPLEMENTED_MSG("Uninplemented Condition Mode!");
execute_on = true;
break;
}
}
}
void Maxwell3D::ProcessSyncPoint() {
const u32 sync_point = regs.sync_info.sync_point.Value();
const u32 increment = regs.sync_info.increment.Value();
const u32 cache_flush = regs.sync_info.unknown.Value();
LOG_DEBUG(HW_GPU, "Syncpoint set {}, increment: {}, unk: {}", sync_point, increment,
cache_flush);
[[maybe_unused]] const u32 cache_flush = regs.sync_info.unknown.Value();
if (increment) {
system.GPU().IncrementSyncPoint(sync_point);
}
}
void Maxwell3D::DrawArrays() {
@@ -405,23 +585,39 @@ void Maxwell3D::ProcessCBBind(Regs::ShaderStage stage) {
}
void Maxwell3D::ProcessCBData(u32 value) {
const u32 id = cb_data_state.id;
cb_data_state.buffer[id][cb_data_state.counter] = value;
// Increment the current buffer position.
regs.const_buffer.cb_pos = regs.const_buffer.cb_pos + 4;
cb_data_state.counter++;
}
void Maxwell3D::StartCBData(u32 method) {
constexpr u32 first_cb_data = MAXWELL3D_REG_INDEX(const_buffer.cb_data[0]);
cb_data_state.start_pos = regs.const_buffer.cb_pos;
cb_data_state.id = method - first_cb_data;
cb_data_state.current = method;
cb_data_state.counter = 0;
ProcessCBData(regs.const_buffer.cb_data[cb_data_state.id]);
}
void Maxwell3D::FinishCBData() {
// Write the input value to the current const buffer at the current position.
const GPUVAddr buffer_address = regs.const_buffer.BufferAddress();
ASSERT(buffer_address != 0);
// Don't allow writing past the end of the buffer.
ASSERT(regs.const_buffer.cb_pos + sizeof(u32) <= regs.const_buffer.cb_size);
ASSERT(regs.const_buffer.cb_pos <= regs.const_buffer.cb_size);
const GPUVAddr address{buffer_address + regs.const_buffer.cb_pos};
const GPUVAddr address{buffer_address + cb_data_state.start_pos};
const std::size_t size = regs.const_buffer.cb_pos - cb_data_state.start_pos;
u8* ptr{memory_manager.GetPointer(address)};
rasterizer.InvalidateRegion(ToCacheAddr(ptr), sizeof(u32));
memory_manager.Write<u32>(address, value);
const u32 id = cb_data_state.id;
memory_manager.WriteBlock(address, cb_data_state.buffer[id].data(), size);
dirty.OnMemoryWrite();
dirty_flags.OnMemoryWrite();
// Increment the current buffer position.
regs.const_buffer.cb_pos = regs.const_buffer.cb_pos + 4;
cb_data_state.id = null_cb_data;
cb_data_state.current = null_cb_data;
}
Texture::TICEntry Maxwell3D::GetTICEntry(u32 tic_index) const {
@@ -430,10 +626,10 @@ Texture::TICEntry Maxwell3D::GetTICEntry(u32 tic_index) const {
Texture::TICEntry tic_entry;
memory_manager.ReadBlockUnsafe(tic_address_gpu, &tic_entry, sizeof(Texture::TICEntry));
const auto r_type{tic_entry.r_type.Value()};
const auto g_type{tic_entry.g_type.Value()};
const auto b_type{tic_entry.b_type.Value()};
const auto a_type{tic_entry.a_type.Value()};
[[maybe_unused]] const auto r_type{tic_entry.r_type.Value()};
[[maybe_unused]] const auto g_type{tic_entry.g_type.Value()};
[[maybe_unused]] const auto b_type{tic_entry.b_type.Value()};
[[maybe_unused]] const auto a_type{tic_entry.a_type.Value()};
// TODO(Subv): Different data types for separate components are not supported
DEBUG_ASSERT(r_type == g_type && r_type == b_type && r_type == a_type);

View File

@@ -90,6 +90,20 @@ public:
enum class QuerySelect : u32 {
Zero = 0,
TimeElapsed = 2,
TransformFeedbackPrimitivesGenerated = 11,
PrimitivesGenerated = 18,
SamplesPassed = 21,
TransformFeedbackUnknown = 26,
};
struct QueryCompare {
u32 initial_sequence;
u32 initial_mode;
u32 unknown1;
u32 unknown2;
u32 current_sequence;
u32 current_mode;
};
enum class QuerySyncCondition : u32 {
@@ -97,6 +111,14 @@ public:
GreaterThan = 1,
};
enum class ConditionMode : u32 {
Never = 0,
Always = 1,
ResNonZero = 2,
Equal = 3,
NotEqual = 4,
};
enum class ShaderProgram : u32 {
VertexA = 0,
VertexB = 1,
@@ -815,7 +837,18 @@ public:
BitField<4, 1, u32> alpha_to_one;
} multisample_control;
INSERT_PADDING_WORDS(0x7);
INSERT_PADDING_WORDS(0x4);
struct {
u32 address_high;
u32 address_low;
ConditionMode mode;
GPUVAddr Address() const {
return static_cast<GPUVAddr>((static_cast<GPUVAddr>(address_high) << 32) |
address_low);
}
} condition;
struct {
u32 tsc_address_high;
@@ -1124,23 +1157,77 @@ public:
State state{};
struct DirtyFlags {
std::bitset<8> color_buffer{0xFF};
std::bitset<32> vertex_array{0xFFFFFFFF};
struct DirtyRegs {
static constexpr std::size_t NUM_REGS = 256;
union {
struct {
bool null_dirty;
bool vertex_attrib_format = true;
bool zeta_buffer = true;
bool shaders = true;
// Vertex Attributes
bool vertex_attrib_format;
// Vertex Arrays
std::array<bool, 32> vertex_array;
bool vertex_array_buffers;
// Vertex Instances
std::array<bool, 32> vertex_instance;
bool vertex_instances;
// Render Targets
std::array<bool, 8> render_target;
bool depth_buffer;
bool render_settings;
// Shaders
bool shaders;
// Rasterizer State
bool viewport;
bool clip_coefficient;
bool cull_mode;
bool primitive_restart;
bool depth_test;
bool stencil_test;
bool blend_state;
bool scissor_test;
bool transform_feedback;
bool color_mask;
bool polygon_offset;
// Complementary
bool viewport_transform;
bool screen_y_control;
bool memory_general;
};
std::array<bool, NUM_REGS> regs;
};
void ResetVertexArrays() {
vertex_array.fill(true);
vertex_array_buffers = true;
}
void ResetRenderTargets() {
depth_buffer = true;
render_target.fill(true);
render_settings = true;
}
void OnMemoryWrite() {
zeta_buffer = true;
shaders = true;
color_buffer.set();
vertex_array.set();
memory_general = true;
ResetRenderTargets();
ResetVertexArrays();
}
};
DirtyFlags dirty_flags;
} dirty{};
std::array<u8, Regs::NUM_REGS> dirty_pointers{};
/// Reads a register value located at the input method address
u32 GetRegisterValue(u32 method) const;
@@ -1169,6 +1256,10 @@ public:
return macro_memory;
}
bool ShouldExecute() const {
return execute_on;
}
private:
void InitializeRegisterDefaults();
@@ -1192,14 +1283,27 @@ private:
/// Interpreter for the macro codes uploaded to the GPU.
MacroInterpreter macro_interpreter;
static constexpr u32 null_cb_data = 0xFFFFFFFF;
struct {
std::array<std::array<u32, 0x4000>, 16> buffer;
u32 current{null_cb_data};
u32 id{null_cb_data};
u32 start_pos{};
u32 counter{};
} cb_data_state;
Upload::State upload_state;
bool execute_on{true};
/// Retrieves information about a specific TIC entry from the TIC buffer.
Texture::TICEntry GetTICEntry(u32 tic_index) const;
/// Retrieves information about a specific TSC entry from the TSC buffer.
Texture::TSCEntry GetTSCEntry(u32 tsc_index) const;
void InitDirtySettings();
/**
* Call a macro on this engine.
* @param method Method to call
@@ -1219,11 +1323,16 @@ private:
/// Handles a write to the QUERY_GET register.
void ProcessQueryGet();
// Handles Conditional Rendering
void ProcessQueryCondition();
/// Handles writes to syncing register.
void ProcessSyncPoint();
/// Handles a write to the CB_DATA[i] register.
void StartCBData(u32 method);
void ProcessCBData(u32 value);
void FinishCBData();
/// Handles a write to the CB_BIND register.
void ProcessCBBind(Regs::ShaderStage stage);
@@ -1290,6 +1399,7 @@ ASSERT_REG_POSITION(clip_distance_enabled, 0x544);
ASSERT_REG_POSITION(point_size, 0x546);
ASSERT_REG_POSITION(zeta_enable, 0x54E);
ASSERT_REG_POSITION(multisample_control, 0x54F);
ASSERT_REG_POSITION(condition, 0x554);
ASSERT_REG_POSITION(tsc, 0x557);
ASSERT_REG_POSITION(polygon_offset_factor, 0x55b);
ASSERT_REG_POSITION(tic, 0x55D);

View File

@@ -5,18 +5,17 @@
#include "common/assert.h"
#include "common/logging/log.h"
#include "core/core.h"
#include "core/settings.h"
#include "video_core/engines/maxwell_3d.h"
#include "video_core/engines/maxwell_dma.h"
#include "video_core/memory_manager.h"
#include "video_core/rasterizer_interface.h"
#include "video_core/renderer_base.h"
#include "video_core/textures/decoders.h"
namespace Tegra::Engines {
MaxwellDMA::MaxwellDMA(Core::System& system, VideoCore::RasterizerInterface& rasterizer,
MemoryManager& memory_manager)
: system{system}, rasterizer{rasterizer}, memory_manager{memory_manager} {}
MaxwellDMA::MaxwellDMA(Core::System& system, MemoryManager& memory_manager)
: system{system}, memory_manager{memory_manager} {}
void MaxwellDMA::CallMethod(const GPU::MethodCall& method_call) {
ASSERT_MSG(method_call.method < Regs::NUM_REGS,
@@ -38,7 +37,7 @@ void MaxwellDMA::CallMethod(const GPU::MethodCall& method_call) {
}
void MaxwellDMA::HandleCopy() {
LOG_WARNING(HW_GPU, "Requested a DMA copy");
LOG_TRACE(HW_GPU, "Requested a DMA copy");
const GPUVAddr source = regs.src_address.Address();
const GPUVAddr dest = regs.dst_address.Address();
@@ -58,7 +57,7 @@ void MaxwellDMA::HandleCopy() {
}
// All copies here update the main memory, so mark all rasterizer states as invalid.
system.GPU().Maxwell3D().dirty_flags.OnMemoryWrite();
system.GPU().Maxwell3D().dirty.OnMemoryWrite();
if (regs.exec.is_dst_linear && regs.exec.is_src_linear) {
// When the enable_2d bit is disabled, the copy is performed as if we were copying a 1D
@@ -84,13 +83,17 @@ void MaxwellDMA::HandleCopy() {
ASSERT(regs.exec.enable_2d == 1);
if (regs.exec.is_dst_linear && !regs.exec.is_src_linear) {
ASSERT(regs.src_params.size_z == 1);
ASSERT(regs.src_params.BlockDepth() == 0);
// If the input is tiled and the output is linear, deswizzle the input and copy it over.
const u32 src_bytes_per_pixel = regs.src_pitch / regs.src_params.size_x;
const u32 bytes_per_pixel = regs.dst_pitch / regs.x_count;
const std::size_t src_size = Texture::CalculateSize(
true, src_bytes_per_pixel, regs.src_params.size_x, regs.src_params.size_y,
true, bytes_per_pixel, regs.src_params.size_x, regs.src_params.size_y,
regs.src_params.size_z, regs.src_params.BlockHeight(), regs.src_params.BlockDepth());
const std::size_t src_layer_size = Texture::CalculateSize(
true, bytes_per_pixel, regs.src_params.size_x, regs.src_params.size_y, 1,
regs.src_params.BlockHeight(), regs.src_params.BlockDepth());
const std::size_t dst_size = regs.dst_pitch * regs.y_count;
if (read_buffer.size() < src_size) {
@@ -104,23 +107,23 @@ void MaxwellDMA::HandleCopy() {
memory_manager.ReadBlock(source, read_buffer.data(), src_size);
memory_manager.ReadBlock(dest, write_buffer.data(), dst_size);
Texture::UnswizzleSubrect(regs.x_count, regs.y_count, regs.dst_pitch,
regs.src_params.size_x, src_bytes_per_pixel, read_buffer.data(),
write_buffer.data(), regs.src_params.BlockHeight(),
regs.src_params.pos_x, regs.src_params.pos_y);
Texture::UnswizzleSubrect(
regs.x_count, regs.y_count, regs.dst_pitch, regs.src_params.size_x, bytes_per_pixel,
read_buffer.data() + src_layer_size * regs.src_params.pos_z, write_buffer.data(),
regs.src_params.BlockHeight(), regs.src_params.pos_x, regs.src_params.pos_y);
memory_manager.WriteBlock(dest, write_buffer.data(), dst_size);
} else {
ASSERT(regs.dst_params.BlockDepth() == 0);
const u32 src_bytes_per_pixel = regs.src_pitch / regs.x_count;
const u32 bytes_per_pixel = regs.src_pitch / regs.x_count;
const std::size_t dst_size = Texture::CalculateSize(
true, src_bytes_per_pixel, regs.dst_params.size_x, regs.dst_params.size_y,
true, bytes_per_pixel, regs.dst_params.size_x, regs.dst_params.size_y,
regs.dst_params.size_z, regs.dst_params.BlockHeight(), regs.dst_params.BlockDepth());
const std::size_t dst_layer_size = Texture::CalculateSize(
true, src_bytes_per_pixel, regs.dst_params.size_x, regs.dst_params.size_y, 1,
true, bytes_per_pixel, regs.dst_params.size_x, regs.dst_params.size_y, 1,
regs.dst_params.BlockHeight(), regs.dst_params.BlockDepth());
const std::size_t src_size = regs.src_pitch * regs.y_count;
@@ -133,14 +136,19 @@ void MaxwellDMA::HandleCopy() {
write_buffer.resize(dst_size);
}
memory_manager.ReadBlock(source, read_buffer.data(), src_size);
memory_manager.ReadBlock(dest, write_buffer.data(), dst_size);
if (Settings::values.use_accurate_gpu_emulation) {
memory_manager.ReadBlock(source, read_buffer.data(), src_size);
memory_manager.ReadBlock(dest, write_buffer.data(), dst_size);
} else {
memory_manager.ReadBlockUnsafe(source, read_buffer.data(), src_size);
memory_manager.ReadBlockUnsafe(dest, write_buffer.data(), dst_size);
}
// If the input is linear and the output is tiled, swizzle the input and copy it over.
Texture::SwizzleSubrect(regs.x_count, regs.y_count, regs.src_pitch, regs.dst_params.size_x,
src_bytes_per_pixel,
write_buffer.data() + dst_layer_size * regs.dst_params.pos_z,
read_buffer.data(), regs.dst_params.BlockHeight());
Texture::SwizzleSubrect(
regs.x_count, regs.y_count, regs.src_pitch, regs.dst_params.size_x, bytes_per_pixel,
write_buffer.data() + dst_layer_size * regs.dst_params.pos_z, read_buffer.data(),
regs.dst_params.BlockHeight(), regs.dst_params.pos_x, regs.dst_params.pos_y);
memory_manager.WriteBlock(dest, write_buffer.data(), dst_size);
}

View File

@@ -20,10 +20,6 @@ namespace Tegra {
class MemoryManager;
}
namespace VideoCore {
class RasterizerInterface;
}
namespace Tegra::Engines {
/**
@@ -33,8 +29,7 @@ namespace Tegra::Engines {
class MaxwellDMA final {
public:
explicit MaxwellDMA(Core::System& system, VideoCore::RasterizerInterface& rasterizer,
MemoryManager& memory_manager);
explicit MaxwellDMA(Core::System& system, MemoryManager& memory_manager);
~MaxwellDMA() = default;
/// Write the value to the register identified by method.
@@ -180,8 +175,6 @@ public:
private:
Core::System& system;
VideoCore::RasterizerInterface& rasterizer;
MemoryManager& memory_manager;
std::vector<u8> read_buffer;

View File

@@ -538,6 +538,12 @@ enum class PhysicalAttributeDirection : u64 {
Output = 1,
};
enum class VoteOperation : u64 {
All = 0, // allThreadsNV
Any = 1, // anyThreadNV
Eq = 2, // allThreadsEqualNV
};
union Instruction {
Instruction& operator=(const Instruction& instr) {
value = instr.value;
@@ -559,6 +565,18 @@ union Instruction {
BitField<39, 8, Register> gpr39;
BitField<48, 16, u64> opcode;
union {
BitField<8, 5, ConditionCode> cc;
BitField<13, 1, u64> trigger;
} nop;
union {
BitField<48, 2, VoteOperation> operation;
BitField<45, 3, u64> dest_pred;
BitField<39, 3, u64> value;
BitField<42, 1, u64> negate_value;
} vote;
union {
BitField<8, 8, Register> gpr;
BitField<20, 24, s64> offset;
@@ -1482,6 +1500,7 @@ public:
SYNC,
BRK,
DEPBAR,
VOTE,
BFE_C,
BFE_R,
BFE_IMM,
@@ -1514,6 +1533,7 @@ public:
TMML, // Texture Mip Map Level
SUST, // Surface Store
EXIT,
NOP,
IPA,
OUT_R, // Emit vertex/primitive
ISBERD,
@@ -1643,6 +1663,7 @@ public:
Hfma2,
Flow,
Synch,
Warp,
Memory,
Texture,
Image,
@@ -1769,6 +1790,7 @@ private:
INST("111000110100---", Id::BRK, Type::Flow, "BRK"),
INST("111000110000----", Id::EXIT, Type::Flow, "EXIT"),
INST("1111000011110---", Id::DEPBAR, Type::Synch, "DEPBAR"),
INST("0101000011011---", Id::VOTE, Type::Warp, "VOTE"),
INST("1110111111011---", Id::LD_A, Type::Memory, "LD_A"),
INST("1110111101001---", Id::LD_S, Type::Memory, "LD_S"),
INST("1110111101000---", Id::LD_L, Type::Memory, "LD_L"),
@@ -1793,6 +1815,7 @@ private:
INST("110111110110----", Id::TMML_B, Type::Texture, "TMML_B"),
INST("1101111101011---", Id::TMML, Type::Texture, "TMML"),
INST("11101011001-----", Id::SUST, Type::Image, "SUST"),
INST("0101000010110---", Id::NOP, Type::Trivial, "NOP"),
INST("11100000--------", Id::IPA, Type::Trivial, "IPA"),
INST("1111101111100---", Id::OUT_R, Type::Trivial, "OUT_R"),
INST("1110111111010---", Id::ISBERD, Type::Trivial, "ISBERD"),

View File

@@ -29,14 +29,15 @@ u32 FramebufferConfig::BytesPerPixel(PixelFormat format) {
UNREACHABLE();
}
GPU::GPU(Core::System& system, VideoCore::RendererBase& renderer) : renderer{renderer} {
GPU::GPU(Core::System& system, VideoCore::RendererBase& renderer, bool is_async)
: system{system}, renderer{renderer}, is_async{is_async} {
auto& rasterizer{renderer.Rasterizer()};
memory_manager = std::make_unique<Tegra::MemoryManager>(system, rasterizer);
dma_pusher = std::make_unique<Tegra::DmaPusher>(*this);
maxwell_3d = std::make_unique<Engines::Maxwell3D>(system, rasterizer, *memory_manager);
fermi_2d = std::make_unique<Engines::Fermi2D>(rasterizer, *memory_manager);
fermi_2d = std::make_unique<Engines::Fermi2D>(rasterizer);
kepler_compute = std::make_unique<Engines::KeplerCompute>(system, rasterizer, *memory_manager);
maxwell_dma = std::make_unique<Engines::MaxwellDMA>(system, rasterizer, *memory_manager);
maxwell_dma = std::make_unique<Engines::MaxwellDMA>(system, *memory_manager);
kepler_memory = std::make_unique<Engines::KeplerMemory>(system, *memory_manager);
}
@@ -50,6 +51,14 @@ const Engines::Maxwell3D& GPU::Maxwell3D() const {
return *maxwell_3d;
}
Engines::KeplerCompute& GPU::KeplerCompute() {
return *kepler_compute;
}
const Engines::KeplerCompute& GPU::KeplerCompute() const {
return *kepler_compute;
}
MemoryManager& GPU::MemoryManager() {
return *memory_manager;
}
@@ -66,6 +75,55 @@ const DmaPusher& GPU::DmaPusher() const {
return *dma_pusher;
}
void GPU::IncrementSyncPoint(const u32 syncpoint_id) {
syncpoints[syncpoint_id]++;
std::lock_guard lock{sync_mutex};
if (!syncpt_interrupts[syncpoint_id].empty()) {
u32 value = syncpoints[syncpoint_id].load();
auto it = syncpt_interrupts[syncpoint_id].begin();
while (it != syncpt_interrupts[syncpoint_id].end()) {
if (value >= *it) {
TriggerCpuInterrupt(syncpoint_id, *it);
it = syncpt_interrupts[syncpoint_id].erase(it);
continue;
}
it++;
}
}
}
u32 GPU::GetSyncpointValue(const u32 syncpoint_id) const {
return syncpoints[syncpoint_id].load();
}
void GPU::RegisterSyncptInterrupt(const u32 syncpoint_id, const u32 value) {
auto& interrupt = syncpt_interrupts[syncpoint_id];
bool contains = std::any_of(interrupt.begin(), interrupt.end(),
[value](u32 in_value) { return in_value == value; });
if (contains) {
return;
}
syncpt_interrupts[syncpoint_id].emplace_back(value);
}
bool GPU::CancelSyncptInterrupt(const u32 syncpoint_id, const u32 value) {
std::lock_guard lock{sync_mutex};
auto& interrupt = syncpt_interrupts[syncpoint_id];
const auto iter =
std::find_if(interrupt.begin(), interrupt.end(),
[value](u32 interrupt_value) { return value == interrupt_value; });
if (iter == interrupt.end()) {
return false;
}
interrupt.erase(iter);
return true;
}
void GPU::FlushCommands() {
renderer.Rasterizer().FlushCommands();
}
u32 RenderTargetBytesPerPixel(RenderTargetFormat format) {
ASSERT(format != RenderTargetFormat::NONE);
@@ -143,12 +201,12 @@ enum class BufferMethods {
NotifyIntr = 0x8,
WrcacheFlush = 0x9,
Unk28 = 0xA,
Unk2c = 0xB,
UnkCacheFlush = 0xB,
RefCnt = 0x14,
SemaphoreAcquire = 0x1A,
SemaphoreRelease = 0x1B,
Unk70 = 0x1C,
Unk74 = 0x1D,
FenceValue = 0x1C,
FenceAction = 0x1D,
Unk78 = 0x1E,
Unk7c = 0x1F,
Yield = 0x20,
@@ -194,6 +252,10 @@ void GPU::CallPullerMethod(const MethodCall& method_call) {
case BufferMethods::SemaphoreAddressLow:
case BufferMethods::SemaphoreSequence:
case BufferMethods::RefCnt:
case BufferMethods::UnkCacheFlush:
case BufferMethods::WrcacheFlush:
case BufferMethods::FenceValue:
case BufferMethods::FenceAction:
break;
case BufferMethods::SemaphoreTrigger: {
ProcessSemaphoreTriggerMethod();
@@ -204,21 +266,11 @@ void GPU::CallPullerMethod(const MethodCall& method_call) {
LOG_ERROR(HW_GPU, "Special puller engine method NotifyIntr not implemented");
break;
}
case BufferMethods::WrcacheFlush: {
// TODO(Kmather73): Research and implement this method.
LOG_ERROR(HW_GPU, "Special puller engine method WrcacheFlush not implemented");
break;
}
case BufferMethods::Unk28: {
// TODO(Kmather73): Research and implement this method.
LOG_ERROR(HW_GPU, "Special puller engine method Unk28 not implemented");
break;
}
case BufferMethods::Unk2c: {
// TODO(Kmather73): Research and implement this method.
LOG_ERROR(HW_GPU, "Special puller engine method Unk2c not implemented");
break;
}
case BufferMethods::SemaphoreAcquire: {
ProcessSemaphoreAcquire();
break;

View File

@@ -5,8 +5,12 @@
#pragma once
#include <array>
#include <atomic>
#include <list>
#include <memory>
#include <mutex>
#include "common/common_types.h"
#include "core/hle/service/nvdrv/nvdata.h"
#include "core/hle/service/nvflinger/buffer_queue.h"
#include "video_core/dma_pusher.h"
@@ -15,6 +19,10 @@ inline CacheAddr ToCacheAddr(const void* host_ptr) {
return reinterpret_cast<CacheAddr>(host_ptr);
}
inline u8* FromCacheAddr(CacheAddr cache_addr) {
return reinterpret_cast<u8*>(cache_addr);
}
namespace Core {
class System;
}
@@ -127,7 +135,7 @@ class MemoryManager;
class GPU {
public:
explicit GPU(Core::System& system, VideoCore::RendererBase& renderer);
explicit GPU(Core::System& system, VideoCore::RendererBase& renderer, bool is_async);
virtual ~GPU();
@@ -149,12 +157,20 @@ public:
/// Calls a GPU method.
void CallMethod(const MethodCall& method_call);
void FlushCommands();
/// Returns a reference to the Maxwell3D GPU engine.
Engines::Maxwell3D& Maxwell3D();
/// Returns a const reference to the Maxwell3D GPU engine.
const Engines::Maxwell3D& Maxwell3D() const;
/// Returns a reference to the KeplerCompute GPU engine.
Engines::KeplerCompute& KeplerCompute();
/// Returns a reference to the KeplerCompute GPU engine.
const Engines::KeplerCompute& KeplerCompute() const;
/// Returns a reference to the GPU memory manager.
Tegra::MemoryManager& MemoryManager();
@@ -164,6 +180,22 @@ public:
/// Returns a reference to the GPU DMA pusher.
Tegra::DmaPusher& DmaPusher();
void IncrementSyncPoint(u32 syncpoint_id);
u32 GetSyncpointValue(u32 syncpoint_id) const;
void RegisterSyncptInterrupt(u32 syncpoint_id, u32 value);
bool CancelSyncptInterrupt(u32 syncpoint_id, u32 value);
std::unique_lock<std::mutex> LockSync() {
return std::unique_lock{sync_mutex};
}
bool IsAsync() const {
return is_async;
}
/// Returns a const reference to the GPU DMA pusher.
const Tegra::DmaPusher& DmaPusher() const;
@@ -194,7 +226,12 @@ public:
u32 semaphore_acquire;
u32 semaphore_release;
INSERT_PADDING_WORDS(0xE4);
u32 fence_value;
union {
BitField<4, 4, u32> operation;
BitField<8, 8, u32> id;
} fence_action;
INSERT_PADDING_WORDS(0xE2);
// Puller state
u32 acquire_mode;
@@ -228,6 +265,9 @@ public:
/// Notify rasterizer that any caches of the specified region should be flushed and invalidated
virtual void FlushAndInvalidateRegion(CacheAddr addr, u64 size) = 0;
protected:
virtual void TriggerCpuInterrupt(u32 syncpoint_id, u32 value) const = 0;
private:
void ProcessBindMethod(const MethodCall& method_call);
void ProcessSemaphoreTriggerMethod();
@@ -245,6 +285,7 @@ private:
protected:
std::unique_ptr<Tegra::DmaPusher> dma_pusher;
Core::System& system;
VideoCore::RendererBase& renderer;
private:
@@ -262,6 +303,14 @@ private:
std::unique_ptr<Engines::MaxwellDMA> maxwell_dma;
/// Inline memory engine
std::unique_ptr<Engines::KeplerMemory> kepler_memory;
std::array<std::atomic<u32>, Service::Nvidia::MaxSyncPoints> syncpoints{};
std::array<std::list<u32>, Service::Nvidia::MaxSyncPoints> syncpt_interrupts;
std::mutex sync_mutex;
const bool is_async;
};
#define ASSERT_REG_POSITION(field_name, position) \
@@ -274,6 +323,8 @@ ASSERT_REG_POSITION(semaphore_trigger, 0x7);
ASSERT_REG_POSITION(reference_count, 0x14);
ASSERT_REG_POSITION(semaphore_acquire, 0x1A);
ASSERT_REG_POSITION(semaphore_release, 0x1B);
ASSERT_REG_POSITION(fence_value, 0x1C);
ASSERT_REG_POSITION(fence_action, 0x1D);
ASSERT_REG_POSITION(acquire_mode, 0x100);
ASSERT_REG_POSITION(acquire_source, 0x101);

View File

@@ -2,6 +2,8 @@
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#include "core/core.h"
#include "core/hardware_interrupt_manager.h"
#include "video_core/gpu_asynch.h"
#include "video_core/gpu_thread.h"
#include "video_core/renderer_base.h"
@@ -9,7 +11,7 @@
namespace VideoCommon {
GPUAsynch::GPUAsynch(Core::System& system, VideoCore::RendererBase& renderer)
: GPU(system, renderer), gpu_thread{system} {}
: GPU(system, renderer, true), gpu_thread{system} {}
GPUAsynch::~GPUAsynch() = default;
@@ -38,4 +40,9 @@ void GPUAsynch::FlushAndInvalidateRegion(CacheAddr addr, u64 size) {
gpu_thread.FlushAndInvalidateRegion(addr, size);
}
void GPUAsynch::TriggerCpuInterrupt(const u32 syncpoint_id, const u32 value) const {
auto& interrupt_manager = system.InterruptManager();
interrupt_manager.GPUInterruptSyncpt(syncpoint_id, value);
}
} // namespace VideoCommon

View File

@@ -27,6 +27,9 @@ public:
void InvalidateRegion(CacheAddr addr, u64 size) override;
void FlushAndInvalidateRegion(CacheAddr addr, u64 size) override;
protected:
void TriggerCpuInterrupt(u32 syncpoint_id, u32 value) const override;
private:
GPUThread::ThreadManager gpu_thread;
};

View File

@@ -8,7 +8,7 @@
namespace VideoCommon {
GPUSynch::GPUSynch(Core::System& system, VideoCore::RendererBase& renderer)
: GPU(system, renderer) {}
: GPU(system, renderer, false) {}
GPUSynch::~GPUSynch() = default;

View File

@@ -25,6 +25,10 @@ public:
void FlushRegion(CacheAddr addr, u64 size) override;
void InvalidateRegion(CacheAddr addr, u64 size) override;
void FlushAndInvalidateRegion(CacheAddr addr, u64 size) override;
protected:
void TriggerCpuInterrupt([[maybe_unused]] u32 syncpoint_id,
[[maybe_unused]] u32 value) const override {}
};
} // namespace VideoCommon

View File

@@ -21,7 +21,8 @@ static void RunThread(VideoCore::RendererBase& renderer, Tegra::DmaPusher& dma_p
MicroProfileOnThreadCreate("GpuThread");
// Wait for first GPU command before acquiring the window context
state.WaitForCommands();
while (state.queue.Empty())
;
// If emulation was stopped during disk shader loading, abort before trying to acquire context
if (!state.is_running) {
@@ -32,7 +33,6 @@ static void RunThread(VideoCore::RendererBase& renderer, Tegra::DmaPusher& dma_p
CommandDataContainer next;
while (state.is_running) {
state.WaitForCommands();
while (!state.queue.Empty()) {
state.queue.Pop(next);
if (const auto submit_list = std::get_if<SubmitListCommand>(&next.data)) {
@@ -49,8 +49,7 @@ static void RunThread(VideoCore::RendererBase& renderer, Tegra::DmaPusher& dma_p
} else {
UNREACHABLE();
}
state.signaled_fence = next.fence;
state.TrySynchronize();
state.signaled_fence.store(next.fence);
}
}
}
@@ -89,12 +88,7 @@ void ThreadManager::FlushRegion(CacheAddr addr, u64 size) {
}
void ThreadManager::InvalidateRegion(CacheAddr addr, u64 size) {
if (state.queue.Empty()) {
// It's quicker to invalidate a single region on the CPU if the queue is already empty
system.Renderer().Rasterizer().InvalidateRegion(addr, size);
} else {
PushCommand(InvalidateRegionCommand(addr, size));
}
system.Renderer().Rasterizer().InvalidateRegion(addr, size);
}
void ThreadManager::FlushAndInvalidateRegion(CacheAddr addr, u64 size) {
@@ -105,22 +99,13 @@ void ThreadManager::FlushAndInvalidateRegion(CacheAddr addr, u64 size) {
u64 ThreadManager::PushCommand(CommandData&& command_data) {
const u64 fence{++state.last_fence};
state.queue.Push(CommandDataContainer(std::move(command_data), fence));
state.SignalCommands();
return fence;
}
MICROPROFILE_DEFINE(GPU_wait, "GPU", "Wait for the GPU", MP_RGB(128, 128, 192));
void SynchState::WaitForSynchronization(u64 fence) {
if (signaled_fence >= fence) {
return;
}
// Wait for the GPU to be idle (all commands to be executed)
{
MICROPROFILE_SCOPE(GPU_wait);
std::unique_lock lock{synchronization_mutex};
synchronization_condition.wait(lock, [this, fence] { return signaled_fence >= fence; });
}
while (signaled_fence.load() < fence)
;
}
} // namespace VideoCommon::GPUThread

View File

@@ -88,41 +88,9 @@ struct CommandDataContainer {
/// Struct used to synchronize the GPU thread
struct SynchState final {
std::atomic_bool is_running{true};
std::atomic_int queued_frame_count{};
std::mutex synchronization_mutex;
std::mutex commands_mutex;
std::condition_variable commands_condition;
std::condition_variable synchronization_condition;
/// Returns true if the gap in GPU commands is small enough that we can consider the CPU and GPU
/// synchronized. This is entirely empirical.
bool IsSynchronized() const {
constexpr std::size_t max_queue_gap{5};
return queue.Size() <= max_queue_gap;
}
void TrySynchronize() {
if (IsSynchronized()) {
std::lock_guard lock{synchronization_mutex};
synchronization_condition.notify_one();
}
}
void WaitForSynchronization(u64 fence);
void SignalCommands() {
if (queue.Empty()) {
return;
}
commands_condition.notify_one();
}
void WaitForCommands() {
std::unique_lock lock{commands_mutex};
commands_condition.wait(lock, [this] { return !queue.Empty(); });
}
using CommandQueue = Common::SPSCQueue<CommandDataContainer>;
CommandQueue queue;
u64 last_fence{};

View File

@@ -34,6 +34,9 @@ public:
/// Clear the current framebuffer
virtual void Clear() = 0;
/// Dispatches a compute shader invocation
virtual void DispatchCompute(GPUVAddr code_addr) = 0;
/// Notify rasterizer that all caches should be flushed to Switch memory
virtual void FlushAll() = 0;
@@ -47,6 +50,9 @@ public:
/// and invalidated
virtual void FlushAndInvalidateRegion(CacheAddr addr, u64 size) = 0;
/// Notify the rasterizer to send all written commands to the host GPU.
virtual void FlushCommands() = 0;
/// Notify rasterizer that a frame is about to finish
virtual void TickFrame() = 0;

View File

@@ -7,28 +7,41 @@
#include <glad/glad.h>
#include "common/assert.h"
#include "common/microprofile.h"
#include "video_core/rasterizer_interface.h"
#include "video_core/renderer_opengl/gl_buffer_cache.h"
#include "video_core/renderer_opengl/gl_rasterizer.h"
#include "video_core/renderer_opengl/gl_resource_manager.h"
namespace OpenGL {
MICROPROFILE_DEFINE(OpenGL_Buffer_Download, "OpenGL", "Buffer Download", MP_RGB(192, 192, 128));
CachedBufferBlock::CachedBufferBlock(CacheAddr cache_addr, const std::size_t size)
: VideoCommon::BufferBlock{cache_addr, size} {
gl_buffer.Create();
glNamedBufferData(gl_buffer.handle, static_cast<GLsizeiptr>(size), nullptr, GL_DYNAMIC_DRAW);
}
CachedBufferBlock::~CachedBufferBlock() = default;
OGLBufferCache::OGLBufferCache(RasterizerOpenGL& rasterizer, Core::System& system,
std::size_t stream_size)
: VideoCommon::BufferCache<OGLBuffer, GLuint, OGLStreamBuffer>{
: VideoCommon::BufferCache<Buffer, GLuint, OGLStreamBuffer>{
rasterizer, system, std::make_unique<OGLStreamBuffer>(stream_size, true)} {}
OGLBufferCache::~OGLBufferCache() = default;
OGLBuffer OGLBufferCache::CreateBuffer(std::size_t size) {
OGLBuffer buffer;
buffer.Create();
glNamedBufferData(buffer.handle, static_cast<GLsizeiptr>(size), nullptr, GL_DYNAMIC_DRAW);
return buffer;
Buffer OGLBufferCache::CreateBlock(CacheAddr cache_addr, std::size_t size) {
return std::make_shared<CachedBufferBlock>(cache_addr, size);
}
const GLuint* OGLBufferCache::ToHandle(const OGLBuffer& buffer) {
return &buffer.handle;
void OGLBufferCache::WriteBarrier() {
glMemoryBarrier(GL_ALL_BARRIER_BITS);
}
const GLuint* OGLBufferCache::ToHandle(const Buffer& buffer) {
return buffer->GetHandle();
}
const GLuint* OGLBufferCache::GetEmptyBuffer(std::size_t) {
@@ -36,23 +49,24 @@ const GLuint* OGLBufferCache::GetEmptyBuffer(std::size_t) {
return &null_buffer;
}
void OGLBufferCache::UploadBufferData(const OGLBuffer& buffer, std::size_t offset, std::size_t size,
const u8* data) {
glNamedBufferSubData(buffer.handle, static_cast<GLintptr>(offset),
void OGLBufferCache::UploadBlockData(const Buffer& buffer, std::size_t offset, std::size_t size,
const u8* data) {
glNamedBufferSubData(*buffer->GetHandle(), static_cast<GLintptr>(offset),
static_cast<GLsizeiptr>(size), data);
}
void OGLBufferCache::DownloadBufferData(const OGLBuffer& buffer, std::size_t offset,
std::size_t size, u8* data) {
glGetNamedBufferSubData(buffer.handle, static_cast<GLintptr>(offset),
void OGLBufferCache::DownloadBlockData(const Buffer& buffer, std::size_t offset, std::size_t size,
u8* data) {
MICROPROFILE_SCOPE(OpenGL_Buffer_Download);
glGetNamedBufferSubData(*buffer->GetHandle(), static_cast<GLintptr>(offset),
static_cast<GLsizeiptr>(size), data);
}
void OGLBufferCache::CopyBufferData(const OGLBuffer& src, const OGLBuffer& dst,
std::size_t src_offset, std::size_t dst_offset,
std::size_t size) {
glCopyNamedBufferSubData(src.handle, dst.handle, static_cast<GLintptr>(src_offset),
static_cast<GLintptr>(dst_offset), static_cast<GLsizeiptr>(size));
void OGLBufferCache::CopyBlock(const Buffer& src, const Buffer& dst, std::size_t src_offset,
std::size_t dst_offset, std::size_t size) {
glCopyNamedBufferSubData(*src->GetHandle(), *dst->GetHandle(),
static_cast<GLintptr>(src_offset), static_cast<GLintptr>(dst_offset),
static_cast<GLsizeiptr>(size));
}
} // namespace OpenGL

View File

@@ -7,7 +7,7 @@
#include <memory>
#include "common/common_types.h"
#include "video_core/buffer_cache.h"
#include "video_core/buffer_cache/buffer_cache.h"
#include "video_core/rasterizer_cache.h"
#include "video_core/renderer_opengl/gl_resource_manager.h"
#include "video_core/renderer_opengl/gl_stream_buffer.h"
@@ -21,7 +21,24 @@ namespace OpenGL {
class OGLStreamBuffer;
class RasterizerOpenGL;
class OGLBufferCache final : public VideoCommon::BufferCache<OGLBuffer, GLuint, OGLStreamBuffer> {
class CachedBufferBlock;
using Buffer = std::shared_ptr<CachedBufferBlock>;
class CachedBufferBlock : public VideoCommon::BufferBlock {
public:
explicit CachedBufferBlock(CacheAddr cache_addr, const std::size_t size);
~CachedBufferBlock();
const GLuint* GetHandle() const {
return &gl_buffer.handle;
}
private:
OGLBuffer gl_buffer{};
};
class OGLBufferCache final : public VideoCommon::BufferCache<Buffer, GLuint, OGLStreamBuffer> {
public:
explicit OGLBufferCache(RasterizerOpenGL& rasterizer, Core::System& system,
std::size_t stream_size);
@@ -30,18 +47,20 @@ public:
const GLuint* GetEmptyBuffer(std::size_t) override;
protected:
OGLBuffer CreateBuffer(std::size_t size) override;
Buffer CreateBlock(CacheAddr cache_addr, std::size_t size) override;
const GLuint* ToHandle(const OGLBuffer& buffer) override;
void WriteBarrier() override;
void UploadBufferData(const OGLBuffer& buffer, std::size_t offset, std::size_t size,
const u8* data) override;
const GLuint* ToHandle(const Buffer& buffer) override;
void DownloadBufferData(const OGLBuffer& buffer, std::size_t offset, std::size_t size,
u8* data) override;
void UploadBlockData(const Buffer& buffer, std::size_t offset, std::size_t size,
const u8* data) override;
void CopyBufferData(const OGLBuffer& src, const OGLBuffer& dst, std::size_t src_offset,
std::size_t dst_offset, std::size_t size) override;
void DownloadBlockData(const Buffer& buffer, std::size_t offset, std::size_t size,
u8* data) override;
void CopyBlock(const Buffer& src, const Buffer& dst, std::size_t src_offset,
std::size_t dst_offset, std::size_t size) override;
};
} // namespace OpenGL

View File

@@ -27,6 +27,8 @@ Device::Device() {
shader_storage_alignment = GetInteger<std::size_t>(GL_SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT);
max_vertex_attributes = GetInteger<u32>(GL_MAX_VERTEX_ATTRIBS);
max_varyings = GetInteger<u32>(GL_MAX_VARYING_VECTORS);
has_warp_intrinsics = GLAD_GL_NV_gpu_shader5 && GLAD_GL_NV_shader_thread_group &&
GLAD_GL_NV_shader_thread_shuffle;
has_vertex_viewport_layer = GLAD_GL_ARB_shader_viewport_layer_array;
has_variable_aoffi = TestVariableAoffi();
has_component_indexing_bug = TestComponentIndexingBug();
@@ -36,6 +38,7 @@ Device::Device(std::nullptr_t) {
uniform_buffer_alignment = 0;
max_vertex_attributes = 16;
max_varyings = 15;
has_warp_intrinsics = true;
has_vertex_viewport_layer = true;
has_variable_aoffi = true;
has_component_indexing_bug = false;

View File

@@ -30,6 +30,10 @@ public:
return max_varyings;
}
bool HasWarpIntrinsics() const {
return has_warp_intrinsics;
}
bool HasVertexViewportLayer() const {
return has_vertex_viewport_layer;
}
@@ -50,6 +54,7 @@ private:
std::size_t shader_storage_alignment{};
u32 max_vertex_attributes{};
u32 max_varyings{};
bool has_warp_intrinsics{};
bool has_vertex_viewport_layer{};
bool has_variable_aoffi{};
bool has_component_indexing_bug{};

View File

@@ -4,6 +4,7 @@
#include <algorithm>
#include <array>
#include <bitset>
#include <memory>
#include <string>
#include <string_view>
@@ -19,6 +20,7 @@
#include "core/core.h"
#include "core/hle/kernel/process.h"
#include "core/settings.h"
#include "video_core/engines/kepler_compute.h"
#include "video_core/engines/maxwell_3d.h"
#include "video_core/memory_manager.h"
#include "video_core/renderer_opengl/gl_rasterizer.h"
@@ -105,6 +107,7 @@ RasterizerOpenGL::RasterizerOpenGL(Core::System& system, Core::Frontend::EmuWind
shader_program_manager = std::make_unique<GLShader::ProgramManager>();
state.draw.shader_program = 0;
state.Apply();
clear_framebuffer.Create();
LOG_DEBUG(Render_OpenGL, "Sync fixed function OpenGL state here");
CheckExtensions();
@@ -124,10 +127,10 @@ GLuint RasterizerOpenGL::SetupVertexFormat() {
auto& gpu = system.GPU().Maxwell3D();
const auto& regs = gpu.regs;
if (!gpu.dirty_flags.vertex_attrib_format) {
if (!gpu.dirty.vertex_attrib_format) {
return state.draw.vertex_array;
}
gpu.dirty_flags.vertex_attrib_format = false;
gpu.dirty.vertex_attrib_format = false;
MICROPROFILE_SCOPE(OpenGL_VAO);
@@ -181,7 +184,7 @@ GLuint RasterizerOpenGL::SetupVertexFormat() {
}
// Rebinding the VAO invalidates the vertex buffer bindings.
gpu.dirty_flags.vertex_array.set();
gpu.dirty.ResetVertexArrays();
state.draw.vertex_array = vao_entry.handle;
return vao_entry.handle;
@@ -189,17 +192,20 @@ GLuint RasterizerOpenGL::SetupVertexFormat() {
void RasterizerOpenGL::SetupVertexBuffer(GLuint vao) {
auto& gpu = system.GPU().Maxwell3D();
const auto& regs = gpu.regs;
if (gpu.dirty_flags.vertex_array.none())
if (!gpu.dirty.vertex_array_buffers)
return;
gpu.dirty.vertex_array_buffers = false;
const auto& regs = gpu.regs;
MICROPROFILE_SCOPE(OpenGL_VB);
// Upload all guest vertex arrays sequentially to our buffer
for (u32 index = 0; index < Maxwell::NumVertexArrays; ++index) {
if (!gpu.dirty_flags.vertex_array[index])
if (!gpu.dirty.vertex_array[index])
continue;
gpu.dirty.vertex_array[index] = false;
gpu.dirty.vertex_instance[index] = false;
const auto& vertex_array = regs.vertex_array[index];
if (!vertex_array.IsEnabled())
@@ -224,8 +230,32 @@ void RasterizerOpenGL::SetupVertexBuffer(GLuint vao) {
glVertexArrayBindingDivisor(vao, index, 0);
}
}
}
gpu.dirty_flags.vertex_array.reset();
void RasterizerOpenGL::SetupVertexInstances(GLuint vao) {
auto& gpu = system.GPU().Maxwell3D();
if (!gpu.dirty.vertex_instances)
return;
gpu.dirty.vertex_instances = false;
const auto& regs = gpu.regs;
// Upload all guest vertex arrays sequentially to our buffer
for (u32 index = 0; index < Maxwell::NumVertexArrays; ++index) {
if (!gpu.dirty.vertex_instance[index])
continue;
gpu.dirty.vertex_instance[index] = false;
if (regs.instanced_arrays.IsInstancingEnabled(index) &&
regs.vertex_array[index].divisor != 0) {
// Enable vertex buffer instancing with the specified divisor.
glVertexArrayBindingDivisor(vao, index, regs.vertex_array[index].divisor);
} else {
// Disable the vertex buffer instancing.
glVertexArrayBindingDivisor(vao, index, 0);
}
}
}
GLintptr RasterizerOpenGL::SetupIndexBuffer() {
@@ -298,9 +328,9 @@ void RasterizerOpenGL::SetupShaders(GLenum primitive_mode) {
Shader shader{shader_cache.GetStageProgram(program)};
const auto stage_enum{static_cast<Maxwell::ShaderStage>(stage)};
const auto stage_enum = static_cast<Maxwell::ShaderStage>(stage);
SetupDrawConstBuffers(stage_enum, shader);
SetupGlobalRegions(stage_enum, shader);
SetupDrawGlobalMemory(stage_enum, shader);
const auto texture_buffer_usage{SetupTextures(stage_enum, shader, base_bindings)};
const ProgramVariant variant{base_bindings, primitive_mode, texture_buffer_usage};
@@ -341,7 +371,7 @@ void RasterizerOpenGL::SetupShaders(GLenum primitive_mode) {
SyncClipEnabled(clip_distances);
gpu.dirty_flags.shaders = false;
gpu.dirty.shaders = false;
}
std::size_t RasterizerOpenGL::CalculateVertexArraysSize() const {
@@ -424,13 +454,13 @@ std::pair<bool, bool> RasterizerOpenGL::ConfigureFramebuffers(
const FramebufferConfigState fb_config_state{using_color_fb, using_depth_fb, preserve_contents,
single_color_target};
if (fb_config_state == current_framebuffer_config_state &&
gpu.dirty_flags.color_buffer.none() && !gpu.dirty_flags.zeta_buffer) {
if (fb_config_state == current_framebuffer_config_state && !gpu.dirty.render_settings) {
// Only skip if the previous ConfigureFramebuffers call was from the same kind (multiple or
// single color targets). This is done because the guest registers may not change but the
// host framebuffer may contain different attachments
return current_depth_stencil_usage;
}
gpu.dirty.render_settings = false;
current_framebuffer_config_state = fb_config_state;
texture_cache.GuardRenderTargets(true);
@@ -519,13 +549,71 @@ std::pair<bool, bool> RasterizerOpenGL::ConfigureFramebuffers(
return current_depth_stencil_usage = {static_cast<bool>(depth_surface), fbkey.stencil_enable};
}
void RasterizerOpenGL::ConfigureClearFramebuffer(OpenGLState& current_state, bool using_color_fb,
bool using_depth_fb, bool using_stencil_fb) {
auto& gpu = system.GPU().Maxwell3D();
const auto& regs = gpu.regs;
texture_cache.GuardRenderTargets(true);
View color_surface{};
if (using_color_fb) {
color_surface = texture_cache.GetColorBufferSurface(regs.clear_buffers.RT, false);
}
View depth_surface{};
if (using_depth_fb || using_stencil_fb) {
depth_surface = texture_cache.GetDepthBufferSurface(false);
}
texture_cache.GuardRenderTargets(false);
current_state.draw.draw_framebuffer = clear_framebuffer.handle;
current_state.ApplyFramebufferState();
if (color_surface) {
color_surface->Attach(GL_COLOR_ATTACHMENT0, GL_DRAW_FRAMEBUFFER);
} else {
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, 0, 0);
}
if (depth_surface) {
const auto& params = depth_surface->GetSurfaceParams();
switch (params.type) {
case VideoCore::Surface::SurfaceType::Depth: {
depth_surface->Attach(GL_DEPTH_ATTACHMENT, GL_DRAW_FRAMEBUFFER);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_STENCIL_ATTACHMENT, GL_TEXTURE_2D, 0, 0);
break;
}
case VideoCore::Surface::SurfaceType::DepthStencil: {
depth_surface->Attach(GL_DEPTH_ATTACHMENT, GL_DRAW_FRAMEBUFFER);
break;
}
default: { UNIMPLEMENTED(); }
}
} else {
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_DEPTH_STENCIL_ATTACHMENT, GL_TEXTURE_2D, 0,
0);
}
}
void RasterizerOpenGL::Clear() {
const auto& regs = system.GPU().Maxwell3D().regs;
const auto& maxwell3d = system.GPU().Maxwell3D();
if (!maxwell3d.ShouldExecute()) {
return;
}
const auto& regs = maxwell3d.regs;
bool use_color{};
bool use_depth{};
bool use_stencil{};
OpenGLState clear_state;
OpenGLState prev_state{OpenGLState::GetCurState()};
SCOPE_EXIT({
prev_state.AllDirty();
prev_state.Apply();
});
OpenGLState clear_state{OpenGLState::GetCurState()};
clear_state.SetDefaultViewports();
if (regs.clear_buffers.R || regs.clear_buffers.G || regs.clear_buffers.B ||
regs.clear_buffers.A) {
use_color = true;
@@ -545,6 +633,7 @@ void RasterizerOpenGL::Clear() {
// true.
clear_state.depth.test_enabled = true;
clear_state.depth.test_func = GL_ALWAYS;
clear_state.depth.write_mask = GL_TRUE;
}
if (regs.clear_buffers.S) {
ASSERT_MSG(regs.zeta_enable != 0, "Tried to clear stencil but buffer is not enabled!");
@@ -581,8 +670,9 @@ void RasterizerOpenGL::Clear() {
return;
}
const auto [clear_depth, clear_stencil] = ConfigureFramebuffers(
clear_state, use_color, use_depth || use_stencil, false, regs.clear_buffers.RT.Value());
ConfigureClearFramebuffer(clear_state, use_color, use_depth, use_stencil);
SyncViewport(clear_state);
if (regs.clear_flags.scissor) {
SyncScissorTest(clear_state);
}
@@ -591,21 +681,18 @@ void RasterizerOpenGL::Clear() {
clear_state.EmulateViewportWithScissor();
}
clear_state.ApplyColorMask();
clear_state.ApplyDepth();
clear_state.ApplyStencilTest();
clear_state.ApplyViewport();
clear_state.ApplyFramebufferState();
clear_state.AllDirty();
clear_state.Apply();
if (use_color) {
glClearBufferfv(GL_COLOR, regs.clear_buffers.RT, regs.clear_color);
glClearBufferfv(GL_COLOR, 0, regs.clear_color);
}
if (clear_depth && clear_stencil) {
if (use_depth && use_stencil) {
glClearBufferfi(GL_DEPTH_STENCIL, 0, regs.clear_depth, regs.clear_stencil);
} else if (clear_depth) {
} else if (use_depth) {
glClearBufferfv(GL_DEPTH, 0, &regs.clear_depth);
} else if (clear_stencil) {
} else if (use_stencil) {
glClearBufferiv(GL_STENCIL, 0, &regs.clear_stencil);
}
}
@@ -616,7 +703,10 @@ void RasterizerOpenGL::DrawArrays() {
MICROPROFILE_SCOPE(OpenGL_Drawing);
auto& gpu = system.GPU().Maxwell3D();
const auto& regs = gpu.regs;
if (!gpu.ShouldExecute()) {
return;
}
SyncColorMask();
SyncFragmentColorClampState();
@@ -661,6 +751,7 @@ void RasterizerOpenGL::DrawArrays() {
// Upload vertex and index data.
SetupVertexBuffer(vao);
SetupVertexInstances(vao);
const GLintptr index_buffer_offset = SetupIndexBuffer();
// Setup draw parameters. It will automatically choose what glDraw* method to use.
@@ -687,7 +778,7 @@ void RasterizerOpenGL::DrawArrays() {
if (invalidate) {
// As all cached buffers are invalidated, we need to recheck their state.
gpu.dirty_flags.vertex_array.set();
gpu.dirty.ResetVertexArrays();
}
shader_program_manager->ApplyTo(state);
@@ -700,6 +791,46 @@ void RasterizerOpenGL::DrawArrays() {
params.DispatchDraw();
accelerate_draw = AccelDraw::Disabled;
gpu.dirty.memory_general = false;
}
void RasterizerOpenGL::DispatchCompute(GPUVAddr code_addr) {
if (!GLAD_GL_ARB_compute_variable_group_size) {
LOG_ERROR(Render_OpenGL, "Compute is currently not supported on this device due to the "
"lack of GL_ARB_compute_variable_group_size");
return;
}
auto kernel = shader_cache.GetComputeKernel(code_addr);
const auto [program, next_bindings] = kernel->GetProgramHandle({});
state.draw.shader_program = program;
state.draw.program_pipeline = 0;
const std::size_t buffer_size =
Tegra::Engines::KeplerCompute::NumConstBuffers *
(Maxwell::MaxConstBufferSize + device.GetUniformBufferAlignment());
buffer_cache.Map(buffer_size);
bind_ubo_pushbuffer.Setup(0);
bind_ssbo_pushbuffer.Setup(0);
SetupComputeConstBuffers(kernel);
SetupComputeGlobalMemory(kernel);
// TODO(Rodrigo): Bind images and samplers
buffer_cache.Unmap();
bind_ubo_pushbuffer.Bind();
bind_ssbo_pushbuffer.Bind();
state.ApplyShaderProgram();
state.ApplyProgramPipeline();
const auto& launch_desc = system.GPU().KeplerCompute().launch_description;
glDispatchComputeGroupSizeARB(launch_desc.grid_dim_x, launch_desc.grid_dim_y,
launch_desc.grid_dim_z, launch_desc.block_dim_x,
launch_desc.block_dim_y, launch_desc.block_dim_z);
}
void RasterizerOpenGL::FlushAll() {}
@@ -730,6 +861,10 @@ void RasterizerOpenGL::FlushAndInvalidateRegion(CacheAddr addr, u64 size) {
InvalidateRegion(addr, size);
}
void RasterizerOpenGL::FlushCommands() {
glFlush();
}
void RasterizerOpenGL::TickFrame() {
buffer_cache.TickFrame();
}
@@ -775,12 +910,25 @@ bool RasterizerOpenGL::AccelerateDisplay(const Tegra::FramebufferConfig& config,
void RasterizerOpenGL::SetupDrawConstBuffers(Tegra::Engines::Maxwell3D::Regs::ShaderStage stage,
const Shader& shader) {
MICROPROFILE_SCOPE(OpenGL_UBO);
const auto stage_index = static_cast<std::size_t>(stage);
const auto& shader_stage = system.GPU().Maxwell3D().state.shader_stages[stage_index];
// Upload only the enabled buffers from the 16 constbuffers of each shader stage
const auto& stages = system.GPU().Maxwell3D().state.shader_stages;
const auto& shader_stage = stages[static_cast<std::size_t>(stage)];
for (const auto& entry : shader->GetShaderEntries().const_buffers) {
SetupConstBuffer(shader_stage.const_buffers[entry.GetIndex()], entry);
const auto& buffer = shader_stage.const_buffers[entry.GetIndex()];
SetupConstBuffer(buffer, entry);
}
}
void RasterizerOpenGL::SetupComputeConstBuffers(const Shader& kernel) {
MICROPROFILE_SCOPE(OpenGL_UBO);
const auto& launch_desc = system.GPU().KeplerCompute().launch_description;
for (const auto& entry : kernel->GetShaderEntries().const_buffers) {
const auto& config = launch_desc.const_buffer_config[entry.GetIndex()];
const std::bitset<8> mask = launch_desc.memory_config.const_buffer_enable_mask.Value();
Tegra::Engines::ConstBufferInfo buffer;
buffer.address = config.Address();
buffer.size = config.size;
buffer.enabled = mask[entry.GetIndex()];
SetupConstBuffer(buffer, entry);
}
}
@@ -801,24 +949,39 @@ void RasterizerOpenGL::SetupConstBuffer(const Tegra::Engines::ConstBufferInfo& b
bind_ubo_pushbuffer.Push(cbuf, offset, size);
}
void RasterizerOpenGL::SetupGlobalRegions(Tegra::Engines::Maxwell3D::Regs::ShaderStage stage,
const Shader& shader) {
void RasterizerOpenGL::SetupDrawGlobalMemory(Tegra::Engines::Maxwell3D::Regs::ShaderStage stage,
const Shader& shader) {
auto& gpu{system.GPU()};
auto& memory_manager{gpu.MemoryManager()};
const auto cbufs{gpu.Maxwell3D().state.shader_stages[static_cast<std::size_t>(stage)]};
const auto alignment{device.GetShaderStorageBufferAlignment()};
for (const auto& entry : shader->GetShaderEntries().global_memory_entries) {
const auto addr{cbufs.const_buffers[entry.GetCbufIndex()].address + entry.GetCbufOffset()};
const auto actual_addr{memory_manager.Read<u64>(addr)};
const auto gpu_addr{memory_manager.Read<u64>(addr)};
const auto size{memory_manager.Read<u32>(addr + 8)};
const auto [ssbo, buffer_offset] =
buffer_cache.UploadMemory(actual_addr, size, alignment, true, entry.IsWritten());
bind_ssbo_pushbuffer.Push(ssbo, buffer_offset, static_cast<GLsizeiptr>(size));
SetupGlobalMemory(entry, gpu_addr, size);
}
}
void RasterizerOpenGL::SetupComputeGlobalMemory(const Shader& kernel) {
auto& gpu{system.GPU()};
auto& memory_manager{gpu.MemoryManager()};
const auto cbufs{gpu.KeplerCompute().launch_description.const_buffer_config};
for (const auto& entry : kernel->GetShaderEntries().global_memory_entries) {
const auto addr{cbufs[entry.GetCbufIndex()].Address() + entry.GetCbufOffset()};
const auto gpu_addr{memory_manager.Read<u64>(addr)};
const auto size{memory_manager.Read<u32>(addr + 8)};
SetupGlobalMemory(entry, gpu_addr, size);
}
}
void RasterizerOpenGL::SetupGlobalMemory(const GLShader::GlobalMemoryEntry& entry,
GPUVAddr gpu_addr, std::size_t size) {
const auto alignment{device.GetShaderStorageBufferAlignment()};
const auto [ssbo, buffer_offset] =
buffer_cache.UploadMemory(gpu_addr, size, alignment, entry.IsWritten());
bind_ssbo_pushbuffer.Push(ssbo, buffer_offset, static_cast<GLsizeiptr>(size));
}
TextureBufferUsage RasterizerOpenGL::SetupTextures(Maxwell::ShaderStage stage, const Shader& shader,
BaseBindings base_bindings) {
MICROPROFILE_SCOPE(OpenGL_Texture);
@@ -907,10 +1070,11 @@ void RasterizerOpenGL::SyncClipCoef() {
}
void RasterizerOpenGL::SyncCullMode() {
const auto& regs = system.GPU().Maxwell3D().regs;
auto& maxwell3d = system.GPU().Maxwell3D();
const auto& regs = maxwell3d.regs;
state.cull.enabled = regs.cull.enabled != 0;
if (state.cull.enabled) {
state.cull.front_face = MaxwellToGL::FrontFace(regs.cull.front_face);
state.cull.mode = MaxwellToGL::CullFace(regs.cull.cull_face);
@@ -943,16 +1107,21 @@ void RasterizerOpenGL::SyncDepthTestState() {
state.depth.test_enabled = regs.depth_test_enable != 0;
state.depth.write_mask = regs.depth_write_enabled ? GL_TRUE : GL_FALSE;
if (!state.depth.test_enabled)
if (!state.depth.test_enabled) {
return;
}
state.depth.test_func = MaxwellToGL::ComparisonOp(regs.depth_test_func);
}
void RasterizerOpenGL::SyncStencilTestState() {
const auto& regs = system.GPU().Maxwell3D().regs;
state.stencil.test_enabled = regs.stencil_enable != 0;
auto& maxwell3d = system.GPU().Maxwell3D();
if (!maxwell3d.dirty.stencil_test) {
return;
}
const auto& regs = maxwell3d.regs;
state.stencil.test_enabled = regs.stencil_enable != 0;
if (!regs.stencil_enable) {
return;
}
@@ -981,10 +1150,17 @@ void RasterizerOpenGL::SyncStencilTestState() {
state.stencil.back.action_depth_fail = GL_KEEP;
state.stencil.back.action_depth_pass = GL_KEEP;
}
state.MarkDirtyStencilState();
maxwell3d.dirty.stencil_test = false;
}
void RasterizerOpenGL::SyncColorMask() {
const auto& regs = system.GPU().Maxwell3D().regs;
auto& maxwell3d = system.GPU().Maxwell3D();
if (!maxwell3d.dirty.color_mask) {
return;
}
const auto& regs = maxwell3d.regs;
const std::size_t count =
regs.independent_blend_enable ? Tegra::Engines::Maxwell3D::Regs::NumRenderTargets : 1;
for (std::size_t i = 0; i < count; i++) {
@@ -995,6 +1171,9 @@ void RasterizerOpenGL::SyncColorMask() {
dest.blue_enabled = (source.B == 0) ? GL_FALSE : GL_TRUE;
dest.alpha_enabled = (source.A == 0) ? GL_FALSE : GL_TRUE;
}
state.MarkDirtyColorMask();
maxwell3d.dirty.color_mask = false;
}
void RasterizerOpenGL::SyncMultiSampleState() {
@@ -1009,7 +1188,11 @@ void RasterizerOpenGL::SyncFragmentColorClampState() {
}
void RasterizerOpenGL::SyncBlendState() {
const auto& regs = system.GPU().Maxwell3D().regs;
auto& maxwell3d = system.GPU().Maxwell3D();
if (!maxwell3d.dirty.blend_state) {
return;
}
const auto& regs = maxwell3d.regs;
state.blend_color.red = regs.blend_color.r;
state.blend_color.green = regs.blend_color.g;
@@ -1032,6 +1215,8 @@ void RasterizerOpenGL::SyncBlendState() {
for (std::size_t i = 1; i < Tegra::Engines::Maxwell3D::Regs::NumRenderTargets; i++) {
state.blend[i].enabled = false;
}
maxwell3d.dirty.blend_state = false;
state.MarkDirtyBlendState();
return;
}
@@ -1048,6 +1233,9 @@ void RasterizerOpenGL::SyncBlendState() {
blend.src_a_func = MaxwellToGL::BlendFunc(src.factor_source_a);
blend.dst_a_func = MaxwellToGL::BlendFunc(src.factor_dest_a);
}
state.MarkDirtyBlendState();
maxwell3d.dirty.blend_state = false;
}
void RasterizerOpenGL::SyncLogicOpState() {
@@ -1099,13 +1287,21 @@ void RasterizerOpenGL::SyncPointState() {
}
void RasterizerOpenGL::SyncPolygonOffset() {
const auto& regs = system.GPU().Maxwell3D().regs;
auto& maxwell3d = system.GPU().Maxwell3D();
if (!maxwell3d.dirty.polygon_offset) {
return;
}
const auto& regs = maxwell3d.regs;
state.polygon_offset.fill_enable = regs.polygon_offset_fill_enable != 0;
state.polygon_offset.line_enable = regs.polygon_offset_line_enable != 0;
state.polygon_offset.point_enable = regs.polygon_offset_point_enable != 0;
state.polygon_offset.units = regs.polygon_offset_units;
state.polygon_offset.factor = regs.polygon_offset_factor;
state.polygon_offset.clamp = regs.polygon_offset_clamp;
state.MarkDirtyPolygonOffset();
maxwell3d.dirty.polygon_offset = false;
}
void RasterizerOpenGL::SyncAlphaTest() {

View File

@@ -58,10 +58,12 @@ public:
void DrawArrays() override;
void Clear() override;
void DispatchCompute(GPUVAddr code_addr) override;
void FlushAll() override;
void FlushRegion(CacheAddr addr, u64 size) override;
void InvalidateRegion(CacheAddr addr, u64 size) override;
void FlushAndInvalidateRegion(CacheAddr addr, u64 size) override;
void FlushCommands() override;
void TickFrame() override;
bool AccelerateSurfaceCopy(const Tegra::Engines::Fermi2D::Regs::Surface& src,
const Tegra::Engines::Fermi2D::Regs::Surface& dst,
@@ -108,17 +110,30 @@ private:
OpenGLState& current_state, bool using_color_fb = true, bool using_depth_fb = true,
bool preserve_contents = true, std::optional<std::size_t> single_color_target = {});
void ConfigureClearFramebuffer(OpenGLState& current_state, bool using_color_fb,
bool using_depth_fb, bool using_stencil_fb);
/// Configures the current constbuffers to use for the draw command.
void SetupDrawConstBuffers(Tegra::Engines::Maxwell3D::Regs::ShaderStage stage,
const Shader& shader);
/// Configures the current constbuffers to use for the kernel invocation.
void SetupComputeConstBuffers(const Shader& kernel);
/// Configures a constant buffer.
void SetupConstBuffer(const Tegra::Engines::ConstBufferInfo& buffer,
const GLShader::ConstBufferEntry& entry);
/// Configures the current global memory entries to use for the draw command.
void SetupGlobalRegions(Tegra::Engines::Maxwell3D::Regs::ShaderStage stage,
const Shader& shader);
void SetupDrawGlobalMemory(Tegra::Engines::Maxwell3D::Regs::ShaderStage stage,
const Shader& shader);
/// Configures the current global memory entries to use for the kernel invocation.
void SetupComputeGlobalMemory(const Shader& kernel);
/// Configures a constant buffer.
void SetupGlobalMemory(const GLShader::GlobalMemoryEntry& entry, GPUVAddr gpu_addr,
std::size_t size);
/// Configures the current textures to use for the draw command. Returns shaders texture buffer
/// usage.
@@ -216,6 +231,7 @@ private:
GLuint SetupVertexFormat();
void SetupVertexBuffer(GLuint vao);
void SetupVertexInstances(GLuint vao);
GLintptr SetupIndexBuffer();
@@ -226,6 +242,8 @@ private:
enum class AccelDraw { Disabled, Arrays, Indexed };
AccelDraw accelerate_draw = AccelDraw::Disabled;
OGLFramebuffer clear_framebuffer;
using CachedPageMap = boost::icl::interval_map<u64, int>;
CachedPageMap cached_pages;
};

View File

@@ -23,13 +23,13 @@ namespace OpenGL {
using VideoCommon::Shader::ProgramCode;
// One UBO is always reserved for emulation values
constexpr u32 RESERVED_UBOS = 1;
// One UBO is always reserved for emulation values on staged shaders
constexpr u32 STAGE_RESERVED_UBOS = 1;
struct UnspecializedShader {
std::string code;
GLShader::ShaderEntries entries;
Maxwell::ShaderProgram program_type;
ProgramType program_type;
};
namespace {
@@ -55,15 +55,17 @@ ProgramCode GetShaderCode(Tegra::MemoryManager& memory_manager, const GPUVAddr g
}
/// Gets the shader type from a Maxwell program type
constexpr GLenum GetShaderType(Maxwell::ShaderProgram program_type) {
constexpr GLenum GetShaderType(ProgramType program_type) {
switch (program_type) {
case Maxwell::ShaderProgram::VertexA:
case Maxwell::ShaderProgram::VertexB:
case ProgramType::VertexA:
case ProgramType::VertexB:
return GL_VERTEX_SHADER;
case Maxwell::ShaderProgram::Geometry:
case ProgramType::Geometry:
return GL_GEOMETRY_SHADER;
case Maxwell::ShaderProgram::Fragment:
case ProgramType::Fragment:
return GL_FRAGMENT_SHADER;
case ProgramType::Compute:
return GL_COMPUTE_SHADER;
default:
return GL_NONE;
}
@@ -100,6 +102,25 @@ constexpr std::tuple<const char*, const char*, u32> GetPrimitiveDescription(GLen
}
}
ProgramType GetProgramType(Maxwell::ShaderProgram program) {
switch (program) {
case Maxwell::ShaderProgram::VertexA:
return ProgramType::VertexA;
case Maxwell::ShaderProgram::VertexB:
return ProgramType::VertexB;
case Maxwell::ShaderProgram::TesselationControl:
return ProgramType::TessellationControl;
case Maxwell::ShaderProgram::TesselationEval:
return ProgramType::TessellationEval;
case Maxwell::ShaderProgram::Geometry:
return ProgramType::Geometry;
case Maxwell::ShaderProgram::Fragment:
return ProgramType::Fragment;
}
UNREACHABLE();
return {};
}
/// Calculates the size of a program stream
std::size_t CalculateProgramSize(const GLShader::ProgramCode& program) {
constexpr std::size_t start_offset = 10;
@@ -128,13 +149,13 @@ std::size_t CalculateProgramSize(const GLShader::ProgramCode& program) {
}
/// Hashes one (or two) program streams
u64 GetUniqueIdentifier(Maxwell::ShaderProgram program_type, const ProgramCode& code,
u64 GetUniqueIdentifier(ProgramType program_type, const ProgramCode& code,
const ProgramCode& code_b, std::size_t size_a = 0, std::size_t size_b = 0) {
if (size_a == 0) {
size_a = CalculateProgramSize(code);
}
u64 unique_identifier = Common::CityHash64(reinterpret_cast<const char*>(code.data()), size_a);
if (program_type != Maxwell::ShaderProgram::VertexA) {
if (program_type != ProgramType::VertexA) {
return unique_identifier;
}
// VertexA programs include two programs
@@ -152,12 +173,12 @@ u64 GetUniqueIdentifier(Maxwell::ShaderProgram program_type, const ProgramCode&
}
/// Creates an unspecialized program from code streams
GLShader::ProgramResult CreateProgram(const Device& device, Maxwell::ShaderProgram program_type,
GLShader::ProgramResult CreateProgram(const Device& device, ProgramType program_type,
ProgramCode program_code, ProgramCode program_code_b) {
GLShader::ShaderSetup setup(program_code);
setup.program.size_a = CalculateProgramSize(program_code);
setup.program.size_b = 0;
if (program_type == Maxwell::ShaderProgram::VertexA) {
if (program_type == ProgramType::VertexA) {
// VertexB is always enabled, so when VertexA is enabled, we have two vertex shaders.
// Conventional HW does not support this, so we combine VertexA and VertexB into one
// stage here.
@@ -168,33 +189,43 @@ GLShader::ProgramResult CreateProgram(const Device& device, Maxwell::ShaderProgr
program_type, program_code, program_code_b, setup.program.size_a, setup.program.size_b);
switch (program_type) {
case Maxwell::ShaderProgram::VertexA:
case Maxwell::ShaderProgram::VertexB:
case ProgramType::VertexA:
case ProgramType::VertexB:
return GLShader::GenerateVertexShader(device, setup);
case Maxwell::ShaderProgram::Geometry:
case ProgramType::Geometry:
return GLShader::GenerateGeometryShader(device, setup);
case Maxwell::ShaderProgram::Fragment:
case ProgramType::Fragment:
return GLShader::GenerateFragmentShader(device, setup);
case ProgramType::Compute:
return GLShader::GenerateComputeShader(device, setup);
default:
LOG_CRITICAL(HW_GPU, "Unimplemented program_type={}", static_cast<u32>(program_type));
UNREACHABLE();
UNIMPLEMENTED_MSG("Unimplemented program_type={}", static_cast<u32>(program_type));
return {};
}
}
CachedProgram SpecializeShader(const std::string& code, const GLShader::ShaderEntries& entries,
Maxwell::ShaderProgram program_type, const ProgramVariant& variant,
ProgramType program_type, const ProgramVariant& variant,
bool hint_retrievable = false) {
auto base_bindings{variant.base_bindings};
const auto primitive_mode{variant.primitive_mode};
const auto texture_buffer_usage{variant.texture_buffer_usage};
std::string source = "#version 430 core\n"
"#extension GL_ARB_separate_shader_objects : enable\n";
"#extension GL_ARB_separate_shader_objects : enable\n"
"#extension GL_NV_gpu_shader5 : enable\n"
"#extension GL_NV_shader_thread_group : enable\n";
if (entries.shader_viewport_layer_array) {
source += "#extension GL_ARB_shader_viewport_layer_array : enable\n";
}
source += fmt::format("\n#define EMULATION_UBO_BINDING {}\n", base_bindings.cbuf++);
if (program_type == ProgramType::Compute) {
source += "#extension GL_ARB_compute_variable_group_size : require\n";
}
source += '\n';
if (program_type != ProgramType::Compute) {
source += fmt::format("#define EMULATION_UBO_BINDING {}\n", base_bindings.cbuf++);
}
for (const auto& cbuf : entries.const_buffers) {
source +=
@@ -218,17 +249,24 @@ CachedProgram SpecializeShader(const std::string& code, const GLShader::ShaderEn
if (!texture_buffer_usage.test(i)) {
continue;
}
source += fmt::format("#define SAMPLER_{}_IS_BUFFER", i);
source += fmt::format("#define SAMPLER_{}_IS_BUFFER\n", i);
}
if (texture_buffer_usage.any()) {
source += '\n';
}
if (program_type == Maxwell::ShaderProgram::Geometry) {
if (program_type == ProgramType::Geometry) {
const auto [glsl_topology, debug_name, max_vertices] =
GetPrimitiveDescription(primitive_mode);
source += "layout (" + std::string(glsl_topology) + ") in;\n";
source += "layout (" + std::string(glsl_topology) + ") in;\n\n";
source += "#define MAX_VERTEX_INPUT " + std::to_string(max_vertices) + '\n';
}
if (program_type == ProgramType::Compute) {
source += "layout (local_size_variable) in;\n";
}
source += '\n';
source += code;
OGLShader shader;
@@ -255,9 +293,9 @@ std::set<GLenum> GetSupportedFormats() {
} // Anonymous namespace
CachedShader::CachedShader(const ShaderParameters& params, Maxwell::ShaderProgram program_type,
CachedShader::CachedShader(const ShaderParameters& params, ProgramType program_type,
GLShader::ProgramResult result)
: RasterizerCacheObject{params.host_ptr}, host_ptr{params.host_ptr}, cpu_addr{params.cpu_addr},
: RasterizerCacheObject{params.host_ptr}, cpu_addr{params.cpu_addr},
unique_identifier{params.unique_identifier}, program_type{program_type},
disk_cache{params.disk_cache}, precompiled_programs{params.precompiled_programs},
entries{result.second}, code{std::move(result.first)}, shader_length{entries.shader_length} {}
@@ -268,29 +306,50 @@ Shader CachedShader::CreateStageFromMemory(const ShaderParameters& params,
ProgramCode&& program_code_b) {
const auto code_size{CalculateProgramSize(program_code)};
const auto code_size_b{CalculateProgramSize(program_code_b)};
auto result{CreateProgram(params.device, program_type, program_code, program_code_b)};
auto result{
CreateProgram(params.device, GetProgramType(program_type), program_code, program_code_b)};
if (result.first.empty()) {
// TODO(Rodrigo): Unimplemented shader stages hit here, avoid using these for now
return {};
}
params.disk_cache.SaveRaw(ShaderDiskCacheRaw(
params.unique_identifier, program_type, static_cast<u32>(code_size / sizeof(u64)),
static_cast<u32>(code_size_b / sizeof(u64)), std::move(program_code),
std::move(program_code_b)));
params.unique_identifier, GetProgramType(program_type),
static_cast<u32>(code_size / sizeof(u64)), static_cast<u32>(code_size_b / sizeof(u64)),
std::move(program_code), std::move(program_code_b)));
return std::shared_ptr<CachedShader>(new CachedShader(params, program_type, std::move(result)));
return std::shared_ptr<CachedShader>(
new CachedShader(params, GetProgramType(program_type), std::move(result)));
}
Shader CachedShader::CreateStageFromCache(const ShaderParameters& params,
Maxwell::ShaderProgram program_type,
GLShader::ProgramResult result) {
return std::shared_ptr<CachedShader>(new CachedShader(params, program_type, std::move(result)));
return std::shared_ptr<CachedShader>(
new CachedShader(params, GetProgramType(program_type), std::move(result)));
}
Shader CachedShader::CreateKernelFromMemory(const ShaderParameters& params, ProgramCode&& code) {
auto result{CreateProgram(params.device, ProgramType::Compute, code, {})};
const auto code_size{CalculateProgramSize(code)};
params.disk_cache.SaveRaw(ShaderDiskCacheRaw(params.unique_identifier, ProgramType::Compute,
static_cast<u32>(code_size / sizeof(u64)), 0,
std::move(code), {}));
return std::shared_ptr<CachedShader>(
new CachedShader(params, ProgramType::Compute, std::move(result)));
}
Shader CachedShader::CreateKernelFromCache(const ShaderParameters& params,
GLShader::ProgramResult result) {
return std::shared_ptr<CachedShader>(
new CachedShader(params, ProgramType::Compute, std::move(result)));
}
std::tuple<GLuint, BaseBindings> CachedShader::GetProgramHandle(const ProgramVariant& variant) {
GLuint handle{};
if (program_type == Maxwell::ShaderProgram::Geometry) {
if (program_type == ProgramType::Geometry) {
handle = GetGeometryShader(variant);
} else {
const auto [entry, is_cache_miss] = programs.try_emplace(variant);
@@ -308,8 +367,11 @@ std::tuple<GLuint, BaseBindings> CachedShader::GetProgramHandle(const ProgramVar
handle = program->handle;
}
auto base_bindings{variant.base_bindings};
base_bindings.cbuf += static_cast<u32>(entries.const_buffers.size()) + RESERVED_UBOS;
auto base_bindings = variant.base_bindings;
base_bindings.cbuf += static_cast<u32>(entries.const_buffers.size());
if (program_type != ProgramType::Compute) {
base_bindings.cbuf += STAGE_RESERVED_UBOS;
}
base_bindings.gmem += static_cast<u32>(entries.global_memory_entries.size());
base_bindings.sampler += static_cast<u32>(entries.samplers.size());
@@ -572,7 +634,7 @@ std::unordered_map<u64, UnspecializedShader> ShaderCacheOpenGL::GenerateUnspecia
}
Shader ShaderCacheOpenGL::GetStageProgram(Maxwell::ShaderProgram program) {
if (!system.GPU().Maxwell3D().dirty_flags.shaders) {
if (!system.GPU().Maxwell3D().dirty.shaders) {
return last_shaders[static_cast<std::size_t>(program)];
}
@@ -589,13 +651,15 @@ Shader ShaderCacheOpenGL::GetStageProgram(Maxwell::ShaderProgram program) {
// No shader found - create a new one
ProgramCode program_code{GetShaderCode(memory_manager, program_addr, host_ptr)};
ProgramCode program_code_b;
if (program == Maxwell::ShaderProgram::VertexA) {
const bool is_program_a{program == Maxwell::ShaderProgram::VertexA};
if (is_program_a) {
const GPUVAddr program_addr_b{GetShaderAddress(system, Maxwell::ShaderProgram::VertexB)};
program_code_b = GetShaderCode(memory_manager, program_addr_b,
memory_manager.GetPointer(program_addr_b));
}
const auto unique_identifier = GetUniqueIdentifier(program, program_code, program_code_b);
const auto unique_identifier =
GetUniqueIdentifier(GetProgramType(program), program_code, program_code_b);
const auto cpu_addr{*memory_manager.GpuToCpuAddress(program_addr)};
const ShaderParameters params{disk_cache, precompiled_programs, device, cpu_addr,
host_ptr, unique_identifier};
@@ -612,4 +676,30 @@ Shader ShaderCacheOpenGL::GetStageProgram(Maxwell::ShaderProgram program) {
return last_shaders[static_cast<std::size_t>(program)] = shader;
}
Shader ShaderCacheOpenGL::GetComputeKernel(GPUVAddr code_addr) {
auto& memory_manager{system.GPU().MemoryManager()};
const auto host_ptr{memory_manager.GetPointer(code_addr)};
auto kernel = TryGet(host_ptr);
if (kernel) {
return kernel;
}
// No kernel found - create a new one
auto code{GetShaderCode(memory_manager, code_addr, host_ptr)};
const auto unique_identifier{GetUniqueIdentifier(ProgramType::Compute, code, {})};
const auto cpu_addr{*memory_manager.GpuToCpuAddress(code_addr)};
const ShaderParameters params{disk_cache, precompiled_programs, device, cpu_addr,
host_ptr, unique_identifier};
const auto found = precompiled_shaders.find(unique_identifier);
if (found == precompiled_shaders.end()) {
kernel = CachedShader::CreateKernelFromMemory(params, std::move(code));
} else {
kernel = CachedShader::CreateKernelFromCache(params, found->second);
}
Register(kernel);
return kernel;
}
} // namespace OpenGL

View File

@@ -61,6 +61,11 @@ public:
Maxwell::ShaderProgram program_type,
GLShader::ProgramResult result);
static Shader CreateKernelFromMemory(const ShaderParameters& params, ProgramCode&& code);
static Shader CreateKernelFromCache(const ShaderParameters& params,
GLShader::ProgramResult result);
VAddr GetCpuAddr() const override {
return cpu_addr;
}
@@ -78,7 +83,7 @@ public:
std::tuple<GLuint, BaseBindings> GetProgramHandle(const ProgramVariant& variant);
private:
explicit CachedShader(const ShaderParameters& params, Maxwell::ShaderProgram program_type,
explicit CachedShader(const ShaderParameters& params, ProgramType program_type,
GLShader::ProgramResult result);
// Geometry programs. These are needed because GLSL needs an input topology but it's not
@@ -101,10 +106,9 @@ private:
ShaderDiskCacheUsage GetUsage(const ProgramVariant& variant) const;
u8* host_ptr{};
VAddr cpu_addr{};
u64 unique_identifier{};
Maxwell::ShaderProgram program_type{};
ProgramType program_type{};
ShaderDiskCacheOpenGL& disk_cache;
const PrecompiledPrograms& precompiled_programs;
@@ -132,6 +136,9 @@ public:
/// Gets the current specified shader stage program
Shader GetStageProgram(Maxwell::ShaderProgram program);
/// Gets a compute kernel in the passed address
Shader GetComputeKernel(GPUVAddr code_addr);
protected:
// We do not have to flush this cache as things in it are never modified by us.
void FlushObjectInner(const Shader& object) override {}

View File

@@ -37,7 +37,6 @@ using namespace std::string_literals;
using namespace VideoCommon::Shader;
using Maxwell = Tegra::Engines::Maxwell3D::Regs;
using ShaderStage = Tegra::Engines::Maxwell3D::Regs::ShaderStage;
using Operation = const OperationNode&;
enum class Type { Bool, Bool2, Float, Int, Uint, HalfFloat };
@@ -162,9 +161,13 @@ std::string FlowStackTopName(MetaStackClass stack) {
return fmt::format("{}_flow_stack_top", GetFlowStackPrefix(stack));
}
constexpr bool IsVertexShader(ProgramType stage) {
return stage == ProgramType::VertexA || stage == ProgramType::VertexB;
}
class GLSLDecompiler final {
public:
explicit GLSLDecompiler(const Device& device, const ShaderIR& ir, ShaderStage stage,
explicit GLSLDecompiler(const Device& device, const ShaderIR& ir, ProgramType stage,
std::string suffix)
: device{device}, ir{ir}, stage{stage}, suffix{suffix}, header{ir.GetHeader()} {}
@@ -248,21 +251,21 @@ public:
}
entries.clip_distances = ir.GetClipDistances();
entries.shader_viewport_layer_array =
stage == ShaderStage::Vertex && (ir.UsesLayer() || ir.UsesViewportIndex());
IsVertexShader(stage) && (ir.UsesLayer() || ir.UsesViewportIndex());
entries.shader_length = ir.GetLength();
return entries;
}
private:
void DeclareVertex() {
if (stage != ShaderStage::Vertex)
if (!IsVertexShader(stage))
return;
DeclareVertexRedeclarations();
}
void DeclareGeometry() {
if (stage != ShaderStage::Geometry) {
if (stage != ProgramType::Geometry) {
return;
}
@@ -293,14 +296,14 @@ private:
break;
}
}
if (stage != ShaderStage::Vertex || device.HasVertexViewportLayer()) {
if (!IsVertexShader(stage) || device.HasVertexViewportLayer()) {
if (ir.UsesLayer()) {
code.AddLine("int gl_Layer;");
}
if (ir.UsesViewportIndex()) {
code.AddLine("int gl_ViewportIndex;");
}
} else if ((ir.UsesLayer() || ir.UsesViewportIndex()) && stage == ShaderStage::Vertex &&
} else if ((ir.UsesLayer() || ir.UsesViewportIndex()) && IsVertexShader(stage) &&
!device.HasVertexViewportLayer()) {
LOG_ERROR(
Render_OpenGL,
@@ -337,11 +340,16 @@ private:
}
void DeclareLocalMemory() {
if (const u64 local_memory_size = header.GetLocalMemorySize(); local_memory_size > 0) {
const auto element_count = Common::AlignUp(local_memory_size, 4) / 4;
code.AddLine("float {}[{}];", GetLocalMemory(), element_count);
code.AddNewLine();
// TODO(Rodrigo): Unstub kernel local memory size and pass it from a register at
// specialization time.
const u64 local_memory_size =
stage == ProgramType::Compute ? 0x400 : header.GetLocalMemorySize();
if (local_memory_size == 0) {
return;
}
const auto element_count = Common::AlignUp(local_memory_size, 4) / 4;
code.AddLine("float {}[{}];", GetLocalMemory(), element_count);
code.AddNewLine();
}
void DeclareInternalFlags() {
@@ -395,12 +403,12 @@ private:
const u32 location{GetGenericAttributeIndex(index)};
std::string name{GetInputAttribute(index)};
if (stage == ShaderStage::Geometry) {
if (stage == ProgramType::Geometry) {
name = "gs_" + name + "[]";
}
std::string suffix;
if (stage == ShaderStage::Fragment) {
if (stage == ProgramType::Fragment) {
const auto input_mode{header.ps.GetAttributeUse(location)};
if (skip_unused && input_mode == AttributeUse::Unused) {
return;
@@ -412,7 +420,7 @@ private:
}
void DeclareOutputAttributes() {
if (ir.HasPhysicalAttributes() && stage != ShaderStage::Fragment) {
if (ir.HasPhysicalAttributes() && stage != ProgramType::Fragment) {
for (u32 i = 0; i < GetNumPhysicalVaryings(); ++i) {
DeclareOutputAttribute(ToGenericAttribute(i));
}
@@ -534,7 +542,7 @@ private:
constexpr u32 element_stride{4};
const u32 address{generic_base + index * generic_stride + element * element_stride};
const bool declared{stage != ShaderStage::Fragment ||
const bool declared{stage != ProgramType::Fragment ||
header.ps.GetAttributeUse(index) != AttributeUse::Unused};
const std::string value{declared ? ReadAttribute(attribute, element) : "0"};
code.AddLine("case 0x{:x}: return {};", address, value);
@@ -557,7 +565,7 @@ private:
case Tegra::Shader::ImageType::Texture1D:
return "image1D";
case Tegra::Shader::ImageType::TextureBuffer:
return "bufferImage";
return "imageBuffer";
case Tegra::Shader::ImageType::Texture1DArray:
return "image1DArray";
case Tegra::Shader::ImageType::Texture2D:
@@ -638,7 +646,7 @@ private:
}
if (const auto abuf = std::get_if<AbufNode>(&*node)) {
UNIMPLEMENTED_IF_MSG(abuf->IsPhysicalBuffer() && stage == ShaderStage::Geometry,
UNIMPLEMENTED_IF_MSG(abuf->IsPhysicalBuffer() && stage == ProgramType::Geometry,
"Physical attributes in geometry shaders are not implemented");
if (abuf->IsPhysicalBuffer()) {
return fmt::format("readPhysicalAttribute(ftou({}))",
@@ -693,6 +701,9 @@ private:
}
if (const auto lmem = std::get_if<LmemNode>(&*node)) {
if (stage == ProgramType::Compute) {
LOG_WARNING(Render_OpenGL, "Local memory is stubbed on compute shaders");
}
return fmt::format("{}[ftou({}) / 4]", GetLocalMemory(), Visit(lmem->GetAddress()));
}
@@ -722,7 +733,7 @@ private:
std::string ReadAttribute(Attribute::Index attribute, u32 element, const Node& buffer = {}) {
const auto GeometryPass = [&](std::string_view name) {
if (stage == ShaderStage::Geometry && buffer) {
if (stage == ProgramType::Geometry && buffer) {
// TODO(Rodrigo): Guard geometry inputs against out of bound reads. Some games
// set an 0x80000000 index for those and the shader fails to build. Find out why
// this happens and what's its intent.
@@ -734,10 +745,10 @@ private:
switch (attribute) {
case Attribute::Index::Position:
switch (stage) {
case ShaderStage::Geometry:
case ProgramType::Geometry:
return fmt::format("gl_in[ftou({})].gl_Position{}", Visit(buffer),
GetSwizzle(element));
case ShaderStage::Fragment:
case ProgramType::Fragment:
return element == 3 ? "1.0f" : ("gl_FragCoord"s + GetSwizzle(element));
default:
UNREACHABLE();
@@ -758,7 +769,7 @@ private:
// TODO(Subv): Find out what the values are for the first two elements when inside a
// vertex shader, and what's the value of the fourth element when inside a Tess Eval
// shader.
ASSERT(stage == ShaderStage::Vertex);
ASSERT(IsVertexShader(stage));
switch (element) {
case 2:
// Config pack's first value is instance_id.
@@ -770,7 +781,7 @@ private:
return "0";
case Attribute::Index::FrontFacing:
// TODO(Subv): Find out what the values are for the other elements.
ASSERT(stage == ShaderStage::Fragment);
ASSERT(stage == ProgramType::Fragment);
switch (element) {
case 3:
return "itof(gl_FrontFacing ? -1 : 0)";
@@ -792,7 +803,7 @@ private:
return value;
}
// There's a bug in NVidia's proprietary drivers that makes precise fail on fragment shaders
const std::string precise = stage != ShaderStage::Fragment ? "precise " : "";
const std::string precise = stage != ProgramType::Fragment ? "precise " : "";
const std::string temporary = code.GenerateTemporary();
code.AddLine("{}float {} = {};", precise, temporary, value);
@@ -827,12 +838,12 @@ private:
UNIMPLEMENTED();
return {};
case 1:
if (stage == ShaderStage::Vertex && !device.HasVertexViewportLayer()) {
if (IsVertexShader(stage) && !device.HasVertexViewportLayer()) {
return {};
}
return std::make_pair("gl_Layer", true);
case 2:
if (stage == ShaderStage::Vertex && !device.HasVertexViewportLayer()) {
if (IsVertexShader(stage) && !device.HasVertexViewportLayer()) {
return {};
}
return std::make_pair("gl_ViewportIndex", true);
@@ -1069,6 +1080,9 @@ private:
target = result->first;
is_integer = result->second;
} else if (const auto lmem = std::get_if<LmemNode>(&*dest)) {
if (stage == ProgramType::Compute) {
LOG_WARNING(Render_OpenGL, "Local memory is stubbed on compute shaders");
}
target = fmt::format("{}[ftou({}) / 4]", GetLocalMemory(), Visit(lmem->GetAddress()));
} else if (const auto gmem = std::get_if<GmemNode>(&*dest)) {
const std::string real = Visit(gmem->GetRealAddress());
@@ -1637,7 +1651,7 @@ private:
}
std::string Exit(Operation operation) {
if (stage != ShaderStage::Fragment) {
if (stage != ProgramType::Fragment) {
code.AddLine("return;");
return {};
}
@@ -1688,7 +1702,7 @@ private:
}
std::string EmitVertex(Operation operation) {
ASSERT_MSG(stage == ShaderStage::Geometry,
ASSERT_MSG(stage == ProgramType::Geometry,
"EmitVertex is expected to be used in a geometry shader.");
// If a geometry shader is attached, it will always flip (it's the last stage before
@@ -1699,7 +1713,7 @@ private:
}
std::string EndPrimitive(Operation operation) {
ASSERT_MSG(stage == ShaderStage::Geometry,
ASSERT_MSG(stage == ProgramType::Geometry,
"EndPrimitive is expected to be used in a geometry shader.");
code.AddLine("EndPrimitive();");
@@ -1721,6 +1735,48 @@ private:
return "utof(gl_WorkGroupID"s + GetSwizzle(element) + ')';
}
std::string BallotThread(Operation operation) {
const std::string value = VisitOperand(operation, 0, Type::Bool);
if (!device.HasWarpIntrinsics()) {
LOG_ERROR(Render_OpenGL,
"Nvidia warp intrinsics are not available and its required by a shader");
// Stub on non-Nvidia devices by simulating all threads voting the same as the active
// one.
return fmt::format("utof({} ? 0xFFFFFFFFU : 0U)", value);
}
return fmt::format("utof(ballotThreadNV({}))", value);
}
std::string Vote(Operation operation, const char* func) {
const std::string value = VisitOperand(operation, 0, Type::Bool);
if (!device.HasWarpIntrinsics()) {
LOG_ERROR(Render_OpenGL,
"Nvidia vote intrinsics are not available and its required by a shader");
// Stub with a warp size of one.
return value;
}
return fmt::format("{}({})", func, value);
}
std::string VoteAll(Operation operation) {
return Vote(operation, "allThreadsNV");
}
std::string VoteAny(Operation operation) {
return Vote(operation, "anyThreadNV");
}
std::string VoteEqual(Operation operation) {
if (!device.HasWarpIntrinsics()) {
LOG_ERROR(Render_OpenGL,
"Nvidia vote intrinsics are not available and its required by a shader");
// We must return true here since a stub for a theoretical warp size of 1 will always
// return an equal result for all its votes.
return "true";
}
return Vote(operation, "allThreadsEqualNV");
}
static constexpr std::array operation_decompilers = {
&GLSLDecompiler::Assign,
@@ -1871,6 +1927,11 @@ private:
&GLSLDecompiler::WorkGroupId<0>,
&GLSLDecompiler::WorkGroupId<1>,
&GLSLDecompiler::WorkGroupId<2>,
&GLSLDecompiler::BallotThread,
&GLSLDecompiler::VoteAll,
&GLSLDecompiler::VoteAny,
&GLSLDecompiler::VoteEqual,
};
static_assert(operation_decompilers.size() == static_cast<std::size_t>(OperationCode::Amount));
@@ -1937,7 +1998,7 @@ private:
}
u32 GetNumPhysicalInputAttributes() const {
return stage == ShaderStage::Vertex ? GetNumPhysicalAttributes() : GetNumPhysicalVaryings();
return IsVertexShader(stage) ? GetNumPhysicalAttributes() : GetNumPhysicalVaryings();
}
u32 GetNumPhysicalAttributes() const {
@@ -1950,7 +2011,7 @@ private:
const Device& device;
const ShaderIR& ir;
const ShaderStage stage;
const ProgramType stage;
const std::string suffix;
const Header header;
@@ -1981,7 +2042,7 @@ std::string GetCommonDeclarations() {
MAX_CONSTBUFFER_ELEMENTS);
}
ProgramResult Decompile(const Device& device, const ShaderIR& ir, Maxwell::ShaderStage stage,
ProgramResult Decompile(const Device& device, const ShaderIR& ir, ProgramType stage,
const std::string& suffix) {
GLSLDecompiler decompiler(device, ir, stage, suffix);
decompiler.Decompile();

View File

@@ -12,14 +12,26 @@
#include "video_core/engines/maxwell_3d.h"
#include "video_core/shader/shader_ir.h"
namespace OpenGL {
class Device;
}
namespace VideoCommon::Shader {
class ShaderIR;
}
namespace OpenGL {
class Device;
enum class ProgramType : u32 {
VertexA = 0,
VertexB = 1,
TessellationControl = 2,
TessellationEval = 3,
Geometry = 4,
Fragment = 5,
Compute = 6
};
} // namespace OpenGL
namespace OpenGL::GLShader {
struct ShaderEntries;
@@ -85,6 +97,6 @@ struct ShaderEntries {
std::string GetCommonDeclarations();
ProgramResult Decompile(const Device& device, const VideoCommon::Shader::ShaderIR& ir,
Maxwell::ShaderStage stage, const std::string& suffix);
ProgramType stage, const std::string& suffix);
} // namespace OpenGL::GLShader

View File

@@ -51,7 +51,7 @@ ShaderCacheVersionHash GetShaderCacheVersionHash() {
} // namespace
ShaderDiskCacheRaw::ShaderDiskCacheRaw(u64 unique_identifier, Maxwell::ShaderProgram program_type,
ShaderDiskCacheRaw::ShaderDiskCacheRaw(u64 unique_identifier, ProgramType program_type,
u32 program_code_size, u32 program_code_size_b,
ProgramCode program_code, ProgramCode program_code_b)
: unique_identifier{unique_identifier}, program_type{program_type},

View File

@@ -18,7 +18,6 @@
#include "common/assert.h"
#include "common/common_types.h"
#include "core/file_sys/vfs_vector.h"
#include "video_core/engines/maxwell_3d.h"
#include "video_core/renderer_opengl/gl_shader_gen.h"
namespace Core {
@@ -34,14 +33,11 @@ namespace OpenGL {
struct ShaderDiskCacheUsage;
struct ShaderDiskCacheDump;
using ShaderDumpsMap = std::unordered_map<ShaderDiskCacheUsage, ShaderDiskCacheDump>;
using ProgramCode = std::vector<u64>;
using Maxwell = Tegra::Engines::Maxwell3D::Regs;
using ShaderDumpsMap = std::unordered_map<ShaderDiskCacheUsage, ShaderDiskCacheDump>;
using TextureBufferUsage = std::bitset<64>;
/// Allocated bindings used by an OpenGL shader program.
/// Allocated bindings used by an OpenGL shader program
struct BaseBindings {
u32 cbuf{};
u32 gmem{};
@@ -126,7 +122,7 @@ namespace OpenGL {
/// Describes a shader how it's used by the guest GPU
class ShaderDiskCacheRaw {
public:
explicit ShaderDiskCacheRaw(u64 unique_identifier, Maxwell::ShaderProgram program_type,
explicit ShaderDiskCacheRaw(u64 unique_identifier, ProgramType program_type,
u32 program_code_size, u32 program_code_size_b,
ProgramCode program_code, ProgramCode program_code_b);
ShaderDiskCacheRaw();
@@ -141,30 +137,13 @@ public:
}
bool HasProgramA() const {
return program_type == Maxwell::ShaderProgram::VertexA;
return program_type == ProgramType::VertexA;
}
Maxwell::ShaderProgram GetProgramType() const {
ProgramType GetProgramType() const {
return program_type;
}
Maxwell::ShaderStage GetProgramStage() const {
switch (program_type) {
case Maxwell::ShaderProgram::VertexA:
case Maxwell::ShaderProgram::VertexB:
return Maxwell::ShaderStage::Vertex;
case Maxwell::ShaderProgram::TesselationControl:
return Maxwell::ShaderStage::TesselationControl;
case Maxwell::ShaderProgram::TesselationEval:
return Maxwell::ShaderStage::TesselationEval;
case Maxwell::ShaderProgram::Geometry:
return Maxwell::ShaderStage::Geometry;
case Maxwell::ShaderProgram::Fragment:
return Maxwell::ShaderStage::Fragment;
}
UNREACHABLE();
}
const ProgramCode& GetProgramCode() const {
return program_code;
}
@@ -175,7 +154,7 @@ public:
private:
u64 unique_identifier{};
Maxwell::ShaderProgram program_type{};
ProgramType program_type{};
u32 program_code_size{};
u32 program_code_size_b{};

Some files were not shown because too many files have changed in this diff Show More