"Merge Tagged PR 1012"

"Merge Tagged PR 1340"
"Merge Tagged PR 1703"
2020-03-09 12:01:13 +00:00 · 2020-03-09 12:01:13 +00:00 · 2020-03-09 12:01:12 +00:00 · 2020-03-08 16:28:07 -04:00 · 2020-03-08 15:59:38 -03:00 · 2020-03-07 22:28:35 -05:00
134 changed files with 5224 additions and 1865 deletions
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 yuzu emulator
 =============
-[![Travis CI Build Status](https://travis-ci.org/yuzu-emu/yuzu.svg?branch=master)](https://travis-ci.org/yuzu-emu/yuzu)
+[![Travis CI Build Status](https://travis-ci.com/yuzu-emu/yuzu.svg?branch=master)](https://travis-ci.com/yuzu-emu/yuzu)
 [![Azure Mainline CI Build Status](https://dev.azure.com/yuzu-emu/yuzu/_apis/build/status/yuzu%20mainline?branchName=master)](https://dev.azure.com/yuzu-emu/yuzu/)

 yuzu is an experimental open-source emulator for the Nintendo Switch from the creators of [Citra](https://citra-emu.org/).
@@ -21,7 +21,7 @@ For development discussion, please join us on [Discord](https://discord.gg/XQV6d

 Most of the development happens on GitHub. It's also where [our central repository](https://github.com/yuzu-emu/yuzu) is hosted.

-If you want to contribute please take a look at the [Contributor's Guide](CONTRIBUTING.md) and [Developer Information](https://github.com/yuzu-emu/yuzu/wiki/Developer-Information). You should as well contact any of the developers on Discord in order to know about the current state of the emulator.
+If you want to contribute please take a look at the [Contributor's Guide](CONTRIBUTING.md) and [Developer Information](https://github.com/yuzu-emu/yuzu/wiki/Developer-Information). You should also contact any of the developers on Discord in order to know about the current state of the emulator.

 ### Building

--- a/externals/httplib/README.md
+++ b/externals/httplib/README.md
@@ -1,4 +1,4 @@
-From https://github.com/yhirose/cpp-httplib/commit/d9479bc0b12e8a1e8bce2d34da4feeef488581f3
+From https://github.com/yhirose/cpp-httplib/tree/fce8e6fefdab4ad48bc5b25c98e5ebfda4f3cf53

 MIT License

--- a/externals/httplib/httplib.h
+++ b/externals/httplib/httplib.h
--- a/src/audio_core/algorithm/interpolate.cpp
+++ b/src/audio_core/algorithm/interpolate.cpp
@@ -5,21 +5,141 @@
 #define _USE_MATH_DEFINES

 #include <algorithm>
+#include <climits>
 #include <cmath>
 #include <vector>
+
 #include "audio_core/algorithm/interpolate.h"
 #include "common/common_types.h"
 #include "common/logging/log.h"

 namespace AudioCore {

-/// The Lanczos kernel
-static double Lanczos(std::size_t a, double x) {
-    if (x == 0.0)
-        return 1.0;
-    const double px = M_PI * x;
-    return a * std::sin(px) * std::sin(px / a) / (px * px);
-}
+constexpr std::array<s16, 512> curve_lut0{
+    6600,  19426, 6722,  3,     6479,  19424, 6845,  9,     6359,  19419, 6968,  15,    6239,
+    19412, 7093,  22,    6121,  19403, 7219,  28,    6004,  19391, 7345,  34,    5888,  19377,
+    7472,  41,    5773,  19361, 7600,  48,    5659,  19342, 7728,  55,    5546,  19321, 7857,
+    62,    5434,  19298, 7987,  69,    5323,  19273, 8118,  77,    5213,  19245, 8249,  84,
+    5104,  19215, 8381,  92,    4997,  19183, 8513,  101,   4890,  19148, 8646,  109,   4785,
+    19112, 8780,  118,   4681,  19073, 8914,  127,   4579,  19031, 9048,  137,   4477,  18988,
+    9183,  147,   4377,  18942, 9318,  157,   4277,  18895, 9454,  168,   4179,  18845, 9590,
+    179,   4083,  18793, 9726,  190,   3987,  18738, 9863,  202,   3893,  18682, 10000, 215,
+    3800,  18624, 10137, 228,   3709,  18563, 10274, 241,   3618,  18500, 10411, 255,   3529,
+    18436, 10549, 270,   3441,  18369, 10687, 285,   3355,  18300, 10824, 300,   3269,  18230,
+    10962, 317,   3186,  18157, 11100, 334,   3103,  18082, 11238, 351,   3022,  18006, 11375,
+    369,   2942,  17927, 11513, 388,   2863,  17847, 11650, 408,   2785,  17765, 11788, 428,
+    2709,  17681, 11925, 449,   2635,  17595, 12062, 471,   2561,  17507, 12198, 494,   2489,
+    17418, 12334, 517,   2418,  17327, 12470, 541,   2348,  17234, 12606, 566,   2280,  17140,
+    12741, 592,   2213,  17044, 12876, 619,   2147,  16946, 13010, 647,   2083,  16846, 13144,
+    675,   2020,  16745, 13277, 704,   1958,  16643, 13409, 735,   1897,  16539, 13541, 766,
+    1838,  16434, 13673, 798,   1780,  16327, 13803, 832,   1723,  16218, 13933, 866,   1667,
+    16109, 14062, 901,   1613,  15998, 14191, 937,   1560,  15885, 14318, 975,   1508,  15772,
+    14445, 1013,  1457,  15657, 14571, 1052,  1407,  15540, 14695, 1093,  1359,  15423, 14819,
+    1134,  1312,  15304, 14942, 1177,  1266,  15185, 15064, 1221,  1221,  15064, 15185, 1266,
+    1177,  14942, 15304, 1312,  1134,  14819, 15423, 1359,  1093,  14695, 15540, 1407,  1052,
+    14571, 15657, 1457,  1013,  14445, 15772, 1508,  975,   14318, 15885, 1560,  937,   14191,
+    15998, 1613,  901,   14062, 16109, 1667,  866,   13933, 16218, 1723,  832,   13803, 16327,
+    1780,  798,   13673, 16434, 1838,  766,   13541, 16539, 1897,  735,   13409, 16643, 1958,
+    704,   13277, 16745, 2020,  675,   13144, 16846, 2083,  647,   13010, 16946, 2147,  619,
+    12876, 17044, 2213,  592,   12741, 17140, 2280,  566,   12606, 17234, 2348,  541,   12470,
+    17327, 2418,  517,   12334, 17418, 2489,  494,   12198, 17507, 2561,  471,   12062, 17595,
+    2635,  449,   11925, 17681, 2709,  428,   11788, 17765, 2785,  408,   11650, 17847, 2863,
+    388,   11513, 17927, 2942,  369,   11375, 18006, 3022,  351,   11238, 18082, 3103,  334,
+    11100, 18157, 3186,  317,   10962, 18230, 3269,  300,   10824, 18300, 3355,  285,   10687,
+    18369, 3441,  270,   10549, 18436, 3529,  255,   10411, 18500, 3618,  241,   10274, 18563,
+    3709,  228,   10137, 18624, 3800,  215,   10000, 18682, 3893,  202,   9863,  18738, 3987,
+    190,   9726,  18793, 4083,  179,   9590,  18845, 4179,  168,   9454,  18895, 4277,  157,
+    9318,  18942, 4377,  147,   9183,  18988, 4477,  137,   9048,  19031, 4579,  127,   8914,
+    19073, 4681,  118,   8780,  19112, 4785,  109,   8646,  19148, 4890,  101,   8513,  19183,
+    4997,  92,    8381,  19215, 5104,  84,    8249,  19245, 5213,  77,    8118,  19273, 5323,
+    69,    7987,  19298, 5434,  62,    7857,  19321, 5546,  55,    7728,  19342, 5659,  48,
+    7600,  19361, 5773,  41,    7472,  19377, 5888,  34,    7345,  19391, 6004,  28,    7219,
+    19403, 6121,  22,    7093,  19412, 6239,  15,    6968,  19419, 6359,  9,     6845,  19424,
+    6479,  3,     6722,  19426, 6600};
+
+constexpr std::array<s16, 512> curve_lut1{
+    -68,   32639, 69,    -5,    -200,  32630, 212,   -15,   -328,  32613, 359,   -26,   -450,
+    32586, 512,   -36,   -568,  32551, 669,   -47,   -680,  32507, 832,   -58,   -788,  32454,
+    1000,  -69,   -891,  32393, 1174,  -80,   -990,  32323, 1352,  -92,   -1084, 32244, 1536,
+    -103,  -1173, 32157, 1724,  -115,  -1258, 32061, 1919,  -128,  -1338, 31956, 2118,  -140,
+    -1414, 31844, 2322,  -153,  -1486, 31723, 2532,  -167,  -1554, 31593, 2747,  -180,  -1617,
+    31456, 2967,  -194,  -1676, 31310, 3192,  -209,  -1732, 31157, 3422,  -224,  -1783, 30995,
+    3657,  -240,  -1830, 30826, 3897,  -256,  -1874, 30649, 4143,  -272,  -1914, 30464, 4393,
+    -289,  -1951, 30272, 4648,  -307,  -1984, 30072, 4908,  -325,  -2014, 29866, 5172,  -343,
+    -2040, 29652, 5442,  -362,  -2063, 29431, 5716,  -382,  -2083, 29203, 5994,  -403,  -2100,
+    28968, 6277,  -424,  -2114, 28727, 6565,  -445,  -2125, 28480, 6857,  -468,  -2133, 28226,
+    7153,  -490,  -2139, 27966, 7453,  -514,  -2142, 27700, 7758,  -538,  -2142, 27428, 8066,
+    -563,  -2141, 27151, 8378,  -588,  -2136, 26867, 8694,  -614,  -2130, 26579, 9013,  -641,
+    -2121, 26285, 9336,  -668,  -2111, 25987, 9663,  -696,  -2098, 25683, 9993,  -724,  -2084,
+    25375, 10326, -753,  -2067, 25063, 10662, -783,  -2049, 24746, 11000, -813,  -2030, 24425,
+    11342, -844,  -2009, 24100, 11686, -875,  -1986, 23771, 12033, -907,  -1962, 23438, 12382,
+    -939,  -1937, 23103, 12733, -972,  -1911, 22764, 13086, -1005, -1883, 22422, 13441, -1039,
+    -1855, 22077, 13798, -1072, -1825, 21729, 14156, -1107, -1795, 21380, 14516, -1141, -1764,
+    21027, 14877, -1176, -1732, 20673, 15239, -1211, -1700, 20317, 15602, -1246, -1667, 19959,
+    15965, -1282, -1633, 19600, 16329, -1317, -1599, 19239, 16694, -1353, -1564, 18878, 17058,
+    -1388, -1530, 18515, 17423, -1424, -1495, 18151, 17787, -1459, -1459, 17787, 18151, -1495,
+    -1424, 17423, 18515, -1530, -1388, 17058, 18878, -1564, -1353, 16694, 19239, -1599, -1317,
+    16329, 19600, -1633, -1282, 15965, 19959, -1667, -1246, 15602, 20317, -1700, -1211, 15239,
+    20673, -1732, -1176, 14877, 21027, -1764, -1141, 14516, 21380, -1795, -1107, 14156, 21729,
+    -1825, -1072, 13798, 22077, -1855, -1039, 13441, 22422, -1883, -1005, 13086, 22764, -1911,
+    -972,  12733, 23103, -1937, -939,  12382, 23438, -1962, -907,  12033, 23771, -1986, -875,
+    11686, 24100, -2009, -844,  11342, 24425, -2030, -813,  11000, 24746, -2049, -783,  10662,
+    25063, -2067, -753,  10326, 25375, -2084, -724,  9993,  25683, -2098, -696,  9663,  25987,
+    -2111, -668,  9336,  26285, -2121, -641,  9013,  26579, -2130, -614,  8694,  26867, -2136,
+    -588,  8378,  27151, -2141, -563,  8066,  27428, -2142, -538,  7758,  27700, -2142, -514,
+    7453,  27966, -2139, -490,  7153,  28226, -2133, -468,  6857,  28480, -2125, -445,  6565,
+    28727, -2114, -424,  6277,  28968, -2100, -403,  5994,  29203, -2083, -382,  5716,  29431,
+    -2063, -362,  5442,  29652, -2040, -343,  5172,  29866, -2014, -325,  4908,  30072, -1984,
+    -307,  4648,  30272, -1951, -289,  4393,  30464, -1914, -272,  4143,  30649, -1874, -256,
+    3897,  30826, -1830, -240,  3657,  30995, -1783, -224,  3422,  31157, -1732, -209,  3192,
+    31310, -1676, -194,  2967,  31456, -1617, -180,  2747,  31593, -1554, -167,  2532,  31723,
+    -1486, -153,  2322,  31844, -1414, -140,  2118,  31956, -1338, -128,  1919,  32061, -1258,
+    -115,  1724,  32157, -1173, -103,  1536,  32244, -1084, -92,   1352,  32323, -990,  -80,
+    1174,  32393, -891,  -69,   1000,  32454, -788,  -58,   832,   32507, -680,  -47,   669,
+    32551, -568,  -36,   512,   32586, -450,  -26,   359,   32613, -328,  -15,   212,   32630,
+    -200,  -5,    69,    32639, -68};
+
+constexpr std::array<s16, 512> curve_lut2{
+    3195,  26287, 3329,  -32,   3064,  26281, 3467,  -34,   2936,  26270, 3608,  -38,   2811,
+    26253, 3751,  -42,   2688,  26230, 3897,  -46,   2568,  26202, 4046,  -50,   2451,  26169,
+    4199,  -54,   2338,  26130, 4354,  -58,   2227,  26085, 4512,  -63,   2120,  26035, 4673,
+    -67,   2015,  25980, 4837,  -72,   1912,  25919, 5004,  -76,   1813,  25852, 5174,  -81,
+    1716,  25780, 5347,  -87,   1622,  25704, 5522,  -92,   1531,  25621, 5701,  -98,   1442,
+    25533, 5882,  -103,  1357,  25440, 6066,  -109,  1274,  25342, 6253,  -115,  1193,  25239,
+    6442,  -121,  1115,  25131, 6635,  -127,  1040,  25018, 6830,  -133,  967,   24899, 7027,
+    -140,  897,   24776, 7227,  -146,  829,   24648, 7430,  -153,  764,   24516, 7635,  -159,
+    701,   24379, 7842,  -166,  641,   24237, 8052,  -174,  583,   24091, 8264,  -181,  526,
+    23940, 8478,  -187,  472,   23785, 8695,  -194,  420,   23626, 8914,  -202,  371,   23462,
+    9135,  -209,  324,   23295, 9358,  -215,  279,   23123, 9583,  -222,  236,   22948, 9809,
+    -230,  194,   22769, 10038, -237,  154,   22586, 10269, -243,  117,   22399, 10501, -250,
+    81,    22208, 10735, -258,  47,    22015, 10970, -265,  15,    21818, 11206, -271,  -16,
+    21618, 11444, -277,  -44,   21415, 11684, -283,  -71,   21208, 11924, -290,  -97,   20999,
+    12166, -296,  -121,  20786, 12409, -302,  -143,  20571, 12653, -306,  -163,  20354, 12898,
+    -311,  -183,  20134, 13143, -316,  -201,  19911, 13389, -321,  -218,  19686, 13635, -325,
+    -234,  19459, 13882, -328,  -248,  19230, 14130, -332,  -261,  18998, 14377, -335,  -273,
+    18765, 14625, -337,  -284,  18531, 14873, -339,  -294,  18295, 15121, -341,  -302,  18057,
+    15369, -341,  -310,  17817, 15617, -341,  -317,  17577, 15864, -340,  -323,  17335, 16111,
+    -340,  -328,  17092, 16357, -338,  -332,  16848, 16603, -336,  -336,  16603, 16848, -332,
+    -338,  16357, 17092, -328,  -340,  16111, 17335, -323,  -340,  15864, 17577, -317,  -341,
+    15617, 17817, -310,  -341,  15369, 18057, -302,  -341,  15121, 18295, -294,  -339,  14873,
+    18531, -284,  -337,  14625, 18765, -273,  -335,  14377, 18998, -261,  -332,  14130, 19230,
+    -248,  -328,  13882, 19459, -234,  -325,  13635, 19686, -218,  -321,  13389, 19911, -201,
+    -316,  13143, 20134, -183,  -311,  12898, 20354, -163,  -306,  12653, 20571, -143,  -302,
+    12409, 20786, -121,  -296,  12166, 20999, -97,   -290,  11924, 21208, -71,   -283,  11684,
+    21415, -44,   -277,  11444, 21618, -16,   -271,  11206, 21818, 15,    -265,  10970, 22015,
+    47,    -258,  10735, 22208, 81,    -250,  10501, 22399, 117,   -243,  10269, 22586, 154,
+    -237,  10038, 22769, 194,   -230,  9809,  22948, 236,   -222,  9583,  23123, 279,   -215,
+    9358,  23295, 324,   -209,  9135,  23462, 371,   -202,  8914,  23626, 420,   -194,  8695,
+    23785, 472,   -187,  8478,  23940, 526,   -181,  8264,  24091, 583,   -174,  8052,  24237,
+    641,   -166,  7842,  24379, 701,   -159,  7635,  24516, 764,   -153,  7430,  24648, 829,
+    -146,  7227,  24776, 897,   -140,  7027,  24899, 967,   -133,  6830,  25018, 1040,  -127,
+    6635,  25131, 1115,  -121,  6442,  25239, 1193,  -115,  6253,  25342, 1274,  -109,  6066,
+    25440, 1357,  -103,  5882,  25533, 1442,  -98,   5701,  25621, 1531,  -92,   5522,  25704,
+    1622,  -87,   5347,  25780, 1716,  -81,   5174,  25852, 1813,  -76,   5004,  25919, 1912,
+    -72,   4837,  25980, 2015,  -67,   4673,  26035, 2120,  -63,   4512,  26085, 2227,  -58,
+    4354,  26130, 2338,  -54,   4199,  26169, 2451,  -50,   4046,  26202, 2568,  -46,   3897,
+    26230, 2688,  -42,   3751,  26253, 2811,  -38,   3608,  26270, 2936,  -34,   3467,  26281,
+    3064,  -32,   3329,  26287, 3195};

 std::vector<s16> Interpolate(InterpolationState& state, std::vector<s16> input, double ratio) {
    if (input.size() < 2)
@@ -27,43 +147,51 @@ std::vector<s16> Interpolate(InterpolationState& state, std::vector<s16> input,

    if (ratio <= 0) {
        LOG_CRITICAL(Audio, "Nonsensical interpolation ratio {}", ratio);
-        ratio = 1.0;
+        return input;
    }

-    if (ratio != state.current_ratio) {
-        const double cutoff_frequency = std::min(0.5 / ratio, 0.5 * ratio);
-        state.nyquist = CascadingFilter::LowPass(std::clamp(cutoff_frequency, 0.0, 0.4), 3);
-        state.current_ratio = ratio;
-    }
-    state.nyquist.Process(input);
+    const s32 step{static_cast<s32>(ratio * 0x8000)};
+    const std::array<s16, 512>& lut = [step] {
+        if (step > 0xaaaa) {
+            return curve_lut0;
+        }
+        if (step <= 0x8000) {
+            return curve_lut1;
+        }
+        return curve_lut2;
+    }();

-    constexpr std::size_t taps = InterpolationState::lanczos_taps;
-    const std::size_t num_frames = input.size() / 2;
+    const std::size_t num_frames{input.size() / 2};

    std::vector<s16> output;
-    output.reserve(static_cast<std::size_t>(input.size() / ratio + 4));
+    output.reserve(static_cast<std::size_t>(input.size() / ratio + InterpolationState::taps));

-    double& pos = state.position;
-    auto& h = state.history;
-    for (std::size_t i = 0; i < num_frames; ++i) {
-        std::rotate(h.begin(), h.end() - 1, h.end());
-        h[0][0] = input[i * 2 + 0];
-        h[0][1] = input[i * 2 + 1];
+    for (std::size_t frame{}; frame < num_frames; ++frame) {
+        const std::size_t lut_index{(state.fraction >> 8) * InterpolationState::taps};

-        while (pos <= 1.0) {
-            double l = 0.0;
-            double r = 0.0;
-            for (std::size_t j = 0; j < h.size(); j++) {
-                const double lanczos_calc = Lanczos(taps, pos + j - taps + 1);
-                l += lanczos_calc * h[j][0];
-                r += lanczos_calc * h[j][1];
-            }
-            output.emplace_back(static_cast<s16>(std::clamp(l, -32768.0, 32767.0)));
-            output.emplace_back(static_cast<s16>(std::clamp(r, -32768.0, 32767.0)));
+        std::rotate(state.history.begin(), state.history.end() - 1, state.history.end());
+        state.history[0][0] = input[frame * 2 + 0];
+        state.history[0][1] = input[frame * 2 + 1];

-            pos += ratio;
+        while (state.position <= 1.0) {
+            const s32 left{state.history[0][0] * lut[lut_index + 0] +
+                           state.history[1][0] * lut[lut_index + 1] +
+                           state.history[2][0] * lut[lut_index + 2] +
+                           state.history[3][0] * lut[lut_index + 3]};
+            const s32 right{state.history[0][1] * lut[lut_index + 0] +
+                            state.history[1][1] * lut[lut_index + 1] +
+                            state.history[2][1] * lut[lut_index + 2] +
+                            state.history[3][1] * lut[lut_index + 3]};
+            const s32 new_offset{state.fraction + step};
+
+            state.fraction = new_offset & 0x7fff;
+
+            output.emplace_back(static_cast<s16>(std::clamp(left >> 15, SHRT_MIN, SHRT_MAX)));
+            output.emplace_back(static_cast<s16>(std::clamp(right >> 15, SHRT_MIN, SHRT_MAX)));
+
+            state.position += ratio;
        }
-        pos -= 1.0;
+        state.position -= 1.0;
    }

    return output;
--- a/src/audio_core/algorithm/interpolate.h
+++ b/src/audio_core/algorithm/interpolate.h
@@ -6,19 +6,17 @@

 #include <array>
 #include <vector>
-#include "audio_core/algorithm/filter.h"
+
 #include "common/common_types.h"

 namespace AudioCore {

 struct InterpolationState {
-    static constexpr std::size_t lanczos_taps = 4;
-    static constexpr std::size_t history_size = lanczos_taps * 2 - 1;
-
-    double current_ratio = 0.0;
-    CascadingFilter nyquist;
-    std::array<std::array<s16, 2>, history_size> history = {};
-    double position = 0;
+    static constexpr std::size_t taps{4};
+    static constexpr std::size_t history_size{taps * 2 - 1};
+    std::array<std::array<s16, 2>, history_size> history{};
+    double position{};
+    s32 fraction{};
 };

 /// Interpolates input signal to produce output signal.
--- a/src/common/assert.h
+++ b/src/common/assert.h
@@ -28,22 +28,19 @@ __declspec(noinline, noreturn)
 }

 #define ASSERT(_a_)                                                                                \
-    do                                                                                             \
-        if (!(_a_)) {                                                                              \
-            assert_noinline_call([] { LOG_CRITICAL(Debug, "Assertion Failed!"); });                \
-        }                                                                                          \
-    while (0)
+    if (!(_a_)) {                                                                                  \
+        LOG_CRITICAL(Debug, "Assertion Failed!");                                                  \
+    }

 #define ASSERT_MSG(_a_, ...)                                                                       \
-    do                                                                                             \
-        if (!(_a_)) {                                                                              \
-            assert_noinline_call([&] { LOG_CRITICAL(Debug, "Assertion Failed!\n" __VA_ARGS__); }); \
-        }                                                                                          \
-    while (0)
+    if (!(_a_)) {                                                                                  \
+        LOG_CRITICAL(Debug, "Assertion Failed! " __VA_ARGS__);                                     \
+    }

-#define UNREACHABLE() assert_noinline_call([] { LOG_CRITICAL(Debug, "Unreachable code!"); })
+#define UNREACHABLE()                                                                              \
+    { LOG_CRITICAL(Debug, "Unreachable code!"); }
 #define UNREACHABLE_MSG(...)                                                                       \
-    assert_noinline_call([&] { LOG_CRITICAL(Debug, "Unreachable code!\n" __VA_ARGS__); })
+    { LOG_CRITICAL(Debug, "Unreachable code!\n" __VA_ARGS__); }

 #ifdef _DEBUG
 #define DEBUG_ASSERT(_a_) ASSERT(_a_)
--- a/src/core/CMakeLists.txt
+++ b/src/core/CMakeLists.txt
@@ -131,8 +131,8 @@ add_library(core STATIC
    frontend/framebuffer_layout.cpp
    frontend/framebuffer_layout.h
    frontend/input.h
-    frontend/scope_acquire_window_context.cpp
-    frontend/scope_acquire_window_context.h
+    frontend/scope_acquire_context.cpp
+    frontend/scope_acquire_context.h
    gdbstub/gdbstub.cpp
    gdbstub/gdbstub.h
    hardware_interrupt_manager.cpp
@@ -187,6 +187,8 @@ add_library(core STATIC
    hle/kernel/synchronization.h
    hle/kernel/thread.cpp
    hle/kernel/thread.h
+    hle/kernel/time_manager.cpp
+    hle/kernel/time_manager.h
    hle/kernel/transfer_memory.cpp
    hle/kernel/transfer_memory.h
    hle/kernel/vm_manager.cpp
@@ -593,8 +595,12 @@ endif()

 if (ARCHITECTURE_x86_64)
    target_sources(core PRIVATE
-        arm/dynarmic/arm_dynarmic.cpp
-        arm/dynarmic/arm_dynarmic.h
+        arm/dynarmic/arm_dynarmic_32.cpp
+        arm/dynarmic/arm_dynarmic_32.h
+        arm/dynarmic/arm_dynarmic_64.cpp
+        arm/dynarmic/arm_dynarmic_64.h
+        arm/dynarmic/arm_dynarmic_cp15.cpp
+        arm/dynarmic/arm_dynarmic_cp15.h
    )
    target_link_libraries(core PRIVATE dynarmic)
 endif()
--- a/src/core/arm/arm_interface.h
+++ b/src/core/arm/arm_interface.h
@@ -25,7 +25,20 @@ public:
    explicit ARM_Interface(System& system_) : system{system_} {}
    virtual ~ARM_Interface() = default;

-    struct ThreadContext {
+    struct ThreadContext32 {
+        std::array<u32, 16> cpu_registers;
+        u32 cpsr;
+        std::array<u8, 4> padding;
+        std::array<u64, 32> fprs;
+        u32 fpscr;
+        u32 fpexc;
+        u32 tpidr;
+    };
+    // Internally within the kernel, it expects the AArch32 version of the
+    // thread context to be 344 bytes in size.
+    static_assert(sizeof(ThreadContext32) == 0x158);
+
+    struct ThreadContext64 {
        std::array<u64, 31> cpu_registers;
        u64 sp;
        u64 pc;
@@ -38,7 +51,7 @@ public:
    };
    // Internally within the kernel, it expects the AArch64 version of the
    // thread context to be 800 bytes in size.
-    static_assert(sizeof(ThreadContext) == 0x320);
+    static_assert(sizeof(ThreadContext64) == 0x320);

    /// Runs the CPU until an event happens
    virtual void Run() = 0;
@@ -130,17 +143,10 @@ public:
     */
    virtual void SetTPIDR_EL0(u64 value) = 0;

-    /**
-     * Saves the current CPU context
-     * @param ctx Thread context to save
-     */
-    virtual void SaveContext(ThreadContext& ctx) = 0;
-
-    /**
-     * Loads a CPU context
-     * @param ctx Thread context to load
-     */
-    virtual void LoadContext(const ThreadContext& ctx) = 0;
+    virtual void SaveContext(ThreadContext32& ctx) = 0;
+    virtual void SaveContext(ThreadContext64& ctx) = 0;
+    virtual void LoadContext(const ThreadContext32& ctx) = 0;
+    virtual void LoadContext(const ThreadContext64& ctx) = 0;

    /// Clears the exclusive monitor's state.
    virtual void ClearExclusiveState() = 0;
--- a/src/core/arm/dynarmic/arm_dynarmic_32.cpp
+++ b/src/core/arm/dynarmic/arm_dynarmic_32.cpp
@@ -0,0 +1,208 @@
+// Copyright 2020 yuzu emulator team
+// Licensed under GPLv2 or any later version
+// Refer to the license.txt file included.
+
+#include <cinttypes>
+#include <memory>
+#include <dynarmic/A32/a32.h>
+#include <dynarmic/A32/config.h>
+#include <dynarmic/A32/context.h>
+#include "common/microprofile.h"
+#include "core/arm/dynarmic/arm_dynarmic_32.h"
+#include "core/arm/dynarmic/arm_dynarmic_64.h"
+#include "core/arm/dynarmic/arm_dynarmic_cp15.h"
+#include "core/core.h"
+#include "core/core_manager.h"
+#include "core/core_timing.h"
+#include "core/hle/kernel/svc.h"
+#include "core/memory.h"
+
+namespace Core {
+
+class DynarmicCallbacks32 : public Dynarmic::A32::UserCallbacks {
+public:
+    explicit DynarmicCallbacks32(ARM_Dynarmic_32& parent) : parent(parent) {}
+
+    u8 MemoryRead8(u32 vaddr) override {
+        return parent.system.Memory().Read8(vaddr);
+    }
+    u16 MemoryRead16(u32 vaddr) override {
+        return parent.system.Memory().Read16(vaddr);
+    }
+    u32 MemoryRead32(u32 vaddr) override {
+        return parent.system.Memory().Read32(vaddr);
+    }
+    u64 MemoryRead64(u32 vaddr) override {
+        return parent.system.Memory().Read64(vaddr);
+    }
+
+    void MemoryWrite8(u32 vaddr, u8 value) override {
+        parent.system.Memory().Write8(vaddr, value);
+    }
+    void MemoryWrite16(u32 vaddr, u16 value) override {
+        parent.system.Memory().Write16(vaddr, value);
+    }
+    void MemoryWrite32(u32 vaddr, u32 value) override {
+        parent.system.Memory().Write32(vaddr, value);
+    }
+    void MemoryWrite64(u32 vaddr, u64 value) override {
+        parent.system.Memory().Write64(vaddr, value);
+    }
+
+    void InterpreterFallback(u32 pc, std::size_t num_instructions) override {
+        UNIMPLEMENTED();
+    }
+
+    void ExceptionRaised(u32 pc, Dynarmic::A32::Exception exception) override {
+        switch (exception) {
+        case Dynarmic::A32::Exception::UndefinedInstruction:
+        case Dynarmic::A32::Exception::UnpredictableInstruction:
+            break;
+        case Dynarmic::A32::Exception::Breakpoint:
+            break;
+        }
+        LOG_CRITICAL(HW_GPU, "ExceptionRaised(exception = {}, pc = {:08X}, code = {:08X})",
+                     static_cast<std::size_t>(exception), pc, MemoryReadCode(pc));
+        UNIMPLEMENTED();
+    }
+
+    void CallSVC(u32 swi) override {
+        Kernel::CallSVC(parent.system, swi);
+    }
+
+    void AddTicks(u64 ticks) override {
+        // Divide the number of ticks by the amount of CPU cores. TODO(Subv): This yields only a
+        // rough approximation of the amount of executed ticks in the system, it may be thrown off
+        // if not all cores are doing a similar amount of work. Instead of doing this, we should
+        // device a way so that timing is consistent across all cores without increasing the ticks 4
+        // times.
+        u64 amortized_ticks = (ticks - num_interpreted_instructions) / Core::NUM_CPU_CORES;
+        // Always execute at least one tick.
+        amortized_ticks = std::max<u64>(amortized_ticks, 1);
+
+        parent.system.CoreTiming().AddTicks(amortized_ticks);
+        num_interpreted_instructions = 0;
+    }
+    u64 GetTicksRemaining() override {
+        return std::max(parent.system.CoreTiming().GetDowncount(), {});
+    }
+
+    ARM_Dynarmic_32& parent;
+    std::size_t num_interpreted_instructions{};
+    u64 tpidrro_el0{};
+    u64 tpidr_el0{};
+};
+
+std::shared_ptr<Dynarmic::A32::Jit> ARM_Dynarmic_32::MakeJit(Common::PageTable& page_table,
+                                                             std::size_t address_space_bits) const {
+    Dynarmic::A32::UserConfig config;
+    config.callbacks = cb.get();
+    // TODO(bunnei): Implement page table for 32-bit
+    // config.page_table = &page_table.pointers;
+    config.coprocessors[15] = std::make_shared<DynarmicCP15>((u32*)&CP15_regs[0]);
+    config.define_unpredictable_behaviour = true;
+    return std::make_unique<Dynarmic::A32::Jit>(config);
+}
+
+MICROPROFILE_DEFINE(ARM_Jit_Dynarmic_32, "ARM JIT", "Dynarmic", MP_RGB(255, 64, 64));
+
+void ARM_Dynarmic_32::Run() {
+    MICROPROFILE_SCOPE(ARM_Jit_Dynarmic_32);
+    jit->Run();
+}
+
+void ARM_Dynarmic_32::Step() {
+    cb->InterpreterFallback(jit->Regs()[15], 1);
+}
+
+ARM_Dynarmic_32::ARM_Dynarmic_32(System& system, ExclusiveMonitor& exclusive_monitor,
+                                 std::size_t core_index)
+    : ARM_Interface{system},
+      cb(std::make_unique<DynarmicCallbacks32>(*this)), core_index{core_index},
+      exclusive_monitor{dynamic_cast<DynarmicExclusiveMonitor&>(exclusive_monitor)} {}
+
+ARM_Dynarmic_32::~ARM_Dynarmic_32() = default;
+
+void ARM_Dynarmic_32::SetPC(u64 pc) {
+    jit->Regs()[15] = static_cast<u32>(pc);
+}
+
+u64 ARM_Dynarmic_32::GetPC() const {
+    return jit->Regs()[15];
+}
+
+u64 ARM_Dynarmic_32::GetReg(int index) const {
+    return jit->Regs()[index];
+}
+
+void ARM_Dynarmic_32::SetReg(int index, u64 value) {
+    jit->Regs()[index] = static_cast<u32>(value);
+}
+
+u128 ARM_Dynarmic_32::GetVectorReg(int index) const {
+    return {};
+}
+
+void ARM_Dynarmic_32::SetVectorReg(int index, u128 value) {}
+
+u32 ARM_Dynarmic_32::GetPSTATE() const {
+    return jit->Cpsr();
+}
+
+void ARM_Dynarmic_32::SetPSTATE(u32 cpsr) {
+    jit->SetCpsr(cpsr);
+}
+
+u64 ARM_Dynarmic_32::GetTlsAddress() const {
+    return CP15_regs[static_cast<std::size_t>(CP15Register::CP15_THREAD_URO)];
+}
+
+void ARM_Dynarmic_32::SetTlsAddress(VAddr address) {
+    CP15_regs[static_cast<std::size_t>(CP15Register::CP15_THREAD_URO)] = static_cast<u32>(address);
+}
+
+u64 ARM_Dynarmic_32::GetTPIDR_EL0() const {
+    return cb->tpidr_el0;
+}
+
+void ARM_Dynarmic_32::SetTPIDR_EL0(u64 value) {
+    cb->tpidr_el0 = value;
+}
+
+void ARM_Dynarmic_32::SaveContext(ThreadContext32& ctx) {
+    Dynarmic::A32::Context context;
+    jit->SaveContext(context);
+    ctx.cpu_registers = context.Regs();
+    ctx.cpsr = context.Cpsr();
+}
+
+void ARM_Dynarmic_32::LoadContext(const ThreadContext32& ctx) {
+    Dynarmic::A32::Context context;
+    context.Regs() = ctx.cpu_registers;
+    context.SetCpsr(ctx.cpsr);
+    jit->LoadContext(context);
+}
+
+void ARM_Dynarmic_32::PrepareReschedule() {
+    jit->HaltExecution();
+}
+
+void ARM_Dynarmic_32::ClearInstructionCache() {
+    jit->ClearCache();
+}
+
+void ARM_Dynarmic_32::ClearExclusiveState() {}
+
+void ARM_Dynarmic_32::PageTableChanged(Common::PageTable& page_table,
+                                       std::size_t new_address_space_size_in_bits) {
+    auto key = std::make_pair(&page_table, new_address_space_size_in_bits);
+    auto iter = jit_cache.find(key);
+    if (iter != jit_cache.end()) {
+        jit = iter->second;
+        return;
+    }
+    jit = MakeJit(page_table, new_address_space_size_in_bits);
+    jit_cache.emplace(key, jit);
+}
+
+} // namespace Core
--- a/src/core/arm/dynarmic/arm_dynarmic_32.h
+++ b/src/core/arm/dynarmic/arm_dynarmic_32.h
@@ -0,0 +1,77 @@
+// Copyright 2020 yuzu emulator team
+// Licensed under GPLv2 or any later version
+// Refer to the license.txt file included.
+
+#pragma once
+
+#include <memory>
+#include <unordered_map>
+
+#include <dynarmic/A32/a32.h>
+#include <dynarmic/A64/a64.h>
+#include <dynarmic/A64/exclusive_monitor.h>
+#include "common/common_types.h"
+#include "common/hash.h"
+#include "core/arm/arm_interface.h"
+#include "core/arm/exclusive_monitor.h"
+
+namespace Memory {
+class Memory;
+}
+
+namespace Core {
+
+class DynarmicCallbacks32;
+class DynarmicExclusiveMonitor;
+class System;
+
+class ARM_Dynarmic_32 final : public ARM_Interface {
+public:
+    ARM_Dynarmic_32(System& system, ExclusiveMonitor& exclusive_monitor, std::size_t core_index);
+    ~ARM_Dynarmic_32() override;
+
+    void SetPC(u64 pc) override;
+    u64 GetPC() const override;
+    u64 GetReg(int index) const override;
+    void SetReg(int index, u64 value) override;
+    u128 GetVectorReg(int index) const override;
+    void SetVectorReg(int index, u128 value) override;
+    u32 GetPSTATE() const override;
+    void SetPSTATE(u32 pstate) override;
+    void Run() override;
+    void Step() override;
+    VAddr GetTlsAddress() const override;
+    void SetTlsAddress(VAddr address) override;
+    void SetTPIDR_EL0(u64 value) override;
+    u64 GetTPIDR_EL0() const override;
+
+    void SaveContext(ThreadContext32& ctx) override;
+    void SaveContext(ThreadContext64& ctx) override {}
+    void LoadContext(const ThreadContext32& ctx) override;
+    void LoadContext(const ThreadContext64& ctx) override {}
+
+    void PrepareReschedule() override;
+    void ClearExclusiveState() override;
+
+    void ClearInstructionCache() override;
+    void PageTableChanged(Common::PageTable& new_page_table,
+                          std::size_t new_address_space_size_in_bits) override;
+
+private:
+    std::shared_ptr<Dynarmic::A32::Jit> MakeJit(Common::PageTable& page_table,
+                                                std::size_t address_space_bits) const;
+
+    using JitCacheKey = std::pair<Common::PageTable*, std::size_t>;
+    using JitCacheType =
+        std::unordered_map<JitCacheKey, std::shared_ptr<Dynarmic::A32::Jit>, Common::PairHash>;
+
+    friend class DynarmicCallbacks32;
+    std::unique_ptr<DynarmicCallbacks32> cb;
+    JitCacheType jit_cache;
+    std::shared_ptr<Dynarmic::A32::Jit> jit;
+    std::size_t core_index;
+    DynarmicExclusiveMonitor& exclusive_monitor;
+    std::array<u32, 84> CP15_regs{};
+};
+
+} // namespace Core
--- a/src/core/arm/dynarmic/arm_dynarmic_64.cpp
+++ b/src/core/arm/dynarmic/arm_dynarmic_64.cpp
@@ -8,7 +8,7 @@
 #include <dynarmic/A64/config.h>
 #include "common/logging/log.h"
 #include "common/microprofile.h"
-#include "core/arm/dynarmic/arm_dynarmic.h"
+#include "core/arm/dynarmic/arm_dynarmic_64.h"
 #include "core/core.h"
 #include "core/core_manager.h"
 #include "core/core_timing.h"
@@ -25,9 +25,9 @@ namespace Core {

 using Vector = Dynarmic::A64::Vector;

-class ARM_Dynarmic_Callbacks : public Dynarmic::A64::UserCallbacks {
+class DynarmicCallbacks64 : public Dynarmic::A64::UserCallbacks {
 public:
-    explicit ARM_Dynarmic_Callbacks(ARM_Dynarmic& parent) : parent(parent) {}
+    explicit DynarmicCallbacks64(ARM_Dynarmic_64& parent) : parent(parent) {}

    u8 MemoryRead8(u64 vaddr) override {
        return parent.system.Memory().Read8(vaddr);
@@ -68,7 +68,7 @@ public:
        LOG_INFO(Core_ARM, "Unicorn fallback @ 0x{:X} for {} instructions (instr = {:08X})", pc,
                 num_instructions, MemoryReadCode(pc));

-        ARM_Interface::ThreadContext ctx;
+        ARM_Interface::ThreadContext64 ctx;
        parent.SaveContext(ctx);
        parent.inner_unicorn.LoadContext(ctx);
        parent.inner_unicorn.ExecuteInstructions(num_instructions);
@@ -90,7 +90,7 @@ public:
                parent.jit->HaltExecution();
                parent.SetPC(pc);
                Kernel::Thread* const thread = parent.system.CurrentScheduler().GetCurrentThread();
-                parent.SaveContext(thread->GetContext());
+                parent.SaveContext(thread->GetContext64());
                GDBStub::Break();
                GDBStub::SendTrap(thread, 5);
                return;
@@ -126,14 +126,14 @@ public:
        return Timing::CpuCyclesToClockCycles(parent.system.CoreTiming().GetTicks());
    }

-    ARM_Dynarmic& parent;
+    ARM_Dynarmic_64& parent;
    std::size_t num_interpreted_instructions = 0;
    u64 tpidrro_el0 = 0;
    u64 tpidr_el0 = 0;
 };

-std::unique_ptr<Dynarmic::A64::Jit> ARM_Dynarmic::MakeJit(Common::PageTable& page_table,
-                                                          std::size_t address_space_bits) const {
+std::shared_ptr<Dynarmic::A64::Jit> ARM_Dynarmic_64::MakeJit(Common::PageTable& page_table,
+                                                             std::size_t address_space_bits) const {
    Dynarmic::A64::UserConfig config;

    // Callbacks
@@ -159,79 +159,79 @@ std::unique_ptr<Dynarmic::A64::Jit> ARM_Dynarmic::MakeJit(Common::PageTable& pag
    // Unpredictable instructions
    config.define_unpredictable_behaviour = true;

-    return std::make_unique<Dynarmic::A64::Jit>(config);
+    return std::make_shared<Dynarmic::A64::Jit>(config);
 }

-MICROPROFILE_DEFINE(ARM_Jit_Dynarmic, "ARM JIT", "Dynarmic", MP_RGB(255, 64, 64));
+MICROPROFILE_DEFINE(ARM_Jit_Dynarmic_64, "ARM JIT", "Dynarmic", MP_RGB(255, 64, 64));

-void ARM_Dynarmic::Run() {
-    MICROPROFILE_SCOPE(ARM_Jit_Dynarmic);
+void ARM_Dynarmic_64::Run() {
+    MICROPROFILE_SCOPE(ARM_Jit_Dynarmic_64);

    jit->Run();
 }

-void ARM_Dynarmic::Step() {
+void ARM_Dynarmic_64::Step() {
    cb->InterpreterFallback(jit->GetPC(), 1);
 }

-ARM_Dynarmic::ARM_Dynarmic(System& system, ExclusiveMonitor& exclusive_monitor,
-                           std::size_t core_index)
+ARM_Dynarmic_64::ARM_Dynarmic_64(System& system, ExclusiveMonitor& exclusive_monitor,
+                                 std::size_t core_index)
    : ARM_Interface{system},
-      cb(std::make_unique<ARM_Dynarmic_Callbacks>(*this)), inner_unicorn{system},
+      cb(std::make_unique<DynarmicCallbacks64>(*this)), inner_unicorn{system},
      core_index{core_index}, exclusive_monitor{
                                  dynamic_cast<DynarmicExclusiveMonitor&>(exclusive_monitor)} {}

-ARM_Dynarmic::~ARM_Dynarmic() = default;
+ARM_Dynarmic_64::~ARM_Dynarmic_64() = default;

-void ARM_Dynarmic::SetPC(u64 pc) {
+void ARM_Dynarmic_64::SetPC(u64 pc) {
    jit->SetPC(pc);
 }

-u64 ARM_Dynarmic::GetPC() const {
+u64 ARM_Dynarmic_64::GetPC() const {
    return jit->GetPC();
 }

-u64 ARM_Dynarmic::GetReg(int index) const {
+u64 ARM_Dynarmic_64::GetReg(int index) const {
    return jit->GetRegister(index);
 }

-void ARM_Dynarmic::SetReg(int index, u64 value) {
+void ARM_Dynarmic_64::SetReg(int index, u64 value) {
    jit->SetRegister(index, value);
 }

-u128 ARM_Dynarmic::GetVectorReg(int index) const {
+u128 ARM_Dynarmic_64::GetVectorReg(int index) const {
    return jit->GetVector(index);
 }

-void ARM_Dynarmic::SetVectorReg(int index, u128 value) {
+void ARM_Dynarmic_64::SetVectorReg(int index, u128 value) {
    jit->SetVector(index, value);
 }

-u32 ARM_Dynarmic::GetPSTATE() const {
+u32 ARM_Dynarmic_64::GetPSTATE() const {
    return jit->GetPstate();
 }

-void ARM_Dynarmic::SetPSTATE(u32 pstate) {
+void ARM_Dynarmic_64::SetPSTATE(u32 pstate) {
    jit->SetPstate(pstate);
 }

-u64 ARM_Dynarmic::GetTlsAddress() const {
+u64 ARM_Dynarmic_64::GetTlsAddress() const {
    return cb->tpidrro_el0;
 }

-void ARM_Dynarmic::SetTlsAddress(VAddr address) {
+void ARM_Dynarmic_64::SetTlsAddress(VAddr address) {
    cb->tpidrro_el0 = address;
 }

-u64 ARM_Dynarmic::GetTPIDR_EL0() const {
+u64 ARM_Dynarmic_64::GetTPIDR_EL0() const {
    return cb->tpidr_el0;
 }

-void ARM_Dynarmic::SetTPIDR_EL0(u64 value) {
+void ARM_Dynarmic_64::SetTPIDR_EL0(u64 value) {
    cb->tpidr_el0 = value;
 }

-void ARM_Dynarmic::SaveContext(ThreadContext& ctx) {
+void ARM_Dynarmic_64::SaveContext(ThreadContext64& ctx) {
    ctx.cpu_registers = jit->GetRegisters();
    ctx.sp = jit->GetSP();
    ctx.pc = jit->GetPC();
@@ -242,7 +242,7 @@ void ARM_Dynarmic::SaveContext(ThreadContext& ctx) {
    ctx.tpidr = cb->tpidr_el0;
 }

-void ARM_Dynarmic::LoadContext(const ThreadContext& ctx) {
+void ARM_Dynarmic_64::LoadContext(const ThreadContext64& ctx) {
    jit->SetRegisters(ctx.cpu_registers);
    jit->SetSP(ctx.sp);
    jit->SetPC(ctx.pc);
@@ -253,25 +253,32 @@ void ARM_Dynarmic::LoadContext(const ThreadContext& ctx) {
    SetTPIDR_EL0(ctx.tpidr);
 }

-void ARM_Dynarmic::PrepareReschedule() {
+void ARM_Dynarmic_64::PrepareReschedule() {
    jit->HaltExecution();
 }

-void ARM_Dynarmic::ClearInstructionCache() {
+void ARM_Dynarmic_64::ClearInstructionCache() {
    jit->ClearCache();
 }

-void ARM_Dynarmic::ClearExclusiveState() {
+void ARM_Dynarmic_64::ClearExclusiveState() {
    jit->ClearExclusiveState();
 }

-void ARM_Dynarmic::PageTableChanged(Common::PageTable& page_table,
-                                    std::size_t new_address_space_size_in_bits) {
+void ARM_Dynarmic_64::PageTableChanged(Common::PageTable& page_table,
+                                       std::size_t new_address_space_size_in_bits) {
+    auto key = std::make_pair(&page_table, new_address_space_size_in_bits);
+    auto iter = jit_cache.find(key);
+    if (iter != jit_cache.end()) {
+        jit = iter->second;
+        return;
+    }
    jit = MakeJit(page_table, new_address_space_size_in_bits);
+    jit_cache.emplace(key, jit);
 }

-DynarmicExclusiveMonitor::DynarmicExclusiveMonitor(Memory::Memory& memory_, std::size_t core_count)
-    : monitor(core_count), memory{memory_} {}
+DynarmicExclusiveMonitor::DynarmicExclusiveMonitor(Memory::Memory& memory, std::size_t core_count)
+    : monitor(core_count), memory{memory} {}

 DynarmicExclusiveMonitor::~DynarmicExclusiveMonitor() = default;

--- a/src/core/arm/dynarmic/arm_dynarmic_64.h
+++ b/src/core/arm/dynarmic/arm_dynarmic_64.h
@@ -5,9 +5,12 @@
 #pragma once

 #include <memory>
+#include <unordered_map>
+
 #include <dynarmic/A64/a64.h>
 #include <dynarmic/A64/exclusive_monitor.h>
 #include "common/common_types.h"
+#include "common/hash.h"
 #include "core/arm/arm_interface.h"
 #include "core/arm/exclusive_monitor.h"
 #include "core/arm/unicorn/arm_unicorn.h"
@@ -18,14 +21,14 @@ class Memory;

 namespace Core {

-class ARM_Dynarmic_Callbacks;
+class DynarmicCallbacks64;
 class DynarmicExclusiveMonitor;
 class System;

-class ARM_Dynarmic final : public ARM_Interface {
+class ARM_Dynarmic_64 final : public ARM_Interface {
 public:
-    ARM_Dynarmic(System& system, ExclusiveMonitor& exclusive_monitor, std::size_t core_index);
-    ~ARM_Dynarmic() override;
+    ARM_Dynarmic_64(System& system, ExclusiveMonitor& exclusive_monitor, std::size_t core_index);
+    ~ARM_Dynarmic_64() override;

    void SetPC(u64 pc) override;
    u64 GetPC() const override;
@@ -42,8 +45,10 @@ public:
    void SetTPIDR_EL0(u64 value) override;
    u64 GetTPIDR_EL0() const override;

-    void SaveContext(ThreadContext& ctx) override;
-    void LoadContext(const ThreadContext& ctx) override;
+    void SaveContext(ThreadContext32& ctx) override {}
+    void SaveContext(ThreadContext64& ctx) override;
+    void LoadContext(const ThreadContext32& ctx) override {}
+    void LoadContext(const ThreadContext64& ctx) override;

    void PrepareReschedule() override;
    void ClearExclusiveState() override;
@@ -53,12 +58,17 @@ public:
                          std::size_t new_address_space_size_in_bits) override;

 private:
-    std::unique_ptr<Dynarmic::A64::Jit> MakeJit(Common::PageTable& page_table,
+    std::shared_ptr<Dynarmic::A64::Jit> MakeJit(Common::PageTable& page_table,
                                                std::size_t address_space_bits) const;

-    friend class ARM_Dynarmic_Callbacks;
-    std::unique_ptr<ARM_Dynarmic_Callbacks> cb;
-    std::unique_ptr<Dynarmic::A64::Jit> jit;
+    using JitCacheKey = std::pair<Common::PageTable*, std::size_t>;
+    using JitCacheType =
+        std::unordered_map<JitCacheKey, std::shared_ptr<Dynarmic::A64::Jit>, Common::PairHash>;
+
+    friend class DynarmicCallbacks64;
+    std::unique_ptr<DynarmicCallbacks64> cb;
+    JitCacheType jit_cache;
+    std::shared_ptr<Dynarmic::A64::Jit> jit;
    ARM_Unicorn inner_unicorn;

    std::size_t core_index;
@@ -67,7 +77,7 @@ private:

 class DynarmicExclusiveMonitor final : public ExclusiveMonitor {
 public:
-    explicit DynarmicExclusiveMonitor(Memory::Memory& memory_, std::size_t core_count);
+    explicit DynarmicExclusiveMonitor(Memory::Memory& memory, std::size_t core_count);
    ~DynarmicExclusiveMonitor() override;

    void SetExclusive(std::size_t core_index, VAddr addr) override;
@@ -80,7 +90,7 @@ public:
    bool ExclusiveWrite128(std::size_t core_index, VAddr vaddr, u128 value) override;

 private:
-    friend class ARM_Dynarmic;
+    friend class ARM_Dynarmic_64;
    Dynarmic::A64::ExclusiveMonitor monitor;
    Memory::Memory& memory;
 };
--- a/src/core/arm/dynarmic/arm_dynarmic_cp15.cpp
+++ b/src/core/arm/dynarmic/arm_dynarmic_cp15.cpp
@@ -0,0 +1,80 @@
+// Copyright 2017 Citra Emulator Project
+// Licensed under GPLv2 or any later version
+// Refer to the license.txt file included.
+
+#include "core/arm/dynarmic/arm_dynarmic_cp15.h"
+
+using Callback = Dynarmic::A32::Coprocessor::Callback;
+using CallbackOrAccessOneWord = Dynarmic::A32::Coprocessor::CallbackOrAccessOneWord;
+using CallbackOrAccessTwoWords = Dynarmic::A32::Coprocessor::CallbackOrAccessTwoWords;
+
+std::optional<Callback> DynarmicCP15::CompileInternalOperation(bool two, unsigned opc1,
+                                                               CoprocReg CRd, CoprocReg CRn,
+                                                               CoprocReg CRm, unsigned opc2) {
+    return {};
+}
+
+CallbackOrAccessOneWord DynarmicCP15::CompileSendOneWord(bool two, unsigned opc1, CoprocReg CRn,
+                                                         CoprocReg CRm, unsigned opc2) {
+    // TODO(merry): Privileged CP15 registers
+
+    if (!two && CRn == CoprocReg::C7 && opc1 == 0 && CRm == CoprocReg::C5 && opc2 == 4) {
+        // This is a dummy write, we ignore the value written here.
+        return &CP15[static_cast<std::size_t>(CP15Register::CP15_FLUSH_PREFETCH_BUFFER)];
+    }
+
+    if (!two && CRn == CoprocReg::C7 && opc1 == 0 && CRm == CoprocReg::C10) {
+        switch (opc2) {
+        case 4:
+            // This is a dummy write, we ignore the value written here.
+            return &CP15[static_cast<std::size_t>(CP15Register::CP15_DATA_SYNC_BARRIER)];
+        case 5:
+            // This is a dummy write, we ignore the value written here.
+            return &CP15[static_cast<std::size_t>(CP15Register::CP15_DATA_MEMORY_BARRIER)];
+        default:
+            return {};
+        }
+    }
+
+    if (!two && CRn == CoprocReg::C13 && opc1 == 0 && CRm == CoprocReg::C0 && opc2 == 2) {
+        return &CP15[static_cast<std::size_t>(CP15Register::CP15_THREAD_UPRW)];
+    }
+
+    return {};
+}
+
+CallbackOrAccessTwoWords DynarmicCP15::CompileSendTwoWords(bool two, unsigned opc, CoprocReg CRm) {
+    return {};
+}
+
+CallbackOrAccessOneWord DynarmicCP15::CompileGetOneWord(bool two, unsigned opc1, CoprocReg CRn,
+                                                        CoprocReg CRm, unsigned opc2) {
+    // TODO(merry): Privileged CP15 registers
+
+    if (!two && CRn == CoprocReg::C13 && opc1 == 0 && CRm == CoprocReg::C0) {
+        switch (opc2) {
+        case 2:
+            return &CP15[static_cast<std::size_t>(CP15Register::CP15_THREAD_UPRW)];
+        case 3:
+            return &CP15[static_cast<std::size_t>(CP15Register::CP15_THREAD_URO)];
+        default:
+            return {};
+        }
+    }
+
+    return {};
+}
+
+CallbackOrAccessTwoWords DynarmicCP15::CompileGetTwoWords(bool two, unsigned opc, CoprocReg CRm) {
+    return {};
+}
+
+std::optional<Callback> DynarmicCP15::CompileLoadWords(bool two, bool long_transfer, CoprocReg CRd,
+                                                       std::optional<u8> option) {
+    return {};
+}
+
+std::optional<Callback> DynarmicCP15::CompileStoreWords(bool two, bool long_transfer, CoprocReg CRd,
+                                                        std::optional<u8> option) {
+    return {};
+}
--- a/src/core/arm/dynarmic/arm_dynarmic_cp15.h
+++ b/src/core/arm/dynarmic/arm_dynarmic_cp15.h
@@ -0,0 +1,152 @@
+// Copyright 2017 Citra Emulator Project
+// Licensed under GPLv2 or any later version
+// Refer to the license.txt file included.
+
+#pragma once
+
+#include <memory>
+#include <optional>
+
+#include <dynarmic/A32/coprocessor.h>
+#include "common/common_types.h"
+
+enum class CP15Register {
+    // c0 - Information registers
+    CP15_MAIN_ID,
+    CP15_CACHE_TYPE,
+    CP15_TCM_STATUS,
+    CP15_TLB_TYPE,
+    CP15_CPU_ID,
+    CP15_PROCESSOR_FEATURE_0,
+    CP15_PROCESSOR_FEATURE_1,
+    CP15_DEBUG_FEATURE_0,
+    CP15_AUXILIARY_FEATURE_0,
+    CP15_MEMORY_MODEL_FEATURE_0,
+    CP15_MEMORY_MODEL_FEATURE_1,
+    CP15_MEMORY_MODEL_FEATURE_2,
+    CP15_MEMORY_MODEL_FEATURE_3,
+    CP15_ISA_FEATURE_0,
+    CP15_ISA_FEATURE_1,
+    CP15_ISA_FEATURE_2,
+    CP15_ISA_FEATURE_3,
+    CP15_ISA_FEATURE_4,
+
+    // c1 - Control registers
+    CP15_CONTROL,
+    CP15_AUXILIARY_CONTROL,
+    CP15_COPROCESSOR_ACCESS_CONTROL,
+
+    // c2 - Translation table registers
+    CP15_TRANSLATION_BASE_TABLE_0,
+    CP15_TRANSLATION_BASE_TABLE_1,
+    CP15_TRANSLATION_BASE_CONTROL,
+    CP15_DOMAIN_ACCESS_CONTROL,
+    CP15_RESERVED,
+
+    // c5 - Fault status registers
+    CP15_FAULT_STATUS,
+    CP15_INSTR_FAULT_STATUS,
+    CP15_COMBINED_DATA_FSR = CP15_FAULT_STATUS,
+    CP15_INST_FSR,
+
+    // c6 - Fault Address registers
+    CP15_FAULT_ADDRESS,
+    CP15_COMBINED_DATA_FAR = CP15_FAULT_ADDRESS,
+    CP15_WFAR,
+    CP15_IFAR,
+
+    // c7 - Cache operation registers
+    CP15_WAIT_FOR_INTERRUPT,
+    CP15_PHYS_ADDRESS,
+    CP15_INVALIDATE_INSTR_CACHE,
+    CP15_INVALIDATE_INSTR_CACHE_USING_MVA,
+    CP15_INVALIDATE_INSTR_CACHE_USING_INDEX,
+    CP15_FLUSH_PREFETCH_BUFFER,
+    CP15_FLUSH_BRANCH_TARGET_CACHE,
+    CP15_FLUSH_BRANCH_TARGET_CACHE_ENTRY,
+    CP15_INVALIDATE_DATA_CACHE,
+    CP15_INVALIDATE_DATA_CACHE_LINE_USING_MVA,
+    CP15_INVALIDATE_DATA_CACHE_LINE_USING_INDEX,
+    CP15_INVALIDATE_DATA_AND_INSTR_CACHE,
+    CP15_CLEAN_DATA_CACHE,
+    CP15_CLEAN_DATA_CACHE_LINE_USING_MVA,
+    CP15_CLEAN_DATA_CACHE_LINE_USING_INDEX,
+    CP15_DATA_SYNC_BARRIER,
+    CP15_DATA_MEMORY_BARRIER,
+    CP15_CLEAN_AND_INVALIDATE_DATA_CACHE,
+    CP15_CLEAN_AND_INVALIDATE_DATA_CACHE_LINE_USING_MVA,
+    CP15_CLEAN_AND_INVALIDATE_DATA_CACHE_LINE_USING_INDEX,
+
+    // c8 - TLB operations
+    CP15_INVALIDATE_ITLB,
+    CP15_INVALIDATE_ITLB_SINGLE_ENTRY,
+    CP15_INVALIDATE_ITLB_ENTRY_ON_ASID_MATCH,
+    CP15_INVALIDATE_ITLB_ENTRY_ON_MVA,
+    CP15_INVALIDATE_DTLB,
+    CP15_INVALIDATE_DTLB_SINGLE_ENTRY,
+    CP15_INVALIDATE_DTLB_ENTRY_ON_ASID_MATCH,
+    CP15_INVALIDATE_DTLB_ENTRY_ON_MVA,
+    CP15_INVALIDATE_UTLB,
+    CP15_INVALIDATE_UTLB_SINGLE_ENTRY,
+    CP15_INVALIDATE_UTLB_ENTRY_ON_ASID_MATCH,
+    CP15_INVALIDATE_UTLB_ENTRY_ON_MVA,
+
+    // c9 - Data cache lockdown register
+    CP15_DATA_CACHE_LOCKDOWN,
+
+    // c10 - TLB/Memory map registers
+    CP15_TLB_LOCKDOWN,
+    CP15_PRIMARY_REGION_REMAP,
+    CP15_NORMAL_REGION_REMAP,
+
+    // c13 - Thread related registers
+    CP15_PID,
+    CP15_CONTEXT_ID,
+    CP15_THREAD_UPRW, // Thread ID register - User/Privileged Read/Write
+    CP15_THREAD_URO,  // Thread ID register - User Read Only (Privileged R/W)
+    CP15_THREAD_PRW,  // Thread ID register - Privileged R/W only.
+
+    // c15 - Performance and TLB lockdown registers
+    CP15_PERFORMANCE_MONITOR_CONTROL,
+    CP15_CYCLE_COUNTER,
+    CP15_COUNT_0,
+    CP15_COUNT_1,
+    CP15_READ_MAIN_TLB_LOCKDOWN_ENTRY,
+    CP15_WRITE_MAIN_TLB_LOCKDOWN_ENTRY,
+    CP15_MAIN_TLB_LOCKDOWN_VIRT_ADDRESS,
+    CP15_MAIN_TLB_LOCKDOWN_PHYS_ADDRESS,
+    CP15_MAIN_TLB_LOCKDOWN_ATTRIBUTE,
+    CP15_TLB_DEBUG_CONTROL,
+
+    // Skyeye defined
+    CP15_TLB_FAULT_ADDR,
+    CP15_TLB_FAULT_STATUS,
+
+    // Not an actual register.
+    // All registers should be defined above this.
+    CP15_REGISTER_COUNT,
+};
+
+class DynarmicCP15 final : public Dynarmic::A32::Coprocessor {
+public:
+    using CoprocReg = Dynarmic::A32::CoprocReg;
+
+    explicit DynarmicCP15(u32* cp15) : CP15(cp15){};
+
+    std::optional<Callback> CompileInternalOperation(bool two, unsigned opc1, CoprocReg CRd,
+                                                     CoprocReg CRn, CoprocReg CRm,
+                                                     unsigned opc2) override;
+    CallbackOrAccessOneWord CompileSendOneWord(bool two, unsigned opc1, CoprocReg CRn,
+                                               CoprocReg CRm, unsigned opc2) override;
+    CallbackOrAccessTwoWords CompileSendTwoWords(bool two, unsigned opc, CoprocReg CRm) override;
+    CallbackOrAccessOneWord CompileGetOneWord(bool two, unsigned opc1, CoprocReg CRn, CoprocReg CRm,
+                                              unsigned opc2) override;
+    CallbackOrAccessTwoWords CompileGetTwoWords(bool two, unsigned opc, CoprocReg CRm) override;
+    std::optional<Callback> CompileLoadWords(bool two, bool long_transfer, CoprocReg CRd,
+                                             std::optional<u8> option) override;
+    std::optional<Callback> CompileStoreWords(bool two, bool long_transfer, CoprocReg CRd,
+                                              std::optional<u8> option) override;
+
+private:
+    u32* CP15{};
+};
--- a/src/core/arm/exclusive_monitor.cpp
+++ b/src/core/arm/exclusive_monitor.cpp
@@ -3,7 +3,7 @@
 // Refer to the license.txt file included.

 #ifdef ARCHITECTURE_x86_64
-#include "core/arm/dynarmic/arm_dynarmic.h"
+#include "core/arm/dynarmic/arm_dynarmic_64.h"
 #endif
 #include "core/arm/exclusive_monitor.h"
 #include "core/memory.h"
--- a/src/core/arm/unicorn/arm_unicorn.cpp
+++ b/src/core/arm/unicorn/arm_unicorn.cpp
@@ -53,7 +53,7 @@ static bool UnmappedMemoryHook(uc_engine* uc, uc_mem_type type, u64 addr, int si
                               void* user_data) {
    auto* const system = static_cast<System*>(user_data);

-    ARM_Interface::ThreadContext ctx{};
+    ARM_Interface::ThreadContext64 ctx{};
    system->CurrentArmInterface().SaveContext(ctx);
    ASSERT_MSG(false, "Attempted to read from unmapped memory: 0x{:X}, pc=0x{:X}, lr=0x{:X}", addr,
               ctx.pc, ctx.cpu_registers[30]);
@@ -179,7 +179,7 @@ void ARM_Unicorn::ExecuteInstructions(std::size_t num_instructions) {
        }

        Kernel::Thread* const thread = system.CurrentScheduler().GetCurrentThread();
-        SaveContext(thread->GetContext());
+        SaveContext(thread->GetContext64());
        if (last_bkpt_hit || GDBStub::IsMemoryBreak() || GDBStub::GetCpuStepFlag()) {
            last_bkpt_hit = false;
            GDBStub::Break();
@@ -188,7 +188,7 @@ void ARM_Unicorn::ExecuteInstructions(std::size_t num_instructions) {
    }
 }

-void ARM_Unicorn::SaveContext(ThreadContext& ctx) {
+void ARM_Unicorn::SaveContext(ThreadContext64& ctx) {
    int uregs[32];
    void* tregs[32];

@@ -215,7 +215,7 @@ void ARM_Unicorn::SaveContext(ThreadContext& ctx) {
    CHECKED(uc_reg_read_batch(uc, uregs, tregs, 32));
 }

-void ARM_Unicorn::LoadContext(const ThreadContext& ctx) {
+void ARM_Unicorn::LoadContext(const ThreadContext64& ctx) {
    int uregs[32];
    void* tregs[32];

--- a/src/core/arm/unicorn/arm_unicorn.h
+++ b/src/core/arm/unicorn/arm_unicorn.h
@@ -30,8 +30,6 @@ public:
    void SetTlsAddress(VAddr address) override;
    void SetTPIDR_EL0(u64 value) override;
    u64 GetTPIDR_EL0() const override;
-    void SaveContext(ThreadContext& ctx) override;
-    void LoadContext(const ThreadContext& ctx) override;
    void PrepareReschedule() override;
    void ClearExclusiveState() override;
    void ExecuteInstructions(std::size_t num_instructions);
@@ -41,6 +39,11 @@ public:
    void PageTableChanged(Common::PageTable&, std::size_t) override {}
    void RecordBreak(GDBStub::BreakpointAddress bkpt);

+    void SaveContext(ThreadContext32& ctx) override {}
+    void SaveContext(ThreadContext64& ctx) override;
+    void LoadContext(const ThreadContext32& ctx) override {}
+    void LoadContext(const ThreadContext64& ctx) override;
+
 private:
    static void InterruptHook(uc_engine* uc, u32 int_no, void* user_data);

--- a/src/core/core.cpp
+++ b/src/core/core.cpp
@@ -24,6 +24,7 @@
 #include "core/file_sys/sdmc_factory.h"
 #include "core/file_sys/vfs_concat.h"
 #include "core/file_sys/vfs_real.h"
+#include "core/frontend/scope_acquire_context.h"
 #include "core/gdbstub/gdbstub.h"
 #include "core/hardware_interrupt_manager.h"
 #include "core/hle/kernel/client_port.h"
@@ -184,6 +185,8 @@ struct System::Impl {

    ResultStatus Load(System& system, Frontend::EmuWindow& emu_window,
                      const std::string& filepath) {
+        Core::Frontend::ScopeAcquireContext acquire_context{emu_window};
+
        app_loader = Loader::GetLoader(GetGameFileFromPath(virtual_filesystem, filepath));
        if (!app_loader) {
            LOG_CRITICAL(Core, "Failed to obtain loader for {}!", filepath);
@@ -707,4 +710,12 @@ const Service::SM::ServiceManager& System::ServiceManager() const {
    return *impl->service_manager;
 }

+void System::RegisterCoreThread(std::size_t id) {
+    impl->kernel.RegisterCoreThread(id);
+}
+
+void System::RegisterHostThread() {
+    impl->kernel.RegisterHostThread();
+}
+
 } // namespace Core
--- a/src/core/core.h
+++ b/src/core/core.h
@@ -360,6 +360,12 @@ public:

    const CurrentBuildProcessID& GetCurrentProcessBuildID() const;

+    /// Register a host thread as an emulated CPU Core.
+    void RegisterCoreThread(std::size_t id);
+
+    /// Register a host thread as an auxiliary thread.
+    void RegisterHostThread();
+
 private:
    System();

--- a/src/core/core_manager.cpp
+++ b/src/core/core_manager.cpp
@@ -6,9 +6,6 @@
 #include <mutex>

 #include "common/logging/log.h"
-#ifdef ARCHITECTURE_x86_64
-#include "core/arm/dynarmic/arm_dynarmic.h"
-#endif
 #include "core/arm/exclusive_monitor.h"
 #include "core/arm/unicorn/arm_unicorn.h"
 #include "core/core.h"
--- a/src/core/frontend/emu_window.h
+++ b/src/core/frontend/emu_window.h
@@ -26,9 +26,6 @@ public:

    /// Releases (dunno if this is the "right" word) the context from the caller thread
    virtual void DoneCurrent() = 0;
-
-    /// Swap buffers to display the next frame
-    virtual void SwapBuffers() = 0;
 };

 /**
--- a/src/core/frontend/framebuffer_layout.h
+++ b/src/core/frontend/framebuffer_layout.h
@@ -29,6 +29,7 @@ enum class AspectRatio {
 struct FramebufferLayout {
    u32 width{ScreenUndocked::Width};
    u32 height{ScreenUndocked::Height};
+    bool is_srgb{};

    Common::Rectangle<u32> screen;

--- a/src/core/frontend/scope_acquire_context.cpp
+++ b/src/core/frontend/scope_acquire_context.cpp
@@ -0,0 +1,18 @@
+// Copyright 2019 yuzu Emulator Project
+// Licensed under GPLv2 or any later version
+// Refer to the license.txt file included.
+
+#include "core/frontend/emu_window.h"
+#include "core/frontend/scope_acquire_context.h"
+
+namespace Core::Frontend {
+
+ScopeAcquireContext::ScopeAcquireContext(Core::Frontend::GraphicsContext& context)
+    : context{context} {
+    context.MakeCurrent();
+}
+ScopeAcquireContext::~ScopeAcquireContext() {
+    context.DoneCurrent();
+}
+
+} // namespace Core::Frontend
--- a/src/core/frontend/scope_acquire_window_context.h
+++ b/src/core/frontend/scope_acquire_window_context.h
@@ -8,16 +8,16 @@

 namespace Core::Frontend {

-class EmuWindow;
+class GraphicsContext;

 /// Helper class to acquire/release window context within a given scope
-class ScopeAcquireWindowContext : NonCopyable {
+class ScopeAcquireContext : NonCopyable {
 public:
-    explicit ScopeAcquireWindowContext(Core::Frontend::EmuWindow& window);
-    ~ScopeAcquireWindowContext();
+    explicit ScopeAcquireContext(Core::Frontend::GraphicsContext& context);
+    ~ScopeAcquireContext();

 private:
-    Core::Frontend::EmuWindow& emu_window;
+    Core::Frontend::GraphicsContext& context;
 };

 } // namespace Core::Frontend
--- a/src/core/frontend/scope_acquire_window_context.cpp
+++ b/src/core/frontend/scope_acquire_window_context.cpp
@@ -1,18 +0,0 @@
-// Copyright 2019 yuzu Emulator Project
-// Licensed under GPLv2 or any later version
-// Refer to the license.txt file included.
-
-#include "core/frontend/emu_window.h"
-#include "core/frontend/scope_acquire_window_context.h"
-
-namespace Core::Frontend {
-
-ScopeAcquireWindowContext::ScopeAcquireWindowContext(Core::Frontend::EmuWindow& emu_window_)
-    : emu_window{emu_window_} {
-    emu_window.MakeCurrent();
-}
-ScopeAcquireWindowContext::~ScopeAcquireWindowContext() {
-    emu_window.DoneCurrent();
-}
-
-} // namespace Core::Frontend
--- a/src/core/gdbstub/gdbstub.cpp
+++ b/src/core/gdbstub/gdbstub.cpp
@@ -217,7 +217,7 @@ static u64 RegRead(std::size_t id, Kernel::Thread* thread = nullptr) {
        return 0;
    }

-    const auto& thread_context = thread->GetContext();
+    const auto& thread_context = thread->GetContext64();

    if (id < SP_REGISTER) {
        return thread_context.cpu_registers[id];
@@ -239,7 +239,7 @@ static void RegWrite(std::size_t id, u64 val, Kernel::Thread* thread = nullptr)
        return;
    }

-    auto& thread_context = thread->GetContext();
+    auto& thread_context = thread->GetContext64();

    if (id < SP_REGISTER) {
        thread_context.cpu_registers[id] = val;
@@ -259,7 +259,7 @@ static u128 FpuRead(std::size_t id, Kernel::Thread* thread = nullptr) {
        return u128{0};
    }

-    auto& thread_context = thread->GetContext();
+    auto& thread_context = thread->GetContext64();

    if (id >= UC_ARM64_REG_Q0 && id < FPCR_REGISTER) {
        return thread_context.vector_registers[id - UC_ARM64_REG_Q0];
@@ -275,7 +275,7 @@ static void FpuWrite(std::size_t id, u128 val, Kernel::Thread* thread = nullptr)
        return;
    }

-    auto& thread_context = thread->GetContext();
+    auto& thread_context = thread->GetContext64();

    if (id >= UC_ARM64_REG_Q0 && id < FPCR_REGISTER) {
        thread_context.vector_registers[id - UC_ARM64_REG_Q0] = val;
@@ -916,7 +916,7 @@ static void WriteRegister() {
    // Update ARM context, skipping scheduler - no running threads at this point
    Core::System::GetInstance()
        .ArmInterface(current_core)
-        .LoadContext(current_thread->GetContext());
+        .LoadContext(current_thread->GetContext64());

    SendReply("OK");
 }
@@ -947,7 +947,7 @@ static void WriteRegisters() {
    // Update ARM context, skipping scheduler - no running threads at this point
    Core::System::GetInstance()
        .ArmInterface(current_core)
-        .LoadContext(current_thread->GetContext());
+        .LoadContext(current_thread->GetContext64());

    SendReply("OK");
 }
@@ -1019,7 +1019,7 @@ static void Step() {
        // Update ARM context, skipping scheduler - no running threads at this point
        Core::System::GetInstance()
            .ArmInterface(current_core)
-            .LoadContext(current_thread->GetContext());
+            .LoadContext(current_thread->GetContext64());
    }
    step_loop = true;
    halt_loop = true;
--- a/src/core/hardware_properties.h
+++ b/src/core/hardware_properties.h
@@ -20,6 +20,8 @@ constexpr u32 NUM_CPU_CORES = 4;            // Number of CPU Cores

 } // namespace Hardware

+constexpr u32 INVALID_HOST_THREAD_ID = 0xFFFFFFFF;
+
 struct EmuThreadHandle {
    u32 host_handle;
    u32 guest_handle;
--- a/src/core/hle/kernel/kernel.cpp
+++ b/src/core/hle/kernel/kernel.cpp
@@ -3,9 +3,12 @@
 // Refer to the license.txt file included.

 #include <atomic>
+#include <bitset>
 #include <functional>
 #include <memory>
 #include <mutex>
+#include <thread>
+#include <unordered_map>
 #include <utility>

 #include "common/assert.h"
@@ -15,6 +18,7 @@
 #include "core/core.h"
 #include "core/core_timing.h"
 #include "core/core_timing_util.h"
+#include "core/hardware_properties.h"
 #include "core/hle/kernel/client_port.h"
 #include "core/hle/kernel/errors.h"
 #include "core/hle/kernel/handle_table.h"
@@ -25,6 +29,7 @@
 #include "core/hle/kernel/scheduler.h"
 #include "core/hle/kernel/synchronization.h"
 #include "core/hle/kernel/thread.h"
+#include "core/hle/kernel/time_manager.h"
 #include "core/hle/lock.h"
 #include "core/hle/result.h"
 #include "core/memory.h"
@@ -44,7 +49,7 @@ static void ThreadWakeupCallback(u64 thread_handle, [[maybe_unused]] s64 cycles_
    std::lock_guard lock{HLE::g_hle_lock};

    std::shared_ptr<Thread> thread =
-        system.Kernel().RetrieveThreadFromWakeupCallbackHandleTable(proper_handle);
+        system.Kernel().RetrieveThreadFromGlobalHandleTable(proper_handle);
    if (thread == nullptr) {
        LOG_CRITICAL(Kernel, "Callback fired for invalid thread {:08X}", proper_handle);
        return;
@@ -97,8 +102,8 @@ static void ThreadWakeupCallback(u64 thread_handle, [[maybe_unused]] s64 cycles_
 }

 struct KernelCore::Impl {
-    explicit Impl(Core::System& system)
-        : system{system}, global_scheduler{system}, synchronization{system} {}
+    explicit Impl(Core::System& system, KernelCore& kernel)
+        : system{system}, global_scheduler{kernel}, synchronization{system}, time_manager{system} {}

    void Initialize(KernelCore& kernel) {
        Shutdown();
@@ -120,7 +125,7 @@ struct KernelCore::Impl {

        system_resource_limit = nullptr;

-        thread_wakeup_callback_handle_table.Clear();
+        global_handle_table.Clear();
        thread_wakeup_event_type = nullptr;
        preemption_event = nullptr;

@@ -138,8 +143,8 @@ struct KernelCore::Impl {

    void InitializePhysicalCores() {
        exclusive_monitor =
-            Core::MakeExclusiveMonitor(system.Memory(), global_scheduler.CpuCoresCount());
-        for (std::size_t i = 0; i < global_scheduler.CpuCoresCount(); i++) {
+            Core::MakeExclusiveMonitor(system.Memory(), Core::Hardware::NUM_CPU_CORES);
+        for (std::size_t i = 0; i < Core::Hardware::NUM_CPU_CORES; i++) {
            cores.emplace_back(system, i, *exclusive_monitor);
        }
    }
@@ -181,9 +186,57 @@ struct KernelCore::Impl {
            return;
        }

+        for (auto& core : cores) {
+            core.SetIs64Bit(process->Is64BitProcess());
+        }
+
        system.Memory().SetCurrentPageTable(*process);
    }

+    void RegisterCoreThread(std::size_t core_id) {
+        std::unique_lock lock{register_thread_mutex};
+        const std::thread::id this_id = std::this_thread::get_id();
+        const auto it = host_thread_ids.find(this_id);
+        ASSERT(core_id < Core::Hardware::NUM_CPU_CORES);
+        ASSERT(it == host_thread_ids.end());
+        ASSERT(!registered_core_threads[core_id]);
+        host_thread_ids[this_id] = static_cast<u32>(core_id);
+        registered_core_threads.set(core_id);
+    }
+
+    void RegisterHostThread() {
+        std::unique_lock lock{register_thread_mutex};
+        const std::thread::id this_id = std::this_thread::get_id();
+        const auto it = host_thread_ids.find(this_id);
+        ASSERT(it == host_thread_ids.end());
+        host_thread_ids[this_id] = registered_thread_ids++;
+    }
+
+    u32 GetCurrentHostThreadID() const {
+        const std::thread::id this_id = std::this_thread::get_id();
+        const auto it = host_thread_ids.find(this_id);
+        if (it == host_thread_ids.end()) {
+            return Core::INVALID_HOST_THREAD_ID;
+        }
+        return it->second;
+    }
+
+    Core::EmuThreadHandle GetCurrentEmuThreadID() const {
+        Core::EmuThreadHandle result = Core::EmuThreadHandle::InvalidHandle();
+        result.host_handle = GetCurrentHostThreadID();
+        if (result.host_handle >= Core::Hardware::NUM_CPU_CORES) {
+            return result;
+        }
+        const Kernel::Scheduler& sched = cores[result.host_handle].Scheduler();
+        const Kernel::Thread* current = sched.GetCurrentThread();
+        if (current != nullptr) {
+            result.guest_handle = current->GetGlobalHandle();
+        } else {
+            result.guest_handle = InvalidHandle;
+        }
+        return result;
+    }
+
    std::atomic<u32> next_object_id{0};
    std::atomic<u64> next_kernel_process_id{Process::InitialKIPIDMin};
    std::atomic<u64> next_user_process_id{Process::ProcessIDMin};
@@ -194,15 +247,16 @@ struct KernelCore::Impl {
    Process* current_process = nullptr;
    Kernel::GlobalScheduler global_scheduler;
    Kernel::Synchronization synchronization;
+    Kernel::TimeManager time_manager;

    std::shared_ptr<ResourceLimit> system_resource_limit;

    std::shared_ptr<Core::Timing::EventType> thread_wakeup_event_type;
    std::shared_ptr<Core::Timing::EventType> preemption_event;

-    // TODO(yuriks): This can be removed if Thread objects are explicitly pooled in the future,
-    // allowing us to simply use a pool index or similar.
-    Kernel::HandleTable thread_wakeup_callback_handle_table;
+    // This is the kernel's handle table or supervisor handle table which
+    // stores all the objects in place.
+    Kernel::HandleTable global_handle_table;

    /// Map of named ports managed by the kernel, which can be retrieved using
    /// the ConnectToPort SVC.
@@ -211,11 +265,17 @@ struct KernelCore::Impl {
    std::unique_ptr<Core::ExclusiveMonitor> exclusive_monitor;
    std::vector<Kernel::PhysicalCore> cores;

+    // 0-3 IDs represent core threads, >3 represent others
+    std::unordered_map<std::thread::id, u32> host_thread_ids;
+    u32 registered_thread_ids{Core::Hardware::NUM_CPU_CORES};
+    std::bitset<Core::Hardware::NUM_CPU_CORES> registered_core_threads;
+    std::mutex register_thread_mutex;
+
    // System context
    Core::System& system;
 };

-KernelCore::KernelCore(Core::System& system) : impl{std::make_unique<Impl>(system)} {}
+KernelCore::KernelCore(Core::System& system) : impl{std::make_unique<Impl>(system, *this)} {}
 KernelCore::~KernelCore() {
    Shutdown();
 }
@@ -232,9 +292,8 @@ std::shared_ptr<ResourceLimit> KernelCore::GetSystemResourceLimit() const {
    return impl->system_resource_limit;
 }

-std::shared_ptr<Thread> KernelCore::RetrieveThreadFromWakeupCallbackHandleTable(
-    Handle handle) const {
-    return impl->thread_wakeup_callback_handle_table.Get<Thread>(handle);
+std::shared_ptr<Thread> KernelCore::RetrieveThreadFromGlobalHandleTable(Handle handle) const {
+    return impl->global_handle_table.Get<Thread>(handle);
 }

 void KernelCore::AppendNewProcess(std::shared_ptr<Process> process) {
@@ -265,6 +324,14 @@ const Kernel::GlobalScheduler& KernelCore::GlobalScheduler() const {
    return impl->global_scheduler;
 }

+Kernel::Scheduler& KernelCore::Scheduler(std::size_t id) {
+    return impl->cores[id].Scheduler();
+}
+
+const Kernel::Scheduler& KernelCore::Scheduler(std::size_t id) const {
+    return impl->cores[id].Scheduler();
+}
+
 Kernel::PhysicalCore& KernelCore::PhysicalCore(std::size_t id) {
    return impl->cores[id];
 }
@@ -281,6 +348,14 @@ const Kernel::Synchronization& KernelCore::Synchronization() const {
    return impl->synchronization;
 }

+Kernel::TimeManager& KernelCore::TimeManager() {
+    return impl->time_manager;
+}
+
+const Kernel::TimeManager& KernelCore::TimeManager() const {
+    return impl->time_manager;
+}
+
 Core::ExclusiveMonitor& KernelCore::GetExclusiveMonitor() {
    return *impl->exclusive_monitor;
 }
@@ -338,12 +413,28 @@ const std::shared_ptr<Core::Timing::EventType>& KernelCore::ThreadWakeupCallback
    return impl->thread_wakeup_event_type;
 }

-Kernel::HandleTable& KernelCore::ThreadWakeupCallbackHandleTable() {
-    return impl->thread_wakeup_callback_handle_table;
+Kernel::HandleTable& KernelCore::GlobalHandleTable() {
+    return impl->global_handle_table;
 }

-const Kernel::HandleTable& KernelCore::ThreadWakeupCallbackHandleTable() const {
-    return impl->thread_wakeup_callback_handle_table;
+const Kernel::HandleTable& KernelCore::GlobalHandleTable() const {
+    return impl->global_handle_table;
+}
+
+void KernelCore::RegisterCoreThread(std::size_t core_id) {
+    impl->RegisterCoreThread(core_id);
+}
+
+void KernelCore::RegisterHostThread() {
+    impl->RegisterHostThread();
+}
+
+u32 KernelCore::GetCurrentHostThreadID() const {
+    return impl->GetCurrentHostThreadID();
+}
+
+Core::EmuThreadHandle KernelCore::GetCurrentEmuThreadID() const {
+    return impl->GetCurrentEmuThreadID();
 }

 } // namespace Kernel
--- a/src/core/hle/kernel/kernel.h
+++ b/src/core/hle/kernel/kernel.h
@@ -11,6 +11,7 @@
 #include "core/hle/kernel/object.h"

 namespace Core {
+struct EmuThreadHandle;
 class ExclusiveMonitor;
 class System;
 } // namespace Core
@@ -29,8 +30,10 @@ class HandleTable;
 class PhysicalCore;
 class Process;
 class ResourceLimit;
+class Scheduler;
 class Synchronization;
 class Thread;
+class TimeManager;

 /// Represents a single instance of the kernel.
 class KernelCore {
@@ -64,7 +67,7 @@ public:
    std::shared_ptr<ResourceLimit> GetSystemResourceLimit() const;

    /// Retrieves a shared pointer to a Thread instance within the thread wakeup handle table.
-    std::shared_ptr<Thread> RetrieveThreadFromWakeupCallbackHandleTable(Handle handle) const;
+    std::shared_ptr<Thread> RetrieveThreadFromGlobalHandleTable(Handle handle) const;

    /// Adds the given shared pointer to an internal list of active processes.
    void AppendNewProcess(std::shared_ptr<Process> process);
@@ -87,6 +90,12 @@ public:
    /// Gets the sole instance of the global scheduler
    const Kernel::GlobalScheduler& GlobalScheduler() const;

+    /// Gets the sole instance of the Scheduler assoviated with cpu core 'id'
+    Kernel::Scheduler& Scheduler(std::size_t id);
+
+    /// Gets the sole instance of the Scheduler assoviated with cpu core 'id'
+    const Kernel::Scheduler& Scheduler(std::size_t id) const;
+
    /// Gets the an instance of the respective physical CPU core.
    Kernel::PhysicalCore& PhysicalCore(std::size_t id);

@@ -99,6 +108,12 @@ public:
    /// Gets the an instance of the Synchronization Interface.
    const Kernel::Synchronization& Synchronization() const;

+    /// Gets the an instance of the TimeManager Interface.
+    Kernel::TimeManager& TimeManager();
+
+    /// Gets the an instance of the TimeManager Interface.
+    const Kernel::TimeManager& TimeManager() const;
+
    /// Stops execution of 'id' core, in order to reschedule a new thread.
    void PrepareReschedule(std::size_t id);

@@ -120,6 +135,18 @@ public:
    /// Determines whether or not the given port is a valid named port.
    bool IsValidNamedPort(NamedPortTable::const_iterator port) const;

+    /// Gets the current host_thread/guest_thread handle.
+    Core::EmuThreadHandle GetCurrentEmuThreadID() const;
+
+    /// Gets the current host_thread handle.
+    u32 GetCurrentHostThreadID() const;
+
+    /// Register the current thread as a CPU Core Thread.
+    void RegisterCoreThread(std::size_t core_id);
+
+    /// Register the current thread as a non CPU core thread.
+    void RegisterHostThread();
+
 private:
    friend class Object;
    friend class Process;
@@ -140,11 +167,11 @@ private:
    /// Retrieves the event type used for thread wakeup callbacks.
    const std::shared_ptr<Core::Timing::EventType>& ThreadWakeupCallbackEventType() const;

-    /// Provides a reference to the thread wakeup callback handle table.
-    Kernel::HandleTable& ThreadWakeupCallbackHandleTable();
+    /// Provides a reference to the global handle table.
+    Kernel::HandleTable& GlobalHandleTable();

-    /// Provides a const reference to the thread wakeup callback handle table.
-    const Kernel::HandleTable& ThreadWakeupCallbackHandleTable() const;
+    /// Provides a const reference to the global handle table.
+    const Kernel::HandleTable& GlobalHandleTable() const;

    struct Impl;
    std::unique_ptr<Impl> impl;
--- a/src/core/hle/kernel/physical_core.cpp
+++ b/src/core/hle/kernel/physical_core.cpp
@@ -5,7 +5,8 @@
 #include "common/logging/log.h"
 #include "core/arm/arm_interface.h"
 #ifdef ARCHITECTURE_x86_64
-#include "core/arm/dynarmic/arm_dynarmic.h"
+#include "core/arm/dynarmic/arm_dynarmic_32.h"
+#include "core/arm/dynarmic/arm_dynarmic_64.h"
 #endif
 #include "core/arm/exclusive_monitor.h"
 #include "core/arm/unicorn/arm_unicorn.h"
@@ -20,13 +21,17 @@ PhysicalCore::PhysicalCore(Core::System& system, std::size_t id,
                           Core::ExclusiveMonitor& exclusive_monitor)
    : core_index{id} {
 #ifdef ARCHITECTURE_x86_64
-    arm_interface = std::make_unique<Core::ARM_Dynarmic>(system, exclusive_monitor, core_index);
+    arm_interface_32 =
+        std::make_unique<Core::ARM_Dynarmic_32>(system, exclusive_monitor, core_index);
+    arm_interface_64 =
+        std::make_unique<Core::ARM_Dynarmic_64>(system, exclusive_monitor, core_index);
+
 #else
    arm_interface = std::make_shared<Core::ARM_Unicorn>(system);
    LOG_WARNING(Core, "CPU JIT requested, but Dynarmic not available");
 #endif

-    scheduler = std::make_unique<Kernel::Scheduler>(system, *arm_interface, core_index);
+    scheduler = std::make_unique<Kernel::Scheduler>(system, core_index);
 }

 PhysicalCore::~PhysicalCore() = default;
@@ -48,4 +53,12 @@ void PhysicalCore::Shutdown() {
    scheduler->Shutdown();
 }

+void PhysicalCore::SetIs64Bit(bool is_64_bit) {
+    if (is_64_bit) {
+        arm_interface = arm_interface_64.get();
+    } else {
+        arm_interface = arm_interface_32.get();
+    }
+}
+
 } // namespace Kernel
--- a/src/core/hle/kernel/physical_core.h
+++ b/src/core/hle/kernel/physical_core.h
@@ -68,10 +68,14 @@ public:
        return *scheduler;
    }

+    void SetIs64Bit(bool is_64_bit);
+
 private:
    std::size_t core_index;
-    std::unique_ptr<Core::ARM_Interface> arm_interface;
+    std::unique_ptr<Core::ARM_Interface> arm_interface_32;
+    std::unique_ptr<Core::ARM_Interface> arm_interface_64;
    std::unique_ptr<Kernel::Scheduler> scheduler;
+    Core::ARM_Interface* arm_interface{};
 };

 } // namespace Kernel
--- a/src/core/hle/kernel/process.cpp
+++ b/src/core/hle/kernel/process.cpp
@@ -42,7 +42,8 @@ void SetupMainThread(Process& owner_process, KernelCore& kernel, u32 priority) {

    // Register 1 must be a handle to the main thread
    const Handle thread_handle = owner_process.GetHandleTable().Create(thread).Unwrap();
-    thread->GetContext().cpu_registers[1] = thread_handle;
+    thread->GetContext32().cpu_registers[1] = thread_handle;
+    thread->GetContext64().cpu_registers[1] = thread_handle;

    // Threads by default are dormant, wake up the main thread so it runs when the scheduler fires
    thread->ResumeFromWait();
--- a/src/core/hle/kernel/scheduler.cpp
+++ b/src/core/hle/kernel/scheduler.cpp
@@ -18,10 +18,11 @@
 #include "core/hle/kernel/kernel.h"
 #include "core/hle/kernel/process.h"
 #include "core/hle/kernel/scheduler.h"
+#include "core/hle/kernel/time_manager.h"

 namespace Kernel {

-GlobalScheduler::GlobalScheduler(Core::System& system) : system{system} {}
+GlobalScheduler::GlobalScheduler(KernelCore& kernel) : kernel{kernel} {}

 GlobalScheduler::~GlobalScheduler() = default;

@@ -35,7 +36,7 @@ void GlobalScheduler::RemoveThread(std::shared_ptr<Thread> thread) {
 }

 void GlobalScheduler::UnloadThread(std::size_t core) {
-    Scheduler& sched = system.Scheduler(core);
+    Scheduler& sched = kernel.Scheduler(core);
    sched.UnloadThread();
 }

@@ -50,7 +51,7 @@ void GlobalScheduler::SelectThread(std::size_t core) {
        sched.is_context_switch_pending = sched.selected_thread != sched.current_thread;
        std::atomic_thread_fence(std::memory_order_seq_cst);
    };
-    Scheduler& sched = system.Scheduler(core);
+    Scheduler& sched = kernel.Scheduler(core);
    Thread* current_thread = nullptr;
    // Step 1: Get top thread in schedule queue.
    current_thread = scheduled_queue[core].empty() ? nullptr : scheduled_queue[core].front();
@@ -356,8 +357,34 @@ void GlobalScheduler::Shutdown() {
    thread_list.clear();
 }

-Scheduler::Scheduler(Core::System& system, Core::ARM_Interface& cpu_core, std::size_t core_id)
-    : system(system), cpu_core(cpu_core), core_id(core_id) {}
+void GlobalScheduler::Lock() {
+    Core::EmuThreadHandle current_thread = kernel.GetCurrentEmuThreadID();
+    if (current_thread == current_owner) {
+        ++scope_lock;
+    } else {
+        inner_lock.lock();
+        current_owner = current_thread;
+        ASSERT(current_owner != Core::EmuThreadHandle::InvalidHandle());
+        scope_lock = 1;
+    }
+}
+
+void GlobalScheduler::Unlock() {
+    if (--scope_lock != 0) {
+        ASSERT(scope_lock > 0);
+        return;
+    }
+    for (std::size_t i = 0; i < Core::Hardware::NUM_CPU_CORES; i++) {
+        SelectThread(i);
+    }
+    current_owner = Core::EmuThreadHandle::InvalidHandle();
+    scope_lock = 1;
+    inner_lock.unlock();
+    // TODO(Blinkhawk): Setup the interrupts and change context on current core.
+}
+
+Scheduler::Scheduler(Core::System& system, std::size_t core_id)
+    : system{system}, core_id{core_id} {}

 Scheduler::~Scheduler() = default;

@@ -395,9 +422,10 @@ void Scheduler::UnloadThread() {

    // Save context for previous thread
    if (previous_thread) {
-        cpu_core.SaveContext(previous_thread->GetContext());
+        system.ArmInterface(core_id).SaveContext(previous_thread->GetContext32());
+        system.ArmInterface(core_id).SaveContext(previous_thread->GetContext64());
        // Save the TPIDR_EL0 system register in case it was modified.
-        previous_thread->SetTPIDR_EL0(cpu_core.GetTPIDR_EL0());
+        previous_thread->SetTPIDR_EL0(system.ArmInterface(core_id).GetTPIDR_EL0());

        if (previous_thread->GetStatus() == ThreadStatus::Running) {
            // This is only the case when a reschedule is triggered without the current thread
@@ -424,9 +452,10 @@ void Scheduler::SwitchContext() {

    // Save context for previous thread
    if (previous_thread) {
-        cpu_core.SaveContext(previous_thread->GetContext());
+        system.ArmInterface(core_id).SaveContext(previous_thread->GetContext32());
+        system.ArmInterface(core_id).SaveContext(previous_thread->GetContext64());
        // Save the TPIDR_EL0 system register in case it was modified.
-        previous_thread->SetTPIDR_EL0(cpu_core.GetTPIDR_EL0());
+        previous_thread->SetTPIDR_EL0(system.ArmInterface(core_id).GetTPIDR_EL0());

        if (previous_thread->GetStatus() == ThreadStatus::Running) {
            // This is only the case when a reschedule is triggered without the current thread
@@ -454,9 +483,10 @@ void Scheduler::SwitchContext() {
            system.Kernel().MakeCurrentProcess(thread_owner_process);
        }

-        cpu_core.LoadContext(new_thread->GetContext());
-        cpu_core.SetTlsAddress(new_thread->GetTLSAddress());
-        cpu_core.SetTPIDR_EL0(new_thread->GetTPIDR_EL0());
+        system.ArmInterface(core_id).LoadContext(new_thread->GetContext32());
+        system.ArmInterface(core_id).LoadContext(new_thread->GetContext64());
+        system.ArmInterface(core_id).SetTlsAddress(new_thread->GetTLSAddress());
+        system.ArmInterface(core_id).SetTPIDR_EL0(new_thread->GetTPIDR_EL0());
    } else {
        current_thread = nullptr;
        // Note: We do not reset the current process and current page table when idling because
@@ -485,4 +515,27 @@ void Scheduler::Shutdown() {
    selected_thread = nullptr;
 }

+SchedulerLock::SchedulerLock(KernelCore& kernel) : kernel{kernel} {
+    kernel.GlobalScheduler().Lock();
+}
+
+SchedulerLock::~SchedulerLock() {
+    kernel.GlobalScheduler().Unlock();
+}
+
+SchedulerLockAndSleep::SchedulerLockAndSleep(KernelCore& kernel, Handle& event_handle,
+                                             Thread* time_task, s64 nanoseconds)
+    : SchedulerLock{kernel}, event_handle{event_handle}, time_task{time_task}, nanoseconds{
+                                                                                   nanoseconds} {
+    event_handle = InvalidHandle;
+}
+
+SchedulerLockAndSleep::~SchedulerLockAndSleep() {
+    if (sleep_cancelled) {
+        return;
+    }
+    auto& time_manager = kernel.TimeManager();
+    time_manager.ScheduleTimeEvent(event_handle, time_task, nanoseconds);
+}
+
 } // namespace Kernel
--- a/src/core/hle/kernel/scheduler.h
+++ b/src/core/hle/kernel/scheduler.h
@@ -6,6 +6,7 @@

 #include <atomic>
 #include <memory>
+#include <mutex>
 #include <vector>

 #include "common/common_types.h"
@@ -20,11 +21,13 @@ class System;

 namespace Kernel {

+class KernelCore;
 class Process;
+class SchedulerLock;

 class GlobalScheduler final {
 public:
-    explicit GlobalScheduler(Core::System& system);
+    explicit GlobalScheduler(KernelCore& kernel);
    ~GlobalScheduler();

    /// Adds a new thread to the scheduler
@@ -138,6 +141,14 @@ public:
    void Shutdown();

 private:
+    friend class SchedulerLock;
+
+    /// Lock the scheduler to the current thread.
+    void Lock();
+
+    /// Unlocks the scheduler, reselects threads, interrupts cores for rescheduling
+    /// and reschedules current core if needed.
+    void Unlock();
    /**
     * Transfers a thread into an specific core. If the destination_core is -1
     * it will be unscheduled from its source code and added into its suggested
@@ -158,14 +169,19 @@ private:
    // ordered from Core 0 to Core 3.
    std::array<u32, Core::Hardware::NUM_CPU_CORES> preemption_priorities = {59, 59, 59, 62};

+    /// Scheduler lock mechanisms.
+    std::mutex inner_lock{}; // TODO(Blinkhawk): Replace for a SpinLock
+    std::atomic<s64> scope_lock{};
+    Core::EmuThreadHandle current_owner{Core::EmuThreadHandle::InvalidHandle()};
+
    /// Lists all thread ids that aren't deleted/etc.
    std::vector<std::shared_ptr<Thread>> thread_list;
-    Core::System& system;
+    KernelCore& kernel;
 };

 class Scheduler final {
 public:
-    explicit Scheduler(Core::System& system, Core::ARM_Interface& cpu_core, std::size_t core_id);
+    explicit Scheduler(Core::System& system, std::size_t core_id);
    ~Scheduler();

    /// Returns whether there are any threads that are ready to run.
@@ -219,7 +235,6 @@ private:
    std::shared_ptr<Thread> selected_thread = nullptr;

    Core::System& system;
-    Core::ARM_Interface& cpu_core;
    u64 last_context_switch_time = 0;
    u64 idle_selection_count = 0;
    const std::size_t core_id;
@@ -227,4 +242,30 @@ private:
    bool is_context_switch_pending = false;
 };

+class SchedulerLock {
+public:
+    explicit SchedulerLock(KernelCore& kernel);
+    ~SchedulerLock();
+
+protected:
+    KernelCore& kernel;
+};
+
+class SchedulerLockAndSleep : public SchedulerLock {
+public:
+    explicit SchedulerLockAndSleep(KernelCore& kernel, Handle& event_handle, Thread* time_task,
+                                   s64 nanoseconds);
+    ~SchedulerLockAndSleep();
+
+    void CancelSleep() {
+        sleep_cancelled = true;
+    }
+
+private:
+    Handle& event_handle;
+    Thread* time_task;
+    s64 nanoseconds;
+    bool sleep_cancelled{};
+};
+
 } // namespace Kernel
--- a/src/core/hle/kernel/svc.cpp
+++ b/src/core/hle/kernel/svc.cpp
@@ -187,6 +187,13 @@ static ResultCode SetHeapSize(Core::System& system, VAddr* heap_addr, u64 heap_s
    return RESULT_SUCCESS;
 }

+static ResultCode SetHeapSize32(Core::System& system, u32* heap_addr, u32 heap_size) {
+    VAddr temp_heap_addr{};
+    const ResultCode result{SetHeapSize(system, &temp_heap_addr, heap_size)};
+    *heap_addr = static_cast<u32>(temp_heap_addr);
+    return result;
+}
+
 static ResultCode SetMemoryPermission(Core::System& system, VAddr addr, u64 size, u32 prot) {
    LOG_TRACE(Kernel_SVC, "called, addr=0x{:X}, size=0x{:X}, prot=0x{:X}", addr, size, prot);

@@ -371,6 +378,12 @@ static ResultCode ConnectToNamedPort(Core::System& system, Handle* out_handle,
    return RESULT_SUCCESS;
 }

+static ResultCode ConnectToNamedPort32(Core::System& system, Handle* out_handle,
+                                       u32 port_name_address) {
+
+    return ConnectToNamedPort(system, out_handle, port_name_address);
+}
+
 /// Makes a blocking IPC call to an OS service.
 static ResultCode SendSyncRequest(Core::System& system, Handle handle) {
    const auto& handle_table = system.Kernel().CurrentProcess()->GetHandleTable();
@@ -390,6 +403,10 @@ static ResultCode SendSyncRequest(Core::System& system, Handle handle) {
    return session->SendSyncRequest(SharedFrom(thread), system.Memory());
 }

+static ResultCode SendSyncRequest32(Core::System& system, Handle handle) {
+    return SendSyncRequest(system, handle);
+}
+
 /// Get the ID for the specified thread.
 static ResultCode GetThreadId(Core::System& system, u64* thread_id, Handle thread_handle) {
    LOG_TRACE(Kernel_SVC, "called thread=0x{:08X}", thread_handle);
@@ -405,6 +422,17 @@ static ResultCode GetThreadId(Core::System& system, u64* thread_id, Handle threa
    return RESULT_SUCCESS;
 }

+static ResultCode GetThreadId32(Core::System& system, u32* thread_id_low, u32* thread_id_high,
+                                Handle thread_handle) {
+    u64 thread_id{};
+    const ResultCode result{GetThreadId(system, &thread_id, thread_handle)};
+
+    *thread_id_low = static_cast<u32>(thread_id >> 32);
+    *thread_id_high = static_cast<u32>(thread_id & std::numeric_limits<u32>::max());
+
+    return result;
+}
+
 /// Gets the ID of the specified process or a specified thread's owning process.
 static ResultCode GetProcessId(Core::System& system, u64* process_id, Handle handle) {
    LOG_DEBUG(Kernel_SVC, "called handle=0x{:08X}", handle);
@@ -479,6 +507,12 @@ static ResultCode WaitSynchronization(Core::System& system, Handle* index, VAddr
    return result;
 }

+static ResultCode WaitSynchronization32(Core::System& system, u32 timeout_low, u32 handles_address,
+                                        s32 handle_count, u32 timeout_high, Handle* index) {
+    const s64 nano_seconds{(static_cast<s64>(timeout_high) << 32) | static_cast<s64>(timeout_low)};
+    return WaitSynchronization(system, index, handles_address, handle_count, nano_seconds);
+}
+
 /// Resumes a thread waiting on WaitSynchronization
 static ResultCode CancelSynchronization(Core::System& system, Handle thread_handle) {
    LOG_TRACE(Kernel_SVC, "called thread=0x{:X}", thread_handle);
@@ -917,6 +951,18 @@ static ResultCode GetInfo(Core::System& system, u64* result, u64 info_id, u64 ha
    }
 }

+static ResultCode GetInfo32(Core::System& system, u32* result_low, u32* result_high, u32 sub_id_low,
+                            u32 info_id, u32 handle, u32 sub_id_high) {
+    const u64 sub_id{static_cast<u64>(sub_id_low | (static_cast<u64>(sub_id_high) << 32))};
+    u64 res_value{};
+
+    const ResultCode result{GetInfo(system, &res_value, info_id, handle, sub_id)};
+    *result_high = static_cast<u32>(res_value >> 32);
+    *result_low = static_cast<u32>(res_value & std::numeric_limits<u32>::max());
+
+    return result;
+}
+
 /// Maps memory at a desired address
 static ResultCode MapPhysicalMemory(Core::System& system, VAddr addr, u64 size) {
    LOG_DEBUG(Kernel_SVC, "called, addr=0x{:016X}, size=0x{:X}", addr, size);
@@ -1058,7 +1104,7 @@ static ResultCode GetThreadContext(Core::System& system, VAddr thread_context, H
        return ERR_BUSY;
    }

-    Core::ARM_Interface::ThreadContext ctx = thread->GetContext();
+    Core::ARM_Interface::ThreadContext64 ctx = thread->GetContext64();
    // Mask away mode bits, interrupt bits, IL bit, and other reserved bits.
    ctx.pstate &= 0xFF0FFE20;

@@ -1088,6 +1134,10 @@ static ResultCode GetThreadPriority(Core::System& system, u32* priority, Handle
    return RESULT_SUCCESS;
 }

+static ResultCode GetThreadPriority32(Core::System& system, u32* priority, Handle handle) {
+    return GetThreadPriority(system, priority, handle);
+}
+
 /// Sets the priority for the specified thread
 static ResultCode SetThreadPriority(Core::System& system, Handle handle, u32 priority) {
    LOG_TRACE(Kernel_SVC, "called");
@@ -1259,6 +1309,11 @@ static ResultCode QueryMemory(Core::System& system, VAddr memory_info_address,
                              query_address);
 }

+static ResultCode QueryMemory32(Core::System& system, u32 memory_info_address,
+                                u32 page_info_address, u32 query_address) {
+    return QueryMemory(system, memory_info_address, page_info_address, query_address);
+}
+
 static ResultCode MapProcessCodeMemory(Core::System& system, Handle process_handle, u64 dst_address,
                                       u64 src_address, u64 size) {
    LOG_DEBUG(Kernel_SVC,
@@ -1675,6 +1730,10 @@ static void SignalProcessWideKey(Core::System& system, VAddr condition_variable_
    }
 }

+static void SignalProcessWideKey32(Core::System& system, u32 condition_variable_addr, s32 target) {
+    SignalProcessWideKey(system, condition_variable_addr, target);
+}
+
 // Wait for an address (via Address Arbiter)
 static ResultCode WaitForAddress(Core::System& system, VAddr address, u32 type, s32 value,
                                 s64 timeout) {
@@ -1760,6 +1819,10 @@ static ResultCode CloseHandle(Core::System& system, Handle handle) {
    return handle_table.Close(handle);
 }

+static ResultCode CloseHandle32(Core::System& system, Handle handle) {
+    return CloseHandle(system, handle);
+}
+
 /// Clears the signaled state of an event or process.
 static ResultCode ResetSignal(Core::System& system, Handle handle) {
    LOG_DEBUG(Kernel_SVC, "called handle 0x{:08X}", handle);
@@ -2317,69 +2380,196 @@ struct FunctionDef {
 };
 } // namespace

-static const FunctionDef SVC_Table[] = {
+static const FunctionDef SVC_Table_32[] = {
    {0x00, nullptr, "Unknown"},
-    {0x01, SvcWrap<SetHeapSize>, "SetHeapSize"},
-    {0x02, SvcWrap<SetMemoryPermission>, "SetMemoryPermission"},
-    {0x03, SvcWrap<SetMemoryAttribute>, "SetMemoryAttribute"},
-    {0x04, SvcWrap<MapMemory>, "MapMemory"},
-    {0x05, SvcWrap<UnmapMemory>, "UnmapMemory"},
-    {0x06, SvcWrap<QueryMemory>, "QueryMemory"},
-    {0x07, SvcWrap<ExitProcess>, "ExitProcess"},
-    {0x08, SvcWrap<CreateThread>, "CreateThread"},
-    {0x09, SvcWrap<StartThread>, "StartThread"},
-    {0x0A, SvcWrap<ExitThread>, "ExitThread"},
-    {0x0B, SvcWrap<SleepThread>, "SleepThread"},
-    {0x0C, SvcWrap<GetThreadPriority>, "GetThreadPriority"},
-    {0x0D, SvcWrap<SetThreadPriority>, "SetThreadPriority"},
-    {0x0E, SvcWrap<GetThreadCoreMask>, "GetThreadCoreMask"},
-    {0x0F, SvcWrap<SetThreadCoreMask>, "SetThreadCoreMask"},
-    {0x10, SvcWrap<GetCurrentProcessorNumber>, "GetCurrentProcessorNumber"},
-    {0x11, SvcWrap<SignalEvent>, "SignalEvent"},
-    {0x12, SvcWrap<ClearEvent>, "ClearEvent"},
-    {0x13, SvcWrap<MapSharedMemory>, "MapSharedMemory"},
-    {0x14, SvcWrap<UnmapSharedMemory>, "UnmapSharedMemory"},
-    {0x15, SvcWrap<CreateTransferMemory>, "CreateTransferMemory"},
-    {0x16, SvcWrap<CloseHandle>, "CloseHandle"},
-    {0x17, SvcWrap<ResetSignal>, "ResetSignal"},
-    {0x18, SvcWrap<WaitSynchronization>, "WaitSynchronization"},
-    {0x19, SvcWrap<CancelSynchronization>, "CancelSynchronization"},
-    {0x1A, SvcWrap<ArbitrateLock>, "ArbitrateLock"},
-    {0x1B, SvcWrap<ArbitrateUnlock>, "ArbitrateUnlock"},
-    {0x1C, SvcWrap<WaitProcessWideKeyAtomic>, "WaitProcessWideKeyAtomic"},
-    {0x1D, SvcWrap<SignalProcessWideKey>, "SignalProcessWideKey"},
-    {0x1E, SvcWrap<GetSystemTick>, "GetSystemTick"},
-    {0x1F, SvcWrap<ConnectToNamedPort>, "ConnectToNamedPort"},
+    {0x01, SvcWrap32<SetHeapSize32>, "SetHeapSize32"},
+    {0x02, nullptr, "Unknown"},
+    {0x03, nullptr, "SetMemoryAttribute32"},
+    {0x04, nullptr, "MapMemory32"},
+    {0x05, nullptr, "UnmapMemory32"},
+    {0x06, SvcWrap32<QueryMemory32>, "QueryMemory32"},
+    {0x07, nullptr, "ExitProcess32"},
+    {0x08, nullptr, "CreateThread32"},
+    {0x09, nullptr, "StartThread32"},
+    {0x0a, nullptr, "ExitThread32"},
+    {0x0b, nullptr, "SleepThread32"},
+    {0x0c, SvcWrap32<GetThreadPriority32>, "GetThreadPriority32"},
+    {0x0d, nullptr, "SetThreadPriority32"},
+    {0x0e, nullptr, "GetThreadCoreMask32"},
+    {0x0f, nullptr, "SetThreadCoreMask32"},
+    {0x10, nullptr, "GetCurrentProcessorNumber32"},
+    {0x11, nullptr, "SignalEvent32"},
+    {0x12, nullptr, "ClearEvent32"},
+    {0x13, nullptr, "MapSharedMemory32"},
+    {0x14, nullptr, "UnmapSharedMemory32"},
+    {0x15, nullptr, "CreateTransferMemory32"},
+    {0x16, SvcWrap32<CloseHandle32>, "CloseHandle32"},
+    {0x17, nullptr, "ResetSignal32"},
+    {0x18, SvcWrap32<WaitSynchronization32>, "WaitSynchronization32"},
+    {0x19, nullptr, "CancelSynchronization32"},
+    {0x1a, nullptr, "ArbitrateLock32"},
+    {0x1b, nullptr, "ArbitrateUnlock32"},
+    {0x1c, nullptr, "WaitProcessWideKeyAtomic32"},
+    {0x1d, SvcWrap32<SignalProcessWideKey32>, "SignalProcessWideKey32"},
+    {0x1e, nullptr, "GetSystemTick32"},
+    {0x1f, SvcWrap32<ConnectToNamedPort32>, "ConnectToNamedPort32"},
+    {0x20, nullptr, "Unknown"},
+    {0x21, SvcWrap32<SendSyncRequest32>, "SendSyncRequest32"},
+    {0x22, nullptr, "SendSyncRequestWithUserBuffer32"},
+    {0x23, nullptr, "Unknown"},
+    {0x24, nullptr, "GetProcessId32"},
+    {0x25, SvcWrap32<GetThreadId32>, "GetThreadId32"},
+    {0x26, nullptr, "Break32"},
+    {0x27, nullptr, "OutputDebugString32"},
+    {0x28, nullptr, "Unknown"},
+    {0x29, SvcWrap32<GetInfo32>, "GetInfo32"},
+    {0x2a, nullptr, "Unknown"},
+    {0x2b, nullptr, "Unknown"},
+    {0x2c, nullptr, "MapPhysicalMemory32"},
+    {0x2d, nullptr, "UnmapPhysicalMemory32"},
+    {0x2e, nullptr, "Unknown"},
+    {0x2f, nullptr, "Unknown"},
+    {0x30, nullptr, "Unknown"},
+    {0x31, nullptr, "Unknown"},
+    {0x32, nullptr, "SetThreadActivity32"},
+    {0x33, nullptr, "GetThreadContext32"},
+    {0x34, nullptr, "WaitForAddress32"},
+    {0x35, nullptr, "SignalToAddress32"},
+    {0x36, nullptr, "Unknown"},
+    {0x37, nullptr, "Unknown"},
+    {0x38, nullptr, "Unknown"},
+    {0x39, nullptr, "Unknown"},
+    {0x3a, nullptr, "Unknown"},
+    {0x3b, nullptr, "Unknown"},
+    {0x3c, nullptr, "Unknown"},
+    {0x3d, nullptr, "Unknown"},
+    {0x3e, nullptr, "Unknown"},
+    {0x3f, nullptr, "Unknown"},
+    {0x40, nullptr, "CreateSession32"},
+    {0x41, nullptr, "AcceptSession32"},
+    {0x42, nullptr, "Unknown"},
+    {0x43, nullptr, "ReplyAndReceive32"},
+    {0x44, nullptr, "Unknown"},
+    {0x45, nullptr, "CreateEvent32"},
+    {0x46, nullptr, "Unknown"},
+    {0x47, nullptr, "Unknown"},
+    {0x48, nullptr, "Unknown"},
+    {0x49, nullptr, "Unknown"},
+    {0x4a, nullptr, "Unknown"},
+    {0x4b, nullptr, "Unknown"},
+    {0x4c, nullptr, "Unknown"},
+    {0x4d, nullptr, "Unknown"},
+    {0x4e, nullptr, "Unknown"},
+    {0x4f, nullptr, "Unknown"},
+    {0x50, nullptr, "Unknown"},
+    {0x51, nullptr, "Unknown"},
+    {0x52, nullptr, "Unknown"},
+    {0x53, nullptr, "Unknown"},
+    {0x54, nullptr, "Unknown"},
+    {0x55, nullptr, "Unknown"},
+    {0x56, nullptr, "Unknown"},
+    {0x57, nullptr, "Unknown"},
+    {0x58, nullptr, "Unknown"},
+    {0x59, nullptr, "Unknown"},
+    {0x5a, nullptr, "Unknown"},
+    {0x5b, nullptr, "Unknown"},
+    {0x5c, nullptr, "Unknown"},
+    {0x5d, nullptr, "Unknown"},
+    {0x5e, nullptr, "Unknown"},
+    {0x5F, nullptr, "FlushProcessDataCache32"},
+    {0x60, nullptr, "Unknown"},
+    {0x61, nullptr, "Unknown"},
+    {0x62, nullptr, "Unknown"},
+    {0x63, nullptr, "Unknown"},
+    {0x64, nullptr, "Unknown"},
+    {0x65, nullptr, "GetProcessList32"},
+    {0x66, nullptr, "Unknown"},
+    {0x67, nullptr, "Unknown"},
+    {0x68, nullptr, "Unknown"},
+    {0x69, nullptr, "Unknown"},
+    {0x6A, nullptr, "Unknown"},
+    {0x6B, nullptr, "Unknown"},
+    {0x6C, nullptr, "Unknown"},
+    {0x6D, nullptr, "Unknown"},
+    {0x6E, nullptr, "Unknown"},
+    {0x6f, nullptr, "GetSystemInfo32"},
+    {0x70, nullptr, "CreatePort32"},
+    {0x71, nullptr, "ManageNamedPort32"},
+    {0x72, nullptr, "ConnectToPort32"},
+    {0x73, nullptr, "SetProcessMemoryPermission32"},
+    {0x74, nullptr, "Unknown"},
+    {0x75, nullptr, "Unknown"},
+    {0x76, nullptr, "Unknown"},
+    {0x77, nullptr, "MapProcessCodeMemory32"},
+    {0x78, nullptr, "UnmapProcessCodeMemory32"},
+    {0x79, nullptr, "Unknown"},
+    {0x7A, nullptr, "Unknown"},
+    {0x7B, nullptr, "TerminateProcess32"},
+};
+
+static const FunctionDef SVC_Table_64[] = {
+    {0x00, nullptr, "Unknown"},
+    {0x01, SvcWrap64<SetHeapSize>, "SetHeapSize"},
+    {0x02, SvcWrap64<SetMemoryPermission>, "SetMemoryPermission"},
+    {0x03, SvcWrap64<SetMemoryAttribute>, "SetMemoryAttribute"},
+    {0x04, SvcWrap64<MapMemory>, "MapMemory"},
+    {0x05, SvcWrap64<UnmapMemory>, "UnmapMemory"},
+    {0x06, SvcWrap64<QueryMemory>, "QueryMemory"},
+    {0x07, SvcWrap64<ExitProcess>, "ExitProcess"},
+    {0x08, SvcWrap64<CreateThread>, "CreateThread"},
+    {0x09, SvcWrap64<StartThread>, "StartThread"},
+    {0x0A, SvcWrap64<ExitThread>, "ExitThread"},
+    {0x0B, SvcWrap64<SleepThread>, "SleepThread"},
+    {0x0C, SvcWrap64<GetThreadPriority>, "GetThreadPriority"},
+    {0x0D, SvcWrap64<SetThreadPriority>, "SetThreadPriority"},
+    {0x0E, SvcWrap64<GetThreadCoreMask>, "GetThreadCoreMask"},
+    {0x0F, SvcWrap64<SetThreadCoreMask>, "SetThreadCoreMask"},
+    {0x10, SvcWrap64<GetCurrentProcessorNumber>, "GetCurrentProcessorNumber"},
+    {0x11, SvcWrap64<SignalEvent>, "SignalEvent"},
+    {0x12, SvcWrap64<ClearEvent>, "ClearEvent"},
+    {0x13, SvcWrap64<MapSharedMemory>, "MapSharedMemory"},
+    {0x14, SvcWrap64<UnmapSharedMemory>, "UnmapSharedMemory"},
+    {0x15, SvcWrap64<CreateTransferMemory>, "CreateTransferMemory"},
+    {0x16, SvcWrap64<CloseHandle>, "CloseHandle"},
+    {0x17, SvcWrap64<ResetSignal>, "ResetSignal"},
+    {0x18, SvcWrap64<WaitSynchronization>, "WaitSynchronization"},
+    {0x19, SvcWrap64<CancelSynchronization>, "CancelSynchronization"},
+    {0x1A, SvcWrap64<ArbitrateLock>, "ArbitrateLock"},
+    {0x1B, SvcWrap64<ArbitrateUnlock>, "ArbitrateUnlock"},
+    {0x1C, SvcWrap64<WaitProcessWideKeyAtomic>, "WaitProcessWideKeyAtomic"},
+    {0x1D, SvcWrap64<SignalProcessWideKey>, "SignalProcessWideKey"},
+    {0x1E, SvcWrap64<GetSystemTick>, "GetSystemTick"},
+    {0x1F, SvcWrap64<ConnectToNamedPort>, "ConnectToNamedPort"},
    {0x20, nullptr, "SendSyncRequestLight"},
-    {0x21, SvcWrap<SendSyncRequest>, "SendSyncRequest"},
+    {0x21, SvcWrap64<SendSyncRequest>, "SendSyncRequest"},
    {0x22, nullptr, "SendSyncRequestWithUserBuffer"},
    {0x23, nullptr, "SendAsyncRequestWithUserBuffer"},
-    {0x24, SvcWrap<GetProcessId>, "GetProcessId"},
-    {0x25, SvcWrap<GetThreadId>, "GetThreadId"},
-    {0x26, SvcWrap<Break>, "Break"},
-    {0x27, SvcWrap<OutputDebugString>, "OutputDebugString"},
+    {0x24, SvcWrap64<GetProcessId>, "GetProcessId"},
+    {0x25, SvcWrap64<GetThreadId>, "GetThreadId"},
+    {0x26, SvcWrap64<Break>, "Break"},
+    {0x27, SvcWrap64<OutputDebugString>, "OutputDebugString"},
    {0x28, nullptr, "ReturnFromException"},
-    {0x29, SvcWrap<GetInfo>, "GetInfo"},
+    {0x29, SvcWrap64<GetInfo>, "GetInfo"},
    {0x2A, nullptr, "FlushEntireDataCache"},
    {0x2B, nullptr, "FlushDataCache"},
-    {0x2C, SvcWrap<MapPhysicalMemory>, "MapPhysicalMemory"},
-    {0x2D, SvcWrap<UnmapPhysicalMemory>, "UnmapPhysicalMemory"},
+    {0x2C, SvcWrap64<MapPhysicalMemory>, "MapPhysicalMemory"},
+    {0x2D, SvcWrap64<UnmapPhysicalMemory>, "UnmapPhysicalMemory"},
    {0x2E, nullptr, "GetFutureThreadInfo"},
    {0x2F, nullptr, "GetLastThreadInfo"},
-    {0x30, SvcWrap<GetResourceLimitLimitValue>, "GetResourceLimitLimitValue"},
-    {0x31, SvcWrap<GetResourceLimitCurrentValue>, "GetResourceLimitCurrentValue"},
-    {0x32, SvcWrap<SetThreadActivity>, "SetThreadActivity"},
-    {0x33, SvcWrap<GetThreadContext>, "GetThreadContext"},
-    {0x34, SvcWrap<WaitForAddress>, "WaitForAddress"},
-    {0x35, SvcWrap<SignalToAddress>, "SignalToAddress"},
+    {0x30, SvcWrap64<GetResourceLimitLimitValue>, "GetResourceLimitLimitValue"},
+    {0x31, SvcWrap64<GetResourceLimitCurrentValue>, "GetResourceLimitCurrentValue"},
+    {0x32, SvcWrap64<SetThreadActivity>, "SetThreadActivity"},
+    {0x33, SvcWrap64<GetThreadContext>, "GetThreadContext"},
+    {0x34, SvcWrap64<WaitForAddress>, "WaitForAddress"},
+    {0x35, SvcWrap64<SignalToAddress>, "SignalToAddress"},
    {0x36, nullptr, "SynchronizePreemptionState"},
    {0x37, nullptr, "Unknown"},
    {0x38, nullptr, "Unknown"},
    {0x39, nullptr, "Unknown"},
    {0x3A, nullptr, "Unknown"},
    {0x3B, nullptr, "Unknown"},
-    {0x3C, SvcWrap<KernelDebug>, "KernelDebug"},
-    {0x3D, SvcWrap<ChangeKernelTraceState>, "ChangeKernelTraceState"},
+    {0x3C, SvcWrap64<KernelDebug>, "KernelDebug"},
+    {0x3D, SvcWrap64<ChangeKernelTraceState>, "ChangeKernelTraceState"},
    {0x3E, nullptr, "Unknown"},
    {0x3F, nullptr, "Unknown"},
    {0x40, nullptr, "CreateSession"},
@@ -2387,7 +2577,7 @@ static const FunctionDef SVC_Table[] = {
    {0x42, nullptr, "ReplyAndReceiveLight"},
    {0x43, nullptr, "ReplyAndReceive"},
    {0x44, nullptr, "ReplyAndReceiveWithUserBuffer"},
-    {0x45, SvcWrap<CreateEvent>, "CreateEvent"},
+    {0x45, SvcWrap64<CreateEvent>, "CreateEvent"},
    {0x46, nullptr, "Unknown"},
    {0x47, nullptr, "Unknown"},
    {0x48, nullptr, "MapPhysicalMemoryUnsafe"},
@@ -2398,9 +2588,9 @@ static const FunctionDef SVC_Table[] = {
    {0x4D, nullptr, "SleepSystem"},
    {0x4E, nullptr, "ReadWriteRegister"},
    {0x4F, nullptr, "SetProcessActivity"},
-    {0x50, SvcWrap<CreateSharedMemory>, "CreateSharedMemory"},
-    {0x51, SvcWrap<MapTransferMemory>, "MapTransferMemory"},
-    {0x52, SvcWrap<UnmapTransferMemory>, "UnmapTransferMemory"},
+    {0x50, SvcWrap64<CreateSharedMemory>, "CreateSharedMemory"},
+    {0x51, SvcWrap64<MapTransferMemory>, "MapTransferMemory"},
+    {0x52, SvcWrap64<UnmapTransferMemory>, "UnmapTransferMemory"},
    {0x53, nullptr, "CreateInterruptEvent"},
    {0x54, nullptr, "QueryPhysicalAddress"},
    {0x55, nullptr, "QueryIoMapping"},
@@ -2419,8 +2609,8 @@ static const FunctionDef SVC_Table[] = {
    {0x62, nullptr, "TerminateDebugProcess"},
    {0x63, nullptr, "GetDebugEvent"},
    {0x64, nullptr, "ContinueDebugEvent"},
-    {0x65, SvcWrap<GetProcessList>, "GetProcessList"},
-    {0x66, SvcWrap<GetThreadList>, "GetThreadList"},
+    {0x65, SvcWrap64<GetProcessList>, "GetProcessList"},
+    {0x66, SvcWrap64<GetThreadList>, "GetThreadList"},
    {0x67, nullptr, "GetDebugThreadContext"},
    {0x68, nullptr, "SetDebugThreadContext"},
    {0x69, nullptr, "QueryDebugProcessMemory"},
@@ -2436,24 +2626,32 @@ static const FunctionDef SVC_Table[] = {
    {0x73, nullptr, "SetProcessMemoryPermission"},
    {0x74, nullptr, "MapProcessMemory"},
    {0x75, nullptr, "UnmapProcessMemory"},
-    {0x76, SvcWrap<QueryProcessMemory>, "QueryProcessMemory"},
-    {0x77, SvcWrap<MapProcessCodeMemory>, "MapProcessCodeMemory"},
-    {0x78, SvcWrap<UnmapProcessCodeMemory>, "UnmapProcessCodeMemory"},
+    {0x76, SvcWrap64<QueryProcessMemory>, "QueryProcessMemory"},
+    {0x77, SvcWrap64<MapProcessCodeMemory>, "MapProcessCodeMemory"},
+    {0x78, SvcWrap64<UnmapProcessCodeMemory>, "UnmapProcessCodeMemory"},
    {0x79, nullptr, "CreateProcess"},
    {0x7A, nullptr, "StartProcess"},
    {0x7B, nullptr, "TerminateProcess"},
-    {0x7C, SvcWrap<GetProcessInfo>, "GetProcessInfo"},
-    {0x7D, SvcWrap<CreateResourceLimit>, "CreateResourceLimit"},
-    {0x7E, SvcWrap<SetResourceLimitLimitValue>, "SetResourceLimitLimitValue"},
+    {0x7C, SvcWrap64<GetProcessInfo>, "GetProcessInfo"},
+    {0x7D, SvcWrap64<CreateResourceLimit>, "CreateResourceLimit"},
+    {0x7E, SvcWrap64<SetResourceLimitLimitValue>, "SetResourceLimitLimitValue"},
    {0x7F, nullptr, "CallSecureMonitor"},
 };

-static const FunctionDef* GetSVCInfo(u32 func_num) {
-    if (func_num >= std::size(SVC_Table)) {
+static const FunctionDef* GetSVCInfo32(u32 func_num) {
+    if (func_num >= std::size(SVC_Table_32)) {
        LOG_ERROR(Kernel_SVC, "Unknown svc=0x{:02X}", func_num);
        return nullptr;
    }
-    return &SVC_Table[func_num];
+    return &SVC_Table_32[func_num];
+}
+
+static const FunctionDef* GetSVCInfo64(u32 func_num) {
+    if (func_num >= std::size(SVC_Table_64)) {
+        LOG_ERROR(Kernel_SVC, "Unknown svc=0x{:02X}", func_num);
+        return nullptr;
+    }
+    return &SVC_Table_64[func_num];
 }

 MICROPROFILE_DEFINE(Kernel_SVC, "Kernel", "SVC", MP_RGB(70, 200, 70));
@@ -2464,7 +2662,8 @@ void CallSVC(Core::System& system, u32 immediate) {
    // Lock the global kernel mutex when we enter the kernel HLE.
    std::lock_guard lock{HLE::g_hle_lock};

-    const FunctionDef* info = GetSVCInfo(immediate);
+    const FunctionDef* info = system.CurrentProcess()->Is64BitProcess() ? GetSVCInfo64(immediate)
+                                                                        : GetSVCInfo32(immediate);
    if (info) {
        if (info->func) {
            info->func(system);
--- a/src/core/hle/kernel/svc_wrap.h
+++ b/src/core/hle/kernel/svc_wrap.h
@@ -15,6 +15,10 @@ static inline u64 Param(const Core::System& system, int n) {
    return system.CurrentArmInterface().GetReg(n);
 }

+static inline u32 Param32(const Core::System& system, int n) {
+    return static_cast<u32>(system.CurrentArmInterface().GetReg(n));
+}
+
 /**
 * HLE a function return from the current ARM userland process
 * @param system System context
@@ -24,40 +28,44 @@ static inline void FuncReturn(Core::System& system, u64 result) {
    system.CurrentArmInterface().SetReg(0, result);
 }

+static inline void FuncReturn32(Core::System& system, u32 result) {
+    system.CurrentArmInterface().SetReg(0, (u64)result);
+}
+
 ////////////////////////////////////////////////////////////////////////////////////////////////////
 // Function wrappers that return type ResultCode

 template <ResultCode func(Core::System&, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, Param(system, 0)).raw);
 }

 template <ResultCode func(Core::System&, u64, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, Param(system, 0), Param(system, 1)).raw);
 }

 template <ResultCode func(Core::System&, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, static_cast<u32>(Param(system, 0))).raw);
 }

 template <ResultCode func(Core::System&, u32, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(
        system,
        func(system, static_cast<u32>(Param(system, 0)), static_cast<u32>(Param(system, 1))).raw);
 }

 template <ResultCode func(Core::System&, u32, u64, u64, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, static_cast<u32>(Param(system, 0)), Param(system, 1),
                            Param(system, 2), Param(system, 3))
                           .raw);
 }

 template <ResultCode func(Core::System&, u32*)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u32 param = 0;
    const u32 retval = func(system, &param).raw;
    system.CurrentArmInterface().SetReg(1, param);
@@ -65,7 +73,7 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, u32*, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u32 param_1 = 0;
    const u32 retval = func(system, &param_1, static_cast<u32>(Param(system, 1))).raw;
    system.CurrentArmInterface().SetReg(1, param_1);
@@ -73,7 +81,7 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, u32*, u32*)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u32 param_1 = 0;
    u32 param_2 = 0;
    const u32 retval = func(system, &param_1, &param_2).raw;
@@ -86,7 +94,7 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, u32*, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u32 param_1 = 0;
    const u32 retval = func(system, &param_1, Param(system, 1)).raw;
    system.CurrentArmInterface().SetReg(1, param_1);
@@ -94,7 +102,7 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, u32*, u64, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u32 param_1 = 0;
    const u32 retval =
        func(system, &param_1, Param(system, 1), static_cast<u32>(Param(system, 2))).raw;
@@ -104,7 +112,7 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, u64*, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u64 param_1 = 0;
    const u32 retval = func(system, &param_1, static_cast<u32>(Param(system, 1))).raw;

@@ -113,12 +121,12 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, u64, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, Param(system, 0), static_cast<u32>(Param(system, 1))).raw);
 }

 template <ResultCode func(Core::System&, u64*, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u64 param_1 = 0;
    const u32 retval = func(system, &param_1, Param(system, 1)).raw;

@@ -127,7 +135,7 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, u64*, u32, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u64 param_1 = 0;
    const u32 retval = func(system, &param_1, static_cast<u32>(Param(system, 1)),
                            static_cast<u32>(Param(system, 2)))
@@ -138,19 +146,19 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, u32, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, static_cast<u32>(Param(system, 0)), Param(system, 1)).raw);
 }

 template <ResultCode func(Core::System&, u32, u32, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, static_cast<u32>(Param(system, 0)),
                            static_cast<u32>(Param(system, 1)), Param(system, 2))
                           .raw);
 }

 template <ResultCode func(Core::System&, u32, u32*, u64*)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u32 param_1 = 0;
    u64 param_2 = 0;
    const ResultCode retval = func(system, static_cast<u32>(Param(system, 2)), &param_1, &param_2);
@@ -161,54 +169,54 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, u64, u64, u32, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, Param(system, 0), Param(system, 1),
                            static_cast<u32>(Param(system, 2)), static_cast<u32>(Param(system, 3)))
                           .raw);
 }

 template <ResultCode func(Core::System&, u64, u64, u32, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, Param(system, 0), Param(system, 1),
                            static_cast<u32>(Param(system, 2)), Param(system, 3))
                           .raw);
 }

 template <ResultCode func(Core::System&, u32, u64, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, static_cast<u32>(Param(system, 0)), Param(system, 1),
                            static_cast<u32>(Param(system, 2)))
                           .raw);
 }

 template <ResultCode func(Core::System&, u64, u64, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, Param(system, 0), Param(system, 1), Param(system, 2)).raw);
 }

 template <ResultCode func(Core::System&, u64, u64, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(
        system,
        func(system, Param(system, 0), Param(system, 1), static_cast<u32>(Param(system, 2))).raw);
 }

 template <ResultCode func(Core::System&, u32, u64, u64, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, static_cast<u32>(Param(system, 0)), Param(system, 1),
                            Param(system, 2), static_cast<u32>(Param(system, 3)))
                           .raw);
 }

 template <ResultCode func(Core::System&, u32, u64, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(
        system,
        func(system, static_cast<u32>(Param(system, 0)), Param(system, 1), Param(system, 2)).raw);
 }

 template <ResultCode func(Core::System&, u32*, u64, u64, s64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u32 param_1 = 0;
    const u32 retval = func(system, &param_1, Param(system, 1), static_cast<u32>(Param(system, 2)),
                            static_cast<s64>(Param(system, 3)))
@@ -219,14 +227,14 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, u64, u64, u32, s64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, Param(system, 0), Param(system, 1),
                            static_cast<u32>(Param(system, 2)), static_cast<s64>(Param(system, 3)))
                           .raw);
 }

 template <ResultCode func(Core::System&, u64*, u64, u64, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u64 param_1 = 0;
    const u32 retval =
        func(system, &param_1, Param(system, 1), Param(system, 2), Param(system, 3)).raw;
@@ -236,7 +244,7 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, u32*, u64, u64, u64, u32, s32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u32 param_1 = 0;
    const u32 retval = func(system, &param_1, Param(system, 1), Param(system, 2), Param(system, 3),
                            static_cast<u32>(Param(system, 4)), static_cast<s32>(Param(system, 5)))
@@ -247,7 +255,7 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, u32*, u64, u64, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u32 param_1 = 0;
    const u32 retval = func(system, &param_1, Param(system, 1), Param(system, 2),
                            static_cast<u32>(Param(system, 3)))
@@ -258,7 +266,7 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, Handle*, u64, u32, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    u32 param_1 = 0;
    const u32 retval = func(system, &param_1, Param(system, 1), static_cast<u32>(Param(system, 2)),
                            static_cast<u32>(Param(system, 3)))
@@ -269,14 +277,14 @@ void SvcWrap(Core::System& system) {
 }

 template <ResultCode func(Core::System&, u64, u32, s32, s64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, Param(system, 0), static_cast<u32>(Param(system, 1)),
                            static_cast<s32>(Param(system, 2)), static_cast<s64>(Param(system, 3)))
                           .raw);
 }

 template <ResultCode func(Core::System&, u64, u32, s32, s32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system, Param(system, 0), static_cast<u32>(Param(system, 1)),
                            static_cast<s32>(Param(system, 2)), static_cast<s32>(Param(system, 3)))
                           .raw);
@@ -286,7 +294,7 @@ void SvcWrap(Core::System& system) {
 // Function wrappers that return type u32

 template <u32 func(Core::System&)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system));
 }

@@ -294,7 +302,7 @@ void SvcWrap(Core::System& system) {
 // Function wrappers that return type u64

 template <u64 func(Core::System&)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    FuncReturn(system, func(system));
 }

@@ -302,44 +310,110 @@ void SvcWrap(Core::System& system) {
 /// Function wrappers that return type void

 template <void func(Core::System&)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    func(system);
 }

 template <void func(Core::System&, u32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    func(system, static_cast<u32>(Param(system, 0)));
 }

 template <void func(Core::System&, u32, u64, u64, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    func(system, static_cast<u32>(Param(system, 0)), Param(system, 1), Param(system, 2),
         Param(system, 3));
 }

 template <void func(Core::System&, s64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    func(system, static_cast<s64>(Param(system, 0)));
 }

 template <void func(Core::System&, u64, s32)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    func(system, Param(system, 0), static_cast<s32>(Param(system, 1)));
 }

 template <void func(Core::System&, u64, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    func(system, Param(system, 0), Param(system, 1));
 }

 template <void func(Core::System&, u64, u64, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    func(system, Param(system, 0), Param(system, 1), Param(system, 2));
 }

 template <void func(Core::System&, u32, u64, u64)>
-void SvcWrap(Core::System& system) {
+void SvcWrap64(Core::System& system) {
    func(system, static_cast<u32>(Param(system, 0)), Param(system, 1), Param(system, 2));
 }

+// Used by QueryMemory32
+template <ResultCode func(Core::System&, u32, u32, u32)>
+void SvcWrap32(Core::System& system) {
+    FuncReturn32(system,
+                 func(system, Param32(system, 0), Param32(system, 1), Param32(system, 2)).raw);
+}
+
+// Used by GetInfo32
+template <ResultCode func(Core::System&, u32*, u32*, u32, u32, u32, u32)>
+void SvcWrap32(Core::System& system) {
+    u32 param_1 = 0;
+    u32 param_2 = 0;
+
+    const u32 retval = func(system, &param_1, &param_2, Param32(system, 0), Param32(system, 1),
+                            Param32(system, 2), Param32(system, 3))
+                           .raw;
+
+    system.CurrentArmInterface().SetReg(1, param_1);
+    system.CurrentArmInterface().SetReg(2, param_2);
+    FuncReturn(system, retval);
+}
+
+// Used by GetThreadPriority32, ConnectToNamedPort32
+template <ResultCode func(Core::System&, u32*, u32)>
+void SvcWrap32(Core::System& system) {
+    u32 param_1 = 0;
+    const u32 retval = func(system, &param_1, Param32(system, 1)).raw;
+    system.CurrentArmInterface().SetReg(1, param_1);
+    FuncReturn(system, retval);
+}
+
+// Used by GetThreadId32
+template <ResultCode func(Core::System&, u32*, u32*, u32)>
+void SvcWrap32(Core::System& system) {
+    u32 param_1 = 0;
+    u32 param_2 = 0;
+
+    const u32 retval = func(system, &param_1, &param_2, Param32(system, 1)).raw;
+    system.CurrentArmInterface().SetReg(1, param_1);
+    system.CurrentArmInterface().SetReg(2, param_2);
+    FuncReturn(system, retval);
+}
+
+// Used by SignalProcessWideKey32
+template <void func(Core::System&, u32, s32)>
+void SvcWrap32(Core::System& system) {
+    func(system, static_cast<u32>(Param(system, 0)), static_cast<s32>(Param(system, 1)));
+}
+
+// Used by SendSyncRequest32
+template <ResultCode func(Core::System&, u32)>
+void SvcWrap32(Core::System& system) {
+    FuncReturn(system, func(system, static_cast<u32>(Param(system, 0))).raw);
+}
+
+// Used by WaitSynchronization32
+template <ResultCode func(Core::System&, u32, u32, s32, u32, Handle*)>
+void SvcWrap32(Core::System& system) {
+    u32 param_1 = 0;
+    const u32 retval = func(system, Param32(system, 0), Param32(system, 1), Param32(system, 2),
+                            Param32(system, 3), &param_1)
+                           .raw;
+    system.CurrentArmInterface().SetReg(1, param_1);
+    FuncReturn(system, retval);
+}
+
 } // namespace Kernel
--- a/src/core/hle/kernel/thread.cpp
+++ b/src/core/hle/kernel/thread.cpp
@@ -46,9 +46,9 @@ Thread::~Thread() = default;
 void Thread::Stop() {
    // Cancel any outstanding wakeup events for this thread
    Core::System::GetInstance().CoreTiming().UnscheduleEvent(kernel.ThreadWakeupCallbackEventType(),
-                                                             callback_handle);
-    kernel.ThreadWakeupCallbackHandleTable().Close(callback_handle);
-    callback_handle = 0;
+                                                             global_handle);
+    kernel.GlobalHandleTable().Close(global_handle);
+    global_handle = 0;
    SetStatus(ThreadStatus::Dead);
    Signal();

@@ -73,12 +73,12 @@ void Thread::WakeAfterDelay(s64 nanoseconds) {
    // thread-safe version of ScheduleEvent.
    const s64 cycles = Core::Timing::nsToCycles(std::chrono::nanoseconds{nanoseconds});
    Core::System::GetInstance().CoreTiming().ScheduleEvent(
-        cycles, kernel.ThreadWakeupCallbackEventType(), callback_handle);
+        cycles, kernel.ThreadWakeupCallbackEventType(), global_handle);
 }

 void Thread::CancelWakeupTimer() {
    Core::System::GetInstance().CoreTiming().UnscheduleEvent(kernel.ThreadWakeupCallbackEventType(),
-                                                             callback_handle);
+                                                             global_handle);
 }

 void Thread::ResumeFromWait() {
@@ -133,15 +133,16 @@ void Thread::CancelWait() {
    ResumeFromWait();
 }

-/**
- * Resets a thread context, making it ready to be scheduled and run by the CPU
- * @param context Thread context to reset
- * @param stack_top Address of the top of the stack
- * @param entry_point Address of entry point for execution
- * @param arg User argument for thread
- */
-static void ResetThreadContext(Core::ARM_Interface::ThreadContext& context, VAddr stack_top,
-                               VAddr entry_point, u64 arg) {
+static void ResetThreadContext32(Core::ARM_Interface::ThreadContext32& context, u32 stack_top,
+                                 u32 entry_point, u32 arg) {
+    context = {};
+    context.cpu_registers[0] = arg;
+    context.cpu_registers[15] = entry_point;
+    context.cpu_registers[13] = stack_top;
+}
+
+static void ResetThreadContext64(Core::ARM_Interface::ThreadContext64& context, VAddr stack_top,
+                                 VAddr entry_point, u64 arg) {
    context = {};
    context.cpu_registers[0] = arg;
    context.pc = entry_point;
@@ -190,7 +191,7 @@ ResultVal<std::shared_ptr<Thread>> Thread::Create(KernelCore& kernel, std::strin
    thread->condvar_wait_address = 0;
    thread->wait_handle = 0;
    thread->name = std::move(name);
-    thread->callback_handle = kernel.ThreadWakeupCallbackHandleTable().Create(thread).Unwrap();
+    thread->global_handle = kernel.GlobalHandleTable().Create(thread).Unwrap();
    thread->owner_process = &owner_process;
    auto& scheduler = kernel.GlobalScheduler();
    scheduler.AddThread(thread);
@@ -198,9 +199,9 @@ ResultVal<std::shared_ptr<Thread>> Thread::Create(KernelCore& kernel, std::strin

    thread->owner_process->RegisterThread(thread.get());

-    // TODO(peachum): move to ScheduleThread() when scheduler is added so selected core is used
-    // to initialize the context
-    ResetThreadContext(thread->context, stack_top, entry_point, arg);
+    ResetThreadContext32(thread->context_32, static_cast<u32>(stack_top),
+                         static_cast<u32>(entry_point), static_cast<u32>(arg));
+    ResetThreadContext64(thread->context_64, stack_top, entry_point, arg);

    return MakeResult<std::shared_ptr<Thread>>(std::move(thread));
 }
@@ -213,11 +214,13 @@ void Thread::SetPriority(u32 priority) {
 }

 void Thread::SetWaitSynchronizationResult(ResultCode result) {
-    context.cpu_registers[0] = result.raw;
+    context_32.cpu_registers[0] = result.raw;
+    context_64.cpu_registers[0] = result.raw;
 }

 void Thread::SetWaitSynchronizationOutput(s32 output) {
-    context.cpu_registers[1] = output;
+    context_32.cpu_registers[1] = output;
+    context_64.cpu_registers[1] = output;
 }

 s32 Thread::GetSynchronizationObjectIndex(std::shared_ptr<SynchronizationObject> object) const {
--- a/src/core/hle/kernel/thread.h
+++ b/src/core/hle/kernel/thread.h
@@ -102,7 +102,8 @@ public:

    using MutexWaitingThreads = std::vector<std::shared_ptr<Thread>>;

-    using ThreadContext = Core::ARM_Interface::ThreadContext;
+    using ThreadContext32 = Core::ARM_Interface::ThreadContext32;
+    using ThreadContext64 = Core::ARM_Interface::ThreadContext64;

    using ThreadSynchronizationObjects = std::vector<std::shared_ptr<SynchronizationObject>>;

@@ -273,12 +274,20 @@ public:
        return status == ThreadStatus::WaitSynch;
    }

-    ThreadContext& GetContext() {
-        return context;
+    ThreadContext32& GetContext32() {
+        return context_32;
    }

-    const ThreadContext& GetContext() const {
-        return context;
+    const ThreadContext32& GetContext32() const {
+        return context_32;
+    }
+
+    ThreadContext64& GetContext64() {
+        return context_64;
+    }
+
+    const ThreadContext64& GetContext64() const {
+        return context_64;
    }

    ThreadStatus GetStatus() const {
@@ -453,6 +462,10 @@ public:
        is_sync_cancelled = value;
    }

+    Handle GetGlobalHandle() const {
+        return global_handle;
+    }
+
 private:
    void SetSchedulingStatus(ThreadSchedStatus new_status);
    void SetCurrentPriority(u32 new_priority);
@@ -462,7 +475,8 @@ private:
    void AdjustSchedulingOnPriority(u32 old_priority);
    void AdjustSchedulingOnAffinity(u64 old_affinity_mask, s32 old_core);

-    Core::ARM_Interface::ThreadContext context{};
+    ThreadContext32 context_32{};
+    ThreadContext64 context_64{};

    u64 thread_id = 0;

@@ -514,7 +528,7 @@ private:
    VAddr arb_wait_address{0};

    /// Handle used as userdata to reference this object when inserting into the CoreTiming queue.
-    Handle callback_handle = 0;
+    Handle global_handle = 0;

    /// Callback that will be invoked when the thread is resumed from a waiting state. If the thread
    /// was waiting via WaitSynchronization then the object will be the last object that became
--- a/src/core/hle/kernel/time_manager.cpp
+++ b/src/core/hle/kernel/time_manager.cpp
@@ -0,0 +1,44 @@
+// Copyright 2020 yuzu Emulator Project
+// Licensed under GPLv2 or any later version
+// Refer to the license.txt file included.
+
+#include "common/assert.h"
+#include "core/core.h"
+#include "core/core_timing.h"
+#include "core/core_timing_util.h"
+#include "core/hle/kernel/handle_table.h"
+#include "core/hle/kernel/kernel.h"
+#include "core/hle/kernel/thread.h"
+#include "core/hle/kernel/time_manager.h"
+
+namespace Kernel {
+
+TimeManager::TimeManager(Core::System& system) : system{system} {
+    time_manager_event_type = Core::Timing::CreateEvent(
+        "Kernel::TimeManagerCallback", [this](u64 thread_handle, [[maybe_unused]] s64 cycles_late) {
+            Handle proper_handle = static_cast<Handle>(thread_handle);
+            std::shared_ptr<Thread> thread =
+                this->system.Kernel().RetrieveThreadFromGlobalHandleTable(proper_handle);
+            thread->ResumeFromWait();
+        });
+}
+
+void TimeManager::ScheduleTimeEvent(Handle& event_handle, Thread* timetask, s64 nanoseconds) {
+    if (nanoseconds > 0) {
+        ASSERT(timetask);
+        event_handle = timetask->GetGlobalHandle();
+        const s64 cycles = Core::Timing::nsToCycles(std::chrono::nanoseconds{nanoseconds});
+        system.CoreTiming().ScheduleEvent(cycles, time_manager_event_type, event_handle);
+    } else {
+        event_handle = InvalidHandle;
+    }
+}
+
+void TimeManager::UnscheduleTimeEvent(Handle event_handle) {
+    if (event_handle == InvalidHandle) {
+        return;
+    }
+    system.CoreTiming().UnscheduleEvent(time_manager_event_type, event_handle);
+}
+
+} // namespace Kernel
--- a/src/core/hle/kernel/time_manager.h
+++ b/src/core/hle/kernel/time_manager.h
@@ -0,0 +1,43 @@
+// Copyright 2020 yuzu Emulator Project
+// Licensed under GPLv2 or any later version
+// Refer to the license.txt file included.
+
+#pragma once
+
+#include <memory>
+
+#include "core/hle/kernel/object.h"
+
+namespace Core {
+class System;
+} // namespace Core
+
+namespace Core::Timing {
+struct EventType;
+} // namespace Core::Timing
+
+namespace Kernel {
+
+class Thread;
+
+/**
+ * The `TimeManager` takes care of scheduling time events on threads and executes their TimeUp
+ * method when the event is triggered.
+ */
+class TimeManager {
+public:
+    explicit TimeManager(Core::System& system);
+
+    /// Schedule a time event on `timetask` thread that will expire in 'nanoseconds'
+    /// returns a non-invalid handle in `event_handle` if correctly scheduled
+    void ScheduleTimeEvent(Handle& event_handle, Thread* timetask, s64 nanoseconds);
+
+    /// Unschedule an existing time event
+    void UnscheduleTimeEvent(Handle event_handle);
+
+private:
+    Core::System& system;
+    std::shared_ptr<Core::Timing::EventType> time_manager_event_type;
+};
+
+} // namespace Kernel
--- a/src/core/hle/service/am/am.cpp
+++ b/src/core/hle/service/am/am.cpp
@@ -607,7 +607,7 @@ ICommonStateGetter::ICommonStateGetter(Core::System& system,
        {40, nullptr, "GetCradleFwVersion"},
        {50, nullptr, "IsVrModeEnabled"},
        {51, nullptr, "SetVrModeEnabled"},
-        {52, nullptr, "SwitchLcdBacklight"},
+        {52, &ICommonStateGetter::SetLcdBacklighOffEnabled, "SetLcdBacklighOffEnabled"},
        {53, nullptr, "BeginVrModeEx"},
        {54, nullptr, "EndVrModeEx"},
        {55, nullptr, "IsInControllerFirmwareUpdateSection"},
@@ -636,7 +636,6 @@ void ICommonStateGetter::GetBootMode(Kernel::HLERequestContext& ctx) {

    IPC::ResponseBuilder rb{ctx, 3};
    rb.Push(RESULT_SUCCESS);
-
    rb.Push<u8>(static_cast<u8>(Service::PM::SystemBootMode::Normal)); // Normal boot mode
 }

@@ -660,6 +659,7 @@ void ICommonStateGetter::ReceiveMessage(Kernel::HLERequestContext& ctx) {
        rb.PushEnum<AppletMessageQueue::AppletMessage>(message);
        return;
    }
+
    rb.Push(RESULT_SUCCESS);
    rb.PushEnum<AppletMessageQueue::AppletMessage>(message);
 }
@@ -672,6 +672,17 @@ void ICommonStateGetter::GetCurrentFocusState(Kernel::HLERequestContext& ctx) {
    rb.Push(static_cast<u8>(FocusState::InFocus));
 }

+void ICommonStateGetter::SetLcdBacklighOffEnabled(Kernel::HLERequestContext& ctx) {
+    IPC::RequestParser rp{ctx};
+    const auto is_lcd_backlight_off_enabled = rp.Pop<bool>();
+
+    LOG_WARNING(Service_AM, "(STUBBED) called. is_lcd_backlight_off_enabled={}",
+                is_lcd_backlight_off_enabled);
+
+    IPC::ResponseBuilder rb{ctx, 2};
+    rb.Push(RESULT_SUCCESS);
+}
+
 void ICommonStateGetter::GetDefaultDisplayResolutionChangeEvent(Kernel::HLERequestContext& ctx) {
    LOG_DEBUG(Service_AM, "called");

--- a/src/core/hle/service/am/am.h
+++ b/src/core/hle/service/am/am.h
@@ -182,6 +182,7 @@ private:
    void GetOperationMode(Kernel::HLERequestContext& ctx);
    void GetPerformanceMode(Kernel::HLERequestContext& ctx);
    void GetBootMode(Kernel::HLERequestContext& ctx);
+    void SetLcdBacklighOffEnabled(Kernel::HLERequestContext& ctx);
    void GetDefaultDisplayResolution(Kernel::HLERequestContext& ctx);
    void SetCpuBoostMode(Kernel::HLERequestContext& ctx);

--- a/src/core/hle/service/bcat/backend/boxcat.cpp
+++ b/src/core/hle/service/bcat/backend/boxcat.cpp
@@ -200,7 +200,8 @@ private:
    DownloadResult DownloadInternal(const std::string& resolved_path, u32 timeout_seconds,
                                    const std::string& content_type_name) {
        if (client == nullptr) {
-            client = std::make_unique<httplib::SSLClient>(BOXCAT_HOSTNAME, PORT, timeout_seconds);
+            client = std::make_unique<httplib::SSLClient>(BOXCAT_HOSTNAME, PORT);
+            client->set_timeout_sec(timeout_seconds);
        }

        httplib::Headers headers{
@@ -448,8 +449,8 @@ std::optional<std::vector<u8>> Boxcat::GetLaunchParameter(TitleIDVersion title)

 Boxcat::StatusResult Boxcat::GetStatus(std::optional<std::string>& global,
                                       std::map<std::string, EventStatus>& games) {
-    httplib::SSLClient client{BOXCAT_HOSTNAME, static_cast<int>(PORT),
-                              static_cast<int>(TIMEOUT_SECONDS)};
+    httplib::SSLClient client{BOXCAT_HOSTNAME, static_cast<int>(PORT)};
+    client.set_timeout_sec(static_cast<int>(TIMEOUT_SECONDS));

    httplib::Headers headers{
        {std::string("Game-Assets-API-Version"), std::string(BOXCAT_API_VERSION)},
--- a/src/core/hle/service/filesystem/filesystem.cpp
+++ b/src/core/hle/service/filesystem/filesystem.cpp
@@ -40,7 +40,10 @@ static FileSys::VirtualDir GetDirectoryRelativeWrapped(FileSys::VirtualDir base,
    if (dir_name.empty() || dir_name == "." || dir_name == "/" || dir_name == "\\")
        return base;

-    return base->GetDirectoryRelative(dir_name);
+    const auto res = base->GetDirectoryRelative(dir_name);
+    if (res == nullptr)
+        return base->CreateDirectoryRelative(dir_name);
+    return res;
 }

 VfsDirectoryServiceWrapper::VfsDirectoryServiceWrapper(FileSys::VirtualDir backing_)
--- a/src/core/hle/service/hid/controllers/npad.cpp
+++ b/src/core/hle/service/hid/controllers/npad.cpp
@@ -287,13 +287,13 @@ void Controller_NPad::RequestPadStateUpdate(u32 npad_id) {
        analog_state[static_cast<std::size_t>(JoystickId::Joystick_Left)]->GetAnalogDirectionStatus(
            Input::AnalogDirection::DOWN));

-    pad_state.r_stick_up.Assign(analog_state[static_cast<std::size_t>(JoystickId::Joystick_Right)]
-                                    ->GetAnalogDirectionStatus(Input::AnalogDirection::RIGHT));
-    pad_state.r_stick_left.Assign(analog_state[static_cast<std::size_t>(JoystickId::Joystick_Right)]
-                                      ->GetAnalogDirectionStatus(Input::AnalogDirection::LEFT));
    pad_state.r_stick_right.Assign(
        analog_state[static_cast<std::size_t>(JoystickId::Joystick_Right)]
-            ->GetAnalogDirectionStatus(Input::AnalogDirection::UP));
+            ->GetAnalogDirectionStatus(Input::AnalogDirection::RIGHT));
+    pad_state.r_stick_left.Assign(analog_state[static_cast<std::size_t>(JoystickId::Joystick_Right)]
+                                      ->GetAnalogDirectionStatus(Input::AnalogDirection::LEFT));
+    pad_state.r_stick_up.Assign(analog_state[static_cast<std::size_t>(JoystickId::Joystick_Right)]
+                                    ->GetAnalogDirectionStatus(Input::AnalogDirection::UP));
    pad_state.r_stick_down.Assign(analog_state[static_cast<std::size_t>(JoystickId::Joystick_Right)]
                                      ->GetAnalogDirectionStatus(Input::AnalogDirection::DOWN));

--- a/src/core/hle/service/nvdrv/devices/nvhost_nvdec.cpp
+++ b/src/core/hle/service/nvdrv/devices/nvhost_nvdec.cpp
@@ -22,6 +22,18 @@ u32 nvhost_nvdec::ioctl(Ioctl command, const std::vector<u8>& input, const std::
    switch (static_cast<IoctlCommand>(command.raw)) {
    case IoctlCommand::IocSetNVMAPfdCommand:
        return SetNVMAPfd(input, output);
+    case IoctlCommand::IocSubmit:
+        return Submit(input, output);
+    case IoctlCommand::IocGetSyncpoint:
+        return GetSyncpoint(input, output);
+    case IoctlCommand::IocGetWaitbase:
+        return GetWaitbase(input, output);
+    case IoctlCommand::IocMapBuffer:
+        return MapBuffer(input, output);
+    case IoctlCommand::IocMapBufferEx:
+        return MapBufferEx(input, output);
+    case IoctlCommand::IocUnmapBufferEx:
+        return UnmapBufferEx(input, output);
    }

    UNIMPLEMENTED_MSG("Unimplemented ioctl");
@@ -30,11 +42,67 @@ u32 nvhost_nvdec::ioctl(Ioctl command, const std::vector<u8>& input, const std::

 u32 nvhost_nvdec::SetNVMAPfd(const std::vector<u8>& input, std::vector<u8>& output) {
    IoctlSetNvmapFD params{};
-    std::memcpy(&params, input.data(), input.size());
+    std::memcpy(&params, input.data(), sizeof(IoctlSetNvmapFD));
    LOG_DEBUG(Service_NVDRV, "called, fd={}", params.nvmap_fd);

    nvmap_fd = params.nvmap_fd;
    return 0;
 }

+u32 nvhost_nvdec::Submit(const std::vector<u8>& input, std::vector<u8>& output) {
+    IoctlSubmit params{};
+    std::memcpy(&params, input.data(), sizeof(IoctlSubmit));
+    LOG_WARNING(Service_NVDRV, "(STUBBED) called");
+    std::memcpy(output.data(), &params, sizeof(IoctlSubmit));
+    return 0;
+}
+
+u32 nvhost_nvdec::GetSyncpoint(const std::vector<u8>& input, std::vector<u8>& output) {
+    IoctlGetSyncpoint params{};
+    std::memcpy(&params, input.data(), sizeof(IoctlGetSyncpoint));
+    LOG_INFO(Service_NVDRV, "called, unknown=0x{:X}", params.unknown);
+    params.value = 0; // Seems to be hard coded at 0
+    std::memcpy(output.data(), &params, sizeof(IoctlGetSyncpoint));
+    return 0;
+}
+
+u32 nvhost_nvdec::GetWaitbase(const std::vector<u8>& input, std::vector<u8>& output) {
+    IoctlGetWaitbase params{};
+    std::memcpy(&params, input.data(), sizeof(IoctlGetWaitbase));
+    LOG_INFO(Service_NVDRV, "called, unknown=0x{:X}", params.unknown);
+    params.value = 0; // Seems to be hard coded at 0
+    std::memcpy(output.data(), &params, sizeof(IoctlGetWaitbase));
+    return 0;
+}
+
+u32 nvhost_nvdec::MapBuffer(const std::vector<u8>& input, std::vector<u8>& output) {
+    IoctlMapBuffer params{};
+    std::memcpy(&params, input.data(), sizeof(IoctlMapBuffer));
+    LOG_WARNING(Service_NVDRV, "(STUBBED) called with address={:08X}{:08X}", params.address_2,
+                params.address_1);
+    params.address_1 = 0;
+    params.address_2 = 0;
+    std::memcpy(output.data(), &params, sizeof(IoctlMapBuffer));
+    return 0;
+}
+
+u32 nvhost_nvdec::MapBufferEx(const std::vector<u8>& input, std::vector<u8>& output) {
+    IoctlMapBufferEx params{};
+    std::memcpy(&params, input.data(), sizeof(IoctlMapBufferEx));
+    LOG_WARNING(Service_NVDRV, "(STUBBED) called with address={:08X}{:08X}", params.address_2,
+                params.address_1);
+    params.address_1 = 0;
+    params.address_2 = 0;
+    std::memcpy(output.data(), &params, sizeof(IoctlMapBufferEx));
+    return 0;
+}
+
+u32 nvhost_nvdec::UnmapBufferEx(const std::vector<u8>& input, std::vector<u8>& output) {
+    IoctlUnmapBufferEx params{};
+    std::memcpy(&params, input.data(), sizeof(IoctlUnmapBufferEx));
+    LOG_WARNING(Service_NVDRV, "(STUBBED) called");
+    std::memcpy(output.data(), &params, sizeof(IoctlUnmapBufferEx));
+    return 0;
+}
+
 } // namespace Service::Nvidia::Devices
--- a/src/core/hle/service/nvdrv/devices/nvhost_nvdec.h
+++ b/src/core/hle/service/nvdrv/devices/nvhost_nvdec.h
@@ -23,16 +23,66 @@ public:
 private:
    enum class IoctlCommand : u32_le {
        IocSetNVMAPfdCommand = 0x40044801,
+        IocSubmit = 0xC0400001,
+        IocGetSyncpoint = 0xC0080002,
+        IocGetWaitbase = 0xC0080003,
+        IocMapBuffer = 0xC01C0009,
+        IocMapBufferEx = 0xC0A40009,
+        IocUnmapBufferEx = 0xC0A4000A,
    };

    struct IoctlSetNvmapFD {
        u32_le nvmap_fd;
    };
-    static_assert(sizeof(IoctlSetNvmapFD) == 4, "IoctlSetNvmapFD is incorrect size");
+    static_assert(sizeof(IoctlSetNvmapFD) == 0x4, "IoctlSetNvmapFD is incorrect size");
+
+    struct IoctlSubmit {
+        INSERT_PADDING_BYTES(0x40); // TODO(DarkLordZach): RE this structure
+    };
+    static_assert(sizeof(IoctlSubmit) == 0x40, "IoctlSubmit has incorrect size");
+
+    struct IoctlGetSyncpoint {
+        u32 unknown; // seems to be ignored? Nintendo added this
+        u32 value;
+    };
+    static_assert(sizeof(IoctlGetSyncpoint) == 0x08, "IoctlGetSyncpoint has incorrect size");
+
+    struct IoctlGetWaitbase {
+        u32 unknown; // seems to be ignored? Nintendo added this
+        u32 value;
+    };
+    static_assert(sizeof(IoctlGetWaitbase) == 0x08, "IoctlGetWaitbase has incorrect size");
+
+    struct IoctlMapBuffer {
+        u32 unknown;
+        u32 address_1;
+        u32 address_2;
+        INSERT_PADDING_BYTES(0x10); // TODO(DarkLordZach): RE this structure
+    };
+    static_assert(sizeof(IoctlMapBuffer) == 0x1C, "IoctlMapBuffer is incorrect size");
+
+    struct IoctlMapBufferEx {
+        u32 unknown;
+        u32 address_1;
+        u32 address_2;
+        INSERT_PADDING_BYTES(0x98); // TODO(DarkLordZach): RE this structure
+    };
+    static_assert(sizeof(IoctlMapBufferEx) == 0xA4, "IoctlMapBufferEx has incorrect size");
+
+    struct IoctlUnmapBufferEx {
+        INSERT_PADDING_BYTES(0xA4); // TODO(DarkLordZach): RE this structure
+    };
+    static_assert(sizeof(IoctlUnmapBufferEx) == 0xA4, "IoctlUnmapBufferEx has incorrect size");

    u32_le nvmap_fd{};

    u32 SetNVMAPfd(const std::vector<u8>& input, std::vector<u8>& output);
+    u32 Submit(const std::vector<u8>& input, std::vector<u8>& output);
+    u32 GetSyncpoint(const std::vector<u8>& input, std::vector<u8>& output);
+    u32 GetWaitbase(const std::vector<u8>& input, std::vector<u8>& output);
+    u32 MapBuffer(const std::vector<u8>& input, std::vector<u8>& output);
+    u32 MapBufferEx(const std::vector<u8>& input, std::vector<u8>& output);
+    u32 UnmapBufferEx(const std::vector<u8>& input, std::vector<u8>& output);
 };

 } // namespace Service::Nvidia::Devices
--- a/src/core/hle/service/nvdrv/devices/nvhost_vic.cpp
+++ b/src/core/hle/service/nvdrv/devices/nvhost_vic.cpp
@@ -22,6 +22,18 @@ u32 nvhost_vic::ioctl(Ioctl command, const std::vector<u8>& input, const std::ve
    switch (static_cast<IoctlCommand>(command.raw)) {
    case IoctlCommand::IocSetNVMAPfdCommand:
        return SetNVMAPfd(input, output);
+    case IoctlCommand::IocSubmit:
+        return Submit(input, output);
+    case IoctlCommand::IocGetSyncpoint:
+        return GetSyncpoint(input, output);
+    case IoctlCommand::IocGetWaitbase:
+        return GetWaitbase(input, output);
+    case IoctlCommand::IocMapBuffer:
+        return MapBuffer(input, output);
+    case IoctlCommand::IocMapBufferEx:
+        return MapBuffer(input, output);
+    case IoctlCommand::IocUnmapBufferEx:
+        return UnmapBufferEx(input, output);
    }

    UNIMPLEMENTED_MSG("Unimplemented ioctl");
@@ -30,11 +42,67 @@ u32 nvhost_vic::ioctl(Ioctl command, const std::vector<u8>& input, const std::ve

 u32 nvhost_vic::SetNVMAPfd(const std::vector<u8>& input, std::vector<u8>& output) {
    IoctlSetNvmapFD params{};
-    std::memcpy(&params, input.data(), input.size());
+    std::memcpy(&params, input.data(), sizeof(IoctlSetNvmapFD));
    LOG_DEBUG(Service_NVDRV, "called, fd={}", params.nvmap_fd);

    nvmap_fd = params.nvmap_fd;
    return 0;
 }

+u32 nvhost_vic::Submit(const std::vector<u8>& input, std::vector<u8>& output) {
+    IoctlSubmit params{};
+    std::memcpy(&params, input.data(), sizeof(IoctlSubmit));
+    LOG_WARNING(Service_NVDRV, "(STUBBED) called");
+    std::memcpy(output.data(), &params, sizeof(IoctlSubmit));
+    return 0;
+}
+
+u32 nvhost_vic::GetSyncpoint(const std::vector<u8>& input, std::vector<u8>& output) {
+    IoctlGetSyncpoint params{};
+    std::memcpy(&params, input.data(), sizeof(IoctlGetSyncpoint));
+    LOG_INFO(Service_NVDRV, "called, unknown=0x{:X}", params.unknown);
+    params.value = 0; // Seems to be hard coded at 0
+    std::memcpy(output.data(), &params, sizeof(IoctlGetSyncpoint));
+    return 0;
+}
+
+u32 nvhost_vic::GetWaitbase(const std::vector<u8>& input, std::vector<u8>& output) {
+    IoctlGetWaitbase params{};
+    std::memcpy(&params, input.data(), sizeof(IoctlGetWaitbase));
+    LOG_INFO(Service_NVDRV, "called, unknown=0x{:X}", params.unknown);
+    params.value = 0; // Seems to be hard coded at 0
+    std::memcpy(output.data(), &params, sizeof(IoctlGetWaitbase));
+    return 0;
+}
+
+u32 nvhost_vic::MapBuffer(const std::vector<u8>& input, std::vector<u8>& output) {
+    IoctlMapBuffer params{};
+    std::memcpy(&params, input.data(), sizeof(IoctlMapBuffer));
+    LOG_WARNING(Service_NVDRV, "(STUBBED) called with address={:08X}{:08X}", params.address_2,
+                params.address_1);
+    params.address_1 = 0;
+    params.address_2 = 0;
+    std::memcpy(output.data(), &params, sizeof(IoctlMapBuffer));
+    return 0;
+}
+
+u32 nvhost_vic::MapBufferEx(const std::vector<u8>& input, std::vector<u8>& output) {
+    IoctlMapBufferEx params{};
+    std::memcpy(&params, input.data(), sizeof(IoctlMapBufferEx));
+    LOG_WARNING(Service_NVDRV, "(STUBBED) called with address={:08X}{:08X}", params.address_2,
+                params.address_1);
+    params.address_1 = 0;
+    params.address_2 = 0;
+    std::memcpy(output.data(), &params, sizeof(IoctlMapBufferEx));
+    return 0;
+}
+
+u32 nvhost_vic::UnmapBufferEx(const std::vector<u8>& input, std::vector<u8>& output) {
+    IoctlUnmapBufferEx params{};
+    std::memcpy(&params, input.data(), sizeof(IoctlUnmapBufferEx));
+    LOG_WARNING(Service_NVDRV, "(STUBBED) called");
+    std::memcpy(output.data(), &params, sizeof(IoctlUnmapBufferEx));
+    return 0;
+}
+
 } // namespace Service::Nvidia::Devices
--- a/src/core/hle/service/nvdrv/devices/nvhost_vic.h
+++ b/src/core/hle/service/nvdrv/devices/nvhost_vic.h
@@ -23,6 +23,12 @@ public:
 private:
    enum class IoctlCommand : u32_le {
        IocSetNVMAPfdCommand = 0x40044801,
+        IocSubmit = 0xC0400001,
+        IocGetSyncpoint = 0xC0080002,
+        IocGetWaitbase = 0xC0080003,
+        IocMapBuffer = 0xC01C0009,
+        IocMapBufferEx = 0xC03C0009,
+        IocUnmapBufferEx = 0xC03C000A,
    };

    struct IoctlSetNvmapFD {
@@ -30,9 +36,53 @@ private:
    };
    static_assert(sizeof(IoctlSetNvmapFD) == 4, "IoctlSetNvmapFD is incorrect size");

+    struct IoctlSubmit {
+        INSERT_PADDING_BYTES(0x40); // TODO(DarkLordZach): RE this structure
+    };
+    static_assert(sizeof(IoctlSubmit) == 0x40, "IoctlSubmit is incorrect size");
+
+    struct IoctlGetSyncpoint {
+        u32 unknown; // seems to be ignored? Nintendo added this
+        u32 value;
+    };
+    static_assert(sizeof(IoctlGetSyncpoint) == 0x8, "IoctlGetSyncpoint is incorrect size");
+
+    struct IoctlGetWaitbase {
+        u32 unknown; // seems to be ignored? Nintendo added this
+        u32 value;
+    };
+    static_assert(sizeof(IoctlGetWaitbase) == 0x8, "IoctlGetWaitbase is incorrect size");
+
+    struct IoctlMapBuffer {
+        u32 unknown;
+        u32 address_1;
+        u32 address_2;
+        INSERT_PADDING_BYTES(0x10); // TODO(DarkLordZach): RE this structure
+    };
+    static_assert(sizeof(IoctlMapBuffer) == 0x1C, "IoctlMapBuffer is incorrect size");
+
+    struct IoctlMapBufferEx {
+        u32 unknown;
+        u32 address_1;
+        u32 address_2;
+        INSERT_PADDING_BYTES(0x30); // TODO(DarkLordZach): RE this structure
+    };
+    static_assert(sizeof(IoctlMapBufferEx) == 0x3C, "IoctlMapBufferEx is incorrect size");
+
+    struct IoctlUnmapBufferEx {
+        INSERT_PADDING_BYTES(0x3C); // TODO(DarkLordZach): RE this structure
+    };
+    static_assert(sizeof(IoctlUnmapBufferEx) == 0x3C, "IoctlUnmapBufferEx is incorrect size");
+
    u32_le nvmap_fd{};

    u32 SetNVMAPfd(const std::vector<u8>& input, std::vector<u8>& output);
+    u32 Submit(const std::vector<u8>& input, std::vector<u8>& output);
+    u32 GetSyncpoint(const std::vector<u8>& input, std::vector<u8>& output);
+    u32 GetWaitbase(const std::vector<u8>& input, std::vector<u8>& output);
+    u32 MapBuffer(const std::vector<u8>& input, std::vector<u8>& output);
+    u32 MapBufferEx(const std::vector<u8>& input, std::vector<u8>& output);
+    u32 UnmapBufferEx(const std::vector<u8>& input, std::vector<u8>& output);
 };

 } // namespace Service::Nvidia::Devices
--- a/src/core/loader/deconstructed_rom_directory.cpp
+++ b/src/core/loader/deconstructed_rom_directory.cpp
@@ -129,12 +129,6 @@ AppLoader_DeconstructedRomDirectory::LoadResult AppLoader_DeconstructedRomDirect
    }
    metadata.Print();

-    const FileSys::ProgramAddressSpaceType arch_bits{metadata.GetAddressSpaceType()};
-    if (arch_bits == FileSys::ProgramAddressSpaceType::Is32Bit ||
-        arch_bits == FileSys::ProgramAddressSpaceType::Is32BitNoMap) {
-        return {ResultStatus::Error32BitISA, {}};
-    }
-
    if (process.LoadFromMetadata(metadata).IsError()) {
        return {ResultStatus::ErrorUnableToParseKernelMetadata, {}};
    }
--- a/src/core/reporter.cpp
+++ b/src/core/reporter.cpp
@@ -111,7 +111,7 @@ json GetProcessorStateDataAuto(Core::System& system) {
    const auto& vm_manager{process->VMManager()};
    auto& arm{system.CurrentArmInterface()};

-    Core::ARM_Interface::ThreadContext context{};
+    Core::ARM_Interface::ThreadContext64 context{};
    arm.SaveContext(context);

    return GetProcessorStateData(process->Is64BitProcess() ? "AArch64" : "AArch32",
--- a/src/core/settings.cpp
+++ b/src/core/settings.cpp
@@ -94,6 +94,7 @@ void LogSettings() {
    LogSetting("Renderer_UseAccurateGpuEmulation", Settings::values.use_accurate_gpu_emulation);
    LogSetting("Renderer_UseAsynchronousGpuEmulation",
               Settings::values.use_asynchronous_gpu_emulation);
+    LogSetting("Renderer_UseVsync", Settings::values.use_vsync);
    LogSetting("Audio_OutputEngine", Settings::values.sink_id);
    LogSetting("Audio_EnableAudioStretching", Settings::values.enable_audio_stretching);
    LogSetting("Audio_OutputDevice", Settings::values.audio_device_id);
--- a/src/core/settings.h
+++ b/src/core/settings.h
@@ -430,11 +430,13 @@ struct Values {

    float resolution_factor;
    int aspect_ratio;
+    int max_anisotropy;
    bool use_frame_limit;
    u16 frame_limit;
    bool use_disk_shader_cache;
    bool use_accurate_gpu_emulation;
    bool use_asynchronous_gpu_emulation;
+    bool use_vsync;
    bool force_30fps_mode;

    float bg_red;
--- a/src/core/telemetry_session.cpp
+++ b/src/core/telemetry_session.cpp
@@ -188,6 +188,7 @@ void TelemetrySession::AddInitialInfo(Loader::AppLoader& app_loader) {
             Settings::values.use_accurate_gpu_emulation);
    AddField(field_type, "Renderer_UseAsynchronousGpuEmulation",
             Settings::values.use_asynchronous_gpu_emulation);
+    AddField(field_type, "Renderer_UseVsync", Settings::values.use_vsync);
    AddField(field_type, "System_UseDockedMode", Settings::values.use_docked_mode);
 }

--- a/src/input_common/analog_from_button.cpp
+++ b/src/input_common/analog_from_button.cpp
@@ -34,6 +34,20 @@ public:
                               y * coef * (x == 0 ? 1.0f : SQRT_HALF));
    }

+    bool GetAnalogDirectionStatus(Input::AnalogDirection direction) const override {
+        switch (direction) {
+        case Input::AnalogDirection::RIGHT:
+            return right->GetStatus();
+        case Input::AnalogDirection::LEFT:
+            return left->GetStatus();
+        case Input::AnalogDirection::UP:
+            return up->GetStatus();
+        case Input::AnalogDirection::DOWN:
+            return down->GetStatus();
+        }
+        return false;
+    }
+
 private:
    Button up;
    Button down;
--- a/src/input_common/udp/client.cpp
+++ b/src/input_common/udp/client.cpp
@@ -32,8 +32,16 @@ public:
                    SocketCallback callback)
        : callback(std::move(callback)), timer(io_service),
          socket(io_service, udp::endpoint(udp::v4(), 0)), client_id(client_id),
-          pad_index(pad_index),
-          send_endpoint(udp::endpoint(boost::asio::ip::make_address_v4(host), port)) {}
+          pad_index(pad_index) {
+        boost::system::error_code ec{};
+        auto ipv4 = boost::asio::ip::make_address_v4(host, ec);
+        if (ec.failed()) {
+            LOG_ERROR(Input, "Invalid IPv4 address \"{}\" provided to socket", host);
+            ipv4 = boost::asio::ip::address_v4{};
+        }
+
+        send_endpoint = {udp::endpoint(ipv4, port)};
+    }

    void Stop() {
        io_service.stop();
@@ -85,17 +93,18 @@ private:
    }

    void HandleSend(const boost::system::error_code& error) {
+        boost::system::error_code _ignored{};
        // Send a request for getting port info for the pad
        Request::PortInfo port_info{1, {pad_index, 0, 0, 0}};
        const auto port_message = Request::Create(port_info, client_id);
        std::memcpy(&send_buffer1, &port_message, PORT_INFO_SIZE);
-        socket.send_to(boost::asio::buffer(send_buffer1), send_endpoint);
+        socket.send_to(boost::asio::buffer(send_buffer1), send_endpoint, {}, _ignored);

        // Send a request for getting pad data for the pad
        Request::PadData pad_data{Request::PadData::Flags::Id, pad_index, EMPTY_MAC_ADDRESS};
        const auto pad_message = Request::Create(pad_data, client_id);
        std::memcpy(send_buffer2.data(), &pad_message, PAD_DATA_SIZE);
-        socket.send_to(boost::asio::buffer(send_buffer2), send_endpoint);
+        socket.send_to(boost::asio::buffer(send_buffer2), send_endpoint, {}, _ignored);
        StartSend(timer.expiry());
    }

--- a/src/input_common/udp/protocol.cpp
+++ b/src/input_common/udp/protocol.cpp
@@ -31,7 +31,6 @@ namespace Response {
 */
 std::optional<Type> Validate(u8* data, std::size_t size) {
    if (size < sizeof(Header)) {
-        LOG_DEBUG(Input, "Invalid UDP packet received");
        return std::nullopt;
    }
    Header header{};
--- a/src/video_core/engines/maxwell_3d.cpp
+++ b/src/video_core/engines/maxwell_3d.cpp
@@ -489,7 +489,7 @@ void Maxwell3D::FlushMMEInlineDraw() {

    const bool is_indexed = mme_draw.current_mode == MMEDrawMode::Indexed;
    if (ShouldExecute()) {
-        rasterizer.DrawMultiBatch(is_indexed);
+        rasterizer.Draw(is_indexed, true);
    }

    // TODO(bunnei): Below, we reset vertex count so that we can use these registers to determine if
@@ -654,7 +654,7 @@ void Maxwell3D::DrawArrays() {

    const bool is_indexed{regs.index_array.count && !regs.vertex_buffer.count};
    if (ShouldExecute()) {
-        rasterizer.DrawBatch(is_indexed);
+        rasterizer.Draw(is_indexed, false);
    }

    // TODO(bunnei): Below, we reset vertex count so that we can use these registers to determine if
--- a/src/video_core/engines/maxwell_3d.h
+++ b/src/video_core/engines/maxwell_3d.h
@@ -542,7 +542,7 @@ public:
                BitField<12, 1, InvMemoryLayout> type;
            } memory_layout;
            union {
-                BitField<0, 16, u32> array_mode;
+                BitField<0, 16, u32> layers;
                BitField<16, 1, u32> volume;
            };
            u32 layer_stride;
@@ -800,8 +800,12 @@ public:

                u32 zeta_width;
                u32 zeta_height;
+                union {
+                    BitField<0, 16, u32> zeta_layers;
+                    BitField<16, 1, u32> zeta_volume;
+                };

-                INSERT_UNION_PADDING_WORDS(0x27);
+                INSERT_UNION_PADDING_WORDS(0x26);

                u32 depth_test_enable;

@@ -1507,6 +1511,7 @@ ASSERT_REG_POSITION(vertex_attrib_format, 0x458);
 ASSERT_REG_POSITION(rt_control, 0x487);
 ASSERT_REG_POSITION(zeta_width, 0x48a);
 ASSERT_REG_POSITION(zeta_height, 0x48b);
+ASSERT_REG_POSITION(zeta_layers, 0x48c);
 ASSERT_REG_POSITION(depth_test_enable, 0x4B3);
 ASSERT_REG_POSITION(independent_blend_enable, 0x4B9);
 ASSERT_REG_POSITION(depth_write_enabled, 0x4BA);
--- a/src/video_core/gpu.cpp
+++ b/src/video_core/gpu.cpp
@@ -140,71 +140,6 @@ void GPU::FlushCommands() {
    renderer.Rasterizer().FlushCommands();
 }

-u32 RenderTargetBytesPerPixel(RenderTargetFormat format) {
-    ASSERT(format != RenderTargetFormat::NONE);
-
-    switch (format) {
-    case RenderTargetFormat::RGBA32_FLOAT:
-    case RenderTargetFormat::RGBA32_UINT:
-        return 16;
-    case RenderTargetFormat::RGBA16_UINT:
-    case RenderTargetFormat::RGBA16_UNORM:
-    case RenderTargetFormat::RGBA16_FLOAT:
-    case RenderTargetFormat::RGBX16_FLOAT:
-    case RenderTargetFormat::RG32_FLOAT:
-    case RenderTargetFormat::RG32_UINT:
-        return 8;
-    case RenderTargetFormat::RGBA8_UNORM:
-    case RenderTargetFormat::RGBA8_SNORM:
-    case RenderTargetFormat::RGBA8_SRGB:
-    case RenderTargetFormat::RGBA8_UINT:
-    case RenderTargetFormat::RGB10_A2_UNORM:
-    case RenderTargetFormat::BGRA8_UNORM:
-    case RenderTargetFormat::BGRA8_SRGB:
-    case RenderTargetFormat::RG16_UNORM:
-    case RenderTargetFormat::RG16_SNORM:
-    case RenderTargetFormat::RG16_UINT:
-    case RenderTargetFormat::RG16_SINT:
-    case RenderTargetFormat::RG16_FLOAT:
-    case RenderTargetFormat::R32_FLOAT:
-    case RenderTargetFormat::R11G11B10_FLOAT:
-    case RenderTargetFormat::R32_UINT:
-        return 4;
-    case RenderTargetFormat::R16_UNORM:
-    case RenderTargetFormat::R16_SNORM:
-    case RenderTargetFormat::R16_UINT:
-    case RenderTargetFormat::R16_SINT:
-    case RenderTargetFormat::R16_FLOAT:
-    case RenderTargetFormat::RG8_UNORM:
-    case RenderTargetFormat::RG8_SNORM:
-        return 2;
-    case RenderTargetFormat::R8_UNORM:
-    case RenderTargetFormat::R8_UINT:
-        return 1;
-    default:
-        UNIMPLEMENTED_MSG("Unimplemented render target format {}", static_cast<u32>(format));
-        return 1;
-    }
-}
-
-u32 DepthFormatBytesPerPixel(DepthFormat format) {
-    switch (format) {
-    case DepthFormat::Z32_S8_X24_FLOAT:
-        return 8;
-    case DepthFormat::Z32_FLOAT:
-    case DepthFormat::S8_Z24_UNORM:
-    case DepthFormat::Z24_X8_UNORM:
-    case DepthFormat::Z24_S8_UNORM:
-    case DepthFormat::Z24_C8_UNORM:
-        return 4;
-    case DepthFormat::Z16_UNORM:
-        return 2;
-    default:
-        UNIMPLEMENTED_MSG("Unimplemented Depth format {}", static_cast<u32>(format));
-        return 1;
-    }
-}
-
 // Note that, traditionally, methods are treated as 4-byte addressable locations, and hence
 // their numbers are written down multiplied by 4 in Docs. Here we are not multiply by 4.
 // So the values you see in docs might be multiplied by 4.
--- a/src/video_core/gpu.h
+++ b/src/video_core/gpu.h
@@ -57,6 +57,7 @@ enum class RenderTargetFormat : u32 {
    RG16_UINT = 0xDD,
    RG16_FLOAT = 0xDE,
    R11G11B10_FLOAT = 0xE0,
+    R32_SINT = 0xE3,
    R32_UINT = 0xE4,
    R32_FLOAT = 0xE5,
    B5G6R5_UNORM = 0xE8,
@@ -82,12 +83,6 @@ enum class DepthFormat : u32 {
    Z32_S8_X24_FLOAT = 0x19,
 };

-/// Returns the number of bytes per pixel of each rendertarget format.
-u32 RenderTargetBytesPerPixel(RenderTargetFormat format);
-
-/// Returns the number of bytes per pixel of each depth format.
-u32 DepthFormatBytesPerPixel(DepthFormat format);
-
 struct CommandListHeader;
 class DebugContext;

--- a/src/video_core/gpu_thread.cpp
+++ b/src/video_core/gpu_thread.cpp
@@ -5,7 +5,7 @@
 #include "common/assert.h"
 #include "common/microprofile.h"
 #include "core/core.h"
-#include "core/frontend/scope_acquire_window_context.h"
+#include "core/frontend/scope_acquire_context.h"
 #include "video_core/dma_pusher.h"
 #include "video_core/gpu.h"
 #include "video_core/gpu_thread.h"
@@ -27,7 +27,7 @@ static void RunThread(VideoCore::RendererBase& renderer, Tegra::DmaPusher& dma_p
        return;
    }

-    Core::Frontend::ScopeAcquireWindowContext acquire_context{renderer.GetRenderWindow()};
+    Core::Frontend::ScopeAcquireContext acquire_context{renderer.GetRenderWindow()};

    CommandDataContainer next;
    while (state.is_running) {
--- a/src/video_core/memory_manager.cpp
+++ b/src/video_core/memory_manager.cpp
@@ -9,6 +9,7 @@
 #include "core/hle/kernel/process.h"
 #include "core/hle/kernel/vm_manager.h"
 #include "core/memory.h"
+#include "video_core/gpu.h"
 #include "video_core/memory_manager.h"
 #include "video_core/rasterizer_interface.h"

@@ -84,7 +85,9 @@ GPUVAddr MemoryManager::UnmapBuffer(GPUVAddr gpu_addr, u64 size) {
    const auto cpu_addr = GpuToCpuAddress(gpu_addr);
    ASSERT(cpu_addr);

-    rasterizer.FlushAndInvalidateRegion(cache_addr, aligned_size);
+    // Flush and invalidate through the GPU interface, to be asynchronous if possible.
+    system.GPU().FlushAndInvalidateRegion(cache_addr, aligned_size);
+
    UnmapRange(gpu_addr, aligned_size);
    ASSERT(system.CurrentProcess()
               ->VMManager()
@@ -242,6 +245,8 @@ void MemoryManager::ReadBlock(GPUVAddr src_addr, void* dest_buffer, const std::s
        switch (page_table.attributes[page_index]) {
        case Common::PageType::Memory: {
            const u8* src_ptr{page_table.pointers[page_index] + page_offset};
+            // Flush must happen on the rasterizer interface, such that memory is always synchronous
+            // when it is read (even when in asynchronous GPU mode). Fixes Dead Cells title menu.
            rasterizer.FlushRegion(ToCacheAddr(src_ptr), copy_amount);
            std::memcpy(dest_buffer, src_ptr, copy_amount);
            break;
@@ -292,6 +297,8 @@ void MemoryManager::WriteBlock(GPUVAddr dest_addr, const void* src_buffer, const
        switch (page_table.attributes[page_index]) {
        case Common::PageType::Memory: {
            u8* dest_ptr{page_table.pointers[page_index] + page_offset};
+            // Invalidate must happen on the rasterizer interface, such that memory is always
+            // synchronous when it is written (even when in asynchronous GPU mode).
            rasterizer.InvalidateRegion(ToCacheAddr(dest_ptr), copy_amount);
            std::memcpy(dest_ptr, src_buffer, copy_amount);
            break;
@@ -339,6 +346,8 @@ void MemoryManager::CopyBlock(GPUVAddr dest_addr, GPUVAddr src_addr, const std::

        switch (page_table.attributes[page_index]) {
        case Common::PageType::Memory: {
+            // Flush must happen on the rasterizer interface, such that memory is always synchronous
+            // when it is copied (even when in asynchronous GPU mode).
            const u8* src_ptr{page_table.pointers[page_index] + page_offset};
            rasterizer.FlushRegion(ToCacheAddr(src_ptr), copy_amount);
            WriteBlock(dest_addr, src_ptr, copy_amount);
--- a/src/video_core/morton.cpp
+++ b/src/video_core/morton.cpp
@@ -85,6 +85,7 @@ static constexpr ConversionArray morton_to_linear_fns = {
    MortonCopy<true, PixelFormat::RG32UI>,
    MortonCopy<true, PixelFormat::RGBX16F>,
    MortonCopy<true, PixelFormat::R32UI>,
+    MortonCopy<true, PixelFormat::R32I>,
    MortonCopy<true, PixelFormat::ASTC_2D_8X8>,
    MortonCopy<true, PixelFormat::ASTC_2D_8X5>,
    MortonCopy<true, PixelFormat::ASTC_2D_5X4>,
@@ -166,6 +167,7 @@ static constexpr ConversionArray linear_to_morton_fns = {
    MortonCopy<false, PixelFormat::RG32UI>,
    MortonCopy<false, PixelFormat::RGBX16F>,
    MortonCopy<false, PixelFormat::R32UI>,
+    MortonCopy<false, PixelFormat::R32I>,
    nullptr,
    nullptr,
    nullptr,
--- a/src/video_core/rasterizer_interface.h
+++ b/src/video_core/rasterizer_interface.h
@@ -35,11 +35,8 @@ class RasterizerInterface {
 public:
    virtual ~RasterizerInterface() {}

-    /// Draw the current batch of vertex arrays
-    virtual bool DrawBatch(bool is_indexed) = 0;
-
-    /// Draw the current batch of multiple instances of vertex arrays
-    virtual bool DrawMultiBatch(bool is_indexed) = 0;
+    /// Dispatches a draw invocation
+    virtual void Draw(bool is_indexed, bool is_instanced) = 0;

    /// Clear the current framebuffer
    virtual void Clear() = 0;
--- a/src/video_core/renderer_base.h
+++ b/src/video_core/renderer_base.h
@@ -35,15 +35,19 @@ public:
    explicit RendererBase(Core::Frontend::EmuWindow& window);
    virtual ~RendererBase();

-    /// Swap buffers (render frame)
-    virtual void SwapBuffers(const Tegra::FramebufferConfig* framebuffer) = 0;
-
    /// Initialize the renderer
    virtual bool Init() = 0;

    /// Shutdown the renderer
    virtual void ShutDown() = 0;

+    /// Finalize rendering the guest frame and draw into the presentation texture
+    virtual void SwapBuffers(const Tegra::FramebufferConfig* framebuffer) = 0;
+
+    /// Draws the latest frame to the window waiting timeout_ms for a frame to arrive (Renderer
+    /// specific implementation)
+    virtual void TryPresent(int timeout_ms) = 0;
+
    // Getter/setter functions:
    // ------------------------

--- a/src/video_core/renderer_opengl/gl_rasterizer.cpp
+++ b/src/video_core/renderer_opengl/gl_rasterizer.cpp
@@ -36,6 +36,7 @@ namespace OpenGL {

 using Maxwell = Tegra::Engines::Maxwell3D::Regs;

+using Tegra::Engines::ShaderType;
 using VideoCore::Surface::PixelFormat;
 using VideoCore::Surface::SurfaceTarget;
 using VideoCore::Surface::SurfaceType;
@@ -56,8 +57,7 @@ namespace {

 template <typename Engine, typename Entry>
 Tegra::Texture::FullTextureInfo GetTextureInfo(const Engine& engine, const Entry& entry,
-                                               Tegra::Engines::ShaderType shader_type,
-                                               std::size_t index = 0) {
+                                               ShaderType shader_type, std::size_t index = 0) {
    if (entry.IsBindless()) {
        const Tegra::Texture::TextureHandle tex_handle =
            engine.AccessConstBuffer32(shader_type, entry.GetBuffer(), entry.GetOffset());
@@ -617,7 +617,7 @@ void RasterizerOpenGL::Draw(bool is_indexed, bool is_instanced) {

    // Setup shaders and their used resources.
    texture_cache.GuardSamplers(true);
-    const auto primitive_mode = MaxwellToGL::PrimitiveTopology(gpu.regs.draw.topology);
+    const GLenum primitive_mode = MaxwellToGL::PrimitiveTopology(gpu.regs.draw.topology);
    SetupShaders(primitive_mode);
    texture_cache.GuardSamplers(false);

@@ -650,31 +650,41 @@ void RasterizerOpenGL::Draw(bool is_indexed, bool is_instanced) {
    const GLsizei num_instances =
        static_cast<GLsizei>(is_instanced ? gpu.mme_draw.instance_count : 1);
    if (is_indexed) {
-        const GLenum index_format = MaxwellToGL::IndexFormat(gpu.regs.index_array.format);
        const GLint base_vertex = static_cast<GLint>(gpu.regs.vb_element_base);
        const GLsizei num_vertices = static_cast<GLsizei>(gpu.regs.index_array.count);
-        glDrawElementsInstancedBaseVertexBaseInstance(
-            primitive_mode, num_vertices, index_format,
-            reinterpret_cast<const void*>(index_buffer_offset), num_instances, base_vertex,
-            base_instance);
+        const GLvoid* offset = reinterpret_cast<const GLvoid*>(index_buffer_offset);
+        const GLenum format = MaxwellToGL::IndexFormat(gpu.regs.index_array.format);
+        if (num_instances == 1 && base_instance == 0 && base_vertex == 0) {
+            glDrawElements(primitive_mode, num_vertices, format, offset);
+        } else if (num_instances == 1 && base_instance == 0) {
+            glDrawElementsBaseVertex(primitive_mode, num_vertices, format, offset, base_vertex);
+        } else if (base_vertex == 0 && base_instance == 0) {
+            glDrawElementsInstanced(primitive_mode, num_vertices, format, offset, num_instances);
+        } else if (base_vertex == 0) {
+            glDrawElementsInstancedBaseInstance(primitive_mode, num_vertices, format, offset,
+                                                num_instances, base_instance);
+        } else if (base_instance == 0) {
+            glDrawElementsInstancedBaseVertex(primitive_mode, num_vertices, format, offset,
+                                              num_instances, base_vertex);
+        } else {
+            glDrawElementsInstancedBaseVertexBaseInstance(primitive_mode, num_vertices, format,
+                                                          offset, num_instances, base_vertex,
+                                                          base_instance);
+        }
    } else {
        const GLint base_vertex = static_cast<GLint>(gpu.regs.vertex_buffer.first);
        const GLsizei num_vertices = static_cast<GLsizei>(gpu.regs.vertex_buffer.count);
-        glDrawArraysInstancedBaseInstance(primitive_mode, base_vertex, num_vertices, num_instances,
-                                          base_instance);
+        if (num_instances == 1 && base_instance == 0) {
+            glDrawArrays(primitive_mode, base_vertex, num_vertices);
+        } else if (base_instance == 0) {
+            glDrawArraysInstanced(primitive_mode, base_vertex, num_vertices, num_instances);
+        } else {
+            glDrawArraysInstancedBaseInstance(primitive_mode, base_vertex, num_vertices,
+                                              num_instances, base_instance);
+        }
    }
 }

-bool RasterizerOpenGL::DrawBatch(bool is_indexed) {
-    Draw(is_indexed, false);
-    return true;
-}
-
-bool RasterizerOpenGL::DrawMultiBatch(bool is_indexed) {
-    Draw(is_indexed, true);
-    return true;
-}
-
 void RasterizerOpenGL::DispatchCompute(GPUVAddr code_addr) {
    if (device.HasBrokenCompute()) {
        return;
@@ -900,15 +910,10 @@ void RasterizerOpenGL::SetupDrawTextures(std::size_t stage_index, const Shader&
    const auto& maxwell3d = system.GPU().Maxwell3D();
    u32 binding = device.GetBaseBindings(stage_index).sampler;
    for (const auto& entry : shader->GetShaderEntries().samplers) {
-        const auto shader_type = static_cast<Tegra::Engines::ShaderType>(stage_index);
-        if (!entry.IsIndexed()) {
-            const auto texture = GetTextureInfo(maxwell3d, entry, shader_type);
+        const auto shader_type = static_cast<ShaderType>(stage_index);
+        for (std::size_t i = 0; i < entry.Size(); ++i) {
+            const auto texture = GetTextureInfo(maxwell3d, entry, shader_type, i);
            SetupTexture(binding++, texture, entry);
-        } else {
-            for (std::size_t i = 0; i < entry.Size(); ++i) {
-                const auto texture = GetTextureInfo(maxwell3d, entry, shader_type, i);
-                SetupTexture(binding++, texture, entry);
-            }
        }
    }
 }
@@ -918,16 +923,9 @@ void RasterizerOpenGL::SetupComputeTextures(const Shader& kernel) {
    const auto& compute = system.GPU().KeplerCompute();
    u32 binding = 0;
    for (const auto& entry : kernel->GetShaderEntries().samplers) {
-        if (!entry.IsIndexed()) {
-            const auto texture =
-                GetTextureInfo(compute, entry, Tegra::Engines::ShaderType::Compute);
+        for (std::size_t i = 0; i < entry.Size(); ++i) {
+            const auto texture = GetTextureInfo(compute, entry, ShaderType::Compute, i);
            SetupTexture(binding++, texture, entry);
-        } else {
-            for (std::size_t i = 0; i < entry.Size(); ++i) {
-                const auto texture =
-                    GetTextureInfo(compute, entry, Tegra::Engines::ShaderType::Compute, i);
-                SetupTexture(binding++, texture, entry);
-            }
        }
    }
 }
--- a/src/video_core/renderer_opengl/gl_rasterizer.h
+++ b/src/video_core/renderer_opengl/gl_rasterizer.h
@@ -58,8 +58,7 @@ public:
                              ScreenInfo& info);
    ~RasterizerOpenGL() override;

-    bool DrawBatch(bool is_indexed) override;
-    bool DrawMultiBatch(bool is_indexed) override;
+    void Draw(bool is_indexed, bool is_instanced) override;
    void Clear() override;
    void DispatchCompute(GPUVAddr code_addr) override;
    void ResetCounter(VideoCore::QueryType type) override;
@@ -110,9 +109,6 @@ private:
    void SetupGlobalMemory(u32 binding, const GLShader::GlobalMemoryEntry& entry, GPUVAddr gpu_addr,
                           std::size_t size);

-    /// Syncs all the state, shaders, render targets and textures setting before a draw call.
-    void Draw(bool is_indexed, bool is_instanced);
-
    /// Configures the current textures to use for the draw command.
    void SetupDrawTextures(std::size_t stage_index, const Shader& shader);

--- a/src/video_core/renderer_opengl/gl_resource_manager.cpp
+++ b/src/video_core/renderer_opengl/gl_resource_manager.cpp
@@ -15,6 +15,24 @@ MICROPROFILE_DEFINE(OpenGL_ResourceDeletion, "OpenGL", "Resource Deletion", MP_R

 namespace OpenGL {

+void OGLRenderbuffer::Create() {
+    if (handle != 0)
+        return;
+
+    MICROPROFILE_SCOPE(OpenGL_ResourceCreation);
+    glGenRenderbuffers(1, &handle);
+}
+
+void OGLRenderbuffer::Release() {
+    if (handle == 0)
+        return;
+
+    MICROPROFILE_SCOPE(OpenGL_ResourceDeletion);
+    glDeleteRenderbuffers(1, &handle);
+    OpenGLState::GetCurState().ResetRenderbuffer(handle).Apply();
+    handle = 0;
+}
+
 void OGLTexture::Create(GLenum target) {
    if (handle != 0)
        return;
--- a/src/video_core/renderer_opengl/gl_resource_manager.h
+++ b/src/video_core/renderer_opengl/gl_resource_manager.h
@@ -11,6 +11,31 @@

 namespace OpenGL {

+class OGLRenderbuffer : private NonCopyable {
+public:
+    OGLRenderbuffer() = default;
+
+    OGLRenderbuffer(OGLRenderbuffer&& o) noexcept : handle(std::exchange(o.handle, 0)) {}
+
+    ~OGLRenderbuffer() {
+        Release();
+    }
+
+    OGLRenderbuffer& operator=(OGLRenderbuffer&& o) noexcept {
+        Release();
+        handle = std::exchange(o.handle, 0);
+        return *this;
+    }
+
+    /// Creates a new internal OpenGL resource and stores the handle
+    void Create();
+
+    /// Deletes the internal OpenGL resource
+    void Release();
+
+    GLuint handle = 0;
+};
+
 class OGLTexture : private NonCopyable {
 public:
    OGLTexture() = default;
--- a/src/video_core/renderer_opengl/gl_sampler_cache.cpp
+++ b/src/video_core/renderer_opengl/gl_sampler_cache.cpp
@@ -38,7 +38,7 @@ OGLSampler SamplerCacheOpenGL::CreateSampler(const Tegra::Texture::TSCEntry& tsc
        glSamplerParameterf(sampler_id, GL_TEXTURE_MAX_ANISOTROPY, tsc.GetMaxAnisotropy());
    } else if (GLAD_GL_EXT_texture_filter_anisotropic) {
        glSamplerParameterf(sampler_id, GL_TEXTURE_MAX_ANISOTROPY_EXT, tsc.GetMaxAnisotropy());
-    } else if (tsc.GetMaxAnisotropy() != 1) {
+    } else {
        LOG_WARNING(Render_OpenGL, "Anisotropy not supported by host GPU driver");
    }

--- a/src/video_core/renderer_opengl/gl_state.cpp
+++ b/src/video_core/renderer_opengl/gl_state.cpp
@@ -423,6 +423,13 @@ void OpenGLState::ApplyClipControl() {
    }
 }

+void OpenGLState::ApplyRenderBuffer() {
+    if (cur_state.renderbuffer != renderbuffer) {
+        cur_state.renderbuffer = renderbuffer;
+        glBindRenderbuffer(GL_RENDERBUFFER, renderbuffer);
+    }
+}
+
 void OpenGLState::ApplyTextures() {
    const std::size_t size = std::size(textures);
    for (std::size_t i = 0; i < size; ++i) {
@@ -478,6 +485,7 @@ void OpenGLState::Apply() {
    ApplyPolygonOffset();
    ApplyAlphaTest();
    ApplyClipControl();
+    ApplyRenderBuffer();
 }

 void OpenGLState::EmulateViewportWithScissor() {
@@ -551,4 +559,11 @@ OpenGLState& OpenGLState::ResetFramebuffer(GLuint handle) {
    return *this;
 }

+OpenGLState& OpenGLState::ResetRenderbuffer(GLuint handle) {
+    if (renderbuffer == handle) {
+        renderbuffer = 0;
+    }
+    return *this;
+}
+
 } // namespace OpenGL
--- a/src/video_core/renderer_opengl/gl_state.h
+++ b/src/video_core/renderer_opengl/gl_state.h
@@ -158,6 +158,8 @@ public:
        GLenum depth_mode = GL_NEGATIVE_ONE_TO_ONE;
    } clip_control;

+    GLuint renderbuffer{}; // GL_RENDERBUFFER_BINDING
+
    OpenGLState();

    /// Get the currently active OpenGL state
@@ -196,6 +198,7 @@ public:
    void ApplyPolygonOffset();
    void ApplyAlphaTest();
    void ApplyClipControl();
+    void ApplyRenderBuffer();

    /// Resets any references to the given resource
    OpenGLState& UnbindTexture(GLuint handle);
@@ -204,6 +207,7 @@ public:
    OpenGLState& ResetPipeline(GLuint handle);
    OpenGLState& ResetVertexArray(GLuint handle);
    OpenGLState& ResetFramebuffer(GLuint handle);
+    OpenGLState& ResetRenderbuffer(GLuint handle);

    /// Viewport does not affects glClearBuffer so emulate viewport using scissor test
    void EmulateViewportWithScissor();
--- a/src/video_core/renderer_opengl/gl_texture_cache.cpp
+++ b/src/video_core/renderer_opengl/gl_texture_cache.cpp
@@ -87,6 +87,7 @@ constexpr std::array<FormatTuple, VideoCore::Surface::MaxPixelFormat> tex_format
    {GL_RG32UI, GL_RG_INTEGER, GL_UNSIGNED_INT, false},                             // RG32UI
    {GL_RGB16F, GL_RGBA, GL_HALF_FLOAT, false},                                     // RGBX16F
    {GL_R32UI, GL_RED_INTEGER, GL_UNSIGNED_INT, false},                             // R32UI
+    {GL_R32I, GL_RED_INTEGER, GL_INT, false},                                       // R32I
    {GL_RGBA8, GL_RGBA, GL_UNSIGNED_BYTE, false},                                   // ASTC_2D_8X8
    {GL_RGBA8, GL_RGBA, GL_UNSIGNED_BYTE, false},                                   // ASTC_2D_8X5
    {GL_RGBA8, GL_RGBA, GL_UNSIGNED_BYTE, false},                                   // ASTC_2D_5X4
@@ -260,6 +261,13 @@ CachedSurface::~CachedSurface() = default;
 void CachedSurface::DownloadTexture(std::vector<u8>& staging_buffer) {
    MICROPROFILE_SCOPE(OpenGL_Texture_Download);

+    if (params.IsBuffer()) {
+        glGetNamedBufferSubData(texture_buffer.handle, 0,
+                                static_cast<GLsizeiptr>(params.GetHostSizeInBytes()),
+                                staging_buffer.data());
+        return;
+    }
+
    SCOPE_EXIT({ glPixelStorei(GL_PACK_ROW_LENGTH, 0); });

    for (u32 level = 0; level < params.emulated_levels; ++level) {
@@ -398,24 +406,36 @@ CachedSurfaceView::CachedSurfaceView(CachedSurface& surface, const ViewParams& p
 CachedSurfaceView::~CachedSurfaceView() = default;

 void CachedSurfaceView::Attach(GLenum attachment, GLenum target) const {
-    ASSERT(params.num_layers == 1 && params.num_levels == 1);
+    ASSERT(params.num_levels == 1);

-    const auto& owner_params = surface.GetSurfaceParams();
+    const GLuint texture = surface.GetTexture();
+    if (params.num_layers > 1) {
+        // Layered framebuffer attachments
+        UNIMPLEMENTED_IF(params.base_layer != 0);

-    switch (owner_params.target) {
+        switch (params.target) {
+        case SurfaceTarget::Texture2DArray:
+            glFramebufferTexture(target, attachment, texture, params.base_level);
+            break;
+        default:
+            UNIMPLEMENTED();
+        }
+        return;
+    }
+
+    const GLenum view_target = surface.GetTarget();
+    switch (surface.GetSurfaceParams().target) {
    case SurfaceTarget::Texture1D:
-        glFramebufferTexture1D(target, attachment, surface.GetTarget(), surface.GetTexture(),
-                               params.base_level);
+        glFramebufferTexture1D(target, attachment, view_target, texture, params.base_level);
        break;
    case SurfaceTarget::Texture2D:
-        glFramebufferTexture2D(target, attachment, surface.GetTarget(), surface.GetTexture(),
-                               params.base_level);
+        glFramebufferTexture2D(target, attachment, view_target, texture, params.base_level);
        break;
    case SurfaceTarget::Texture1DArray:
    case SurfaceTarget::Texture2DArray:
    case SurfaceTarget::TextureCubemap:
    case SurfaceTarget::TextureCubeArray:
-        glFramebufferTextureLayer(target, attachment, surface.GetTexture(), params.base_level,
+        glFramebufferTextureLayer(target, attachment, texture, params.base_level,
                                  params.base_layer);
        break;
    default:
--- a/src/video_core/renderer_opengl/maxwell_to_gl.h
+++ b/src/video_core/renderer_opengl/maxwell_to_gl.h
@@ -92,8 +92,32 @@ inline GLenum VertexType(Maxwell::VertexAttribute attrib) {
        }
    case Maxwell::VertexAttribute::Type::UnsignedScaled:
        switch (attrib.size) {
+        case Maxwell::VertexAttribute::Size::Size_8:
        case Maxwell::VertexAttribute::Size::Size_8_8:
+        case Maxwell::VertexAttribute::Size::Size_8_8_8:
+        case Maxwell::VertexAttribute::Size::Size_8_8_8_8:
            return GL_UNSIGNED_BYTE;
+        case Maxwell::VertexAttribute::Size::Size_16:
+        case Maxwell::VertexAttribute::Size::Size_16_16:
+        case Maxwell::VertexAttribute::Size::Size_16_16_16:
+        case Maxwell::VertexAttribute::Size::Size_16_16_16_16:
+            return GL_UNSIGNED_SHORT;
+        default:
+            LOG_ERROR(Render_OpenGL, "Unimplemented vertex size={}", attrib.SizeString());
+            return {};
+        }
+    case Maxwell::VertexAttribute::Type::SignedScaled:
+        switch (attrib.size) {
+        case Maxwell::VertexAttribute::Size::Size_8:
+        case Maxwell::VertexAttribute::Size::Size_8_8:
+        case Maxwell::VertexAttribute::Size::Size_8_8_8:
+        case Maxwell::VertexAttribute::Size::Size_8_8_8_8:
+            return GL_BYTE;
+        case Maxwell::VertexAttribute::Size::Size_16:
+        case Maxwell::VertexAttribute::Size::Size_16_16:
+        case Maxwell::VertexAttribute::Size::Size_16_16_16:
+        case Maxwell::VertexAttribute::Size::Size_16_16_16_16:
+            return GL_SHORT;
        default:
            LOG_ERROR(Render_OpenGL, "Unimplemented vertex size={}", attrib.SizeString());
            return {};
--- a/src/video_core/renderer_opengl/renderer_opengl.cpp
+++ b/src/video_core/renderer_opengl/renderer_opengl.cpp
@@ -9,11 +9,11 @@
 #include <glad/glad.h>
 #include "common/assert.h"
 #include "common/logging/log.h"
+#include "common/microprofile.h"
 #include "common/telemetry.h"
 #include "core/core.h"
 #include "core/core_timing.h"
 #include "core/frontend/emu_window.h"
-#include "core/frontend/scope_acquire_window_context.h"
 #include "core/memory.h"
 #include "core/perf_stats.h"
 #include "core/settings.h"
@@ -24,6 +24,144 @@

 namespace OpenGL {

+// If the size of this is too small, it ends up creating a soft cap on FPS as the renderer will have
+// to wait on available presentation frames.
+constexpr std::size_t SWAP_CHAIN_SIZE = 3;
+
+struct Frame {
+    u32 width{};                      /// Width of the frame (to detect resize)
+    u32 height{};                     /// Height of the frame
+    bool color_reloaded{};            /// Texture attachment was recreated (ie: resized)
+    OpenGL::OGLRenderbuffer color{};  /// Buffer shared between the render/present FBO
+    OpenGL::OGLFramebuffer render{};  /// FBO created on the render thread
+    OpenGL::OGLFramebuffer present{}; /// FBO created on the present thread
+    GLsync render_fence{};            /// Fence created on the render thread
+    GLsync present_fence{};           /// Fence created on the presentation thread
+    bool is_srgb{};                   /// Framebuffer is sRGB or RGB
+};
+
+/**
+ * For smooth Vsync rendering, we want to always present the latest frame that the core generates,
+ * but also make sure that rendering happens at the pace that the frontend dictates. This is a
+ * helper class that the renderer uses to sync frames between the render thread and the presentation
+ * thread
+ */
+class FrameMailbox {
+public:
+    std::mutex swap_chain_lock;
+    std::condition_variable present_cv;
+    std::array<Frame, SWAP_CHAIN_SIZE> swap_chain{};
+    std::queue<Frame*> free_queue;
+    std::deque<Frame*> present_queue;
+    Frame* previous_frame{};
+
+    FrameMailbox() {
+        for (auto& frame : swap_chain) {
+            free_queue.push(&frame);
+        }
+    }
+
+    ~FrameMailbox() {
+        // lock the mutex and clear out the present and free_queues and notify any people who are
+        // blocked to prevent deadlock on shutdown
+        std::scoped_lock lock{swap_chain_lock};
+        std::queue<Frame*>().swap(free_queue);
+        present_queue.clear();
+        present_cv.notify_all();
+    }
+
+    void ReloadPresentFrame(Frame* frame, u32 height, u32 width) {
+        frame->present.Release();
+        frame->present.Create();
+        GLint previous_draw_fbo{};
+        glGetIntegerv(GL_DRAW_FRAMEBUFFER_BINDING, &previous_draw_fbo);
+        glBindFramebuffer(GL_FRAMEBUFFER, frame->present.handle);
+        glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER,
+                                  frame->color.handle);
+        if (glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE) {
+            LOG_CRITICAL(Render_OpenGL, "Failed to recreate present FBO!");
+        }
+        glBindFramebuffer(GL_DRAW_FRAMEBUFFER, previous_draw_fbo);
+        frame->color_reloaded = false;
+    }
+
+    void ReloadRenderFrame(Frame* frame, u32 width, u32 height) {
+        OpenGLState prev_state = OpenGLState::GetCurState();
+        OpenGLState state = OpenGLState::GetCurState();
+
+        // Recreate the color texture attachment
+        frame->color.Release();
+        frame->color.Create();
+        state.renderbuffer = frame->color.handle;
+        state.Apply();
+        glRenderbufferStorage(GL_RENDERBUFFER, frame->is_srgb ? GL_SRGB8 : GL_RGB8, width, height);
+
+        // Recreate the FBO for the render target
+        frame->render.Release();
+        frame->render.Create();
+        state.draw.read_framebuffer = frame->render.handle;
+        state.draw.draw_framebuffer = frame->render.handle;
+        state.Apply();
+        glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER,
+                                  frame->color.handle);
+        if (glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE) {
+            LOG_CRITICAL(Render_OpenGL, "Failed to recreate render FBO!");
+        }
+        prev_state.Apply();
+        frame->width = width;
+        frame->height = height;
+        frame->color_reloaded = true;
+    }
+
+    Frame* GetRenderFrame() {
+        std::unique_lock lock{swap_chain_lock};
+
+        // If theres no free frames, we will reuse the oldest render frame
+        if (free_queue.empty()) {
+            auto frame = present_queue.back();
+            present_queue.pop_back();
+            return frame;
+        }
+
+        Frame* frame = free_queue.front();
+        free_queue.pop();
+        return frame;
+    }
+
+    void ReleaseRenderFrame(Frame* frame) {
+        std::unique_lock lock{swap_chain_lock};
+        present_queue.push_front(frame);
+        present_cv.notify_one();
+    }
+
+    Frame* TryGetPresentFrame(int timeout_ms) {
+        std::unique_lock lock{swap_chain_lock};
+        // wait for new entries in the present_queue
+        present_cv.wait_for(lock, std::chrono::milliseconds(timeout_ms),
+                            [&] { return !present_queue.empty(); });
+        if (present_queue.empty()) {
+            // timed out waiting for a frame to draw so return the previous frame
+            return previous_frame;
+        }
+
+        // free the previous frame and add it back to the free queue
+        if (previous_frame) {
+            free_queue.push(previous_frame);
+        }
+
+        // the newest entries are pushed to the front of the queue
+        Frame* frame = present_queue.front();
+        present_queue.pop_front();
+        // remove all old entries from the present queue and move them back to the free_queue
+        for (auto f : present_queue) {
+            free_queue.push(f);
+        }
+        present_queue.clear();
+        previous_frame = frame;
+        return frame;
+    }
+};
+
 namespace {

 constexpr char vertex_shader[] = R"(
@@ -158,21 +296,91 @@ void APIENTRY DebugHandler(GLenum source, GLenum type, GLuint id, GLenum severit
 } // Anonymous namespace

 RendererOpenGL::RendererOpenGL(Core::Frontend::EmuWindow& emu_window, Core::System& system)
-    : VideoCore::RendererBase{emu_window}, emu_window{emu_window}, system{system} {}
+    : VideoCore::RendererBase{emu_window}, emu_window{emu_window}, system{system},
+      frame_mailbox{std::make_unique<FrameMailbox>()} {}

 RendererOpenGL::~RendererOpenGL() = default;

+MICROPROFILE_DEFINE(OpenGL_RenderFrame, "OpenGL", "Render Frame", MP_RGB(128, 128, 64));
+MICROPROFILE_DEFINE(OpenGL_WaitPresent, "OpenGL", "Wait For Present", MP_RGB(128, 128, 128));
+
 void RendererOpenGL::SwapBuffers(const Tegra::FramebufferConfig* framebuffer) {
+    render_window.PollEvents();
+
+    if (!framebuffer) {
+        return;
+    }
+
    // Maintain the rasterizer's state as a priority
    OpenGLState prev_state = OpenGLState::GetCurState();
    state.AllDirty();
    state.Apply();

+    PrepareRendertarget(framebuffer);
+    RenderScreenshot();
+
+    Frame* frame;
+    {
+        MICROPROFILE_SCOPE(OpenGL_WaitPresent);
+
+        frame = frame_mailbox->GetRenderFrame();
+
+        // Clean up sync objects before drawing
+
+        // INTEL driver workaround. We can't delete the previous render sync object until we are
+        // sure that the presentation is done
+        if (frame->present_fence) {
+            glClientWaitSync(frame->present_fence, 0, GL_TIMEOUT_IGNORED);
+        }
+
+        // delete the draw fence if the frame wasn't presented
+        if (frame->render_fence) {
+            glDeleteSync(frame->render_fence);
+            frame->render_fence = 0;
+        }
+
+        // wait for the presentation to be done
+        if (frame->present_fence) {
+            glWaitSync(frame->present_fence, 0, GL_TIMEOUT_IGNORED);
+            glDeleteSync(frame->present_fence);
+            frame->present_fence = 0;
+        }
+    }
+
+    {
+        MICROPROFILE_SCOPE(OpenGL_RenderFrame);
+        const auto& layout = render_window.GetFramebufferLayout();
+
+        // Recreate the frame if the size of the window has changed
+        if (layout.width != frame->width || layout.height != frame->height ||
+            screen_info.display_srgb != frame->is_srgb) {
+            LOG_DEBUG(Render_OpenGL, "Reloading render frame");
+            frame->is_srgb = screen_info.display_srgb;
+            frame_mailbox->ReloadRenderFrame(frame, layout.width, layout.height);
+        }
+        state.draw.draw_framebuffer = frame->render.handle;
+        state.Apply();
+        DrawScreen(layout);
+        // Create a fence for the frontend to wait on and swap this frame to OffTex
+        frame->render_fence = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
+        glFlush();
+        frame_mailbox->ReleaseRenderFrame(frame);
+        m_current_frame++;
+        rasterizer->TickFrame();
+    }
+
+    // Restore the rasterizer state
+    prev_state.AllDirty();
+    prev_state.Apply();
+}
+
+void RendererOpenGL::PrepareRendertarget(const Tegra::FramebufferConfig* framebuffer) {
    if (framebuffer) {
        // If framebuffer is provided, reload it from memory to a texture
        if (screen_info.texture.width != static_cast<GLsizei>(framebuffer->width) ||
            screen_info.texture.height != static_cast<GLsizei>(framebuffer->height) ||
-            screen_info.texture.pixel_format != framebuffer->pixel_format) {
+            screen_info.texture.pixel_format != framebuffer->pixel_format ||
+            gl_framebuffer_data.empty()) {
            // Reallocate texture if the framebuffer size has changed.
            // This is expected to not happen very often and hence should not be a
            // performance problem.
@@ -181,22 +389,7 @@ void RendererOpenGL::SwapBuffers(const Tegra::FramebufferConfig* framebuffer) {

        // Load the framebuffer from memory, draw it to the screen, and swap buffers
        LoadFBToScreenInfo(*framebuffer);
-
-        if (renderer_settings.screenshot_requested)
-            CaptureScreenshot();
-
-        DrawScreen(render_window.GetFramebufferLayout());
-
-        rasterizer->TickFrame();
-
-        render_window.SwapBuffers();
    }
-
-    render_window.PollEvents();
-
-    // Restore the rasterizer state
-    prev_state.AllDirty();
-    prev_state.Apply();
 }

 void RendererOpenGL::LoadFBToScreenInfo(const Tegra::FramebufferConfig& framebuffer) {
@@ -418,13 +611,48 @@ void RendererOpenGL::DrawScreen(const Layout::FramebufferLayout& layout) {
    DrawScreenTriangles(screen_info, static_cast<float>(screen.left),
                        static_cast<float>(screen.top), static_cast<float>(screen.GetWidth()),
                        static_cast<float>(screen.GetHeight()));
-
-    m_current_frame++;
 }

-void RendererOpenGL::UpdateFramerate() {}
+void RendererOpenGL::TryPresent(int timeout_ms) {
+    const auto& layout = render_window.GetFramebufferLayout();
+    auto frame = frame_mailbox->TryGetPresentFrame(timeout_ms);
+    if (!frame) {
+        LOG_DEBUG(Render_OpenGL, "TryGetPresentFrame returned no frame to present");
+        return;
+    }
+
+    // Clearing before a full overwrite of a fbo can signal to drivers that they can avoid a
+    // readback since we won't be doing any blending
+    glClear(GL_COLOR_BUFFER_BIT);
+
+    // Recreate the presentation FBO if the color attachment was changed
+    if (frame->color_reloaded) {
+        LOG_DEBUG(Render_OpenGL, "Reloading present frame");
+        frame_mailbox->ReloadPresentFrame(frame, layout.width, layout.height);
+    }
+    glWaitSync(frame->render_fence, 0, GL_TIMEOUT_IGNORED);
+    // INTEL workaround.
+    // Normally we could just delete the draw fence here, but due to driver bugs, we can just delete
+    // it on the emulation thread without too much penalty
+    // glDeleteSync(frame.render_sync);
+    // frame.render_sync = 0;
+
+    glBindFramebuffer(GL_READ_FRAMEBUFFER, frame->present.handle);
+    glBlitFramebuffer(0, 0, frame->width, frame->height, 0, 0, layout.width, layout.height,
+                      GL_COLOR_BUFFER_BIT, GL_LINEAR);
+
+    // Insert fence for the main thread to block on
+    frame->present_fence = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
+    glFlush();
+
+    glBindFramebuffer(GL_READ_FRAMEBUFFER, 0);
+}
+
+void RendererOpenGL::RenderScreenshot() {
+    if (!renderer_settings.screenshot_requested) {
+        return;
+    }

-void RendererOpenGL::CaptureScreenshot() {
    // Draw the current frame to the screenshot framebuffer
    screenshot_framebuffer.Create();
    GLuint old_read_fb = state.draw.read_framebuffer;
@@ -459,8 +687,6 @@ void RendererOpenGL::CaptureScreenshot() {
 }

 bool RendererOpenGL::Init() {
-    Core::Frontend::ScopeAcquireWindowContext acquire_context{render_window};
-
    if (GLAD_GL_KHR_debug) {
        glEnable(GL_DEBUG_OUTPUT);
        glDebugMessageCallback(DebugHandler, nullptr);
--- a/src/video_core/renderer_opengl/renderer_opengl.h
+++ b/src/video_core/renderer_opengl/renderer_opengl.h
@@ -44,19 +44,23 @@ struct ScreenInfo {
    TextureInfo texture;
 };

+struct PresentationTexture {
+    u32 width = 0;
+    u32 height = 0;
+    OGLTexture texture;
+};
+
+class FrameMailbox;
+
 class RendererOpenGL final : public VideoCore::RendererBase {
 public:
    explicit RendererOpenGL(Core::Frontend::EmuWindow& emu_window, Core::System& system);
    ~RendererOpenGL() override;

-    /// Swap buffers (render frame)
-    void SwapBuffers(const Tegra::FramebufferConfig* framebuffer) override;
-
-    /// Initialize the renderer
    bool Init() override;
-
-    /// Shutdown the renderer
    void ShutDown() override;
+    void SwapBuffers(const Tegra::FramebufferConfig* framebuffer) override;
+    void TryPresent(int timeout_ms) override;

 private:
    /// Initializes the OpenGL state and creates persistent objects.
@@ -74,10 +78,7 @@ private:

    void DrawScreenTriangles(const ScreenInfo& screen_info, float x, float y, float w, float h);

-    /// Updates the framerate.
-    void UpdateFramerate();
-
-    void CaptureScreenshot();
+    void RenderScreenshot();

    /// Loads framebuffer from emulated memory into the active OpenGL texture.
    void LoadFBToScreenInfo(const Tegra::FramebufferConfig& framebuffer);
@@ -87,6 +88,8 @@ private:
    void LoadColorToActiveGLTexture(u8 color_r, u8 color_g, u8 color_b, u8 color_a,
                                    const TextureInfo& texture);

+    void PrepareRendertarget(const Tegra::FramebufferConfig* framebuffer);
+
    Core::Frontend::EmuWindow& emu_window;
    Core::System& system;

@@ -107,6 +110,9 @@ private:
    /// Used for transforming the framebuffer orientation
    Tegra::FramebufferConfig::TransformFlags framebuffer_transform_flags;
    Common::Rectangle<int> framebuffer_crop_rect;
+
+    /// Frame presentation mailbox
+    std::unique_ptr<FrameMailbox> frame_mailbox;
 };

 } // namespace OpenGL
--- a/src/video_core/renderer_vulkan/maxwell_to_vk.cpp
+++ b/src/video_core/renderer_vulkan/maxwell_to_vk.cpp
@@ -120,7 +120,7 @@ struct FormatTuple {
    {vk::Format::eA8B8G8R8UintPack32, Attachable | Storage},     // ABGR8UI
    {vk::Format::eB5G6R5UnormPack16, {}},                        // B5G6R5U
    {vk::Format::eA2B10G10R10UnormPack32, Attachable | Storage}, // A2B10G10R10U
-    {vk::Format::eA1R5G5B5UnormPack16, Attachable | Storage},    // A1B5G5R5U (flipped with swizzle)
+    {vk::Format::eA1R5G5B5UnormPack16, Attachable},              // A1B5G5R5U (flipped with swizzle)
    {vk::Format::eR8Unorm, Attachable | Storage},                // R8U
    {vk::Format::eR8Uint, Attachable | Storage},                 // R8UI
    {vk::Format::eR16G16B16A16Sfloat, Attachable | Storage},     // RGBA16F
@@ -159,12 +159,13 @@ struct FormatTuple {
    {vk::Format::eR32G32Uint, Attachable | Storage},             // RG32UI
    {vk::Format::eUndefined, {}},                                // RGBX16F
    {vk::Format::eR32Uint, Attachable | Storage},                // R32UI
+    {vk::Format::eR32Sint, Attachable | Storage},                // R32I
    {vk::Format::eAstc8x8UnormBlock, {}},                        // ASTC_2D_8X8
    {vk::Format::eUndefined, {}},                                // ASTC_2D_8X5
    {vk::Format::eUndefined, {}},                                // ASTC_2D_5X4
    {vk::Format::eUndefined, {}},                                // BGRA8_SRGB
    {vk::Format::eBc1RgbaSrgbBlock, {}},                         // DXT1_SRGB
-    {vk::Format::eUndefined, {}},                                // DXT23_SRGB
+    {vk::Format::eBc2SrgbBlock, {}},                             // DXT23_SRGB
    {vk::Format::eBc3SrgbBlock, {}},                             // DXT45_SRGB
    {vk::Format::eBc7SrgbBlock, {}},                             // BC7U_SRGB
    {vk::Format::eR4G4B4A4UnormPack16, Attachable},              // R4G4B4A4U
@@ -363,13 +364,29 @@ vk::Format VertexFormat(Maxwell::VertexAttribute::Type type, Maxwell::VertexAttr
            return vk::Format::eR8G8B8A8Uint;
        case Maxwell::VertexAttribute::Size::Size_32:
            return vk::Format::eR32Uint;
+        case Maxwell::VertexAttribute::Size::Size_32_32_32_32:
+            return vk::Format::eR32G32B32A32Uint;
        default:
            break;
        }
    case Maxwell::VertexAttribute::Type::UnsignedScaled:
        switch (size) {
+        case Maxwell::VertexAttribute::Size::Size_8:
+            return vk::Format::eR8Uscaled;
        case Maxwell::VertexAttribute::Size::Size_8_8:
            return vk::Format::eR8G8Uscaled;
+        case Maxwell::VertexAttribute::Size::Size_8_8_8:
+            return vk::Format::eR8G8B8Uscaled;
+        case Maxwell::VertexAttribute::Size::Size_8_8_8_8:
+            return vk::Format::eR8G8B8A8Uscaled;
+        case Maxwell::VertexAttribute::Size::Size_16:
+            return vk::Format::eR16Uscaled;
+        case Maxwell::VertexAttribute::Size::Size_16_16:
+            return vk::Format::eR16G16Uscaled;
+        case Maxwell::VertexAttribute::Size::Size_16_16_16:
+            return vk::Format::eR16G16B16Uscaled;
+        case Maxwell::VertexAttribute::Size::Size_16_16_16_16:
+            return vk::Format::eR16G16B16A16Uscaled;
        default:
            break;
        }
--- a/src/video_core/renderer_vulkan/renderer_vulkan.cpp
+++ b/src/video_core/renderer_vulkan/renderer_vulkan.cpp
@@ -106,8 +106,14 @@ RendererVulkan::~RendererVulkan() {
 }

 void RendererVulkan::SwapBuffers(const Tegra::FramebufferConfig* framebuffer) {
+    render_window.PollEvents();
+
+    if (!framebuffer) {
+        return;
+    }
+
    const auto& layout = render_window.GetFramebufferLayout();
-    if (framebuffer && layout.width > 0 && layout.height > 0 && render_window.IsShown()) {
+    if (layout.width > 0 && layout.height > 0 && render_window.IsShown()) {
        const VAddr framebuffer_addr = framebuffer->address + framebuffer->offset;
        const bool use_accelerated =
            rasterizer->AccelerateDisplay(*framebuffer, framebuffer_addr, framebuffer->stride);
@@ -128,13 +134,16 @@ void RendererVulkan::SwapBuffers(const Tegra::FramebufferConfig* framebuffer) {
            blit_screen->Recreate();
        }

-        render_window.SwapBuffers();
        rasterizer->TickFrame();
    }

    render_window.PollEvents();
 }

+void RendererVulkan::TryPresent(int /*timeout_ms*/) {
+    // TODO (bunnei): ImplementMe
+}
+
 bool RendererVulkan::Init() {
    PFN_vkGetInstanceProcAddr vkGetInstanceProcAddr{};
    render_window.RetrieveVulkanHandlers(&vkGetInstanceProcAddr, &instance, &surface);
@@ -262,4 +271,4 @@ void RendererVulkan::Report() const {
    telemetry_session.AddField(field, "GPU_Vulkan_Extensions", extensions);
 }

-} // namespace Vulkan
+} // namespace Vulkan
--- a/src/video_core/renderer_vulkan/renderer_vulkan.h
+++ b/src/video_core/renderer_vulkan/renderer_vulkan.h
@@ -36,14 +36,10 @@ public:
    explicit RendererVulkan(Core::Frontend::EmuWindow& window, Core::System& system);
    ~RendererVulkan() override;

-    /// Swap buffers (render frame)
-    void SwapBuffers(const Tegra::FramebufferConfig* framebuffer) override;
-
-    /// Initialize the renderer
    bool Init() override;
-
-    /// Shutdown the renderer
    void ShutDown() override;
+    void SwapBuffers(const Tegra::FramebufferConfig* framebuffer) override;
+    void TryPresent(int timeout_ms) override;

 private:
    std::optional<vk::DebugUtilsMessengerEXT> CreateDebugCallback(
--- a/src/video_core/renderer_vulkan/vk_compute_pipeline.cpp
+++ b/src/video_core/renderer_vulkan/vk_compute_pipeline.cpp
@@ -73,7 +73,7 @@ UniqueDescriptorUpdateTemplate VKComputePipeline::CreateDescriptorUpdateTemplate
    std::vector<vk::DescriptorUpdateTemplateEntry> template_entries;
    u32 binding = 0;
    u32 offset = 0;
-    FillDescriptorUpdateTemplateEntries(device, entries, binding, offset, template_entries);
+    FillDescriptorUpdateTemplateEntries(entries, binding, offset, template_entries);
    if (template_entries.empty()) {
        // If the shader doesn't use descriptor sets, skip template creation.
        return UniqueDescriptorUpdateTemplate{};
--- a/src/video_core/renderer_vulkan/vk_device.cpp
+++ b/src/video_core/renderer_vulkan/vk_device.cpp
@@ -107,6 +107,8 @@ bool VKDevice::Create(const vk::DispatchLoaderDynamic& dldi, vk::Instance instan
    features.occlusionQueryPrecise = true;
    features.fragmentStoresAndAtomics = true;
    features.shaderImageGatherExtended = true;
+    features.shaderStorageImageReadWithoutFormat =
+        is_shader_storage_img_read_without_format_supported;
    features.shaderStorageImageWriteWithoutFormat = true;
    features.textureCompressionASTC_LDR = is_optimal_astc_supported;

@@ -465,6 +467,8 @@ void VKDevice::SetupFamilies(const vk::DispatchLoaderDynamic& dldi, vk::SurfaceK

 void VKDevice::SetupFeatures(const vk::DispatchLoaderDynamic& dldi) {
    const auto supported_features{physical.getFeatures(dldi)};
+    is_shader_storage_img_read_without_format_supported =
+        supported_features.shaderStorageImageReadWithoutFormat;
    is_optimal_astc_supported = IsOptimalAstcSupported(supported_features, dldi);
 }

@@ -519,6 +523,7 @@ std::unordered_map<vk::Format, vk::FormatProperties> VKDevice::GetFormatProperti
                                        vk::Format::eB10G11R11UfloatPack32,
                                        vk::Format::eR32Sfloat,
                                        vk::Format::eR32Uint,
+                                        vk::Format::eR32Sint,
                                        vk::Format::eR16Sfloat,
                                        vk::Format::eR16G16B16A16Sfloat,
                                        vk::Format::eB8G8R8A8Unorm,
@@ -538,6 +543,7 @@ std::unordered_map<vk::Format, vk::FormatProperties> VKDevice::GetFormatProperti
                                        vk::Format::eBc6HUfloatBlock,
                                        vk::Format::eBc6HSfloatBlock,
                                        vk::Format::eBc1RgbaSrgbBlock,
+                                        vk::Format::eBc2SrgbBlock,
                                        vk::Format::eBc3SrgbBlock,
                                        vk::Format::eBc7SrgbBlock,
                                        vk::Format::eAstc4x4SrgbBlock,
--- a/src/video_core/renderer_vulkan/vk_device.h
+++ b/src/video_core/renderer_vulkan/vk_device.h
@@ -122,6 +122,11 @@ public:
        return properties.limits.maxPushConstantsSize;
    }

+    /// Returns true if Shader storage Image Read Without Format supported.
+    bool IsShaderStorageImageReadWithoutFormatSupported() const {
+        return is_shader_storage_img_read_without_format_supported;
+    }
+
    /// Returns true if ASTC is natively supported.
    bool IsOptimalAstcSupported() const {
        return is_optimal_astc_supported;
@@ -227,6 +232,8 @@ private:
    bool ext_depth_range_unrestricted{};       ///< Support for VK_EXT_depth_range_unrestricted.
    bool ext_shader_viewport_index_layer{};    ///< Support for VK_EXT_shader_viewport_index_layer.
    bool nv_device_diagnostic_checkpoints{};   ///< Support for VK_NV_device_diagnostic_checkpoints.
+    bool is_shader_storage_img_read_without_format_supported{}; ///< Support for shader storage
+                                                                ///< image read without format

    // Telemetry parameters
    std::string vendor_name;                      ///< Device's driver name.
--- a/src/video_core/renderer_vulkan/vk_graphics_pipeline.cpp
+++ b/src/video_core/renderer_vulkan/vk_graphics_pipeline.cpp
@@ -97,8 +97,7 @@ UniqueDescriptorUpdateTemplate VKGraphicsPipeline::CreateDescriptorUpdateTemplat
    u32 offset = 0;
    for (const auto& stage : program) {
        if (stage) {
-            FillDescriptorUpdateTemplateEntries(device, stage->entries, binding, offset,
-                                                template_entries);
+            FillDescriptorUpdateTemplateEntries(stage->entries, binding, offset, template_entries);
        }
    }
    if (template_entries.empty()) {
--- a/src/video_core/renderer_vulkan/vk_pipeline_cache.cpp
+++ b/src/video_core/renderer_vulkan/vk_pipeline_cache.cpp
@@ -36,6 +36,13 @@ using Tegra::Engines::ShaderType;

 namespace {

+// C++20's using enum
+constexpr auto eUniformBuffer = vk::DescriptorType::eUniformBuffer;
+constexpr auto eStorageBuffer = vk::DescriptorType::eStorageBuffer;
+constexpr auto eUniformTexelBuffer = vk::DescriptorType::eUniformTexelBuffer;
+constexpr auto eCombinedImageSampler = vk::DescriptorType::eCombinedImageSampler;
+constexpr auto eStorageImage = vk::DescriptorType::eStorageImage;
+
 constexpr VideoCommon::Shader::CompilerSettings compiler_settings{
    VideoCommon::Shader::CompileDepth::FullDecompile};

@@ -119,23 +126,32 @@ ShaderType GetShaderType(Maxwell::ShaderProgram program) {
    }
 }

+template <vk::DescriptorType descriptor_type, class Container>
+void AddBindings(std::vector<vk::DescriptorSetLayoutBinding>& bindings, u32& binding,
+                 vk::ShaderStageFlags stage_flags, const Container& container) {
+    const u32 num_entries = static_cast<u32>(std::size(container));
+    for (std::size_t i = 0; i < num_entries; ++i) {
+        u32 count = 1;
+        if constexpr (descriptor_type == eCombinedImageSampler) {
+            // Combined image samplers can be arrayed.
+            count = container[i].Size();
+        }
+        bindings.emplace_back(binding++, descriptor_type, count, stage_flags, nullptr);
+    }
+}
+
 u32 FillDescriptorLayout(const ShaderEntries& entries,
                         std::vector<vk::DescriptorSetLayoutBinding>& bindings,
                         Maxwell::ShaderProgram program_type, u32 base_binding) {
    const ShaderType stage = GetStageFromProgram(program_type);
-    const vk::ShaderStageFlags stage_flags = MaxwellToVK::ShaderStage(stage);
+    const vk::ShaderStageFlags flags = MaxwellToVK::ShaderStage(stage);

    u32 binding = base_binding;
-    const auto AddBindings = [&](vk::DescriptorType descriptor_type, std::size_t num_entries) {
-        for (std::size_t i = 0; i < num_entries; ++i) {
-            bindings.emplace_back(binding++, descriptor_type, 1, stage_flags, nullptr);
-        }
-    };
-    AddBindings(vk::DescriptorType::eUniformBuffer, entries.const_buffers.size());
-    AddBindings(vk::DescriptorType::eStorageBuffer, entries.global_buffers.size());
-    AddBindings(vk::DescriptorType::eUniformTexelBuffer, entries.texel_buffers.size());
-    AddBindings(vk::DescriptorType::eCombinedImageSampler, entries.samplers.size());
-    AddBindings(vk::DescriptorType::eStorageImage, entries.images.size());
+    AddBindings<eUniformBuffer>(bindings, binding, flags, entries.const_buffers);
+    AddBindings<eStorageBuffer>(bindings, binding, flags, entries.global_buffers);
+    AddBindings<eUniformTexelBuffer>(bindings, binding, flags, entries.texel_buffers);
+    AddBindings<eCombinedImageSampler>(bindings, binding, flags, entries.samplers);
+    AddBindings<eStorageImage>(bindings, binding, flags, entries.images);
    return binding;
 }

@@ -361,32 +377,45 @@ VKPipelineCache::DecompileShaders(const GraphicsPipelineCacheKey& key) {
    return {std::move(program), std::move(bindings)};
 }

-void FillDescriptorUpdateTemplateEntries(
-    const VKDevice& device, const ShaderEntries& entries, u32& binding, u32& offset,
-    std::vector<vk::DescriptorUpdateTemplateEntry>& template_entries) {
-    static constexpr auto entry_size = static_cast<u32>(sizeof(DescriptorUpdateEntry));
-    const auto AddEntry = [&](vk::DescriptorType descriptor_type, std::size_t count_) {
-        const u32 count = static_cast<u32>(count_);
-        if (descriptor_type == vk::DescriptorType::eUniformTexelBuffer &&
-            device.GetDriverID() == vk::DriverIdKHR::eNvidiaProprietary) {
-            // Nvidia has a bug where updating multiple uniform texels at once causes the driver to
-            // crash.
-            for (u32 i = 0; i < count; ++i) {
-                template_entries.emplace_back(binding + i, 0, 1, descriptor_type,
-                                              offset + i * entry_size, entry_size);
-            }
-        } else if (count != 0) {
-            template_entries.emplace_back(binding, 0, count, descriptor_type, offset, entry_size);
-        }
-        offset += count * entry_size;
-        binding += count;
-    };
+template <vk::DescriptorType descriptor_type, class Container>
+void AddEntry(std::vector<vk::DescriptorUpdateTemplateEntry>& template_entries, u32& binding,
+              u32& offset, const Container& container) {
+    static constexpr u32 entry_size = static_cast<u32>(sizeof(DescriptorUpdateEntry));
+    const u32 count = static_cast<u32>(std::size(container));

-    AddEntry(vk::DescriptorType::eUniformBuffer, entries.const_buffers.size());
-    AddEntry(vk::DescriptorType::eStorageBuffer, entries.global_buffers.size());
-    AddEntry(vk::DescriptorType::eUniformTexelBuffer, entries.texel_buffers.size());
-    AddEntry(vk::DescriptorType::eCombinedImageSampler, entries.samplers.size());
-    AddEntry(vk::DescriptorType::eStorageImage, entries.images.size());
+    if constexpr (descriptor_type == eCombinedImageSampler) {
+        for (u32 i = 0; i < count; ++i) {
+            const u32 num_samplers = container[i].Size();
+            template_entries.emplace_back(binding, 0, num_samplers, descriptor_type, offset,
+                                          entry_size);
+            ++binding;
+            offset += num_samplers * entry_size;
+        }
+        return;
+    }
+
+    if constexpr (descriptor_type == eUniformTexelBuffer) {
+        // Nvidia has a bug where updating multiple uniform texels at once causes the driver to
+        // crash.
+        for (u32 i = 0; i < count; ++i) {
+            template_entries.emplace_back(binding + i, 0, 1, descriptor_type,
+                                          offset + i * entry_size, entry_size);
+        }
+    } else if (count > 0) {
+        template_entries.emplace_back(binding, 0, count, descriptor_type, offset, entry_size);
+    }
+    offset += count * entry_size;
+    binding += count;
+}
+
+void FillDescriptorUpdateTemplateEntries(
+    const ShaderEntries& entries, u32& binding, u32& offset,
+    std::vector<vk::DescriptorUpdateTemplateEntry>& template_entries) {
+    AddEntry<eUniformBuffer>(template_entries, offset, binding, entries.const_buffers);
+    AddEntry<eStorageBuffer>(template_entries, offset, binding, entries.global_buffers);
+    AddEntry<eUniformTexelBuffer>(template_entries, offset, binding, entries.texel_buffers);
+    AddEntry<eCombinedImageSampler>(template_entries, offset, binding, entries.samplers);
+    AddEntry<eStorageImage>(template_entries, offset, binding, entries.images);
 }

 } // namespace Vulkan
--- a/src/video_core/renderer_vulkan/vk_pipeline_cache.h
+++ b/src/video_core/renderer_vulkan/vk_pipeline_cache.h
@@ -194,7 +194,7 @@ private:
 };

 void FillDescriptorUpdateTemplateEntries(
-    const VKDevice& device, const ShaderEntries& entries, u32& binding, u32& offset,
+    const ShaderEntries& entries, u32& binding, u32& offset,
    std::vector<vk::DescriptorUpdateTemplateEntry>& template_entries);

 } // namespace Vulkan
--- a/src/video_core/renderer_vulkan/vk_rasterizer.cpp
+++ b/src/video_core/renderer_vulkan/vk_rasterizer.cpp
@@ -105,17 +105,20 @@ void TransitionImages(const std::vector<ImageView>& views, vk::PipelineStageFlag

 template <typename Engine, typename Entry>
 Tegra::Texture::FullTextureInfo GetTextureInfo(const Engine& engine, const Entry& entry,
-                                               std::size_t stage) {
+                                               std::size_t stage, std::size_t index = 0) {
    const auto stage_type = static_cast<Tegra::Engines::ShaderType>(stage);
    if (entry.IsBindless()) {
        const Tegra::Texture::TextureHandle tex_handle =
            engine.AccessConstBuffer32(stage_type, entry.GetBuffer(), entry.GetOffset());
        return engine.GetTextureInfo(tex_handle);
    }
+    const auto& gpu_profile = engine.AccessGuestDriverProfile();
+    const u32 entry_offset = static_cast<u32>(index * gpu_profile.GetTextureHandlerSize());
+    const u32 offset = entry.GetOffset() + entry_offset;
    if constexpr (std::is_same_v<Engine, Tegra::Engines::Maxwell3D>) {
-        return engine.GetStageTexture(stage_type, entry.GetOffset());
+        return engine.GetStageTexture(stage_type, offset);
    } else {
-        return engine.GetTexture(entry.GetOffset());
+        return engine.GetTexture(offset);
    }
 }

@@ -295,16 +298,6 @@ RasterizerVulkan::RasterizerVulkan(Core::System& system, Core::Frontend::EmuWind

 RasterizerVulkan::~RasterizerVulkan() = default;

-bool RasterizerVulkan::DrawBatch(bool is_indexed) {
-    Draw(is_indexed, false);
-    return true;
-}
-
-bool RasterizerVulkan::DrawMultiBatch(bool is_indexed) {
-    Draw(is_indexed, true);
-    return true;
-}
-
 void RasterizerVulkan::Draw(bool is_indexed, bool is_instanced) {
    MICROPROFILE_SCOPE(Vulkan_Drawing);

@@ -621,33 +614,34 @@ bool RasterizerVulkan::WalkAttachmentOverlaps(const CachedSurfaceView& attachmen
 std::tuple<vk::Framebuffer, vk::Extent2D> RasterizerVulkan::ConfigureFramebuffers(
    vk::RenderPass renderpass) {
    FramebufferCacheKey key{renderpass, std::numeric_limits<u32>::max(),
-                            std::numeric_limits<u32>::max()};
+                            std::numeric_limits<u32>::max(), std::numeric_limits<u32>::max()};

-    const auto MarkAsModifiedAndPush = [&](const View& view) {
-        if (view == nullptr) {
+    const auto try_push = [&](const View& view) {
+        if (!view) {
            return false;
        }
        key.views.push_back(view->GetHandle());
        key.width = std::min(key.width, view->GetWidth());
        key.height = std::min(key.height, view->GetHeight());
+        key.layers = std::min(key.layers, view->GetNumLayers());
        return true;
    };

    for (std::size_t index = 0; index < std::size(color_attachments); ++index) {
-        if (MarkAsModifiedAndPush(color_attachments[index])) {
+        if (try_push(color_attachments[index])) {
            texture_cache.MarkColorBufferInUse(index);
        }
    }
-    if (MarkAsModifiedAndPush(zeta_attachment)) {
+    if (try_push(zeta_attachment)) {
        texture_cache.MarkDepthBufferInUse();
    }

    const auto [fbentry, is_cache_miss] = framebuffer_cache.try_emplace(key);
    auto& framebuffer = fbentry->second;
    if (is_cache_miss) {
-        const vk::FramebufferCreateInfo framebuffer_ci({}, key.renderpass,
-                                                       static_cast<u32>(key.views.size()),
-                                                       key.views.data(), key.width, key.height, 1);
+        const vk::FramebufferCreateInfo framebuffer_ci(
+            {}, key.renderpass, static_cast<u32>(key.views.size()), key.views.data(), key.width,
+            key.height, key.layers);
        const auto dev = device.GetLogical();
        const auto& dld = device.GetDispatchLoader();
        framebuffer = dev.createFramebufferUnique(framebuffer_ci, nullptr, dld);
@@ -845,8 +839,10 @@ void RasterizerVulkan::SetupGraphicsTextures(const ShaderEntries& entries, std::
    MICROPROFILE_SCOPE(Vulkan_Textures);
    const auto& gpu = system.GPU().Maxwell3D();
    for (const auto& entry : entries.samplers) {
-        const auto texture = GetTextureInfo(gpu, entry, stage);
-        SetupTexture(texture, entry);
+        for (std::size_t i = 0; i < entry.Size(); ++i) {
+            const auto texture = GetTextureInfo(gpu, entry, stage, i);
+            SetupTexture(texture, entry);
+        }
    }
 }

@@ -895,8 +891,10 @@ void RasterizerVulkan::SetupComputeTextures(const ShaderEntries& entries) {
    MICROPROFILE_SCOPE(Vulkan_Textures);
    const auto& gpu = system.GPU().KeplerCompute();
    for (const auto& entry : entries.samplers) {
-        const auto texture = GetTextureInfo(gpu, entry, ComputeShaderIndex);
-        SetupTexture(texture, entry);
+        for (std::size_t i = 0; i < entry.Size(); ++i) {
+            const auto texture = GetTextureInfo(gpu, entry, ComputeShaderIndex, i);
+            SetupTexture(texture, entry);
+        }
    }
 }

--- a/src/video_core/renderer_vulkan/vk_rasterizer.h
+++ b/src/video_core/renderer_vulkan/vk_rasterizer.h
@@ -56,6 +56,7 @@ struct FramebufferCacheKey {
    vk::RenderPass renderpass{};
    u32 width = 0;
    u32 height = 0;
+    u32 layers = 0;
    ImageViewsPack views;

    std::size_t Hash() const noexcept {
@@ -66,12 +67,17 @@ struct FramebufferCacheKey {
        }
        boost::hash_combine(hash, width);
        boost::hash_combine(hash, height);
+        boost::hash_combine(hash, layers);
        return hash;
    }

    bool operator==(const FramebufferCacheKey& rhs) const noexcept {
-        return std::tie(renderpass, views, width, height) ==
-               std::tie(rhs.renderpass, rhs.views, rhs.width, rhs.height);
+        return std::tie(renderpass, views, width, height, layers) ==
+               std::tie(rhs.renderpass, rhs.views, rhs.width, rhs.height, rhs.layers);
+    }
+
+    bool operator!=(const FramebufferCacheKey& rhs) const noexcept {
+        return !operator==(rhs);
    }
 };

@@ -105,8 +111,7 @@ public:
                              VKScheduler& scheduler);
    ~RasterizerVulkan() override;

-    bool DrawBatch(bool is_indexed) override;
-    bool DrawMultiBatch(bool is_indexed) override;
+    void Draw(bool is_indexed, bool is_instanced) override;
    void Clear() override;
    void DispatchCompute(GPUVAddr code_addr) override;
    void ResetCounter(VideoCore::QueryType type) override;
@@ -143,8 +148,6 @@ private:

    static constexpr std::size_t ZETA_TEXCEPTION_INDEX = 8;

-    void Draw(bool is_indexed, bool is_instanced);
-
    void FlushWork();

    Texceptions UpdateAttachments();
--- a/src/video_core/renderer_vulkan/vk_sampler_cache.cpp
+++ b/src/video_core/renderer_vulkan/vk_sampler_cache.cpp
@@ -23,7 +23,14 @@ static std::optional<vk::BorderColor> TryConvertBorderColor(std::array<float, 4>
    } else if (color == std::array<float, 4>{1, 1, 1, 1}) {
        return vk::BorderColor::eFloatOpaqueWhite;
    } else {
-        return {};
+        if (color[0] + color[1] + color[2] > 1.35f) {
+            // If color elements are brighter than roughly 0.5 average, use white border
+            return vk::BorderColor::eFloatOpaqueWhite;
+        }
+        if (color[3] > 0.5f) {
+            return vk::BorderColor::eFloatOpaqueBlack;
+        }
+        return vk::BorderColor::eFloatTransparentBlack;
    }
 }

@@ -37,8 +44,6 @@ UniqueSampler VKSamplerCache::CreateSampler(const Tegra::Texture::TSCEntry& tsc)

    const auto border_color{tsc.GetBorderColor()};
    const auto vk_border_color{TryConvertBorderColor(border_color)};
-    UNIMPLEMENTED_IF_MSG(!vk_border_color, "Unimplemented border color {} {} {} {}",
-                         border_color[0], border_color[1], border_color[2], border_color[3]);

    constexpr bool unnormalized_coords{false};

--- a/src/video_core/renderer_vulkan/vk_shader_decompiler.cpp
+++ b/src/video_core/renderer_vulkan/vk_shader_decompiler.cpp
@@ -69,8 +69,9 @@ struct TexelBuffer {

 struct SampledImage {
    Id image_type{};
-    Id sampled_image_type{};
-    Id sampler{};
+    Id sampler_type{};
+    Id sampler_pointer_type{};
+    Id variable{};
 };

 struct StorageImage {
@@ -86,6 +87,7 @@ struct AttributeType {

 struct VertexIndices {
    std::optional<u32> position;
+    std::optional<u32> layer;
    std::optional<u32> viewport;
    std::optional<u32> point_size;
    std::optional<u32> clip_distances;
@@ -284,14 +286,20 @@ public:
        AddExtension("SPV_KHR_variable_pointers");
        AddExtension("SPV_KHR_shader_draw_parameters");

-        if (ir.UsesViewportIndex()) {
-            AddCapability(spv::Capability::MultiViewport);
-            if (device.IsExtShaderViewportIndexLayerSupported()) {
+        if (ir.UsesLayer() || ir.UsesViewportIndex()) {
+            if (ir.UsesViewportIndex()) {
+                AddCapability(spv::Capability::MultiViewport);
+            }
+            if (stage != ShaderType::Geometry && device.IsExtShaderViewportIndexLayerSupported()) {
                AddExtension("SPV_EXT_shader_viewport_index_layer");
                AddCapability(spv::Capability::ShaderViewportIndexLayerEXT);
            }
        }

+        if (device.IsShaderStorageImageReadWithoutFormatSupported()) {
+            AddCapability(spv::Capability::StorageImageReadWithoutFormat);
+        }
+
        if (device.IsFloat16Supported()) {
            AddCapability(spv::Capability::Float16);
        }
@@ -826,16 +834,20 @@ private:
            constexpr int sampled = 1;
            constexpr auto format = spv::ImageFormat::Unknown;
            const Id image_type = TypeImage(t_float, dim, depth, arrayed, ms, sampled, format);
-            const Id sampled_image_type = TypeSampledImage(image_type);
-            const Id pointer_type =
-                TypePointer(spv::StorageClass::UniformConstant, sampled_image_type);
+            const Id sampler_type = TypeSampledImage(image_type);
+            const Id sampler_pointer_type =
+                TypePointer(spv::StorageClass::UniformConstant, sampler_type);
+            const Id type = sampler.IsIndexed()
+                                ? TypeArray(sampler_type, Constant(t_uint, sampler.Size()))
+                                : sampler_type;
+            const Id pointer_type = TypePointer(spv::StorageClass::UniformConstant, type);
            const Id id = OpVariable(pointer_type, spv::StorageClass::UniformConstant);
            AddGlobalVariable(Name(id, fmt::format("sampler_{}", sampler.GetIndex())));
            Decorate(id, spv::Decoration::Binding, binding++);
            Decorate(id, spv::Decoration::DescriptorSet, DESCRIPTOR_SET);

-            sampled_images.emplace(sampler.GetIndex(),
-                                   SampledImage{image_type, sampled_image_type, id});
+            sampled_images.emplace(sampler.GetIndex(), SampledImage{image_type, sampler_type,
+                                                                    sampler_pointer_type, id});
        }
        return binding;
    }
@@ -924,13 +936,22 @@ private:
        VertexIndices indices;
        indices.position = AddBuiltIn(t_float4, spv::BuiltIn::Position, "position");

+        if (ir.UsesLayer()) {
+            if (stage != ShaderType::Vertex || device.IsExtShaderViewportIndexLayerSupported()) {
+                indices.layer = AddBuiltIn(t_int, spv::BuiltIn::Layer, "layer");
+            } else {
+                LOG_ERROR(
+                    Render_Vulkan,
+                    "Shader requires Layer but it's not supported on this stage with this device.");
+            }
+        }
+
        if (ir.UsesViewportIndex()) {
            if (stage != ShaderType::Vertex || device.IsExtShaderViewportIndexLayerSupported()) {
                indices.viewport = AddBuiltIn(t_int, spv::BuiltIn::ViewportIndex, "viewport_index");
            } else {
-                LOG_ERROR(Render_Vulkan,
-                          "Shader requires ViewportIndex but it's not supported on this "
-                          "stage with this device.");
+                LOG_ERROR(Render_Vulkan, "Shader requires ViewportIndex but it's not supported on "
+                                         "this stage with this device.");
            }
        }

@@ -1292,6 +1313,13 @@ private:
                }
                case Attribute::Index::LayerViewportPointSize:
                    switch (element) {
+                    case 1: {
+                        if (!out_indices.layer) {
+                            return {};
+                        }
+                        const u32 index = out_indices.layer.value();
+                        return {AccessElement(t_out_int, out_vertex, index), Type::Int};
+                    }
                    case 2: {
                        if (!out_indices.viewport) {
                            return {};
@@ -1362,6 +1390,11 @@ private:
            UNIMPLEMENTED();
        }

+        if (!target.id) {
+            // On failure we return a nullptr target.id, skip these stores.
+            return {};
+        }
+
        OpStore(target.id, As(Visit(src), target.type));
        return {};
    }
@@ -1497,7 +1530,12 @@ private:
        ASSERT(!meta.sampler.IsBuffer());

        const auto& entry = sampled_images.at(meta.sampler.GetIndex());
-        return OpLoad(entry.sampled_image_type, entry.sampler);
+        Id sampler = entry.variable;
+        if (meta.sampler.IsIndexed()) {
+            const Id index = AsInt(Visit(meta.index));
+            sampler = OpAccessChain(entry.sampler_pointer_type, sampler, index);
+        }
+        return OpLoad(entry.sampler_type, sampler);
    }

    Id GetTextureImage(Operation operation) {
@@ -1755,8 +1793,16 @@ private:
    }

    Expression ImageLoad(Operation operation) {
-        UNIMPLEMENTED();
-        return {};
+        if (!device.IsShaderStorageImageReadWithoutFormatSupported()) {
+            return {v_float_zero, Type::Float};
+        }
+
+        const auto& meta{std::get<MetaImage>(operation.GetMeta())};
+
+        const Id coords = GetCoordinates(operation, Type::Int);
+        const Id texel = OpImageRead(t_uint4, GetImage(operation), coords);
+
+        return {OpCompositeExtract(t_uint, texel, meta.element), Type::Uint};
    }

    Expression ImageStore(Operation operation) {
@@ -2175,16 +2221,14 @@ private:
        switch (specialization.attribute_types.at(location)) {
        case Maxwell::VertexAttribute::Type::SignedNorm:
        case Maxwell::VertexAttribute::Type::UnsignedNorm:
+        case Maxwell::VertexAttribute::Type::UnsignedScaled:
+        case Maxwell::VertexAttribute::Type::SignedScaled:
        case Maxwell::VertexAttribute::Type::Float:
            return {Type::Float, t_in_float, t_in_float4};
        case Maxwell::VertexAttribute::Type::SignedInt:
            return {Type::Int, t_in_int, t_in_int4};
        case Maxwell::VertexAttribute::Type::UnsignedInt:
            return {Type::Uint, t_in_uint, t_in_uint4};
-        case Maxwell::VertexAttribute::Type::UnsignedScaled:
-        case Maxwell::VertexAttribute::Type::SignedScaled:
-            UNIMPLEMENTED();
-            return {Type::Float, t_in_float, t_in_float4};
        default:
            UNREACHABLE();
            return {Type::Float, t_in_float, t_in_float4};
--- a/src/video_core/renderer_vulkan/vk_swapchain.cpp
+++ b/src/video_core/renderer_vulkan/vk_swapchain.cpp
@@ -141,11 +141,6 @@ void VKSwapchain::CreateSwapchain(const vk::SurfaceCapabilitiesKHR& capabilities

    const vk::SurfaceFormatKHR surface_format{ChooseSwapSurfaceFormat(formats, srgb)};
    const vk::PresentModeKHR present_mode{ChooseSwapPresentMode(present_modes)};
-    extent = ChooseSwapExtent(capabilities, width, height);
-
-    current_width = extent.width;
-    current_height = extent.height;
-    current_srgb = srgb;

    u32 requested_image_count{capabilities.minImageCount + 1};
    if (capabilities.maxImageCount > 0 && requested_image_count > capabilities.maxImageCount) {
@@ -153,10 +148,9 @@ void VKSwapchain::CreateSwapchain(const vk::SurfaceCapabilitiesKHR& capabilities
    }

    vk::SwapchainCreateInfoKHR swapchain_ci(
-        {}, surface, requested_image_count, surface_format.format, surface_format.colorSpace,
-        extent, 1, vk::ImageUsageFlagBits::eColorAttachment, {}, {}, {},
-        capabilities.currentTransform, vk::CompositeAlphaFlagBitsKHR::eOpaque, present_mode, false,
-        {});
+        {}, surface, requested_image_count, surface_format.format, surface_format.colorSpace, {}, 1,
+        vk::ImageUsageFlagBits::eColorAttachment, {}, {}, {}, capabilities.currentTransform,
+        vk::CompositeAlphaFlagBitsKHR::eOpaque, present_mode, false, {});

    const u32 graphics_family{device.GetGraphicsFamily()};
    const u32 present_family{device.GetPresentFamily()};
@@ -169,9 +163,18 @@ void VKSwapchain::CreateSwapchain(const vk::SurfaceCapabilitiesKHR& capabilities
        swapchain_ci.imageSharingMode = vk::SharingMode::eExclusive;
    }

+    // Request the size again to reduce the possibility of a TOCTOU race condition.
+    const auto updated_capabilities = physical_device.getSurfaceCapabilitiesKHR(surface, dld);
+    swapchain_ci.imageExtent = ChooseSwapExtent(updated_capabilities, width, height);
+    // Don't add code within this and the swapchain creation.
    const auto dev{device.GetLogical()};
    swapchain = dev.createSwapchainKHRUnique(swapchain_ci, nullptr, dld);

+    extent = swapchain_ci.imageExtent;
+    current_width = extent.width;
+    current_height = extent.height;
+    current_srgb = srgb;
+
    images = dev.getSwapchainImagesKHR(*swapchain, dld);
    image_count = static_cast<u32>(images.size());
    image_format = surface_format.format;
--- a/src/video_core/renderer_vulkan/vk_texture_cache.h
+++ b/src/video_core/renderer_vulkan/vk_texture_cache.h
@@ -151,6 +151,10 @@ public:
        return params.GetMipHeight(base_level);
    }

+    u32 GetNumLayers() const {
+        return num_layers;
+    }
+
    bool IsBufferView() const {
        return buffer_view;
    }
--- a/src/video_core/shader/decode/arithmetic.cpp
+++ b/src/video_core/shader/decode/arithmetic.cpp
@@ -53,29 +53,24 @@ u32 ShaderIR::DecodeArithmetic(NodeBlock& bb, u32 pc) {

        op_b = GetOperandAbsNegFloat(op_b, false, instr.fmul.negate_b);

-        // TODO(Rodrigo): Should precise be used when there's a postfactor?
-        Node value = Operation(OperationCode::FMul, PRECISE, op_a, op_b);
+        static constexpr std::array FmulPostFactor = {
+            1.000f, // None
+            0.500f, // Divide 2
+            0.250f, // Divide 4
+            0.125f, // Divide 8
+            8.000f, // Mul 8
+            4.000f, // Mul 4
+            2.000f, // Mul 2
+        };

        if (instr.fmul.postfactor != 0) {
-            auto postfactor = static_cast<s32>(instr.fmul.postfactor);
-
-            // Postfactor encoded as 3-bit 1's complement in instruction, interpreted with below
-            // logic.
-            if (postfactor >= 4) {
-                postfactor = 7 - postfactor;
-            } else {
-                postfactor = 0 - postfactor;
-            }
-
-            if (postfactor > 0) {
-                value = Operation(OperationCode::FMul, NO_PRECISE, value,
-                                  Immediate(static_cast<f32>(1 << postfactor)));
-            } else {
-                value = Operation(OperationCode::FDiv, NO_PRECISE, value,
-                                  Immediate(static_cast<f32>(1 << -postfactor)));
-            }
+            op_a = Operation(OperationCode::FMul, NO_PRECISE, op_a,
+                             Immediate(FmulPostFactor[instr.fmul.postfactor]));
        }

+        // TODO(Rodrigo): Should precise be used when there's a postfactor?
+        Node value = Operation(OperationCode::FMul, PRECISE, op_a, op_b);
+
        value = GetSaturatedFloat(value, instr.alu.saturate_d);

        SetInternalFlagsFromFloat(bb, value, instr.generates_cc);
--- a/src/video_core/shader/decode/arithmetic_integer.cpp
+++ b/src/video_core/shader/decode/arithmetic_integer.cpp
@@ -293,44 +293,66 @@ u32 ShaderIR::DecodeArithmeticInteger(NodeBlock& bb, u32 pc) {

 void ShaderIR::WriteLop3Instruction(NodeBlock& bb, Register dest, Node op_a, Node op_b, Node op_c,
                                    Node imm_lut, bool sets_cc) {
-    constexpr u32 lop_iterations = 32;
-    const Node one = Immediate(1);
-    const Node two = Immediate(2);
-
-    Node value;
-    for (u32 i = 0; i < lop_iterations; ++i) {
-        const Node shift_amount = Immediate(i);
-
-        const Node a = Operation(OperationCode::ILogicalShiftRight, NO_PRECISE, op_c, shift_amount);
-        const Node pack_0 = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, a, one);
-
-        const Node b = Operation(OperationCode::ILogicalShiftRight, NO_PRECISE, op_b, shift_amount);
-        const Node c = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, b, one);
-        const Node pack_1 = Operation(OperationCode::ILogicalShiftLeft, NO_PRECISE, c, one);
-
-        const Node d = Operation(OperationCode::ILogicalShiftRight, NO_PRECISE, op_a, shift_amount);
-        const Node e = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, d, one);
-        const Node pack_2 = Operation(OperationCode::ILogicalShiftLeft, NO_PRECISE, e, two);
-
-        const Node pack_01 = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, pack_0, pack_1);
-        const Node pack_012 = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, pack_01, pack_2);
-
-        const Node shifted_bit =
-            Operation(OperationCode::ILogicalShiftRight, NO_PRECISE, imm_lut, pack_012);
-        const Node bit = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, shifted_bit, one);
-
-        const Node right =
-            Operation(OperationCode::ILogicalShiftLeft, NO_PRECISE, bit, shift_amount);
-
-        if (i > 0) {
-            value = Operation(OperationCode::IBitwiseOr, NO_PRECISE, value, right);
-        } else {
-            value = right;
+    const Node lop3_fast = [&](const Node na, const Node nb, const Node nc, const Node ttbl) {
+        Node value = Immediate(0);
+        const ImmediateNode imm = std::get<ImmediateNode>(*ttbl);
+        if (imm.GetValue() & 0x01) {
+            const Node a = Operation(OperationCode::IBitwiseNot, na);
+            const Node b = Operation(OperationCode::IBitwiseNot, nb);
+            const Node c = Operation(OperationCode::IBitwiseNot, nc);
+            Node r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, a, b);
+            r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, r, c);
+            value = Operation(OperationCode::IBitwiseOr, value, r);
        }
-    }
+        if (imm.GetValue() & 0x02) {
+            const Node a = Operation(OperationCode::IBitwiseNot, na);
+            const Node b = Operation(OperationCode::IBitwiseNot, nb);
+            Node r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, a, b);
+            r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, r, nc);
+            value = Operation(OperationCode::IBitwiseOr, value, r);
+        }
+        if (imm.GetValue() & 0x04) {
+            const Node a = Operation(OperationCode::IBitwiseNot, na);
+            const Node c = Operation(OperationCode::IBitwiseNot, nc);
+            Node r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, a, nb);
+            r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, r, c);
+            value = Operation(OperationCode::IBitwiseOr, value, r);
+        }
+        if (imm.GetValue() & 0x08) {
+            const Node a = Operation(OperationCode::IBitwiseNot, na);
+            Node r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, a, nb);
+            r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, r, nc);
+            value = Operation(OperationCode::IBitwiseOr, value, r);
+        }
+        if (imm.GetValue() & 0x10) {
+            const Node b = Operation(OperationCode::IBitwiseNot, nb);
+            const Node c = Operation(OperationCode::IBitwiseNot, nc);
+            Node r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, na, b);
+            r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, r, c);
+            value = Operation(OperationCode::IBitwiseOr, value, r);
+        }
+        if (imm.GetValue() & 0x20) {
+            const Node b = Operation(OperationCode::IBitwiseNot, nb);
+            Node r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, na, b);
+            r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, r, nc);
+            value = Operation(OperationCode::IBitwiseOr, value, r);
+        }
+        if (imm.GetValue() & 0x40) {
+            const Node c = Operation(OperationCode::IBitwiseNot, nc);
+            Node r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, na, nb);
+            r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, r, c);
+            value = Operation(OperationCode::IBitwiseOr, value, r);
+        }
+        if (imm.GetValue() & 0x80) {
+            Node r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, na, nb);
+            r = Operation(OperationCode::IBitwiseAnd, NO_PRECISE, r, nc);
+            value = Operation(OperationCode::IBitwiseOr, value, r);
+        }
+        return value;
+    }(op_a, op_b, op_c, imm_lut);

-    SetInternalFlagsFromInteger(bb, value, sets_cc);
-    SetRegister(bb, dest, value);
+    SetInternalFlagsFromInteger(bb, lop3_fast, sets_cc);
+    SetRegister(bb, dest, lop3_fast);
 }

 } // namespace VideoCommon::Shader
--- a/src/video_core/shader/decode/conversion.cpp
+++ b/src/video_core/shader/decode/conversion.cpp
@@ -83,14 +83,14 @@ u32 ShaderIR::DecodeConversion(NodeBlock& bb, u32 pc) {

        const bool input_signed = instr.conversion.is_input_signed;

-        if (instr.conversion.src_size == Register::Size::Byte) {
-            const u32 offset = static_cast<u32>(instr.conversion.int_src.selector) * 8;
-            if (offset > 0) {
-                value = SignedOperation(OperationCode::ILogicalShiftRight, input_signed,
-                                        std::move(value), Immediate(offset));
+        if (const u32 offset = static_cast<u32>(instr.conversion.int_src.selector); offset > 0) {
+            ASSERT(instr.conversion.src_size == Register::Size::Byte ||
+                   instr.conversion.src_size == Register::Size::Short);
+            if (instr.conversion.src_size == Register::Size::Short) {
+                ASSERT(offset == 0 || offset == 2);
            }
-        } else {
-            UNIMPLEMENTED_IF(instr.conversion.int_src.selector != 0);
+            value = SignedOperation(OperationCode::ILogicalShiftRight, input_signed,
+                                    std::move(value), Immediate(offset * 8));
        }

        value = ConvertIntegerSize(value, instr.conversion.src_size, input_signed);
--- a/src/video_core/shader/decode/texture.cpp
+++ b/src/video_core/shader/decode/texture.cpp
@@ -522,68 +522,53 @@ Node4 ShaderIR::GetTextureCode(Instruction instr, TextureType texture_type,
                               Node array, Node depth_compare, u32 bias_offset,
                               std::vector<Node> aoffi,
                               std::optional<Tegra::Shader::Register> bindless_reg) {
-    const auto is_array = static_cast<bool>(array);
-    const auto is_shadow = static_cast<bool>(depth_compare);
+    const bool is_array = array != nullptr;
+    const bool is_shadow = depth_compare != nullptr;
    const bool is_bindless = bindless_reg.has_value();

-    UNIMPLEMENTED_IF_MSG((texture_type == TextureType::Texture3D && (is_array || is_shadow)) ||
-                             (texture_type == TextureType::TextureCube && is_array && is_shadow),
-                         "This method is not supported.");
+    UNIMPLEMENTED_IF(texture_type == TextureType::TextureCube && is_array && is_shadow);
+    ASSERT_MSG(texture_type != TextureType::Texture3D || !is_array || !is_shadow,
+               "Illegal texture type");

    const SamplerInfo info{texture_type, is_array, is_shadow, false};
-    Node index_var{};
+    Node index_var;
    const Sampler* sampler = is_bindless ? GetBindlessSampler(*bindless_reg, index_var, info)
                                         : GetSampler(instr.sampler, info);
-    Node4 values;
-    if (sampler == nullptr) {
-        for (u32 element = 0; element < values.size(); ++element) {
-            values[element] = Immediate(0);
-        }
-        return values;
+    if (!sampler) {
+        return {Immediate(0), Immediate(0), Immediate(0), Immediate(0)};
    }

    const bool lod_needed = process_mode == TextureProcessMode::LZ ||
                            process_mode == TextureProcessMode::LL ||
                            process_mode == TextureProcessMode::LLA;
-
-    // LOD selection (either via bias or explicit textureLod) not supported in GL for
-    // sampler2DArrayShadow and samplerCubeArrayShadow.
-    const bool gl_lod_supported =
-        !((texture_type == Tegra::Shader::TextureType::Texture2D && is_array && is_shadow) ||
-          (texture_type == Tegra::Shader::TextureType::TextureCube && is_array && is_shadow));
-
-    const OperationCode read_method =
-        (lod_needed && gl_lod_supported) ? OperationCode::TextureLod : OperationCode::Texture;
-
-    UNIMPLEMENTED_IF(process_mode != TextureProcessMode::None && !gl_lod_supported);
+    const OperationCode opcode = lod_needed ? OperationCode::TextureLod : OperationCode::Texture;

    Node bias;
    Node lod;
-    if (process_mode != TextureProcessMode::None && gl_lod_supported) {
-        switch (process_mode) {
-        case TextureProcessMode::LZ:
-            lod = Immediate(0.0f);
-            break;
-        case TextureProcessMode::LB:
-            // If present, lod or bias are always stored in the register
-            // indexed by the gpr20 field with an offset depending on the
-            // usage of the other registers
-            bias = GetRegister(instr.gpr20.Value() + bias_offset);
-            break;
-        case TextureProcessMode::LL:
-            lod = GetRegister(instr.gpr20.Value() + bias_offset);
-            break;
-        default:
-            UNIMPLEMENTED_MSG("Unimplemented process mode={}", static_cast<u32>(process_mode));
-            break;
-        }
+    switch (process_mode) {
+    case TextureProcessMode::None:
+        break;
+    case TextureProcessMode::LZ:
+        lod = Immediate(0.0f);
+        break;
+    case TextureProcessMode::LB:
+        // If present, lod or bias are always stored in the register indexed by the gpr20 field with
+        // an offset depending on the usage of the other registers.
+        bias = GetRegister(instr.gpr20.Value() + bias_offset);
+        break;
+    case TextureProcessMode::LL:
+        lod = GetRegister(instr.gpr20.Value() + bias_offset);
+        break;
+    default:
+        UNIMPLEMENTED_MSG("Unimplemented process mode={}", static_cast<u32>(process_mode));
+        break;
    }

+    Node4 values;
    for (u32 element = 0; element < values.size(); ++element) {
-        auto copy_coords = coords;
        MetaTexture meta{*sampler, array, depth_compare, aoffi,    {}, {}, bias,
                         lod,      {},    element,       index_var};
-        values[element] = Operation(read_method, meta, std::move(copy_coords));
+        values[element] = Operation(opcode, meta, coords);
    }

    return values;
--- a/src/video_core/shader/node.h
+++ b/src/video_core/shader/node.h
@@ -299,7 +299,7 @@ private:
    u32 index{};  ///< Emulated index given for the this sampler.
    u32 offset{}; ///< Offset in the const buffer from where the sampler is being read.
    u32 buffer{}; ///< Buffer where the bindless sampler is being read (unused on bound samplers).
-    u32 size{};   ///< Size of the sampler if indexed.
+    u32 size{1};  ///< Size of the sampler.

    Tegra::Shader::TextureType type{}; ///< The type used to sample this texture (Texture2D, etc)
    bool is_array{};    ///< Whether the texture is being sampled as an array texture or not.
--- a/src/video_core/shader/track.cpp
+++ b/src/video_core/shader/track.cpp
@@ -157,13 +157,21 @@ std::tuple<Node, u32, u32> ShaderIR::TrackCbuf(Node tracked, const NodeBlock& co
        if (gpr->GetIndex() == Tegra::Shader::Register::ZeroIndex) {
            return {};
        }
-        // Reduce the cursor in one to avoid infinite loops when the instruction sets the same
-        // register that it uses as operand
-        const auto [source, new_cursor] = TrackRegister(gpr, code, cursor - 1);
-        if (!source) {
-            return {};
+        s64 current_cursor = cursor;
+        while (current_cursor > 0) {
+            // Reduce the cursor in one to avoid infinite loops when the instruction sets the same
+            // register that it uses as operand
+            const auto [source, new_cursor] = TrackRegister(gpr, code, current_cursor - 1);
+            current_cursor = new_cursor;
+            if (!source) {
+                continue;
+            }
+            const auto [base_address, index, offset] = TrackCbuf(source, code, current_cursor);
+            if (base_address != nullptr) {
+                return {base_address, index, offset};
+            }
        }
-        return TrackCbuf(source, code, new_cursor);
+        return {};
    }
    if (const auto operation = std::get_if<OperationNode>(&*tracked)) {
        for (std::size_t i = operation->GetOperandsCount(); i > 0; --i) {
--- a/src/video_core/surface.cpp
+++ b/src/video_core/surface.cpp
@@ -155,6 +155,8 @@ PixelFormat PixelFormatFromRenderTargetFormat(Tegra::RenderTargetFormat format)
        return PixelFormat::R16I;
    case Tegra::RenderTargetFormat::R32_FLOAT:
        return PixelFormat::R32F;
+    case Tegra::RenderTargetFormat::R32_SINT:
+        return PixelFormat::R32I;
    case Tegra::RenderTargetFormat::R32_UINT:
        return PixelFormat::R32UI;
    case Tegra::RenderTargetFormat::RG32_UINT:
--- a/src/video_core/surface.h
+++ b/src/video_core/surface.h
@@ -59,47 +59,48 @@ enum class PixelFormat {
    RG32UI = 41,
    RGBX16F = 42,
    R32UI = 43,
-    ASTC_2D_8X8 = 44,
-    ASTC_2D_8X5 = 45,
-    ASTC_2D_5X4 = 46,
-    BGRA8_SRGB = 47,
-    DXT1_SRGB = 48,
-    DXT23_SRGB = 49,
-    DXT45_SRGB = 50,
-    BC7U_SRGB = 51,
-    R4G4B4A4U = 52,
-    ASTC_2D_4X4_SRGB = 53,
-    ASTC_2D_8X8_SRGB = 54,
-    ASTC_2D_8X5_SRGB = 55,
-    ASTC_2D_5X4_SRGB = 56,
-    ASTC_2D_5X5 = 57,
-    ASTC_2D_5X5_SRGB = 58,
-    ASTC_2D_10X8 = 59,
-    ASTC_2D_10X8_SRGB = 60,
-    ASTC_2D_6X6 = 61,
-    ASTC_2D_6X6_SRGB = 62,
-    ASTC_2D_10X10 = 63,
-    ASTC_2D_10X10_SRGB = 64,
-    ASTC_2D_12X12 = 65,
-    ASTC_2D_12X12_SRGB = 66,
-    ASTC_2D_8X6 = 67,
-    ASTC_2D_8X6_SRGB = 68,
-    ASTC_2D_6X5 = 69,
-    ASTC_2D_6X5_SRGB = 70,
-    E5B9G9R9F = 71,
+    R32I = 44,
+    ASTC_2D_8X8 = 45,
+    ASTC_2D_8X5 = 46,
+    ASTC_2D_5X4 = 47,
+    BGRA8_SRGB = 48,
+    DXT1_SRGB = 49,
+    DXT23_SRGB = 50,
+    DXT45_SRGB = 51,
+    BC7U_SRGB = 52,
+    R4G4B4A4U = 53,
+    ASTC_2D_4X4_SRGB = 54,
+    ASTC_2D_8X8_SRGB = 55,
+    ASTC_2D_8X5_SRGB = 56,
+    ASTC_2D_5X4_SRGB = 57,
+    ASTC_2D_5X5 = 58,
+    ASTC_2D_5X5_SRGB = 59,
+    ASTC_2D_10X8 = 60,
+    ASTC_2D_10X8_SRGB = 61,
+    ASTC_2D_6X6 = 62,
+    ASTC_2D_6X6_SRGB = 63,
+    ASTC_2D_10X10 = 64,
+    ASTC_2D_10X10_SRGB = 65,
+    ASTC_2D_12X12 = 66,
+    ASTC_2D_12X12_SRGB = 67,
+    ASTC_2D_8X6 = 68,
+    ASTC_2D_8X6_SRGB = 69,
+    ASTC_2D_6X5 = 70,
+    ASTC_2D_6X5_SRGB = 71,
+    E5B9G9R9F = 72,

    MaxColorFormat,

    // Depth formats
-    Z32F = 72,
-    Z16 = 73,
+    Z32F = 73,
+    Z16 = 74,

    MaxDepthFormat,

    // DepthStencil formats
-    Z24S8 = 74,
-    S8Z24 = 75,
-    Z32FS8 = 76,
+    Z24S8 = 75,
+    S8Z24 = 76,
+    Z32FS8 = 77,

    MaxDepthStencilFormat,

@@ -171,6 +172,7 @@ constexpr std::array<u32, MaxPixelFormat> compression_factor_shift_table = {{
    0, // RG32UI
    0, // RGBX16F
    0, // R32UI
+    0, // R32I
    2, // ASTC_2D_8X8
    2, // ASTC_2D_8X5
    2, // ASTC_2D_5X4
@@ -267,6 +269,7 @@ constexpr std::array<u32, MaxPixelFormat> block_width_table = {{
    1,  // RG32UI
    1,  // RGBX16F
    1,  // R32UI
+    1,  // R32I
    8,  // ASTC_2D_8X8
    8,  // ASTC_2D_8X5
    5,  // ASTC_2D_5X4
@@ -355,6 +358,7 @@ constexpr std::array<u32, MaxPixelFormat> block_height_table = {{
    1,  // RG32UI
    1,  // RGBX16F
    1,  // R32UI
+    1,  // R32I
    8,  // ASTC_2D_8X8
    5,  // ASTC_2D_8X5
    4,  // ASTC_2D_5X4
@@ -443,6 +447,7 @@ constexpr std::array<u32, MaxPixelFormat> bpp_table = {{
    64,  // RG32UI
    64,  // RGBX16F
    32,  // R32UI
+    32,  // R32I
    128, // ASTC_2D_8X8
    128, // ASTC_2D_8X5
    128, // ASTC_2D_5X4
@@ -546,6 +551,7 @@ constexpr std::array<SurfaceCompression, MaxPixelFormat> compression_type_table
    SurfaceCompression::None,       // RG32UI
    SurfaceCompression::None,       // RGBX16F
    SurfaceCompression::None,       // R32UI
+    SurfaceCompression::None,       // R32I
    SurfaceCompression::Converted,  // ASTC_2D_8X8
    SurfaceCompression::Converted,  // ASTC_2D_8X5
    SurfaceCompression::Converted,  // ASTC_2D_5X4
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
yuzubot	8e554e7c7b	"Merge Tagged PR 1012"	2020-03-09 12:01:13 +00:00
yuzubot	32ed85ce73	"Merge Tagged PR 1340"	2020-03-09 12:01:13 +00:00
yuzubot	a88753e0c4	"Merge Tagged PR 1703"	2020-03-09 12:01:12 +00:00
bunnei	c281173df6	Merge pull request #3486 from ReinUsesLisp/fix-anisotropy-hack textures: Fix anisotropy hack	2020-03-08 16:28:07 -04:00
ReinUsesLisp	1aa75b1081	textures: Fix anisotropy hack Previous code could generate an anisotropy value way higher than x16.	2020-03-08 15:59:38 -03:00
bunnei	84e9f9f395	Merge pull request #3452 from Morph1984/anisotropic-filtering frontend/Graphics: Add "Advanced" graphics tab and experimental Anisotropic Filtering support	2020-03-07 22:28:35 -05:00
bunnei	662feb8c1c	Merge pull request #3481 from ReinUsesLisp/abgr5-storage maxwell_to_vk: Remove Storage capability for A1B5G5R5U	2020-03-07 19:51:33 -05:00
ReinUsesLisp	aa6fe3f1aa	maxwell_to_vk: Remove Storage capability for A1B5G5R5U	2020-03-06 18:47:27 -03:00
bunnei	49eff536d0	Merge pull request #3463 from ReinUsesLisp/vk-toctou vk_swapchain: Silence TOCTOU race condition	2020-03-05 19:38:42 -05:00
bunnei	4a8fe67964	Merge pull request #3479 from jroweboy/dont-log-on-no-input Minor fixes for udp input	2020-03-05 15:09:48 -05:00
bunnei	0361aa1915	Merge pull request #3451 from ReinUsesLisp/indexed-textures vk_shader_decompiler: Implement indexed textures	2020-03-05 11:42:46 -05:00
bunnei	fa1d625eed	Merge pull request #3469 from namkazt/patch-1 shader_decode: Fix LD, LDG when track constant buffer	2020-03-04 23:10:01 -05:00
bunnei	1e84d22275	Merge pull request #3478 from bunnei/a32 Refactoring to boot A32 games	2020-03-04 20:37:51 -05:00
James Rowe	002d9508a0	input/udp - Add minor error handling to prevent bad input from crashing	2020-03-03 23:46:05 -07:00
bunnei	67e7186d79	Merge pull request #3455 from ReinUsesLisp/attr-scaled video_core: Implement more scaled attribute formats	2020-03-03 22:46:20 -05:00
James Rowe	fc205a1bc5	Frontend/SDL - Provide proper default for UDP input When the default file is read in, the settings default value is only used when the key is missing. As it was, the key existed, but the value was empty string causing it to accept that as a value to pass into the core	2020-03-03 20:05:42 -07:00
James Rowe	2cdda8c564	input/udp - Dont log on invalid packet received	2020-03-03 19:52:16 -07:00
bunnei	dba112e510	core: hle: Implement separate A32/A64 SVC interfaces.	2020-03-02 21:52:03 -05:00
bunnei	c083ea7d78	core: Implement separate A32/A64 ARM interfaces.	2020-03-02 21:51:57 -05:00
bunnei	6fc485a607	core: loader: Remove check for 32-bit.	2020-03-02 21:43:15 -05:00
bunnei	64facb403e	core: dynarmic: Add CP15 from Citra.	2020-03-02 21:43:15 -05:00
bunnei	08c638f249	Merge pull request #3464 from FernandoS27/jit-fix ARM_Interface: Cache the JITs instead of deleting/recreating.	2020-03-02 21:41:43 -05:00
bunnei	dfa2e336ba	Merge pull request #3475 from yuzu-emu/FearlessTobi-readme Port citra-emu/citra#5097: "Update README.md"	2020-03-01 22:41:41 -05:00
Tobias	6af8ff24c9	Update README.md	2020-03-01 18:03:32 +01:00
Nguyen Dac Nam	85a4222a8c	nit: move comment to right place.	2020-02-29 13:50:10 +07:00
bunnei	ca7618684c	Merge pull request #3448 from bunnei/fix-audio-interp-2 audio_core: interpolate: Improvements to fix audio crackling.	2020-02-28 16:07:10 -05:00
bunnei	c7db1ef565	Merge pull request #3470 from bunnei/fix-smash-srgb renderer_opengl: Fix SRGB presentation frame tracking.	2020-02-28 01:22:00 -05:00
namkazy	1326e326f5	Merge branch 'patch-1' of https://github.com/namkazt/yuzu into patch-2	2020-02-28 13:14:49 +07:00
bunnei	5056d23d0d	renderer_opengl: Fix SRGB presentation frame tracking. - Fixes SRGB in Super Smash Bros. Ultimate.	2020-02-28 01:13:38 -05:00
Nguyen Dac Nam	6c0c2dfabc	shader_decode: Fix LD, LDG when track constant buffer	2020-02-28 13:11:19 +07:00
Nguyen Dac Nam	1c385362f5	shader_decode: keep it search on all code It fixed opcode LD, LDG on Pokemon Sword that can't find the constant buffer. Not sure if it helps any on visual.	2020-02-28 11:59:05 +07:00
Morph	7ee6065178	Create an "Advanced" tab in the graphics configuration tab and add anisotropic filtering levels.	2020-02-27 21:34:00 -05:00
bunnei	969357af1a	Merge pull request #3430 from bunnei/split-presenter Port citra-emu/citra#4940: "Split Presentation thread from Render thread"	2020-02-27 19:51:55 -05:00
bunnei	ebbfe73557	renderer_opengl: Reduce swap chain size to 3.	2020-02-27 19:50:17 -05:00
Morph	e1efab1f51	AM/ICommonStateGetter: Stub SetLcdBacklighOffEnabled (#3454 ) * Stub SetLcdBacklighOffEnabled Used by Super Smash Bros. Ultimate We require backlight services to be implemented to turn on/off the backlight. * Address feedback	2020-02-27 17:49:23 +01:00
Nguyen Dac Nam	db2f547434	shader: FMUL switch to using LUT (#3441 ) * shader: add FmulPostFactor LUT table * shader: FMUL apply LUT * Update src/video_core/engines/shader_bytecode.h Co-Authored-By: Mat M. <mathew1800@gmail.com> * nit: mistype * clang-format & add missing import * shader: remove post factor LUT. * shader: move post factor LUT to function and fix incorrect order. * clang-format * shader: FMUL: add static to post factor LUT * nit: typo Co-authored-by: Mat M. <mathew1800@gmail.com>	2020-02-27 11:14:25 -05:00
bunnei	a17214baea	renderer_opengl: Use more concise lock syntax.	2020-02-26 18:35:35 -05:00
bunnei	aef159354c	renderer_opengl: Move Frame/FrameMailbox to OpenGL namespace.	2020-02-26 18:28:50 -05:00
ReinUsesLisp	0aaa69e4d7	vk_swapchain: Silence TOCTOU race condition It's possible that the window is resized from the moment we ask for its size to the moment a swapchain is created, causing validation issues. To workaround this Vulkan issue request the capabilities again just before creating the swapchain, making the race condition less likely.	2020-02-26 17:07:18 -03:00
Fernando Sahmkow	f3d4d4eaa8	ARM_Interface: Cache the JITs instead of deleting/recreating. This was a bug inherited from citra which was fixed by then at some time. This commit corrects such bug and ensures JITs are correctly recycled.	2020-02-26 15:53:47 -04:00
bunnei	1f57f679a4	Merge pull request #3440 from namkazt/patch-6 shader: implement LOP3 fast replace for old function	2020-02-26 10:24:35 -05:00
bunnei	01a05b48b7	Merge pull request #3431 from CJBok/npad-fix InputCommon: analog_from_button get direction implementation	2020-02-25 21:39:26 -05:00
bunnei	795893a9a5	renderer_opengl: Create gl_framebuffer_data if empty.	2020-02-25 21:23:02 -05:00
bunnei	c6f78a4a6d	frontend: qt: bootmanager: Acquire a shared context in main emu thread.	2020-02-25 21:23:02 -05:00
bunnei	e25297536f	frontend: qt: bootmanager: Vulkan: Restore support for VK backend.	2020-02-25 21:23:01 -05:00
bunnei	14877b8f35	frontend: qt: bootmanager: OpenGL: Implement separate presentation thread.	2020-02-25 21:23:01 -05:00
bunnei	b2a38cce4e	frontent: qt: main: Various updates/refactoring for separate presentation thread.	2020-02-25 21:23:00 -05:00
bunnei	667f026c95	core: frontend: Refactor scope_acquire_window_context to scope_acquire_context.	2020-02-25 21:23:00 -05:00
bunnei	2e16c23784	frontend: sdl2: emu_window: Implement separate presentation thread.	2020-02-25 21:23:00 -05:00
bunnei	dc672ca4b3	renderer_opengl: Add texture mailbox support for presenter thread.	2020-02-25 21:22:59 -05:00
bunnei	add2c38b73	renderer_opengl: Add OGLRenderbuffer to resource/state management.	2020-02-25 21:22:58 -05:00
bunnei	0c82b00dfd	core: frontend: emu_window: Add TextureMailbox class.	2020-02-25 21:22:57 -05:00
bunnei	571451bdfe	core: settings: Add setting to enable vsync, which is on by default.	2020-02-25 20:57:02 -05:00
Mat M	45ac1c62c6	Merge pull request #3461 from ReinUsesLisp/r32i-rt video_core/surface: Add R32_SINT render target format	2020-02-25 17:47:14 -05:00
Mat M	00e3eab9c1	Merge pull request #3460 from ReinUsesLisp/unused-format-getter video_core/gpu: Remove unused functions	2020-02-25 17:46:07 -05:00
ReinUsesLisp	466ce715e4	video_core/surface: Add R32_SINT render target format	2020-02-25 17:19:34 -03:00
ReinUsesLisp	3c648e3e2d	video_core/gpu: Remove unused functions	2020-02-25 16:53:47 -03:00
bunnei	78ab2e0474	Merge pull request #3417 from ReinUsesLisp/r32i texture: Implement R32I	2020-02-25 14:08:45 -05:00
bunnei	e22ad52cdb	Merge pull request #3425 from ReinUsesLisp/layered-framebuffer texture_cache: Implement layered framebuffer attachments	2020-02-24 10:14:50 -05:00
ReinUsesLisp	1e9213632a	vk_shader_decompiler: Implement indexed textures Implement accessing textures through an index. It uses the same interface as OpenGL, the main difference is that Vulkan bindings are forced to be arrayed (the binding index doesn't change for stacked textures in SPIR-V).	2020-02-24 01:26:07 -03:00
ReinUsesLisp	1dda77d392	shader: Simplify indexed sampler usages	2020-02-24 01:26:07 -03:00
ReinUsesLisp	e2dd59e341	video_core: Implement more scaler attribute formats While changing this, fix assert in vk_shader_decompiler. We now know scaled formats are expected to be float in shaders attributes.	2020-02-24 00:27:37 -03:00
bunnei	2b4cdb73b6	Merge pull request #3424 from ReinUsesLisp/spirv-layer vk_shader_decompiler: Implement Layer output attribute	2020-02-22 23:45:16 -05:00
bunnei	754aac331f	Merge pull request #3422 from ReinUsesLisp/buffer-flush surface_base: Implement texture buffer flushes	2020-02-22 23:09:50 -05:00
bunnei	3ef5f2017d	Merge pull request #3416 from FernandoS27/schedule Kernel: Refactors and Implement a TimeManager and SchedulerLocks	2020-02-22 22:32:21 -05:00
bunnei	1989e1b9ac	audio_core: interpolate: Improvements to fix audio crackling. - Fixes audio crackling in Crash Team Racing Nitro-Fueled, Super Mario Odyssey, and others. - Addresses followup issues from #3310.	2020-02-22 22:26:16 -05:00
Fernando Sahmkow	3d0a2375ca	Scheduler: Inline global scheduler in Scheduler Lock.	2020-02-22 12:39:17 -04:00
Fernando Sahmkow	a1bf353780	Kernel: Correct pending feedback.	2020-02-22 11:51:03 -04:00
Fernando Sahmkow	b9472eae44	System: Expose Host thread registering routines from kernel.	2020-02-22 11:18:07 -04:00
Fernando Sahmkow	d219a96cc8	Kernel: Address Feedback.	2020-02-22 11:18:07 -04:00
Fernando Sahmkow	ea956c823e	Kernel: Implement Scheduler locks	2020-02-22 11:18:07 -04:00
Fernando Sahmkow	5c90d22f3d	Kernel: Implement Time Manager.	2020-02-22 11:18:07 -04:00
Fernando Sahmkow	179bafa7cb	Kernel: Rename ThreadCallbackHandleTable and Setup Thread Ids on Kernel.	2020-02-22 11:18:06 -04:00
Fernando Sahmkow	0728dfef84	Kernel: Make global scheduler depend on KernelCore	2020-02-22 11:18:06 -04:00
bunnei	d4da52bbd9	Merge pull request #3444 from bunnei/linux-audio-fix audio_core: interpolate: Fix include for climits (Linux build break).	2020-02-22 03:08:05 -05:00
bunnei	f5cf67140b	audio_core: interpolate: Fix include for climits (Linux build break).	2020-02-22 02:29:41 -05:00
bunnei	19bce3685a	Merge pull request #3310 from FearlessTobi/fast-resample audio_core: Switch to a faster interpolation technique	2020-02-22 01:54:40 -05:00
bunnei	27d57e0c4a	Merge pull request #3442 from ReinUsesLisp/fix-3d-assert shader/texture: Fix illegal 3D texture assert	2020-02-21 22:08:57 -05:00
ReinUsesLisp	7dc488a375	shader/texture: Fix illegal 3D texture assert Fix typo in the illegal 3D texture assert logic. We care about catching arrayed 3D textures or 3D shadow textures, not regular 3D textures.	2020-02-21 15:57:27 -03:00
Rodrigo Locatti	4a6a1aeab4	Merge pull request #3433 from namkazt/patch-1 renderer_vulkan: Add the rest of case for TryConvertBorderColor	2020-02-21 15:56:09 -03:00
Rodrigo Locatti	ef27b4b7b5	Merge pull request #3434 from namkazt/patch-2 vk_shader: Implement ImageLoad	2020-02-21 15:55:05 -03:00
Rodrigo Locatti	6b2719c0bb	Merge pull request #3435 from namkazt/patch-3 vulkan: add DXT23_SRGB	2020-02-21 15:48:19 -03:00
bunnei	dc7ebc2d01	Merge pull request #3423 from ReinUsesLisp/no-match-3d texture_cache: Avoid matches in 3D textures	2020-02-21 12:16:51 -05:00
Nguyen Dac Nam	10d8afb302	nit: add const to where it need.	2020-02-21 21:16:45 +07:00
Nguyen Dac Nam	1956a34ee5	shader: implement LOP3 fast replace for old function ref: https://devtalk.nvidia.com/default/topic/1070081/cuda-programming-and-performance/reverse-lut-for-lop3-lut/	2020-02-21 19:08:07 +07:00
Nguyen Dac Nam	c0c4da27d9	vk_device: remove left over from other branch	2020-02-21 08:56:18 +07:00
bunnei	fe8e5d8ae4	Merge pull request #3438 from bunnei/gpu-mem-manager-fix video_core: memory_manager: Flush/invalidate asynchronously when possible.	2020-02-20 20:04:05 -05:00
Nguyen Dac Nam	ecf275887b	clang-format	2020-02-20 09:39:30 +07:00
Nguyen Dac Nam	fbbad95845	shader_decompiler: only add StorageImageReadWithoutFormat when available	2020-02-20 09:28:13 +07:00
bunnei	2342c0d50e	Merge pull request #3432 from brianclinkenbeard/update-httplib Update httplib to 0.5.5	2020-02-19 21:15:06 -05:00
bunnei	bf0c929d4c	Merge pull request #3415 from ReinUsesLisp/texture-code shader/texture: Allow 2D shadow arrays and simplify code	2020-02-19 20:06:14 -05:00
bunnei	d65fa7d65c	video_core: memory_manager: Flush/invalidate asynchronously on Unmap. - Minor perf improvement.	2020-02-19 20:03:52 -05:00
Brian Clinkenbeard	d31156931d	fix issue with windows getnameinfo()	2020-02-19 16:16:49 -08:00
bunnei	b2bc7682b4	Merge pull request #3414 from ReinUsesLisp/maxwell-3d-draw maxwell_3d: Unify draw methods	2020-02-19 16:13:50 -05:00
bunnei	c8261a1a57	Merge pull request #3411 from ReinUsesLisp/specific-funcs gl_rasterizer: Use the least generic OpenGL draw function possible	2020-02-19 15:37:41 -05:00
bunnei	fd4c5463e8	Merge pull request #3437 from namkazt/patch-5 shader_conversion: add conversion I2F for Short	2020-02-19 11:27:28 -05:00
Nguyen Dac Nam	88cb05e6e7	shader_decompiler: add check in case of device not support ShaderStorageImageReadWithoutFormat	2020-02-19 12:57:22 +07:00
Nguyen Dac Nam	e61c7e9310	vk_device: setup shaderStorageImageReadWithoutFormat	2020-02-19 12:56:36 +07:00
Nguyen Dac Nam	47106ab152	vk_device: add check for shaderStorageImageReadWithoutFormat	2020-02-19 12:55:56 +07:00
Nguyen Dac Nam	1b6308727c	shader_conversion: I2F : add Assert for case src_size is Short	2020-02-19 11:40:35 +07:00
Nguyen Dac Nam	a2c2c5768f	fix warning	2020-02-19 11:10:26 +07:00
Nguyen Dac Nam	a8508f2bc0	clang-format fix	2020-02-19 11:02:59 +07:00
Nguyen Dac Nam	556f3a6e9a	shader_conversion: add conversion I2F for Short	2020-02-19 10:54:37 +07:00
Nguyen Dac Nam	2ef8af93aa	vk_shader: add Capability StorageImageReadWithoutFormat	2020-02-19 10:16:51 +07:00
Brian Clinkenbeard	ad4e5c15fb	httplib compatibility	2020-02-18 18:04:33 -08:00
Nguyen Dac Nam	f6f0762e81	vk_shader: Implement function ImageLoad (Used by Kirby Start Allies) Please enter the commit message for your changes. Lines starting	2020-02-19 08:39:01 +07:00
Brian Clinkenbeard	7f6c686d55	update httplib to latest commit	2020-02-18 17:11:40 -08:00
Nguyen Dac Nam	ec206f7f95	fixups mistake auto commit.	2020-02-19 01:24:32 +07:00
Nguyen Dac Nam	eaf60ca5d8	Update code structure Co-Authored-By: Mat M. <mathew1800@gmail.com>	2020-02-19 01:23:08 +07:00
Nguyen Dac Nam	9295966d26	add vertex UnsignedInt size RGBA	2020-02-18 21:52:51 +07:00
Nguyen Dac Nam	9fc42fffd9	add eBc2SrgbBlock to formats	2020-02-18 21:44:09 +07:00
Nguyen Dac Nam	493f0ad904	vulkan: add DXT23_SRGB	2020-02-18 21:39:50 +07:00
Nguyen Dac Nam	ba84f0988f	renderer_vulkan: Add the rest of case for TryConvertBorderColor	2020-02-18 16:52:54 +07:00
Brian Clinkenbeard	9e42025e5b	update httplib README	2020-02-17 22:54:09 -08:00
Brian Clinkenbeard	76b55c3624	0.4.2 works too	2020-02-17 22:53:25 -08:00
CJBok	23c4cc80e2	analog_from_button get direction implementation	2020-02-18 06:45:37 +01:00
Brian Clinkenbeard	293d4d553a	update httplib to 0.2.6	2020-02-17 20:13:24 -08:00
ReinUsesLisp	6a0220b2e1	texture_cache: Implement layered framebuffer attachments Layered framebuffer attachments is a feature that allows applications to write attach layered textures to a single attachment. What layer the fragments are written to is decided from the shader using gl_Layer.	2020-02-16 04:19:32 -03:00
ReinUsesLisp	1caf3f11c8	vk_shader_decompiler: Implement Layer output attribute SPIR-V's Layer is GLSL's gl_Layer. It lets the application choose from a shader stage (vertex, tessellation or geometry) which framebuffer layer write the output fragments to.	2020-02-16 04:17:37 -03:00
ReinUsesLisp	bfda5ff3f6	texture_cache: Avoid matches in 3D textures Code before this commit was trying to match 3D textures with another target. Fix that.	2020-02-16 04:15:42 -03:00
ReinUsesLisp	fd62bdf377	surface_base: Implement texture buffer flushes Implement downloads to guest memory from texture buffers on the generic cache and OpenGL.	2020-02-16 04:13:27 -03:00
ReinUsesLisp	14c2a4a2ec	texture: Implement R32I	2020-02-15 16:26:50 -03:00
ReinUsesLisp	6910ade146	shader/texture: Allow 2D shadow arrays and simplify code Shadow sampler 2D arrays are supported on OpenGL, so there's no reason to forbid these. Enable textureLod usage on these. Minor style changes.	2020-02-15 02:36:28 -03:00
ReinUsesLisp	91aa58e410	maxwell_3d: Unify draw methods Pass instanced state of a draw invocation as an argument instead of having two separate virtual methods.	2020-02-14 18:09:40 -03:00
ReinUsesLisp	336a4f8e99	gl_rasterizer: Use the least generic OpenGL draw function possible This may help some implementations.	2020-02-13 21:55:21 -03:00
FearlessTobi	e3cad7d49e	audio_core: Switch to a faster interpolation technique	2020-01-24 00:38:22 +01:00