Currently only supports Wayland, and it's quite..... bad
This is a DRAFT! Just a small proof-of-concept.
I'll submit a draft PR to the SDL project to see if I should continue
development.
If this is greenlit, I might making consider a WGPU SDL_GPU backend if
wgpu-native is in a good enough state (maybe Dawn if it isn't?)
No promises though! Don't get angry at me if I don't!
Current known issues:
1: Error handling is sparse.
2: It only supports the shared library version of wgpu-native, and it
does that by just loading it whenever it needs to do something
3: wgpu-native is bad and doesn't implement wgpuGetProcAddress so we're
just using SDL_LoadFunction instead. I'd much prefer using the native
wgpuGetProcAddress but alas....
* SVE2 was actually disabled in fdfbbce, this issue is fixed
- The macro __ARM_FEATURE_SVE is only defined when the compilation target is set as -march=armv8-m+sve2
* Improves 8888 alpha-blending performance
- Now, in In-Order AArch64 processors, e.g. A520, SVE2 is better than NEON with the 128bit vector width
- For Out-of-order processors, NEON is still better than SVE2 (We could improve this in the future), the performance is improved from 3.0 to 3.6.
* The 8888 -> RGB565 performance is also improved (from 7.4 to 9.3)
Previously default audio on Apple platforms would duck other audio streams. This is unexpected, so by default we won't do that and you can use the hint SDL_HINT_AUDIO_DUCK_OTHERS to re-enable that behavior.
SVE/SVE2 is a new SIMD extension for AArch64. Compared to NEON, SVE/SVE2 brings the following benefits that are good for SDL projects:
- Lane prediction: we don't have to treat the tail part of a stride separately when the width is n times the hardware vector size
- Although the performance is almost no difference from NEON when the hardware vector size is 128bits, when the hardware provides a longer vector size, e.g. 256, 512, ... 2048, we can enjoy the large performance gain without modifying the source code or recompiling a library.
The functional correctness is validated in a dedicated [qemu project](https://github.com/GorgonMeducer/aarch64_qemu_mac_template/tree/SDL-SVE2-Acceleration-Validation).
The performance is tested on [Radxa Orion 6 N](https://radxa.com/products/orion/o6n/), which provides 4x A720 and 4x A520 processors. Since the vector size is 128 bits, which is the same as NEON, the performance is almost the same (or no worse than) the NEON acceleration.
Under the right conditions, this extension can result is smoother resizing when rendering with OpenGL, however, it is known to cause problems in certain cases, such as when handling presentation externally.
Gate it behind a hint, and disable it by default. Developers can selectively enable it when they verify that they meet the criteria for using it, and that it behaves correctly in their apps/games.
On some platforms, SDL_MemoryBarrierRelease() is defined to
SDL_CompilerBarrier(). If SDL_CompilerBarrier() is also defined to
the fallback spinlock acquire/release, then we will infinitely
recurse in SDL_UnlockSpinlock(). Avoid this by not unlocking the
temporary spinlock we create.