The biggest advantage that consoles (any) have over gaming PCs is the fact that they are a fixed platform and with everything integrated, making it much easier for developers to optimize games for these platforms. One of the great innovations of next-gen consoles is the SSD that both PS5 and Xbox Series X incorporate, which will allow a huge advance in terms of performance , so it is inevitable that we ask ourselves the question: does this mean a new era for gaming ?
Until now, PC gaming has shown that increasing storage speed with SSDs has little or no impact on gaming performance if we ignore loading times. NVMe SSDs are several times faster than SATA SSDs on paper, but even in the most demanding PC games there is no noticeable difference between using one or the other. In part, this is due to bottlenecks in other parts of the system that reveal when storage is fast enough not to crash anywhere.
Upcoming consoles will include a number of hardware features to make it easier for games to take advantage of storage much faster than in the past, alleviating bottlenecks that would be problematic on a standard PC. This is where the storage technology of the console becomes really interesting, as the SSDs of the PS5 or Xbox Series X themselves are relatively unremarkable.
So, next we will describe the aspects of the SSD and in general of the upcoming PS5 and Xbox Series X that could usher in a new era for gaming if developers extrapolated these advantages to the PC gaming market as well .
Compression on the SSD of the PS5 and Xbox Series X
The most important specialized hardware feature that new generation consoles will incorporate to complement the performance of storage hardware is dedicated data decompression hardware. Game assets must be stored on disk in compressed form to keep storage requirements at a reasonable level, and this obviously has some impact on performance.
Games, as a general rule, are based on multiple compression methods : some lossy, specialized for certain types of data (for example audio or static images) and some general-purpose algorithms without losses; What is unappealable is that almost all data goes through at least one compression method that is quite complex from a computational point of view.
GPU architectures have long included hardware to handle decoding video streams and support fast but simple lossy texture compression methods like S3TC and its successors, but this leaves the CPU to have to unpack many of the data. Desktop CPUs do not have dedicated engines or decompression instructions, although many instructions in the different SIMD extensions are intended to assist in these types of tasks. Still, unzipping a multi-GB-per-second data stream isn’t trivial, and having specific hardware for this task can do it much more efficiently while relieving the CPU load.
The decompression dump hardware on the upcoming consoles is implemented in the main SoC so that it can unpack the data after traversing the PCIe link from the SSD, and resides in the main RAM pool shared by the GPU and CPU cores.
Decompression download hardware like this is not found on a desktop PC, but it is not a novel idea as it does exist in server environments and in fact previous generation consoles also include dedicated decompression hardware if you Well not a hardware with a performance capable of keeping up with an NVMe SSD like that of the PS5 or Xbox Series X.
Server platforms often include compression accelerators, usually paired with crypto accelerators: Intel has made such accelerators as dedicated or external peripherals on server chipsets, and IBM POWER9 and later CPUs feature similar acceleration units. These server accelerators are more comparable to what modern consoles need, as they do achieve a throughput of several GB per second.
Both Microsoft and SONY have optimized their decompression units to match the expected performance of their corresponding SSDs. They have chosen different proprietary compression algorithms: RAD’s Kraken in the case of SONY, originally designed for today’s consoles with relatively weak CPUs but with much lower performance requirements, while Microsoft has specifically focused on texture compression, reasoning that textures represent the largest volume of data in a game. To do this, they developed a new compression algorithm called BCPack .
Xbox Series X | PS5 | |
---|---|---|
Algorithm | BCPack | Kraken |
Maximum output bandwidth | 6 GB / s | 22 GB / s |
Typical bandwidth | 4.8 GB / s | 8-9 GB / s |
Equivalent in Zen 2 CPU cores | 5 | 9 |
SONY claims its Kraken-based decompression hardware can unpack a stream of up to 5.5 GB / s of data (8-9 GB / s of uncompressed data), but theoretically it could reach 22 GB / s if the data is enough redundant. For its part, Microsoft says that its BCPack decompressor can generate typical 4.8 GB / s of input but would potentially reach 6 GB / s. Thus, Microsoft claims to have slightly higher typical compression ratios, but it still has lower data throughput because the Xbox Series X’s SSD is noticeably slower than that of the PS5. Remember that Microsoft only compresses and decompresses texture data , while SONY does it with all the data.
The CPU time saved by the system from these dedicated decompression units is incredible – it’s the equivalent of about 9 Zen 2 CPU cores for the PS5 and about 5 cores on the Xbox Series X. Keep in mind that these are maximum numbers assuming that the SSD bandwidth is being fully utilized, but the reality is that games will hardly ever have the SSD 100% loaded constantly, so it would never be necessary CPU usage.
Storage acceleration features in console SoCs are not limited to compression dumping alone, and SONY in particular has outlined some features, albeit in such a highly vague way that we dare not make claims about it because it is too open to interpretation. Stick to the concept that this dedicated decompression hardware works in theory to improve performance on other tasks as well.
DMA motors
Direct memory access (DMA stands for Direct Memory Access) refers to the ability of a peripheral to read and write to RAM without the system CPU having to be involved. All modern high-speed peripherals use DMA for most of their communications with the CPU, but that is not their only use since a DMA engine is a peripheral device that exists only to move data, but do nothing with it. It is like a highway.
The CPU can command the DMA engine to make a copy of one region of RAM to another, and it does the memory job of potentially copying gigabytes of data without the CPU having to do a mov instruction (or its SIMD equivalent). for each piece, and also without contaminating the CPU cache. DMA engines can also often do more than simply offload simple copy operations: They support scatter / collect operations to rearrange data in some way in the process of moving it. NVMe already has features like scatter / collect lists that can eliminate the need for a separate DMA engine to provide that feature, but NVMe commands on Xbox and PS5 SSDs mostly act with compressed data alone.
Although DMA engines are a peripheral device, we generally won’t find them as a PCIe expansion card for example. It makes more sense for them to be as close to the memory controller as possible for lower latency access, which means we’ll find them on the chipset or on the CPU itself.
The PS5 SoC includes a DMA engine to handle the copying of data that comes out of the compression unit, and as with decompression engines, this is not a new invention but a feature that is missing from standard desktop PCs but that already exists on servers. Simply what SONY has done is incorporate this feature into its console to further “fine-tune” performance and relieve the load on the CPU.
The IO coprocessor
The complex IO (I / O, input and output) on the PS5 SoC also includes a dual-core processor with its own set of SRAMs . SONY has said next to nothing about the insides of this, but has described it as a dedicated console SSD kernel that will allow games to “bypass the traditional I / O process”, while the other kernel is simply described as an aid to memory mapping. For more details on this we must resort to a patent that SONY filed many years ago and that we hope will reflect what we are really going to find on the PS5.
This coprocessor described in the SONY patent downloads portions of what would normally be the operating system storage drivers. One of its most important tasks is to translate between various address spaces: when a game requests a certain range of bytes from one of its files, it searches for the uncompressed data. The IO coprocessor determines which pieces of compressed data are needed and sends NVMe read commands to the SSD. Once the data has been returned, the IO coprocessor configures the decompression unit to process this data and the DMA engine to deliver it to the locations requested in memory by the game, all without CPU intervention.
Since these two coprocessor cores are much less powerful than the SSoC Zen 2 cores, they cannot be in charge of all the interaction with the SSD. The coprocessor will handle the most common cases of reading data, and the system turns to Zen 2 cores for the rest. The coprocessor SRAM is not used to protect the large amounts of game data flowing through the IO complex, but this memory contains the various lookup tables (as an index) used by the coprocessor. In this respect, it is similar to an SSD controller with a set of RAM for its mapping tables, but the work of the IO coprocessor is completely different from that of an SSD controller, of course. However, it will be very useful with third-party SSDs if the console supports them in the end.
Cache consistency
The last hardware feature of the consoles that is related to storage is the set of cache consistency engines that SONY has revealed. The CPU and GPU in the console SoC share the same 16GB of GDDR6 RAM, eliminating the step of copying assets from main RAM to VRAM after loading them from the SSD and unzipping them.
But to get the most benefit from the shared memory pool, the hardware needs to ensure some consistency of the cache memory and not just across multiple CPU cores, but across various GPU caches as well. All that is normal for an APU, but the novelty with the PS5 and Xbox Series X is that the IO complex also participates. When new graphical assets are loaded into memory via the system’s complex IO and overwrite the oldest assets, it sends invalidation signals to the cache at the same time to discard only outdated data and not what is still valid (instead of completely empty the cache every time).
Is the SSD of the PS5 the same as that of the Xbox Series X?
There is a ton of information on PS5 custom storage (as we’ve told you in this article, though there are still many gaps as well), so it’s natural to wonder if users who buy an Xbox Series X will have the same “perks” or if will limit only to decompression hardware. Microsoft has bundled all of the storage related technologies into what they have called “Xbox Velocity Architecture”.
Microsoft says that it has four components: the SSD itself, the compression engine, a new software API to access storage and a hardware feature called Sampler Feedback Streaming, which we already talked about earlier. The latter is fairly unrelated to storage as it is a function of the GPU that makes partially resident textures more useful by allowing shader programs to keep track of what parts of a texture are actually being used. This information can be used to decide what data to evict from RAM and what to load next, such as a higher resolution version of the texture regions that are truly visible at all times.
Since Microsoft doesn’t mention anything like the other complex features of the PS5, it’s reasonable to assume that the Xbox Series X lacks them and its IO is largely CPU-managed. However, it would not be surprising to discover later that the console does have a comparable DMA engine, because we have seen this feature historically in previous versions of the console.