I spent some time recently determining the effect of Oodle on UE4 Load Time for various theoretical disk speeds.
The Oodle compressors can speed up load time in two different ways : one because they decompress faster, taking less CPU time for decompression, second by making the data smaller, which saves IO time. When disk speeds are slow, smaller files that save IO time are the primary benefit. When disk speeds are very fast, using less CPU time for decompression is the main factor.
First, I patched the UE4 source to limit the cores to less than the system to something more reasonable. I then patched the source to artificially limit the disk IO speed to something specific. The data itself was loaded from a PCIE-4 SSD - so very very fast and needed to be artificially limited to reflect the typical performance of say a blu-ray or PS4/XB1 HDD.
Of note, I did not emulate seek time, so the seek time is assumed to be basically instant - so YMMV. Also, real world load times will also be affected by things like disk cache, so we get more useful measurements by simulating disk speed.
Loading in UE4 is the sum of the time taken to load from disk and the time to decompress that data plus overhead time for level loading that's not directly in IO or decompression. Though depending on how many cores are available this loading from disk and decompressing the data itself can sometimes be done in parallel - for the purposes of these tests that was minimized through core affinity settings and mutexes.
What are we comparing?
ZLib and Oodle. If you enable compression for pak files in Unreal, software zlib is used by default. Oodle provides a plugin that drops in and changes the pak file compression. Mostly we care about Oodle's Kraken encoder as it has very desirable perf for compression ratio, but I included the others (Selkie, Mermaid, Leviathan, Hydra) as well in my testing.
The time we are measuring here is three things.
1) We want to know time to first frame.
2) We want to know how much time total was spent decoding.
3) We want to know how much time total was spent loading from disk.
#1 is the most important overall score, but #2 and #3 inform us about how much we can gain from the different options of Oodle Compressors and which one we should use specifically.
How fast is the PS4/XB1 HDD?
About 65-80 MB/s typical.
How fast is a Blue-ray?
About 10-20 MB/s (though seek times are horrendous)
How did I measure time to first frame?
With RAD Telemetry of course! :) (Seriously invaluable tool if you aren't familiar)
How much data are we loading to get to first frame?
ZLib: ~105 MB
Kraken: ~86 MB
Kraken has less data to load because of higher compression ratio.
First up, just Zlib and Oodle time to first frame...
The time it takes to do just the decompression part (not counting disk speed - just decompression time) is also pretty interesting.
ZLib: 3.88 seconds
Kraken: 1.39 seconds
The other Oodle formats here are as follows with regards to decompression time...
Selkie: 0.24 seconds
Mermaid: 0.64 seconds
Leviathan: 1.82 seconds
Hydra: 1 second
^^ you heard that right. Even Leviathan, Oodle's LZMA like compression ratio is over twice as fast as Zlib here...
In isolation Leviathan can decode 3X faster than Zlib, here we're timing not in an ideal benchmark, but in the actual usage in Unreal, where sometimes the buffers compressed are small and the overhead means we don't reach the full speeds Oodle is capable of.
The disk io time is (when measured) basically equivalent to the time to first frame - the decompression time + a second or two depending on how many cores you have working.
In conclusion, Oodle does make a meaningful impact on load times - This is extremely so for lower end devices which have fewer cores and also on systems with HDDs which are typical for PC & Current Gen console games. Presumably the Nintendo Switch will also benefit greatly from Oodle as well since the game data is loaded on a sdcard and those come in various speeds (sometimes really really slow).
For more information on Oodle visit http://www.radgametools.com/oodle.htm
The full data if you want to dig in...
First data set is "Time to First Frame". other time varies depending on how well it can hide work w/ multiple threads and just measurement noise.