In celebration of another year of Dolphin, we're taking a look back at some of our favorite features, fixes, and large scale changes of the year!
In every project there are revolutionary changes that alter the landscape and set the tone for future changes. In 2014, Dolphin team made a commitment to bringing the very best GameCube and Wii emulator to the next level in accuracy, speed, and feature-set. It was infectious, as we saw help arrive from veterans and newcomers alike in one of the most active years in Dolphin history.
Among all the new features and bug fixes were core changes to the emulator that helped bring things together. These changes, some that affect every game played, and others that only affect a handful, are all important in making aspects of how the emulator works better and more accurate. Without further ado, let's recap on 2014's best core changes!
Pixel Processing Pipeline Improvements by neobrain with an addition from magumagu¶
The idea of changing Dolphin's pixel processing pipeline from floating point to integers is not a new idea. The GameCube and Wii rely upon integer math in their GPU pipeline, and using floating point math to approximate it isn't exactly ideal. Hundreds of games suffered from slight graphical glitches all the way up to not rendering anything at all! To make a point on how important these changes were to the emulator, we can show off almost every example of how integer math helps without leaving The Legend of Zelda series
The sleeping armos statues used to have odd texturing defects. Color issues such as these were a problem in Wind Waker.
Not everyone may remember all of these issues. And that's because of a very scary aspect of the problem; the precise floating point results depended on the GPUs, drivers, or graphical backends in Dolphin. Tracking down and fixing one floating point error often meant creating a new floating point error on another hardware configuration. While these examples are from just one series of games, this was a global problem that affected almost the entirity of the GameCube and Wii game libraries. The solution to this massive problem was a branch called tev_fixes_new; the fourth attempt at rewriting Dolphin's pixel processing pipeline to be more accurate. The difference between this one and the prior attempts is that tev_fixes_new used integers directly.
If The Dolphin team knew that the GameCube and Wii were using integer math, why would so much effort over so many years go into fixing the pipeline through other means? A bunch of things held back on what ended up being the ultimate solution to the pipeline woes.
Hardware Issues - When Dolphin was a young emulator, using integer math on PC GPUs simply couldn't happen. It wasn't until the release of DirectX 10 in 2006 that top of the line GPUs started supporting integer math. When the switch to integer math happened earlier this year, it also mandated GLES 3 on the Android front in a time when no GLES 3 devices existed.
Backend issues - Prior to 2013, Dolphin had DX9, DX11, and OpenGL backends; and only the DX11 backend actually supported integers. It wasn't until degasus and Sonicadvance1's GLSL rewrite for the OpenGL backend that it was updated to spec that could use integer math. That also left us with DirectX9; which explicitly couldn't do integers at all. While it was the most popular backend used in Dolphin, we opted to drop the DX9 backend while vowing to make our remaining backends better and faster so it wouldn't be missed.
Performance Issues - Getting support for something that was going to slow down the emulator was a hard sell. Let it be known: the switch to integer math definitely slowed down emulation, especially on NVIDIA GPUs. After tev_fixes_new was really the first time that people ran into the issues of not being able to run most games at 4x IR on pretty much any decent GPU. But the trade-off for accuracy was worth it; plus, speed improvements since then have taken back the lost speed and then some.
The switch to integer math was a huge risk; one that required a lot of adjustments coming up to its eventual merge. Overall, thanks to careful review, planning, testing, and making sure the users were well informed, the initial merge went well.
But that was far from the end of the story. A short time later, a second similarly named merge, tev_combiner_fixes, was created to help lessen the performance impact on GPUs while bringing in a few new features. This change ended up fixing a long suffering problem in one of the GameCube's launch titles.
Integer math affected many other long standing issues. NES GameCube/Wii virtual console games previously drew nothing but a few lines here and there. The tev_fixes_new merge changed that.
Accuracy begets more accuracy, and eventually more fixes were made to the pipeline with a fix to the indirect texture coordinate computation. This, mixed with another fix from magumagu for the External Framebuffer made NES games on both the Wii and GameCube play perfectly in Dolphin.
Most enemies and non-player sprites in NES games did not draw properly without the indirect texture coordinates working correctly.
Being able to emulate NES games on a GameCube/Wii emulator may not matter to most users, but it's a testament that Dolphin is doing things correctly.
While this isn't the big rewrite that Zelda HLE audio absolutely needs, the Synchronous Zelda HLE audio merge greatly increased the stability of the games that use the Zelda HLE microcode. This in-house Nintendo microcode is far different than the AX variant that most games (including some other Zelda and Mario games!) use. As such, it wasn't covered by the brilliant AX-HLE Rewrite in 2013. A lot of the audio improvements form the past two years were featured in the Rise of HLE Audio feature article, including synchronous HLE audio.
The Zelda microcode makes up for the lack of games using it (less than two dozen) by being featured in some of the most popular titles, like The Legend of Zelda: The Wind Waker, Twilight Princess, Super Mario Galaxy 1 and 2 and other extremely popular titles. Without synchronous Zelda HLE, Galaxy titles were impossible to complete without LLE audio due to hangs after collecting the Grand Stars.
In general, this is more or less a maintenance and stability commit rather than a full-on rewrite. Many of the bugs and problems of Zelda HLE remain, but most of the blatant hangs and crashes are gone.
Actually emulating disc seek times, constant angular velocity, read speeds and other related disc drive functionality may seem pointless in titles that already work. Most users would rather have the games load as fast as possible and use Dolphin's ability to bypass the speed limits on the GameCube/Wii disc drive as an enhancement. Still, an argument for adding accurate disc timings has ramped up over the years, culminating in these changes.
- Preservation - With Nintendo moving on from the GameCube and Wii, it's important that an emulator be able to reproduce all behaviors of the console.
- Speedrunning - Being able to accurately time a game when testing on an emulator with all kinds of tools and enhancements is paramount to having that work pay off when doing runs on console.
- Game Functionality - In the case of the Metroid Prime series, a lot of the fun tricks and glitches done on console aren't possible without the game taking the time to load areas. Other games, like Starfox Adventures can have difficulties with timing subtitles without properly emulated disc speeds.
Left: Accurate disc timings. Right: "Speed up Disc Transfer Rate" enabled
More data can be read per rotation on sectors near the outside of the disc.
Based upon available information, homebrew programs, and game testing, several changes to the way disc timings are handled in Dolphin have been merged. For most of the year, however, these disc speeds were the same regardless of where the information stored on the disc. It was assumed to be only a minor problem that wouldn't show up in any actual game situation. That sentiment was quickly proven wrong.
While some doors in Metroid Prime seemed to open akin to how they would on console, other doors opened much faster even with the new accurate disc timings. Adjusting the speeds further wouldn't work; make the speeds any lower would break existing timings and videos in several games, and speeding up the disc speeds would make the areas that loaded correctly load too fast. To be more accurate and work correctly in this use case, Dolphin had to emulate the fact that the GameCube and Wii disc drive read speeds vary depending on where the data is located on the disc.
If that seems just crazy, it is! That didn't stop JosJuice from taking up the task. After a lot of tweaking (even some ongoing!) CAV calculations have been successfully implemented in Dolphin.
HLE, LLE, DTK and Audio Interface Upgrades and Fixes with commits from booto, magumagu, skidau, degasus, konpie, Phire and others.¶
If any of the big upgrades over the past year were a community effort, it would have to be this one. So many people have dedicated their time to fixing the remaining issues in Dolphin's audio system that it's hard to keep track!
Possibly the biggest single change in this bunch would be the DiscTracK audio rewrite which completely changed how streaming audio was handled. Prior to this merge, it was handled in an asynchronous manner; and by now everyone knows the kinds of issues that asynchronous audio can bring to the emulator!
Even with the DTK audio rewrite, streaming audio was still a pain in games. There were a ton of problems that showed up and caused headaches, including many games not loading music tracks after one playthrough, garbled audio, and other nasty issues. booto took the role of "I'm fixing anything I possibly can about this," and began digging through, reverse engineering and testing the audio in games that had problems. His work caused the audio in several problem titles to drastically improve, as well as audio to work for the first time in several games, such as the Virtual Console title Pokémon Snap.
All of these changes are great, but one of the smaller, but most obvious fixes comes from degasus. He fixed a long standing issue in Dolphin; the severe audio latency in Dolphin's mixer. This audio latency (roughly 200ms) affected all games in DSP-LLE, and AX-HLE titles in Dolphin 4.0.2 (and would affect all titles now; considering that asynchronous audio is eliminated.) Thanks to degasus, the audio latency was fixed before it became an epidemic. People who are curious about this behavior should try playing a game in Dolphin 4.0.2 and compare to latest master; the difference will be obvious.
skidau is one of our most versatile developers. While he has a history with fifo-work, some of his most notable changes of 2014 were actually in the audio department. Mario Kart Wii is a very popular game, so there is no way it could have an audio crash sitting right in front of our faces, right?
To be fair, who doesn't skip those pesky intros? This bug somehow went unnoticed for thousands of builds before a user finally caught on and reported it. With a change to the audio-loop points, the issue was quelled and everything is working. Speaking of audio loop points, we have this gem of an issue.
At 20 seconds, notice the music slowly become more and more garbled.
skidau tackled another audio loop-point issue to fix this in several games.
This fix was actually a much bigger one than the crash bug; problematic audio loop-points were causing all kinds of mayhem in Dolphin!
- FMVs could become desynced and have garbled audio - Mega Man X Collection and Pac-man World 2.
- Instruments could sound detuned - Skies of Arcadia Legends, Tales of Symphonia, and Pokémon Colosseum.
- Audio could completely desynchronize - Taiko no Tatsujin Series., Rhythm Heaven Fever
On top of all of those fixes, skidau also introduced a new feature to the emulator that brought out some excitement. For years, Wiimote audio has been relegated to real Wiimotes, with no option for people using emulated Wiimotes. Thanks to changes by degasus and magumagu, he was able to hook up the emulated Wiimote speaker data into Dolphin's audio mixer and use that to output the audio through system speakers. Unlike real Wiimotes, this does not suffer from the latency and bandwidth issues that garble up the audio, allowing for the sounds to be heard as they were meant to be.
While phire and konpie may not have many commits toward audio; their contribution were very important. After the DTK rewrite, there was a subtle static introduced to the emulator that a lot of audio sensitive users noticed. phire tacked the issue very quickly before it became one of those issues that become a stalwart in the emulator.
On the subject of static, konpie also handled one of our severe static issues; but this one only affected HLE audio. They identified a bug in our volume ramping that caused games to play audio too loud, causing a very obvious static.
Crackling in games can be really annoying, especially when the only work-around is LLE.
With volume ramping adjusted properly, many games that exhibited static only under HLE audio were fixed.
All in all, these commits and many others allowed Dolphin to take huge steps forward in audio emulation accuracy this year.
Also special thanks to phire for maintaining and improving JITIL when everyone else broke it with all their speedups and catching a critical problem in PPC_FP that only affected certain computers.
One of the most absolutely ridiculous things about 2014 is that despite Dolphin focusing on becoming a more accurate emulator, it's also taken huge steps forward in performance as well. Dolphin is now faster than it ever has been throughout the entire history of the emulator. How did that happen? Outside of some very nifty optimizations to the rest of the emulator, the reason for this falls upon all of the hard work done on the CPU side of Dolphin. It all started way back in May.
magumagu began writing what would be considered one of the great cheat sheets in Dolphin history in his Software Floating Point implementation. By doing this, he was able to unearth tons of problems in Dolphin's CPU emulation, some of them causing severe crashes, bugs, hangs, and really, really silly physics problems.
By June, Software Floating Point proved how important it would be for Dolphin when it, in a more complete form, was pitted against Dolphin's existing CPU cores.
As you could expect Software Floating Point wiped the floor with the existing solutions in the emulator. But unfortunately, it was too slow to seriously considering merging into the emulator.
Dolphin had more or less entered that phase of development where the focus was on making it more accurate rather than faster; hacks were being removed left and right as a greater understanding of the GameCube and Wii allowed for better emulation. This all in all lead to Dolphin 4.0 being the slowest release of the emulator yet.
Fiora brought a level of brilliance and dedication to Dolphin's CPU emulation that it outright halted this trend. Using Software Floating Point as her guideline, she implemented accurate fmul, fmadd, fres, fsqrtex, and the paired singles versions into the JIT recompiler. The result was a huge increase in accuracy and performance in one swoop.
If she had just stopped there, it would have already been a tremendous service to the emulator, but instead she's kept finding more and more ways to make the JIT more accurate and faster at the same time. PowerPC_FloatingPoint was a late 2013/early 2014 change that helped make Dolphin handle a lot of floating point situations more like the Wii/GC CPU would. Unfortunately, these accuracy changes came with a pretty big hit to speed. Fiora came up with a way to bring back most of the performance lost by those merges without sacrificing the accuracy. Everything she touched got more accurate and faster (or temporarily broke spectacularly and forced her to rush around looking for what she broke!) as she moved through the emulator.
Register allocation: improving the way that the JIT maps registers from the GC/WII CPU into the registers on the host x86 CPU. Better register allocation means shorter code and fewer memory accesses.
Increase the number of free registers for the JIT to use for allocation.
Heuristically preload needed values into registers, and flush unneeded ones.
Add heuristic for which registers to flush if the JIT runs out.
Optimize existing instructions: Find ways to improve PPC instructions already implemented in the JIT, by emulating them in fewer x86 instructions than before.
Float conversion on load/store ("PPC_FP")
Condition register instructions: cr operations, mtcrf, mcrxr, mfcr
Integer instructions: rlwinm, srawi, addi, extsb/h, cmp, mulli, mull
Memory instructions: load/store address calculation, paired load/store
Avoid saving PC on each store
Immediate handling: optimize stores of known values and to known addresses.
Instruction selection: Improve the generated JIT code by supporting new x86 instruction sets, or optimizing selection of existing instructions in the emitter.)
Use shorter opcodes when available for immediates and EAX-forms of x86 instructions
Support and use FMA3 instructions for fmadd PPC instructions
Support AVX and use it in all relevant float instructions
Support lzcnt and use it in cntlzw
Implement new instructions: implement new PPC instructions in the JIT that previously had to slowly and painfully fall back to interpreter.
Missing integer instructions: mulhw
Missing float instructions: ps_cmp, ps_res, ps_rsqrte, fsel, ps_sel, fres, fsqrte
Missing memory instructions: indexed paired load/store, indexed float load/store.
Missing system instructions: mftb
Missing flags calculation: support FPRF (for games like F-Zero GX)
Inter-instruction optimization: Implement optimizations that work across multiple PPC instructions, e.g. by merging related instructions.
Peephole merging of fcmp+cror, lbz+exstb, and rc instructions + cmp.
Optimize floating point operations based on knowledge of their inputs.
Carry optimizations: reorganize XER and keep carry flag in host carry flag when possible.
MMU-mode optimizations: Specific optimizations that primarily help games that require the MMU, like Rogue Squadron 2.
Disable BATs by default.
Far code cache to reduce MMU exception-handling code impact on instruction cache.
Support paired loads/stores with MMU.
In MMU mode, don't flush register state at every single load/store.
GPU thread optimizations: Changes that speed up the GPU thread, as opposed to the CPU thread.
Fully-SIMD vertex loader.
Pipelined CRC texture hashing implementation.
- Optimistically Predict BLR with RET - This merge from comex abuses the x86 feature of RET to emulate the PowerPC variant for a great speedup. This can be seen as up to 12% faster.
- PPC State Register - Another comex merge with another great speedup. By sacrificing one of our x86 registers to point to the middle of our PowerPC state, Dolphin gained efficiency and got faster despite the loss of a register.
- MMIO Rewrite - While the performance numbers didn't show huge gains, by better emulating the Memory-Mapped I/O, we have uncovered a lot of bugs in Dolphin.
- PowerPC Flags Optimization - The brain child between delroth and calc84maniac abuses the x86 architecture to efficiently calculate PowerPC flags. This change required every single JIT and the interpreter to be adjusted, and resulted in the death of JITILARM. delroth handled the JIT, Sonicadvance1 handled JIT32ARM, and magumagu handled JITIL and interpreter. We highly recommend interested users checkout the July Progress Report for how this optimization works in detail.
These optimizations and improvements hitting the emulator weren't purely accidental or luck. The Dolphin Team made the very hard decision to drop 32bit support and the 32bit JIT along with it. By doing that, newer optimizations that would have broken the 32bit JIT were now possible, and changing the JIT was now easier and faster than ever. Without that decision, it's very possible these improvements would never have happened.
And since? Well, optimizations have been coming in on all sides, GPU, CPU, MMU and more! With every generation, CPU and GPU manufacturers are regularly adding in new functionality that previously only existed on console. It seems like there is still a lot more performance to derive from Dolphin.
Bounding box has always been a bit of a pain in Dolphin. It's resulted in many annoying glitches and crashes in the very popular Paper Mario series
crudelios has been the primary person working on bounding box emulation for quite some time. Bounding box registers are used when a game wants to know where an object was drawn on screen. These registers contain coordinates of a rectangle surrounding the object. The game can then do crazy things with this information, allowing for the creative effects that the Paper Mario games are famous for.
Late in the year crudelios finally levied what appeared to be the final blow against the finicky feature by levying a software solution against it. This solution to the problem resulted in a huge increase in accuracy with bounding box effects at the expense of being quite a bit slower.
While bounding box emulation had been steadily getting better, some issues couldn't be handled so easily.
Finally, it seemed as though Dolphin's battle against bounding box had come to a close. That's when degasus saw an opportunity to implement any even more accurate method of bounding box emulation in OpenGL. This somehow made it more accurate and even faster than the old inaccurate bounding box implementation at lower internal resolutions! TinoB effortlessly implemented it in D3D, making it so Dolphin doesn't need to fallback to software on any backend any longer.