About half an hour after making it, I fixed almost all the missing screens/skies and the clipped mario by adjusting the Z buffer range (previous range was 0..1, now it is -1..1).
I then improved N64 gfx combiners emulation (combiners are sort of early primitive pixel shaders). I use 360 pixel shaders to emulate them. At first it was really slow because I used many switches and loops in it and it seems doing that in a pixel shader isn't such a good idea. I got everything back to playable speeds by using mainly 3 techniques:
- a color lookup table to emulate combiner 'source' (ie vertex color/texture color/constant/...)
- a math formula that handles all the possible cases for the combiner operation (ie mul/add/sub/...)
- having different pixel shaders (one fast that can only do simple things, one intermediate, and one slow that emulates everything) and switching between then when needed.
A few #define later the emu was up to 50% faster.
Next I had an idea: why not try to get the X360 GPU to render my current frame in background instead of actively waiting for it to finish rendering.
Usually you do this:
/* resolve (and clear) */Xe_Resolve(xe);/* wait for render finish */Xe_Sync(xe);
Now I do this:
/* resolve (and clear) */Xe_Resolve(xe);/* begin rendering in background */Xe_Execute(xe);
and then I call Xe_Sync() at the last time right before beginning my next frame
I got a huge speed boost with this, Super Mario 64 now runs at around 100fps ingame !