Optimizing noise generation in Terrablox

Posted on 22/09/2016

Part of the challenge I set myself with Terrablox is to create it in pure GML, not known for it's speed as it's a high-level language, so no external files that could do the job easier or faster or anything like that. The point of this is that I believe if some piece of code is slow, it's your fault. And finding a way round it by using another tool that is inherently faster is not solving the problem, I think. So it's largely a practice challenge I guess!

On that note, there's two main things that are "slow" in Terrablox, meaning in this case they take a lot of processing time, and they are world generation and geometry modelling. These are also the two most important things to the game, so finding ways to make them faster is always ideal.

Today I implemented a solution to make the world generation faster. The world generation is done using a noise function, passed different parameters for different levels of detail on the terrain (oceans/continents, mountains/rivers, hills/holes) and then the results are combined to create the final terrain at any given point. While noise functions are generally slow when used a lot, it itself was not the issue. The fact that I was using it a lot was the issue. For each 16x16 chunk it was doing the 3 noise functions for each and every cell, this adds up to 768 noise function calls per chunk and took about 12000 microseconds on average to complete. That's 0.012 of a second, which at 60fps is about 75% of what happens in a second, which is a lot for a single aspect of the game!. What I have been doing to alleviate that demand is split the chunk generation over n number of game steps, which divides the demand n times leaving more spare time in a second which helps maintain a good framerate (at the cost of it taking longer for chunks to appear). It worked ok-ish, and I'll still maintain that optimization.

But my new solution is better and simpler and, sadly, blindingly obvious. So, these noise functions work on quite a large scale, to the point there's very little variation in height within a single chunk. So I wondered why the hell am I calling noise functions for every single cell? Surely just calling the noise functions for each corner, and just interpolating those values per cell would be better? And yeah, it is, there's no major visible difference to the terrain in game and it's only costing 8000 microseconds per chunk now. That's good, it's 33% faster. So now it's only doing 12 noise function calls per chunk (3 for each corner). But there are still some noise functions in there to deal with biomes, and other things taking up time, which I shall work on replacing in the same way. I'll talk more about that when I do an article on my method towards biomes.

So in code, this is what was done per cell (repeated 256 times):
var noise1 = SimplexGetHeight( xPos, yPos, 0.5, 0.004, scale );
var noise2 = SimplexGetHeight( xPos, yPos, 1.0, 0.002, scale );
var noise3 = SimplexGetHeight( xPos, yPos, 1.0, 0.00025, scale );

And what happens now (before we get into the cells, done only once);
var height1a = SimplexGetHeight( chunk[X], chunk[Y], 0.5, 0.004, scale );
var height1b = SimplexGetHeight( chunk[X]+16, chunk[Y], 0.5, 0.004, scale );
var height1c = SimplexGetHeight( chunk[X], chunk[Y]+16, 0.5, 0.004, scale );
var height1d = SimplexGetHeight( chunk[X]+16, chunk[Y]+16, 0.5, 0.004, scale );
var height2a = SimplexGetHeight( chunk[X], chunk[Y], 1.0, 0.002, scale );
var height2b = SimplexGetHeight( chunk[X]+16, chunk[Y], 1.0, 0.002, scale );
var height2c = SimplexGetHeight( chunk[X], chunk[Y]+16, 1.0, 0.002, scale );
var height2d = SimplexGetHeight( chunk[X]+16, chunk[Y]+16, 1.0, 0.002, scale );
var height3a = SimplexGetHeight( chunk[X], chunk[Y], 1.0, 0.00025, scale );
var height3b = SimplexGetHeight( chunk[X]+16, chunk[Y], 1.0, 0.00025, scale );
var height3c = SimplexGetHeight( chunk[X], chunk[Y]+16, 1.0, 0.00025, scale );
var height3d = SimplexGetHeight( chunk[X]+16, chunk[Y]+16, 1.0, 0.00025, scale );

And now per cell (repeated 256 times, no noise functions, just simple lerps):
var r1, r2; r1 = lerp( height1a, height1b, _x/16 );
r2 = lerp( height1c, height1d, _x/16 );
var noise1 = lerp( r1, r2, _y/16 );
r1 = lerp( height2a, height2b, _x/16 );
r2 = lerp( height2c, height2d, _x/16 );
var noise2 = lerp( r1, r2, _y/16 );
r1 = lerp( height3a, height3b, _x/16 );
r2 = lerp( height3c, height3d, _x/16 );
var noise3 = lerp( r1, r2, _y/16 );

The lerp method can actually be optimized further. As the cells are scanned using 2 for loops, one for x and one for y, we can get the r1 and r2 variables in the x loop before entering the y loop. Which means that we can reduce the lerp function calls from 2304 (16x16x9) down to 864 (16x6 + 16x16x3).

Anyway, I think it goes to show that "it's always your fault" stands true. I've had this code the same for so long and that was clearly a mistake now, seeing as such an easy optimization was staring at me for so long. But, it happens. :)