Optimization gone weird

The Instructable dealing with Arduino multiplexing is almost complete, after about a year of writing. In fact, this delay did some good, as I caught a bunch of bugs that were hiding pretty well. Today I’ll tell you a story about some of them in the hope that it may prove to be rather useful.

I’m writing about two multiplexing projects that do not use any LED drivers, just Arduino (Atmega328p to be precise). Obviously I have to use all the available PWM outputs, which means I’m also using all the timers. Here’s an excerpt from the old draft:

…But we’ll lose a lot of Arduino timer-related functions, most notably millis(), delay(), delayMicroseconds() and anything else of that persuasion. Yes, the delay() will still work, but it will interfere with the PWM output, turning some of the LEDs on. The microseconds variation will turn different ones on. In any case, we need an alternative delay, the one that won’t rely on timers.

There are three ways to do a new delay function.

Firstly, use the ASM NOP instruction that does nothing for a single tick:

// alternative Delay (approx 1.3m ticks per second)

void justWait (uint32_t period)


  for (uint32_t z = 0; z<period; z++) __asm__("nop\n\t"); 



Secondly, you can do an empty loops like these:

// alternative Delay (approx 450,000 ticks per second)

void justWait1 (uint32_t period)


  for (volatile uint32_t z = 0; z<period; z++);



// alternative Delay (approx 600,000 ticks per second)

void justWait2 (volatile uint32_t period)


  while (period--); 


Note the volatile keyword in there, it is needed to prohibit the compiler from optimizing the code (as the loop does nothing, the compiler will simply ditch it if left unchecked).

Thirdly, you may simply use the


avrlibc function, but where’s fun in that? Also, it has some drawbacks.

In fact, I started the article with this part, just after I wrote the sketches and tested them with all possible delay functions. I still had to check some other stuff, but at least I was absolutely sure these are ok.

And then, 8 months later, when I was finishing the article and powered up my multiplexing contraption, weirdness happened.

The demonstration sketch has two functions to show off the multiplex workings. One is more complex, the other is very simple – it just turns on the LEDs one by one (red, then green, then blue on the first LED, red, green and blue on the second and so on) with fade-ins and fade-outs. Here it is:

void testFor(uint32_t dperiod)


 for (uint8_t y = 0; y<18; y++){

  for (uint8_t k = 255; k>5; k--)

   {colors[y/3][y%3] = k;

    justWait2(dperiod); //1294


  for (uint16_t k = 5; k<256; k++)

   {colors[y/3][y%3] = k;





The colors[][] array is the ‘video memory’, writing there is essentially the same as arduinoWrite() with inverted logic.

While the first, more complex function still worked flawlessly, this one suddenly decided to go berserk and refused to work. The LEDs simply stayed off. And the problem was with the justWait() functions, as everything worked fine when I replaced them with _delay_ms().

I’ve spent some time trying to figure this out, adding some weird stuff to the justWait() like __volatile__  after __asm__, or changing __asm__ to just asm and some other stuff I even don’t remember anymore. I managed to get some results, like the first LED turning on and staying on, but they were still wrong.

Finally I did something that made that first LED blink red. There were no fades and no switching of colors or LEDs, but at least it was blinking and the delay between the blinks looked like it is a failed fade-in and fade-out routines. Like the fade for() cycles were truncated to the first color[0][0]=0 or 255 and then the delays were just thrown together in a heap.

That really stank of optimizing done wrong!

So, I went and turned optimization off right at the start

#pragma GCC optimize ("-O0")

and naturally everything started working as intended.

Ok, the culprit found, but the sketch grew in size and that’s not good. I googled around and found function attributes that change the way a particular function is optimized. Like this:

void __attribute__((optimize("O0"))) justWait (uint32_t period)


  for (uint32_t z = 0; z<period; z++) __asm__("nop\n\t"); 


And this worked, too!

Admittedly, the avrlibc manual warns that this attribute can be used only for debugging and should never see production, but who cares? It works, right?

Well, right, it worked.

And then I remembered to declare the ‘video memory’ array volatile. Yes, it wasn’t. And it was used in the ISR.


And this should have been the end of this enlightening tale of resourceful stupidity, but then the compiler decided it is not done with me!

I had the delay with _delay_ms() and was just playing around with timers when BOOM:


Err, ok, I just switched my multiplex refresh ISR (overflow) from timer2 to timer1, that should be the problem and maybe _delay_ms() is not that timer-independent after all. Ok, switching back to timer2…

WHAT? This exact code worked five minutes ago! Google, help! Here, here, do not worry, here’s the link.

Ok, right, do not use variables with _delay_ms(). Ah, didn’t know that. Problem solv–

Hey, wait a second. I didn’t use any variables!

To add insult to injury, _delay_ms() worked ok in the other window of the same Arduino IDE. With a variable!

(to be fair, this variable is known at compile time)

What the heck is happening here?



You guessed it! Optimization!

For some reason _delay_ms() doesn’t want to work with optimization turned off.

And thus the optimization adventures finally end.

Leave a comment

Your email address will not be published. Required fields are marked *