I vote for C... I don't belive in assembler
Or at least I wouldn't suggest going there unless it's completely necessary.
For the expander you could just use the range shifted and call zero something other than 100% duty cycle, but as you said it would take part of the scale (exactly 1bit if you use 50% dutyC as reference) An option would be to have 2 independent sections, one for boost and other for attenuation, so you switch 2 sides of the divider. Now, you should be careful with the timing between both depending on what you are doing. You could use 2 resistors and 2 switches and short one or the other but if doing that you should only use one switch at a time, while using 4 resistors and shorting 2 of them gives you 4 usable steps where you could potentially extract better response for the whole range.
Getting back to the firmware, I guess I'd go for C with nothing fancy to do all the math, doing the processing, curves, shapes, feedback, etc. and interrupts to manage the PWM without the use of libraries, as you may not always know when they are going to bite you. Also be careful to how to drive the PWM output as the µC may have dedicated pins for the timers which could benefit greatly rather than relying on ISR to do the output updating.
Fast attack times (for brick-wall function) could be done in some clever way, I don't remember about the M3 but the 328 has a comparator which can drive ISR, you could set the comparator to a level (slow well filtered PWM output if needed) and automatically activate the attenuation and then you have like 200 clock cycles to do the math and switch it again when needed. (200 cycles based on 80MHz clock, 100kHz PWM, 2.5dB minimum attenuation for 1 PWM cycle once over the comparator) With this approach you would be responding instantly to the transient, as you have the settling time of your filter to start reacting (from detection in the comparator to action in the ISR, few clock cycles, still faster than the filter) and then the partial PWM cycle to resolve how bad it was.
JS