Signal Processor

dale116dot7 · Feb 3, 2008

I'm thinking about designing a multi-effects box, and I'm trying to decide what to use for signal processing. Here's my short list:

AT91 microcontroller for the host - not the signal processor but to run the VFD (I like those better than LCD) and the user interface. I'm pretty familiar with it. You don't need a special programmer - just use the USB port.

TMS320VC33 - reasonably simple to hook up, pretty high speed. A bit shy on I/O but I could make it work. Can access a lot of RAM. I would probably give it 1Meg by 32 bit fast SRAM. Maybe leave a spot for another meg. Over 100 MIPS. Free assembler available. That's worth a lot, considering that most other DSP's do not have development tools for less than $4000.

DSP56366 - nice audio DSP, a bit low on addressable memory, however. I'd prefer not to need paging but I'd do it if I had to. Free development tools available, and better I/O.

Wavefront AL3201 + AL3101 set. I've written some code for the AL3201 (reverb engine) and it works. The reverb engine has about the same performance and memory as a PCM70. Actually it is slightly better because the multiplier is seven bit plus sign and not three bit plus sign so you don't need double-precision very often. The downside is that the memory is fixed at 32k samples, and if you want to use two or more cores (like the 480L) you can't communicate between cores easily - there's only a stereo PCM port in and out. Well, there's a way but it's so much hassle.

FPGA - Xilinx or Atmel or something like that. A way to get the I/O and processing the way I want it but a lot more work. With a big enough FPGA, the host processor and the DSP can both be soft cores - or the DSP could be a 'custom' - one designed specifically for effects. I'm a bit scared of VHDL, however. Last time I worked with any kind of programmble logic I was just doing GAL20V8's and 22V10's and used CUPL.

Ideas?

Rochey · Feb 3, 2008

replace that VC33 with a TMS320C6727

C programmable, floating point - and now being designed in by some of the major big boys in Pro Audio. The Eval board comes with a 100dB codec. Should be enough to get started.

/R

dale116dot7 · Feb 3, 2008

I wanted to use it. $4500 for the C compiler (other than the 30-day evaluation version) was a bit steep. Also, the BGA package is pretty much a non-starter for DIY builds.

-Dale

playboss · Feb 4, 2008

there is also AVR32 , devkit for 70 usd, thats looks good for portable. Rackmount, I'd go with FPGA, you could also do the decimation / oversampling inside it , then its guaranteed to sound genuine. &, there is DSD :razz:
http://www.emmlabs.com/pdf/papers/DerkSigmaDelta.pdf

keefaz · Feb 4, 2008

There is an AVR DIY usb devkit for $22 here. I use it with a Linux OS, but should work fine with Windows

dale116dot7 · Feb 4, 2008

I looked at the various TI dsp's of the C672x family and the smaller C6722 or C6726 would be manageable in a 144-QFP with the limit of a 16-bit data bus to memory. For most effects I doubt this would be a bottleneck. I'm comfortable with QFP's but prototyping with a 1mm BGA is a bit scary to me. I've seen the toaster oven method of doing it, or maybe I can find a place with an oven that I can just run one or two prototype boards through for a case of :guinness:.

jdbakker · Feb 4, 2008

More options:

- The Analog Devices Blackfin. Available in QFP, very simple booting (can use an AVR to load code over SPI), and has a free gcc C-compiler port. Hardware is optimized for 16x16 ops (video), but the compiler generates the proper instructions for greater word lengths. Has a ~$150 dev kit available with glueless interfacing to most I2S codecs, and I believe I have even seen working 2-layer board designs floating around.

- Atmel AT91RM9200/AT91SAM9260. ARM9s, so lots of free tools available, and pretty fast 32-bit MUL/MACs. 208-QFP. Easy interfacing to most codecs.

Disclaimer: these chips work best with fixed-point math (this is what I use pretty much exclusively).

JDB.

dale116dot7 · Feb 4, 2008

That Atmel part looks nice! In terms of MAC instructions, quite a bit slower than the TI DSP (about 4 or 5 times), but probably ok for anything I want to do for now. It looks like doing maybe a few hundred MAC's per sample (at 48k) would be ok. I've used the AT91SAM7XC256 parts and they were quite ok. I got GCC going and the USB SAMBA loader working within a day of getting everything together. Another advantage of that is that I would not need a host processor. I'd need a fast reverb code chunk (probably triggered by an interrupt) that would process the sample, with the host management and USB and/or MIDI port handling running in the background. Interfacing to SDRAM is nice, which both the TI DSP and the ARM can do. In terms of raw MAC power, the TI part is going to be 4 or 5 times as much which would be necessary for convolution. I'm going more for chorus, multitaps, general sound warping - that kind of thing. Maybe some verbs.

What about designing the board for the Atmel part and an optional spot for the 144-pin version of the TI C6722 / C6726 part? If the Atmel part does everything I want, then great. If I want (or need) more 'guts' then solder down the TI part? Another option would be to design for two of the Atmel parts, which could keep the code manageable. That would be kind of like the Lex PCM90 that used two cores - or the 480L that used four or six cores - each core is the equivalent of a PCM70 and can run 128 instructions (MAC or data movement) per sample.

I like having the USB port, MMC card, and CODEC interfaces available.

If this works out, maybe I could offer a kit or PC board to people, or offer up the gerbers and source code to some other member to take care of manufacturing? I'm building it for myself, mostly, but I don't mind sharing. BTW it will be a four-layer board. I don't want the EMI hassles of designing a board with low-level analogue and high-speed digital without access to a good ground plane structure. Also, SDRAM signals are pretty quick.

I do need to look at the addressing modes available on the ARM. I would like something that I can define a large round-robin array of RAM. One way of doing that is to embrace memory 'mirroring' - an inconvenience sometimes, but a godsend other times. This is one of the latter. With that you just need do to this:

yout = .050 * predlyl[427+a];
yout += .223 * rvba[441+a];
yout += .564 * rvbc[851+a];
...
a++;
if(a>DLYMEMSIZE) a=0;

With memory mirroring - using a 4 Meg RAM in an 8 Meg space - the address will wrap over automagically and you don't need to individually wrap each sample's address or take any more addressing time. On TI DSP's they have an auto-wrap mode but I was trying to figure out how to do this on any processor without that feature.

-Dale

JohnRoberts · Feb 4, 2008

It seems most DSP mfrs should offer a development board, just to help junior engineers get their feet wet with a given processor family. I bought a low end Microchip dsp board for a non audio project I'm working on.

If your favorite processor doesn't already offer such a board maybe you can approach them to partner with them one one? They could subsidize some of your low volume run cost in return for development boards they can offer to promote their processor to others.

JR

jdbakker · Feb 4, 2008

[quote author="dale116dot7"]I do need to look at the addressing modes available on the ARM. I would like something that I can define a large round-robin array of RAM. <snip>[/quote]
How many coefficients do you have?
How many previous samples do you need to refer to?
How sequential (vs random) is your addressing of these samples/coefficients? Any tricks possible to make the addressing more sequential?

If you need to work on a data set that's much larger than cache/internal SRAM, MACs/sec don't matter -- you'll be bound by the access time of your memory subsystem. On SDRAM, random access latency for back-to-back single word reads (word width == SDRAM bus width) is six cycles on a 100MHz memory bus. So no matter how fast your processor is, on random access you can never do better than ~16 MMACs/sec if you need to get one parameter from external RAM, or ~8 MMACs/sec if both parameters need to be brought in.

So, in order from cheap to expensive:

- get an algorithm that fits inside the processor's zero-latency memory
- rewrite your algo for sequential access. Sequential access can be much faster, down to 2 mem clocks for the first word in a burst / 1 mem clock for subsequent words. For bonus points some processors allow you to preload lines, hiding part of this latency.
- use faster external memory, such as SyncSRAM. Most of these will still limit you to a 2 mem clock minimum latency, though.
- get a part with large enough internal memory. Can get quite expensive.

Don't worry about the ARM's addressing modes. They are quite versatile; what hurts is that you need to load all parameters into a register before you can do operations on them. Fortunately instruction timing is very predictable and it's doable to hide load/store times during the execution of a MAC.

JDB.
[missing the StrongARM, still not understanding how Intel could screw that chip line up so badly]

dale116dot7 · Feb 4, 2008

Multitap delays are actually horrible that way. They will always be non-cached accesses randomly throughout memory. At best you can do a back-to-back read and write - read from the tail of one delay line then write to the head of the next. Fast SRAM is much nicer in terms of verbs and multitaps. Coefficients are generally ok to put in internal RAM. Actually, the addresses in both a 'verb and a multitap, or chorus, or any other of those effects are pretty close to random and spread throughout sample memory.

Looking at a 'typical' verb, you're probably looking at 200 or so MAC's per sample. If I need some additional processing for other stuff, that's not going to be easy - but if I just used fast SRAM instead (15ns or so) , that might work. A bit expensive, but faster. For reference, a PCM91 can do about 12 MMAC's per second.

-Dale

dale116dot7 · Feb 27, 2008

I think I've figured out what I am going to do. After talking to a DSP guy in town, he convinced me to stay away from the latest TI architecture for a hobby sort of project. Anything with more than ten stages in its pipeline is a bit crazy to debug. He's worked with the various C5x and C6x parts and thought that something a lot simpler would be better for me to use.

The first of two effects boxes I have decided on is to use the Wavefront AL3201 reverb chip, coupled with a Freescale 9S08AW60 host processor. The end result should be something PCM-70-ish, and will be quite quick to implement. It can also be done on a double-sided PC board. I just finished an engine management system using the 9S08AW60 so I am very familiar with it. I expect to have it running at least one reverb algorithm within a few weeks. It should have no trouble doing multivoice chorus, odd resonators, MIDI control, and a room/plate/chamber/inverse reverb.

If that box goes well, I have pretty much decided on a pair of Freescale DSP56366's with a 256k SRAM each, and a bus transfer register in between them for sharing data back and forth. I think synchronizing the code between the two processors would be pretty easy. I was thinking of bus transfer registers but perhaps it would be better to just plop down a dual-port RAM there.

Quite possibly I might add a shared DRAM or SDRAM section that either can access. I would want to pipeline the controller so the DSP would give an address and data to read/write, then get the result some time later. Latency on SDRAM access is pretty bad for sample-based algorithms. I saw the Lexicon Vortex do something like this. This is obviously a longer-term, more ambitious project.

For host duties, I need to look at that a bit closer, since I would prefer a 3.3V part since the DSP's are all 3.3V. Should be pretty easy. I want a processor with an external memory port for reasonably efficient access to the DSP's.

- The DSP56366 looks to be a lot easier to program using low-cost development tools (gcc), though obviously less performance than the TI chips.
- I am already very familiar with Freescale parts so the learning curve for other parts would be much less, I think.

ruairioflaherty · Feb 27, 2008

Hey Dale,

I'm fascinated by this thread and the previous 480 repair thread. I'm a total beginner but I've put all my energy into understanding regular sandstate and IC based stuff. I have zero understanding of tube or dsp based stuff but find it very interesting to watch/learn/listen from the sidelines.

At the risk of derailing your discussion can I ask how you approach writing the reverb algorithm? When you've settled on an architecture and you sit down to code - what is your approach? Can you "borrow" algorithms from commercially available boxes? Do you even want to?

Any thoughts appreciated. I recently used a 480 for the first time and had a life changing audio experience. Beautifully restored Steinway grand + pair of Neumann TLM 171 in omni + GML pre + 480L + fantastic jazz pianist = pure heaven. The single best sound I've ever recorded in my 15 year career. Sorry I'm rambling...

All the best,
Ruairi

dale116dot7 · Feb 27, 2008

If you start with Dattorro's (in)famous article covering effects design, you can see a Lexicon plate structure. Also, there are a lot of published algorithms out there, but in terms of efficiency for sound, that structure is efficient and sounds good. That's a good start. Then you can play with the algorithms, adding taps and diffusors, and that kind of thing. If you make a structure you can play with the taps, delay line lengths, and coefficients pretty easily using the host processor and front panel.

For other effects, you can often read the settings and figure out how it works. That is certainly the case for, say, the PCM70 'Chorus Echo' algorithm, or 'Resonant Chords'. You can take the PCM80 manual, and they give very detailed signal flows for everything except their reverbs.

But my feeling is that if I am taking this approach, that's fine but don't sell it, and don't show the code to anyone.

mhelin · Feb 29, 2008

Freescale Soundbite (uses Symphony™ DSP56371) is a nice kit if you want to play with some DSP stuff:
http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=SYMP_SOUNDBITE&fsrch=1

It's a 180 MIPS device but with the built-in EFCOP (filter co-processor) its almost like a 360 MIPS device.

dale116dot7 · Feb 29, 2008

The part looks pretty nice but no external memory interface. That's why I like the 56366 - it has a SRAM interface. Right now I'm doing a preliminary design for the memory interface for the 56366 now. The local SRAM is no problem but shared memory and processor interlocking is always a bit of a hassle if you want the capability of multiple processors.

-Dale

Guest · Feb 29, 2008

[quote author="mhelin"]Freescale Soundbite (uses Symphony™ DSP56371) is a nice kit if you want to play with some DSP stuff:
http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=SYMP_SOUNDBITE&fsrch=1

It's a 180 MIPS device but with the built-in EFCOP (filter co-processor) its almost like a 360 MIPS device.[/quote]

It is $150 for dialers (hardware only).
How much for DIY with all needed tools to program and debug?

Signal Processor

Help Support GroupDIY Audio Forum:

dale116dot7

Well-known member

Rochey

Well-known member

dale116dot7

Well-known member

playboss

Well-known member

keefaz

Well-known member

dale116dot7

Well-known member

jdbakker

Well-known member

dale116dot7

Well-known member

JohnRoberts

Well-known member

jdbakker

Well-known member

dale116dot7

Well-known member

dale116dot7

Well-known member

ruairioflaherty

Well-known member

dale116dot7

Well-known member

mhelin

Well-known member

dale116dot7

Well-known member

Guest

Guest

Similar threads

Latest posts