Yes, the A/D and D/A chips will have 2mS +/- or so of delay in the decimation/sigma delta process (might be different for DSD in a continuous converter design?), such that if one were to use a monitoring methodology of piping the I2S digital data directly on the PCB from A/D microchip directly to D/A microchip then there could be a 4mS +/- round trip delay....
This is why I just use an external mixer, and kybosh the entire digital monitoring through the DAW (whether FW, USB, or other).... I allow the DAW to line stuff up for me... And the DAW is like a mushroom, just accepting what it is streaming into it for recording, while dumbly replaying output for playback...
I have found that changing the buffer settings can lead to overflow or underflow as you probably are aware
... You may be on a Mac / Unix, but what I know from Windows is that Microsoft has delineated that their OS products are not for real-time use (but they can do well), and use Deferred Procedure Calls to handle interrupts, perhaps with 55mS latency (in the old days, perhaps better now), and sometimes through a hardware abstraction layer (HAL)... DirectX is supposed to bypass this and now WDM etc running with ASIO (in the Stienberg case)...
I am not sure of the Mac/Unix version although knowing enough to be dangerous (or stupid?) about Kernel block versus character drivers in /dev for embedded uCLinux makes me suspicious that dollars on some new hardware may help, but may not be the answer... I dunno if "nice'ing" processes are the answer either...
I also remember ProTools having issues with the USB ports on the MacBook Pro on one physical side of the laptop rather than the other side... (one side had a single USB port -- perhaps a different chipset, whilst the other side had two ports)... I think they fixed that, but it does not give me the warm fuzzies that they are using the OS sanctioned drivers... Perhaps this was due to their hardware locking of the MBox to the software and may not apply in this case?
In so far as USB choking... one could try USB sniffers (like usbmon, or wireshark?) or look to see if there are any system error logs etc.?
Delay compensation is part of the overall monitoring latency... By the time all the delay compensation gets down to the mix bus there is latency based upon the largest delay factor for monitoring... this ass umes that incoming audio is going through the mixer for monitoring?
the slapback echo sounds like there is something else "mixed" in, just like wet versus dry on a delay/reverb unit?
Cuemix stuff, if I can imagine it, is probably done with an FPGA or ASIC (e.g., VIA chipsets like in M-Audio Delta 1010, e.g. block diagram of: http://www.via.com.tw/en/products/audio/usb/vt1730/index.jsp) locally on the PCB as a digital mixer... still has some latency but not as bad as getting an OS in the way with a round trip...