> I can't say I've gone off to study the Early effect, probably should.
Nah. Fancy name for something you already know: even with current-source loads, the voltage gain of a transistor is never infinite. Or: a BJT has a "Amplification Factor" and a "Plate resistance".
In a 12AU7, if changing the grid-cathode voltage does a certain thing to the current, then changing the plate-cathode voltage 20 times more does the same thing to the current. Or if you force a constant current though, and change the G-K voltage 1V, the P-K voltage has to change 20 times more in the other direction. It has a Mu (amplification factor) of 20. The maximum voltage gain is 20. The plate resistance turns out to be 20 times the cathode impedance, which is 1/Gm.
A 12AX7 is the same except the factor is 100.
A pentode also has this, but the factor is 200-1,000 so in practice, you can not find a load high enough to reach the theoretical maximum gain (though AM radio IF tanks come close).
Also, in a 12AU7 the factor is pretty constant, 17 to 20, over nearly the full range of current it will work at. Down at a few microamps it falls off, but in audio we can't go there because of capacitive loading. In a pentode it varies considerably more, with current and also with minor variations of manufacture. That hardly matters because we can't "see" the amplification factor except with precision tests, it never affects practical work.
Same thing in BJTs and JFETs, at least above the "triode range" (0.2V on a BJT, 2-3V on a JFET). If you do precision tests at constant current, the gain is less than infinity.
A man named Early developed the theoretical explanation of why this happens. The electric field inside the device does something to the way the holes and electrons lay around. As far as I've ever been able to figure out, all this hole/electron BS is voodoo magic, of no use to us who stuff pre-made transistors together in various ways.
However, in BJT it is possible to get a good-enough current source and load-buffer to show the "amplification factor". As in your design: "the diff pair is working into current mirrors, which makes the collectors essentially infinitely high impedance". To a first approximation, and assuming lots of buffering, the stage voltage gain "should" be infinite. With real devices, Mu is 1000-2000 and this type scheme will show a voltage gain no higher than 300-1000 (because of three collector-base junctions on the node). The wild voltage swing on the collector makes the base junction bigger and smaller, counteracting the base drive voltage. It is like you have a 10 ohm resistor in series with the base, and a 10K resistor collector to base. The real base will see 1:1000 feedback from its collector.
In volt-amp stages, it helps to keep the base drive impedance very low (don't make the series base resistance any bigger than you can avoid). Q108 keeps Q107's base drive impedance low, so Q107 can maybe make a voltage gain of 500 (modified by the compensation cap). Q108 lives at a near constant voltage, so has little Early effect on its base current and on the input pair. (A Darlington for Q107 would be worse than your split-Darlington.)
In current-sources, Early effect means you may get 1.000mA at 1V, 1.010mA at 10V, 1.1mA at 100V. That isn't constant! It can be even worse with some topologies.
Cascoding isolates the working base from the working collector. The bottom base sees a collector stuck at a few volts, the top base sees the wildly swinging collector above it but is powerless to change the current it gets from the bottom device (actually its control is β times less, so it has little effect). If you need to get the MAX voltage gain from "a single stage", cascoding is needed all around the node. (Obvious way to go from a 8-tranny to a 12-tranny design and "feature"...)
Early effect seems to be non-linear: it isn't exactly 1000 but varies with voltage. If you cancel Gm variation with constant current, I think Early effect is one of the main causes of residual THD (junction capacitance also causes residual THD).
β times Early tells the maximum power gain of a BJT. If β is 100 and Early is 1000, then we have 100*1000= 100,000 or 50dB of power gain absolute max for a single transistor.
Classic broadcast amps have power gain of 40dB or 50dB, but we like to mis-match the input and outputs 10dB (2K load on a 200Ω source, etc) and a hard-worked BJT has THD around 26% so we want to throw a lot of gain into feedback to control THD.
So while you can make a 1-tranny buffer, or get a little gain for small signals, a phono preamp or broadcast-spec mike amp needs two transistors to "work", three to be good, and maybe 4 to be great. Some of these stages may want to be doubled: diff-in, push-pull out, so we wind up around 4 to 10 transistors in any gain block that does non-trivial work.
uA/LM741 is 6 transistors in 4 current-gain stages, plus more than that in support circuitry. (741's main audio flaws are slow PNPs and poor AB bias, not tranny-count.) The classic small-Langevin and Neve did mike-amps in 3 or 4 transistors but without diff or push-pull pairs. A big-Langevin used 8 transistors in 4 stages all push-pull. The Altec copy with 3 stages 6 transistors is not as highly regarded.
So if you really want to do your sums, you can take the gross box-level specs and estimate the number of transistors needed. (Note that an emitter follower has power gain more like 20dB, common-base about 30dB.) Line-amp is rated 50dB gain, but voltage-matched in and out. Gross power gain is 70dB. You could do this in one CE and one CC stage (a popular pair) but THD will be high unless signal is very small compared to supply power. Taking 20dB feedback in each transistor, you need 3 or 4 to reach the 70dB spec for the whole box. The THD is less with push-pull, but the transistors are doubled: when transistors cost money, the BBC and later Sir Rupert didn't do push-pull.