> why is it 200:50k and not 100:25k or any other numbers?
It could be either. You'd have to squint a VTVM to know the difference.
It "could be" 2K:500K.... except the bass response would be poor and the treble response would be poor.
"Nominal impedance" must include broadband loss, frequency response and power considerations.
It isn't 2:500, because a "200 ohm: 50K" winding probably has 10 ohms and 2K5 copper resistance. Your input power would mostly heat the copper, not the load. In fact this is a way to guesstimate the rough ballpark of a winding. If the DCR is 10r, then it is more likely intended for few-K use than for 8 ohm or 50K ohm use. Use nominal impedances 10 to 20 times the DCR, it won't be utterly wrong.
Formulas for power transformer saturation, or audio tranny distortion, include voltage not power. You can hold voltage constant and reduce impedance to get a higher power rating... ah, but as you reduce impedance your DCR losses increase and eventually most of your power isn't getting through.
Let's arbitrarily say that your transformer is flat down to 50Hz. The primary inductance should be over 200 ohms impedance at 50Hz.
Now try to use it as 2K impedance. If the primary is 200 impedance at 50Hz, then it won't get to 2K impedance until 500Hz. If the source is 2K, the lower frequency limit is 500Hz. That may be acceptable in a cheap telephone coupler, not an all-round studio preamp.
The top end is more complicated. You have series leakage inductance and you have capacitance. There is capacitance in the winding, and your first tube has input capcitance. The leakage inductance is a little about how souped-up your iron is (hot iron needs fewer turns to make bass response, which means less leakage inductance), and also about how cleverly you interleave. This is between the designer and the accountant. The capacitance is less affected by number of turns, mostly by physical size. Hot iron helps somewhat. Clever winding helps some. But ultimately you can not reach high impedance at wide audio bandwidth, because capacitance bites.
12AX7 grid is infinite resistance (>100Meg) and ~~100pFd capacitance. One reason to use lower-Mu 12AY7 is the lower Miller effect and input capacitance. It could even allow a bit higher step-up before capactance bites, minimizing the loss of tube gain. But a practical size transformer has ~~100pFd self-capacitance, so mucking with the tube does not help a lot. Pentode has low input capacitance, but high hiss. You could use a very high step-up to swamp the hiss... ah, but that winding capacitance does not go away. Anyway a pentode has high gain, and coupled from a high step-up transformer, you will likely overload on large sounds.
Low-impedance is mostly "easy". You can design a decent 150:2K on the back of your phone bill. While you can use RDH4 to estimate leakage and capacitance, you can just go for a reasonable bass limit, wind it, and try it, probably with good result.
Transistor inputs are "usually" worked with high NFB so their input impedance is "high". (Could be low but you have to stand on your head to think that way.) But you don't design for high step-up, because BJTs have significant input noise current. There's an "optimum" impedance for low noise. Early Ge had narrow low optimums and you find some strange ratios, even step-down. The first clean Si devices were small, and optimum might be 2K-5K. Large clean Si can be adjusted (with emitter current) for OSI from below 500 to over 20K. Jensen's insight was that with ample Silicon he could wind the transformer 150:600 quadrifilar, get very low noise, and top-end far beyond the audio band.