I have understood that "transients" are waveforms, typically at the beginning of a tone or word, without any particular shape in terms of frequency.
Well... yes and no.
Any signal can be analyzed as a sum of frequency components, even transients. But when looked at in the time plane, a transient is as you say, the onset of a tone, or a very short sound. Typically in music, a snare shot.
More generally, we can think of many situations in music when the sound waves' amplitudes and phase are not constant between two sample points. Intuitively hi-res would increase the accuracy since the samples cover shorter time intervals.
Intuituion is not always correct when it comes to complicated matters.
As you say, the first question asked is "What about the things that happen
between the samples?" And luckily, it turns out that both theoretically (Shannon/Nyquist) and practically this is not a problem since a filter is used that implements the sinc function (most often used) to the original signal. This means that every little nook and cranny on the original (bandwidth limited) signal ends up on the CD, and is replayed by the D/A converter.
Moreover, the applied dither makes it possible to increase the resolution in the amplitude so that it is better than the theoretical "bit depth" resolution.
I found some information about this in the AES paper (although the author makes some unsbstantiated claims regarding hi-res that is refuted in the paper I referred to previously) "Coding for High-Resolution Audio Systems", J. Audio Eng. Soc., vol. 52, No. 3, 2004 March, by J. Robert Stuart:
J. Robert Stuart said:
Provided that both the correct level of TPDF dither is used in the quantizer, and the signal has no content above the Nyquist frequency (half the sampling rate), then the system has infinite resolution of both time and amplitude (see the worked examples in [15]).
[15] S. P. Lipshitz and J. Vanderkooy, “Pulse-Code Modulation—An Overview,” J. Audio Eng. Soc., vol. 52, No. 3, 2004 March, pp. 200–215.
I repeat:
the system has infinite resolution of both time and amplitude. Couldn't get much better, eh?