If that were the case, then why do all the proprietary-format AEI audio CD's (sampled at I think 12 bit and 37.something KHz to get four hours of stereo music on one CD) all sound crappy?
Music (supposedly) only goes to 15KHz like on FM Stereo radio - so a 37.whatever sample rate and 12 bit word length should have been more than enough for commercially-produced CD's as well according to your math.
And if THAT was the case then why is there SACD's and DVD ProAudio and Blu-Ray Multichannel High Resolution and on and on and on if twice the same rate of the highest frequency was ``good enough''?
A number of good questions. First, the range considered audio is 20 to 20,000 Hz. Most of us can't hear all the way to 20,000, but a few can. I can still hear to at least 16,000. Second, 37KHz isn't that bad of a sampling rate. FM stereo broadcasting has a subcarrier at 38 KHz, and we've enjoyed listening to that most of our lives.
The big if in you question is the 12 bits. 12 bits is enough to capture about 72 dB of dynamic range. That may sound like a lot, but it isn't. First, unless a lot of compression and limiting is done, there needs to be at least 10 dB of headroom for peaks. That means that your normal 0 Vu recording level needs to be at least 10 dB below the peak capability of your medium (in this case digital word). That has reduced your dynamic range to 62 dB (theoretical, real world issues are going to take at least 2 dB away from that, and more likely 5 or 6). Some of the early digital music synthesizers used 12 bit. Normal sounds weren't too bad, by decay (particularly things like a bell fading out) sounded awful. It gives very little room for manipulation. Also the human ear can hear a range of at least 90 dB.
Things get more interesting when you look at the details of sound, though. 60 dB SNR was the FCC limit for FM broadcasting, and a lot of analog consumer equipment only came in around 55 dB (some closer to 45 dB). But that it total noise energy spread across the entire audio specrum. The human ear is able to hear discrete frequencies well below the level of broadband noise, so just because the analog SNR is only 60 dB doesn't mean that its OK to limit the digital dynamic range to 72 dB. There is also another aspect that a lot of people ignore. The number of bits in digital leads to a condition called quantization error, with an artifact known as quantization noise. The bottom end of the dynamic range is a hard cut off at the least significant bit. Not only that, but the handling of the LSB is generally not perfect. The result is a low level of intermodulation distortion. IM has a very harsh, metallic sound that is quite noticeable, so it needs to be kept well below both the level of signal being recorded and well below the wideband noise floor, or it will detract noticeably from the content.
So the number one problem with the bad quality sources you cited has nothing to do with sampling frequency, and everything to do with number of bits. I do not consider anything to be a high quality digital system that uses less than 24 bits per sample (per channel). While I generally enjoy listening to CDs that are recorded using 16 bits, I know people who can't stand to listen to them, because they can hear the IM and it sounds harsh, metallic and grating. Of course pragmatically, manufacturers are trying to design systems that use as few bits (both in data size and in sample rate) as possible, because they can store more information, and transfer it faster over a given medium (such as the internet). But this leads to compromises. We had compromises in the analog world, also. Records varied quite a bit in noise performance, for instance, depending on the exact vinyl formulation. Reel to reel tape varied depending on the tape used, the speed, the number of tracks and the width of the tape (both of which determine the width of the track).
Another factor that impacts a lot of digital recording is compression. MP3 (the most common one) is not a lossless compression technique. It creates a mathematical model of tones (rather than sounds) because such a model can convey most of the information with far less data than raw samples. Then it tries to reproduce the original sound from that model, on playback. When done well it can sound fairly good. But I've heard some low bandwidth MP3 signals that were pretty harsh (part of a DVB signal). Any high quality music recording process should either use no compression (like PCM) or should use only lossless compression.
As far as the sampling rate is concerned. The theory remains as stated. If you want accurate response to 15 KHz then you have to sample at 30 KHz or above. IF you want accurate response to 20 KHz then you have to sample at 40 KHz or above. Note that standard CDs do that (barely). The CD-4 standard actually does that, as the 30 KHz subcarrier is mathematically equivalent to sampling at 30 KHz. The theroretical maximum audio frequency that can be reproduced through a CD-4 record is 15 KHz.
But. There are real world concerns here as well. If I am sampling at 30 KHz and I encounter a audio tone (probably a harmonic of something, or possibly part of a wideband signal such as a cymbal crash--or even a snare drum brush stroke)--Let's say it has a frequency of 15.4 KHz. It will be digitized as if it was 14.6 KHz. Besides being wrong, that has no harmonic relationship to the original so it will sound rather harsh and foreign. So, if anything can possibly be above that frequency (and this has nothing to do with whether the human hear could have heard it or whether it would be desirable to reproduce correctly) it must be filtered out before the digitization is done. The problem is that you can't make a filter that passes 14,999 with no loss and reduces 15,001 to the level of inaudible. We have technology today that can do fairly well, but not that good. The reason CDs use 44.1 is that the filter has to cut off at 22.05 and it isn't that difficult to pass 20,000 and reject 22050. But the point is that the sampling rate has to be sufficiently above what is needed to allow a filter to be designed that has room for its transition band.
There is a second issue that has to be dealt with, also. If you want the mathematical background, do some research on the term sin(x)/x. It turns out that the sampling process doesn't capture the amplitude completely accurately. There is a frequency dependent factor that gets worse, the closer you get to the sampling frequency. There are ways of compensating for this with filters, but that has side effects, too (like changing the phase). The variation is fairly small, and since it is worst near the top edge of the frequencies, where it is probably least noticeable, is is often ignored. However, these are the two reasons why most professional audio recording is done at 96 KHz. It give a lot more room for the filter transition band, resulting in a simpler filter and less filter artifacts. It also means that the actual audio is in a portion of the spectrum where sin(x)/x errors are very small (like .02 dB). Even if the end product is going to be put on CD (at 44.1 KHz) or converted to MP3, it is advantageous to sample at 96 KHz, do whatever filtering and other processing is necessary in the digital domain and then do sample rate conversion to the lower rate, with everything as carefully prepared digitally as possible, so the end product is as close to the theoretical ideal.
Side note: That is also why high end CD players use more than 16 bit DACs and sometime sample rates higher than 44.1 KHz. It allows them to take the original (which hopefully was generated under close to ideal conditions) and convert it back to analog under conditions closer to ideal. It will never sound better than what is theoretically possible for 16 bit PCM and 44.1 KHz, but it will approach that ideal more closely. How much? maybe only 1 or 2 dB, but it will make a small difference.
Now for CD-4 decoding. If the sample rate is 192 KHz, the theoretical bandwidth is 96 KHz. The record has nothing on it above about 50 KHz, so there is plenty of margin to do the filtering, etc. right. If it is done with 24 bits the theoretical dynamic range is 144 dB which has plenty of margin for headroom and quantization error and still is well beyond the dynamic range of human hearing. (There's even enough room for doing the RIAA de-emphesis digitally). 24 bit ADCs don't actually give 144 dB of dynamic range. The requirements on the analog hardware doing the digitizing are just too unreasonable. The best do about 110 to 120 dB of dynamic range, but that is still 18 - 20 bits worth (theoretical), so it is worth the effort.
I hope that helps. A lot of listeners come from more of a music background than an engineering background. While we all pick up numbers and "facts" and have to have some sense of what is better and worse, most lack the technical background to attribute what they don't like in what they hear to the correct cause. For instance, digital is not BAD. It is just that digital is often done badLY. Companies exist to make money and they make money by making compromises--always have. Some compromises are more acceptable than others. Fortunately, I am a practicing engineer and software developer who happens to have an interest in music. I earned a minor in music in college. So I'm in the unique position of having "one leg on each side of the fence".