Apologies for the long post, but I've discovered some new software called Spleeter (deezer/spleeter) which has been released by Deezer and I thought I'd share it here.
For years, I have made my upmixes 4.1 rather than 5.1 as I am not a fan of having too much music content going into the centre channel. However, when done properly (eg on the many excellent Steve Willson surround mixes) having the main vocal solo in the centre speaker can really focus the mix. Unfortunately, the upmix programs I have used (mainly SpecWeb and DTS Neural - although I have also played with Nugen a bit) always put way too much in the centre for my liking, hence I have always stuck with Quad upmixes.
I recently came across this post by Jon Urban (Check out this snippet from George Harrison's "Cockamamie Business") where he had used iZotope RX7 Music Rebalance to create isolated vocals in the centre channel from a Penteo Quad upmix which sounded great and I have since been (rather obsessively) refining my processes to see if I can get good, clean, discrete vocals into the centre channel. I got hold of a copy of RX7 for testing purposes, but also came across a relatively new bit of opensource software called Spleeter which, to my ears, is actually as good if not better than iZotope at extracting vocals (although it has some limitations) - and obviously has the benefit of being free vs $279 for iZotope RX Standard.
For me, it was important to make sure that only vocals that should be in the centre (typically the lead singer) go there and the rest (harmonies/backing vocals etc) remain in the other speakers. Obviously, this is not easy to achieve, as tools such as iZotope and Spleeter just get the vocals, without pulling them apart any further and there tend to be audible artefacts when you listen to the extracted vocals in isolation.
In order to start from as 'central' a source as I could, I made use of another free program which some of you may be familiar with called CenterCutGui. CenterCutGui itself was, I believe one of the original inspirations for the slice methods used in Spec, so it has a good pedigree... The other great thing about both Spleeter and CenterCutGui is that they can be used from a command line, so therefore can be called from a batch script. Together with a few other really useful command-line tools (SoX, MrsWatson and Foobar) I can pretty much automate the conversion from stereo to 5.1 Flac and have got some really great results so far.
My revised workflow is as follows:
1) Listen in realtime to the 'raw' upmix from DTS Neural
For this I use Foobar with the VST adapter component (foobar2000:Components/VST 2.4 adapter (foo_vst) - Hydrogenaudio Knowledgebase). To get a quad upmix, my default settings are to set width to 100% which gives nothing to the centre speaker and I solo the rear channels and adjust the depth so that I can hear as little of the lead vocal as possible, usually somewhere between -20 and -50 although this varies, of course, depending on the source material. For most albums, I find the same settings work for all tracks. Occasionally, I need different settings per track - and this would generally be the case with compilations or best-ofs. The settings can then be saved as a pr3eset in an .fxp file which is basically just a standard asci txt file and can be viewed in notepad.
2) Convert to Multichannel Flac
Using the preset from step one, I have written a couple of batch files that do the following:
a) Check if any tracks segue - If they do I join the source files together to avoid any glitches at the beginning/end of tracks
b) The source files need a little bit of preparation before they go through MrsWatson. To get 6 channel output, you need to feed in 6 channel input, so I create 5.1 wavs with the stereo as front L & R and the rest of the channels silent (using Sox). Also, the combination of DTS Neural and MrsWatson do not handle sample rates above 48kHz, so to trick them into working with high-res files, I change the sample rate without resampling (i.e. double the duration) so that 96kHz becomes 48kz and 88.2kHz becomes 44.1kHz.
c) MrsWatson (a command-line based VST host) then processes the files and gives a 5.1 wav output
d) Sox then splits the MrsWatson output into LR, LFE and sLrR files and reverses any changes to the sample rate
e) The 'Final' task required to get to 4.1 files is to balance the channels keeping the overall dynamics of the album intact. By default, I set the RMS levels of the rear channels to be 6dB lower than the front channels. To do this, I use Foobar (controlled via a batch file using the Run_Command component) to run a Dynamic Range scan and create a foo_dr.txt file for the source files and each of the separated files created in step (d). The Foo_dr files contain all the information required (peak and RMS levels) to calculate the correct gain to add to each channel before recombining into a 5.1 Flac file (with a silent Centre Channel). All calculations are carried out by the batch file.
This is where I always stopped - a bit of re-tagging later and my Quad upmixes were complete... until now. You could of course use any tool of your preference (SpecWeb, Penteo etc) to get to this point.
3) Create a centre channel containing (almost) just vocals
I have written a new batch file which does the following:
a) Extract a centre channel from the LR channels created in step 2(e) using CenterCutGui. This in itself does a great job, but still keeps any centre mixed instrumentation, and I want to have only vocals with as little 'other stuff' left behind as possible
b) My new discovery, Spleeter then creates 2 stems from this centre channel; vocals + accompaniment. There are other pre-trained models that can create 4 and 5 stems which you can read more about here (deezer/spleeter). One limitation is that the pre-trained models only work up to 16kHz and so you won't get the full benefit of a high res source on the centre channel. Also, it suffers from an annoying stack overflow on files over a certain size, so I have added a section in the batch file that splits each file into chunks before processing then stitches them back together (with an overlap that is trimmed off to ensure there are no spikes). It also only seems to output 44.1kHz/16bit stems irrespective of what you feed in. Despite these limitations, the resultant vocal stems are pretty good - but still need a bit of work... As Spleeter is relatively new and still very much under development, these limitations will hopefully be overcome in time.
c) I use Sox again to blend 20% of the centre channel extracted by CenterCutGui in step 3(a) with the vocal stem output from Spleeter (after resampling the Spleeter stem if it no longer matches the source). This helps mask artefacts from the Spleeter process and also makes the 'final' centre channel match the source bit depth & sample rate. The end result is quite a clean sounding centre channel based on the true phantom centre of the original stereo source with the vocals at full volume and just a bit of the music content left behind - enough that your brain hears a focused isolated vocal from the centre speaker.
d) To create the 'final' front L+R channels, I usi Sox again to subtract the final mixed centre channel from step 3(c) from the level adjusted LR pair from step 2(e)
e) The last step is for Sox to recombine all the channels back into a mutichannel flac file and copy any tags over from the source file (using another free command-line tool called tag.exe which is no longer developed but can still be found on line - eg here https://www.softpedia.com/get/Multimedia/Audio/Tag-Editors/Tag.shtml)
Some of the upmixes I have created using this new technique have been truly stunning. I can't take any credit for the various bits of software used - or the genius that must go into the stereo mixes in the first place to allow them to be pulled apart so well, but I am very much enjoying listening to some of my favourite music in a whole new way.
Attached are a few highlights for you to sample. These are DTS files, but have been zipped to allow upload. DTS and FLAC versions are also available here: Spleeter Samples)
Billie Eilish - Wish You Were Gay (From When We All Fall Asleep, Where Do We Go)
Chemical Brothers - It Doesn't Matter (From Dig Your Own Hole)
George Harrison - Cockamamie Business (From Best Of Dark Horse 1976-1989)
Jeff Wayne - The Spirit Of Man (From The War Of The Worlds - The New Generation)
Madonna - Music (From Music)
Muse - Drones (From Drones)
Any comments greatly received - and I would be very happy to share my batch files with anyone who is interested.
For years, I have made my upmixes 4.1 rather than 5.1 as I am not a fan of having too much music content going into the centre channel. However, when done properly (eg on the many excellent Steve Willson surround mixes) having the main vocal solo in the centre speaker can really focus the mix. Unfortunately, the upmix programs I have used (mainly SpecWeb and DTS Neural - although I have also played with Nugen a bit) always put way too much in the centre for my liking, hence I have always stuck with Quad upmixes.
I recently came across this post by Jon Urban (Check out this snippet from George Harrison's "Cockamamie Business") where he had used iZotope RX7 Music Rebalance to create isolated vocals in the centre channel from a Penteo Quad upmix which sounded great and I have since been (rather obsessively) refining my processes to see if I can get good, clean, discrete vocals into the centre channel. I got hold of a copy of RX7 for testing purposes, but also came across a relatively new bit of opensource software called Spleeter which, to my ears, is actually as good if not better than iZotope at extracting vocals (although it has some limitations) - and obviously has the benefit of being free vs $279 for iZotope RX Standard.
For me, it was important to make sure that only vocals that should be in the centre (typically the lead singer) go there and the rest (harmonies/backing vocals etc) remain in the other speakers. Obviously, this is not easy to achieve, as tools such as iZotope and Spleeter just get the vocals, without pulling them apart any further and there tend to be audible artefacts when you listen to the extracted vocals in isolation.
In order to start from as 'central' a source as I could, I made use of another free program which some of you may be familiar with called CenterCutGui. CenterCutGui itself was, I believe one of the original inspirations for the slice methods used in Spec, so it has a good pedigree... The other great thing about both Spleeter and CenterCutGui is that they can be used from a command line, so therefore can be called from a batch script. Together with a few other really useful command-line tools (SoX, MrsWatson and Foobar) I can pretty much automate the conversion from stereo to 5.1 Flac and have got some really great results so far.
My revised workflow is as follows:
1) Listen in realtime to the 'raw' upmix from DTS Neural
For this I use Foobar with the VST adapter component (foobar2000:Components/VST 2.4 adapter (foo_vst) - Hydrogenaudio Knowledgebase). To get a quad upmix, my default settings are to set width to 100% which gives nothing to the centre speaker and I solo the rear channels and adjust the depth so that I can hear as little of the lead vocal as possible, usually somewhere between -20 and -50 although this varies, of course, depending on the source material. For most albums, I find the same settings work for all tracks. Occasionally, I need different settings per track - and this would generally be the case with compilations or best-ofs. The settings can then be saved as a pr3eset in an .fxp file which is basically just a standard asci txt file and can be viewed in notepad.
2) Convert to Multichannel Flac
Using the preset from step one, I have written a couple of batch files that do the following:
a) Check if any tracks segue - If they do I join the source files together to avoid any glitches at the beginning/end of tracks
b) The source files need a little bit of preparation before they go through MrsWatson. To get 6 channel output, you need to feed in 6 channel input, so I create 5.1 wavs with the stereo as front L & R and the rest of the channels silent (using Sox). Also, the combination of DTS Neural and MrsWatson do not handle sample rates above 48kHz, so to trick them into working with high-res files, I change the sample rate without resampling (i.e. double the duration) so that 96kHz becomes 48kz and 88.2kHz becomes 44.1kHz.
c) MrsWatson (a command-line based VST host) then processes the files and gives a 5.1 wav output
d) Sox then splits the MrsWatson output into LR, LFE and sLrR files and reverses any changes to the sample rate
e) The 'Final' task required to get to 4.1 files is to balance the channels keeping the overall dynamics of the album intact. By default, I set the RMS levels of the rear channels to be 6dB lower than the front channels. To do this, I use Foobar (controlled via a batch file using the Run_Command component) to run a Dynamic Range scan and create a foo_dr.txt file for the source files and each of the separated files created in step (d). The Foo_dr files contain all the information required (peak and RMS levels) to calculate the correct gain to add to each channel before recombining into a 5.1 Flac file (with a silent Centre Channel). All calculations are carried out by the batch file.
This is where I always stopped - a bit of re-tagging later and my Quad upmixes were complete... until now. You could of course use any tool of your preference (SpecWeb, Penteo etc) to get to this point.
3) Create a centre channel containing (almost) just vocals
I have written a new batch file which does the following:
a) Extract a centre channel from the LR channels created in step 2(e) using CenterCutGui. This in itself does a great job, but still keeps any centre mixed instrumentation, and I want to have only vocals with as little 'other stuff' left behind as possible
b) My new discovery, Spleeter then creates 2 stems from this centre channel; vocals + accompaniment. There are other pre-trained models that can create 4 and 5 stems which you can read more about here (deezer/spleeter). One limitation is that the pre-trained models only work up to 16kHz and so you won't get the full benefit of a high res source on the centre channel. Also, it suffers from an annoying stack overflow on files over a certain size, so I have added a section in the batch file that splits each file into chunks before processing then stitches them back together (with an overlap that is trimmed off to ensure there are no spikes). It also only seems to output 44.1kHz/16bit stems irrespective of what you feed in. Despite these limitations, the resultant vocal stems are pretty good - but still need a bit of work... As Spleeter is relatively new and still very much under development, these limitations will hopefully be overcome in time.
c) I use Sox again to blend 20% of the centre channel extracted by CenterCutGui in step 3(a) with the vocal stem output from Spleeter (after resampling the Spleeter stem if it no longer matches the source). This helps mask artefacts from the Spleeter process and also makes the 'final' centre channel match the source bit depth & sample rate. The end result is quite a clean sounding centre channel based on the true phantom centre of the original stereo source with the vocals at full volume and just a bit of the music content left behind - enough that your brain hears a focused isolated vocal from the centre speaker.
d) To create the 'final' front L+R channels, I usi Sox again to subtract the final mixed centre channel from step 3(c) from the level adjusted LR pair from step 2(e)
e) The last step is for Sox to recombine all the channels back into a mutichannel flac file and copy any tags over from the source file (using another free command-line tool called tag.exe which is no longer developed but can still be found on line - eg here https://www.softpedia.com/get/Multimedia/Audio/Tag-Editors/Tag.shtml)
Some of the upmixes I have created using this new technique have been truly stunning. I can't take any credit for the various bits of software used - or the genius that must go into the stereo mixes in the first place to allow them to be pulled apart so well, but I am very much enjoying listening to some of my favourite music in a whole new way.
Attached are a few highlights for you to sample. These are DTS files, but have been zipped to allow upload. DTS and FLAC versions are also available here: Spleeter Samples)
Billie Eilish - Wish You Were Gay (From When We All Fall Asleep, Where Do We Go)
Chemical Brothers - It Doesn't Matter (From Dig Your Own Hole)
George Harrison - Cockamamie Business (From Best Of Dark Horse 1976-1989)
Jeff Wayne - The Spirit Of Man (From The War Of The Worlds - The New Generation)
Madonna - Music (From Music)
Muse - Drones (From Drones)
Any comments greatly received - and I would be very happy to share my batch files with anyone who is interested.