New Opensource tool "Spleeter" for extracting stems

QuadraphonicQuad

Help Support QuadraphonicQuad:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.

zcftr29

Well-known Member
Joined
Jun 18, 2012
Messages
102
Apologies for the long post, but I've discovered some new software called Spleeter (deezer/spleeter) which has been released by Deezer and I thought I'd share it here.

For years, I have made my upmixes 4.1 rather than 5.1 as I am not a fan of having too much music content going into the centre channel. However, when done properly (eg on the many excellent Steve Willson surround mixes) having the main vocal solo in the centre speaker can really focus the mix. Unfortunately, the upmix programs I have used (mainly SpecWeb and DTS Neural - although I have also played with Nugen a bit) always put way too much in the centre for my liking, hence I have always stuck with Quad upmixes.

I recently came across this post by Jon Urban (Check out this snippet from George Harrison's "Cockamamie Business") where he had used iZotope RX7 Music Rebalance to create isolated vocals in the centre channel from a Penteo Quad upmix which sounded great and I have since been (rather obsessively) refining my processes to see if I can get good, clean, discrete vocals into the centre channel. I got hold of a copy of RX7 for testing purposes, but also came across a relatively new bit of opensource software called Spleeter which, to my ears, is actually as good if not better than iZotope at extracting vocals (although it has some limitations) - and obviously has the benefit of being free vs $279 for iZotope RX Standard.

For me, it was important to make sure that only vocals that should be in the centre (typically the lead singer) go there and the rest (harmonies/backing vocals etc) remain in the other speakers. Obviously, this is not easy to achieve, as tools such as iZotope and Spleeter just get the vocals, without pulling them apart any further and there tend to be audible artefacts when you listen to the extracted vocals in isolation.

In order to start from as 'central' a source as I could, I made use of another free program which some of you may be familiar with called CenterCutGui. CenterCutGui itself was, I believe one of the original inspirations for the slice methods used in Spec, so it has a good pedigree... The other great thing about both Spleeter and CenterCutGui is that they can be used from a command line, so therefore can be called from a batch script. Together with a few other really useful command-line tools (SoX, MrsWatson and Foobar) I can pretty much automate the conversion from stereo to 5.1 Flac and have got some really great results so far.

My revised workflow is as follows:

1) Listen in realtime to the 'raw' upmix from DTS Neural

For this I use Foobar with the VST adapter component (foobar2000:Components/VST 2.4 adapter (foo_vst) - Hydrogenaudio Knowledgebase). To get a quad upmix, my default settings are to set width to 100% which gives nothing to the centre speaker and I solo the rear channels and adjust the depth so that I can hear as little of the lead vocal as possible, usually somewhere between -20 and -50 although this varies, of course, depending on the source material. For most albums, I find the same settings work for all tracks. Occasionally, I need different settings per track - and this would generally be the case with compilations or best-ofs. The settings can then be saved as a pr3eset in an .fxp file which is basically just a standard asci txt file and can be viewed in notepad.

2) Convert to Multichannel Flac

Using the preset from step one, I have written a couple of batch files that do the following:

a) Check if any tracks segue - If they do I join the source files together to avoid any glitches at the beginning/end of tracks
b) The source files need a little bit of preparation before they go through MrsWatson. To get 6 channel output, you need to feed in 6 channel input, so I create 5.1 wavs with the stereo as front L & R and the rest of the channels silent (using Sox). Also, the combination of DTS Neural and MrsWatson do not handle sample rates above 48kHz, so to trick them into working with high-res files, I change the sample rate without resampling (i.e. double the duration) so that 96kHz becomes 48kz and 88.2kHz becomes 44.1kHz.
c) MrsWatson (a command-line based VST host) then processes the files and gives a 5.1 wav output
d) Sox then splits the MrsWatson output into LR, LFE and sLrR files and reverses any changes to the sample rate
e) The 'Final' task required to get to 4.1 files is to balance the channels keeping the overall dynamics of the album intact. By default, I set the RMS levels of the rear channels to be 6dB lower than the front channels. To do this, I use Foobar (controlled via a batch file using the Run_Command component) to run a Dynamic Range scan and create a foo_dr.txt file for the source files and each of the separated files created in step (d). The Foo_dr files contain all the information required (peak and RMS levels) to calculate the correct gain to add to each channel before recombining into a 5.1 Flac file (with a silent Centre Channel). All calculations are carried out by the batch file.

This is where I always stopped - a bit of re-tagging later and my Quad upmixes were complete... until now. You could of course use any tool of your preference (SpecWeb, Penteo etc) to get to this point.

3) Create a centre channel containing (almost) just vocals

I have written a new batch file which does the following:

a) Extract a centre channel from the LR channels created in step 2(e) using CenterCutGui. This in itself does a great job, but still keeps any centre mixed instrumentation, and I want to have only vocals with as little 'other stuff' left behind as possible
b) My new discovery, Spleeter then creates 2 stems from this centre channel; vocals + accompaniment. There are other pre-trained models that can create 4 and 5 stems which you can read more about here (deezer/spleeter). One limitation is that the pre-trained models only work up to 16kHz and so you won't get the full benefit of a high res source on the centre channel. Also, it suffers from an annoying stack overflow on files over a certain size, so I have added a section in the batch file that splits each file into chunks before processing then stitches them back together (with an overlap that is trimmed off to ensure there are no spikes). It also only seems to output 44.1kHz/16bit stems irrespective of what you feed in. Despite these limitations, the resultant vocal stems are pretty good - but still need a bit of work... As Spleeter is relatively new and still very much under development, these limitations will hopefully be overcome in time.
c) I use Sox again to blend 20% of the centre channel extracted by CenterCutGui in step 3(a) with the vocal stem output from Spleeter (after resampling the Spleeter stem if it no longer matches the source). This helps mask artefacts from the Spleeter process and also makes the 'final' centre channel match the source bit depth & sample rate. The end result is quite a clean sounding centre channel based on the true phantom centre of the original stereo source with the vocals at full volume and just a bit of the music content left behind - enough that your brain hears a focused isolated vocal from the centre speaker.
d) To create the 'final' front L+R channels, I usi Sox again to subtract the final mixed centre channel from step 3(c) from the level adjusted LR pair from step 2(e)
e) The last step is for Sox to recombine all the channels back into a mutichannel flac file and copy any tags over from the source file (using another free command-line tool called tag.exe which is no longer developed but can still be found on line - eg here https://www.softpedia.com/get/Multimedia/Audio/Tag-Editors/Tag.shtml)

Some of the upmixes I have created using this new technique have been truly stunning. I can't take any credit for the various bits of software used - or the genius that must go into the stereo mixes in the first place to allow them to be pulled apart so well, but I am very much enjoying listening to some of my favourite music in a whole new way.

Attached are a few highlights for you to sample. These are DTS files, but have been zipped to allow upload. DTS and FLAC versions are also available here: Spleeter Samples)

Billie Eilish - Wish You Were Gay (From When We All Fall Asleep, Where Do We Go)
Chemical Brothers - It Doesn't Matter (From Dig Your Own Hole)
George Harrison - Cockamamie Business (From Best Of Dark Horse 1976-1989)
Jeff Wayne - The Spirit Of Man (From The War Of The Worlds - The New Generation)
Madonna - Music (From Music)
Muse - Drones (From Drones)

Any comments greatly received - and I would be very happy to share my batch files with anyone who is interested.
 

Attachments

  • Billie Eilish - Wish You Were Gay.zip
    5.1 MB
  • Chemical Brothers - It Doesn't Matter.zip
    5.1 MB
  • George Harrison - Cockamamie Business.zip
    5.1 MB
  • Jeff Wayne - The Spirit Of Man.zip
    5.1 MB
  • Madonna - Music.zip
    5.1 MB
  • Muse - Drones.zip
    4.9 MB
Very nice!! Actually, QQ Member @staygroovy PM'd me a while back about Spleeter and such, but since I already had RX-7, it was way beyond my technical expertise to get into all of the command line stuff and all - or basically I was just to lazy to investigate it.

Anyway, your results are spectacular. I went back and compared your Harrison to the one I did, and I can tell right off the bat that my center was too low, where yours is crisp and clean. My eventual finished track did have the vocals up, but your results are most impressive.

It's great that we can all do decent upmixes these days because the labels sure are not helping out by releasing REAL stuff, so it's up to use to do this stuff.

Great job, and thanks a ton for writing it all up for everyone. Very nice.

Here is your version and my version in Sound Forge, very hard to see a difference, and yours was done by hand and with all freeware!

CB.jpg
 
Hi Jon - Thanks for your kind words. It's almost all opensource software with the exception of DTS Neural which does the initial upmix to quad. I did an ABX comparison between your rears and mine and they were almost identical, with perhaps a little more bass in the Penteo vs DTS Neural. I am also in occasional contact with Staygroovy - I will get in touch and see what he has managed to get out of Spleeter.

In contrast to my comments in my original post RE putting vocals only in the centre channel, I have just finished doing Starfleet Project by Brian May + Friends and, as this is all about the guitars, I thought it would be interesting to see if I could get them to take centre stage using a similar process. To this end, I used CenterCutGui to give me the 'phantom' centre, then ran through Spleeter - but this time with the 5 stem model, which gives Vocals, Bass, Drums, Piano and 'Other'. The resultant piano and vocal stems had very little in them, therefore to improve quality, rather than using the 'other' stem directly, which in this case was basically the guitars, I subtracted the bass and drums from the CenterCutGui output and then blended back 20% to mask any remaining artefacts.

The clip attached (DTS as before, FLAC available here: Spleeter Samples) has Eddie Van Halen very clearly in the centre channel, with Brian May strongly in the rears and the remainder (Alan Gratzer on drums, Phil Chen on bass and Fred Mandel on piano) mostly in front L+R.

I think I might adapt my vocal separation process in a similar way and subtract the 'other stuff' which I think should give better quality results - I will report back in due course...
 

Attachments

  • Brian May + Friends - Blues Breaker.zip
    5.1 MB
As a purely academic exercise, I thought I'd see how close I could get to a 'proper' 5.1 mix using the above techniques. As a source, I took Shout By Tears For Fears from the 2014 Steve Wilson remix and upmixed from his stereo mix to compare with his 5.1 mix. The images below are from the 30-second clip attached (also in FLAC here: Spleeter Samples). Obviously, the SW mix (Top) is much cleaner, but the sound of the upmix (Bottom) and separation of the channels is very close...

Tears For Fears - Shout (SW Mix).JPG

Tears For Fears - Shout (Upmix).JPG
 

Attachments

  • Tears For Fears - Shout (SW Mix).zip
    4.9 MB
  • Tears For Fears - Shout (Upmix).zip
    4.9 MB
Very nice!! Actually, QQ Member Staygroovy PM'd me a while back about Spleeter and such, but since I already had RX-7, it was way beyond my technical expertise to get into all of the command line stuff and all - or basically I was just to lazy to investigate it.
If you wish to test Spleeter without getting into command line stuff, you can download the trail version of Acoustica 7.2.1 (Acoustica | Digital Audio Editor). There's a separate download for spleeter which also allows you to listen to the separate stems in real time. Of course, using this software is more time consuming as you'll have to save the individual stems one by one for remixing to surround.
 
Damn @HaflerSQQS, your post above made it look too tempting. I went for it and bought the software to check it out. I had been waffling on the $199 upgrade to Sound Forge 14, waiting to see if they'd get down to $149 at some point, but this was much more interesting to me, especially after your clear and concise post above.

So, I bought it, tried it, and wow! It worked like a breeze. I ended up with the stem files to a Badfinger tune that I had been working on with RX-7 and Penteo. I wasn't really happy with the results I was getting so I just tried it with Acoustica Premium and your step by step directions above.

Poof! So easy. Once I got my file folders created the stem creation was SO MUCH FASTER than doing it with RX7.

I will do some comparison testing now between the two methods. Of course, the resulting stems are not crystal clear tracks as if they were studio multitracks, but they are probably as good as anyone will ever get as those multi's are probably long gone.

So THANKS! Very cool. Now I have yet ANOTHER program to figure out. Good grief, I now have Sound Forge, AA (somewhere), Reaper, and now Acoustica Premium. And the only one I really know how to use is SF.

Here's a crude look at the resulting "Baby Blue" stems in Sound Forge.

BB stems.jpg
 
Jon,
since Spleeter works in a totally different way than classic phase-involved decoding (SQ/QS and so on) or panorama-slicers (Penteo etc...) the combination of Spleeter + another one could give you much more stems to work with for a mch mix.
For example, first spleeter to get 5 stereo stems, then process every stems with something else and then remix everyting. It may even be a combination for fixing some very-tame SQ mix we all known but not love at all.
 
Here's a sample of my first try at this program in creating a 5.1 from stereo. The track is "Have Mercy On the Criminal" from Elton John's "Don't Shoot Me I'm Only the Piano Player", an album that we should have had on SACD but were denied by Universal for some bizarre reason.

Anyway, here's a sample, less than a minute so it should be OK. If they complain I will remove the link but I doubt they'd even care. The purpose of this sample is to show what this program can do, not to pass around a minute of Elton

It's only 10Mb, 16/45 5.1 flac


https://mega.nz/file/yQIAUS6K#UDP4R3zrhgTktroWjCOg7p-k3eu8ftAOEzCJk_-ANv8
 
Damn @HaflerSQQS, your post above made it look too tempting. I went for it and bought the software to check it out. I had been waffling on the $199 upgrade to Sound Forge 14, waiting to see if they'd get down to $149 at some point, but this was much more interesting to me, especially after your clear and concise post above.

So, I bought it, tried it, and wow! It worked like a breeze. I ended up with the stem files to a Badfinger tune that I had been working on with RX-7 and Penteo. I wasn't really happy with the results I was getting so I just tried it with Acoustica Premium and your step by step directions above.

Poof! So easy. Once I got my file folders created the stem creation was SO MUCH FASTER than doing it with RX7.

I will do some comparison testing now between the two methods. Of course, the resulting stems are not crystal clear tracks as if they were studio multitracks, but they are probably as good as anyone will ever get as those multi's are probably long gone.

So THANKS! Very cool. Now I have yet ANOTHER program to figure out. Good grief, I now have Sound Forge, AA (somewhere), Reaper, and now Acoustica Premium. And the only one I really know how to use is SF.

Here's a crude look at the resulting "Baby Blue" stems in Sound Forge.

View attachment 50632
Since Spleeter is based on the TensorFlow libraries for Machine Learning, there are still huge margins for improvement in the years to come. Should the developers, or someone else decide to train the network once again with an extended or augmented dataset of multitracks, perhaps we'll be able to get more stems out of the process, or better-quality stems (with a higher frequency than 11Khz). I have been using the TensorFlow libraries for sketch recognition and basic structural design (with Generative Adversarial Networks - GANs). It's a pain to train these things: you need to feed thousands of examples before the network learn to recognise certain features and be able to make clusters across multiple knowledge domains. After that, however, they work beautifully. In my field it can take months to construct a decent dataset, as one is required to do it by hand, literally... With music it must be much easier, as long as one has access to a copious amount of multitracks. This is definitely the future in many fields, and Spleeter has persuaded me it will also be the future for upmixes.
 
Spleeter does have optional libraries to go above 11kHz, 16kHz I think it is. But you need to install as a command line interface.
I am 68 years old... the weakest link in my music-listening chain is my own ears... my ears/brain can only hear up to 8KHz... but that doesn't prevent me from enjoying multi-channel music.
 
The 16k libraries would be the best thing to use always, and before complaining that we're used today to 96/24 stuff, let's not forget that CD4 system was designed with a +- 16KHz frequency response band, so we're not too much off band.
 
So (anyone) what's the cost involved in this method of upmixing? I mean, I'm cool with anything that works having upmixed hundreds of albums, but forget that, how much?
 
So (anyone) what's the cost involved in this method of upmixing? I mean, I'm cool with anything that works having upmixed hundreds of albums, but forget that, how much?

Basically free, the cost involved is if you want to generate and use the stems inside Acoustica Premium - but spleeeter by itself is free.
So, you can generate the stems with spleeter then importing it into your preferred DAW for free.
 
So do you get the nice GUI version through purchasing the Acoustica Premium Edition; and a different command line version is the free one?

Somebody's gotta be developing an open-source GUI for Spleeter, right? (Or figuring out ways to connect it seamlessly to an open-source DAW.)
 
Last edited:
Spleeter is only one of the "new" music separation tools, several of which are free, but yeah most of the free ones are python command line.

I have an email thread going with a friend is who is a "remixer" vs. "upmixer". A lot of mono to stereo, etc.

Anyway at one point, at least for his test song, he said he preferred:

Drums -- Stems was clearly the best​
Vocals -- RX7 first, Spleeter second​
Bass -- Demucs first since an upright bass being used, then RX​
Other -- RX7 was clearly the best with much less artifacts, a distant second is Spleeter​
...
Spleeter: deezer/spleeter
Open Unmix: Introduction | SigSep
Xtrax Stems: XTRAX STEMS - Audionamix

To name the main players. Then before the AI approach we had spectral editors, but that was VERY labor intensive.

So I'm watching this space, to see if I want to include one or more of the opensource tools in my upmixing tools, but frankly for stereo to 5.1 I'm not sure having stereo stems (4 or 5 only, vs. every instrument/voice) is all that helpful. Questions like "Do you really want all drum sounds in the rears?" etc., so I'm interested in what you all come up with.

Oh, for myself I've made drag and drop icons that abstract away the command line stuff. But trying to share them with my friend, who is less versed in command line stuff, has not always been successful, especially as the tool distributions change.
 
Back
Top