A new(ish) upmix method under development

zeerround · Dec 5, 2021

Those of you reading along with me in my posts to this forum know that for immersive upmixes I've been developing tools & techniques for combining upmixing, and re-mixing, using stems that were separated from the original stereo using machine learning techniques.

I still think this has the potential to create the "best" immersive upmixes, but it is labor/time intensive and some would prefer more of a drag and drop solution like SpecWeb.

Then in the last month I went down a Rabbit Hole looking for Music Source Separation Techniques that might be applicable when you had a lot of synth sounds, like in electronica, vs. Guitars and Pianos.

I thought maybe there would be something "timbral based", maybe something before all the tools started using the same MUSDB18 dataset, but didn't really find anything useful. (Update: LALAL.ai has a "synth" extraction in beta, but in a one song test if failed pretty bad).

Then I thought about what else could I do with the tools at hand, and came up with a bit of an out of the box idea. What if I used "arctan" to spread the stereo sound up and over your head, vs. just in the horizontal speakers? From C to Fronts to Front Heights to Rear Heights and finally to Rears. I prototyped something in Plogue Bidule, and have been improving/refining it over the last month.

One of the refinements is a control for fold it down into 7.1, so adding a little of that fills in the side speakers and setting it all the way to 7.1 gives you a 7.1 output (vs. 7.1.4).

Yes I'm thinking about porting it to SpecWeb, at least for 7.1, if not all the way to 7.1.4, but that will probably take a while.

Anyway the results to date are promising and it gives a very "room filling", "in the band" sound, compared to other upmix methods.

For going beyond 7.1, you may be wondering about how playback would be done. Assuming you don't have a 12 or more output audio device AND a way to get those signals into your AVR/Surround system, it's going require Dolby Atmos, DTS:X, Auro 3D or MPEG-H/ambisonic encoding. Short of buying very expensive software encoders (most of which require a mac, vs. a windows pc, to run), until recently there wasn't really a viable option for hobbyist up/remixers, but now you can use cloud based services to Dolby Atmos encode your 12 ch wav files.

Amazon Web Services Elemental Media Converter is one such service that is inexpensive and fairly easy to use (but not push button without some automation being built).

Building that automation into an upmixer is doable, but again will take some time.

That, my day job, and other hobbies will keep me off the streets I guess ;0)

In the meantime, if there are Plogue Bidule users that want to dive in early let me know an I can share the prototype layout and instructions, as well as help getting going with Dolby Atmos encoding via AWS (if you need help).

Cheers,
Z

proufo · Dec 6, 2021

Peter Jackson says that he used machine learning to restore the Beatles’ music

zeerround · Dec 6, 2021

Yeah I saw that.

"To me, the sound restoration is the most exciting thing. We made some huge breakthroughs in audio. We developed a machine learning system that we taught what a guitar sounds like, what a bass sounds like, what a voice sounds like. In fact, we taught the computer what John sounds like and what Paul sounds like. So we can take these mono tracks and split up all the instruments we can just hear the vocals, the guitars. You see Ringo thumping the drums in the background but you don't hear the drums at all. That allows us to remix it really cleanly."

[emphasis mine]

So I'm on two or three parallel tracks. Pure upmix (7.1 through 7.1.4), and upmix combined with remix from AI/ML separated sources.

The AI/ML stuff is starting to move (improve) really fast. Lots of competing solutions (which is good).

kfbkfb · Dec 6, 2021

I have not heard any upmixed content, I was wondering if the imaging is stable (in some cases, logic assisted matrix based surround sound synthesizing doesn't produce stable imaging):
https://worldradiohistory.com/Archive-All-Audio/Archive-Audio/70s/Audio-1973-07.pdf#page=54^^^
On logic SQ, the standard stereo sound
seems to wave gently in the breeze,
so to speak.

Kirk Bayne