The "recording of that element" in the metadata would be a difference audio signal. The result of subtraction of one audio signal from another. Precisely the style of relationships and manipulation used in past systems, yes! That's still object audio in the metadata though.
@steelydave does a much better job of accuracy than I did but we're not in disagreement.
My point was meant to be that the system keeps actual discrete audio channels throughout the process. Involving lossless phase inversions and mlp mysteries as it may be. I thought I read it the other way around when I read some tech papers a few years ago now. Maybe I read a 'simplified' version? But alright... difference signals in the metadata then from all the object channels. Same end result.