While I’m expecting this blog post series to cover a number of topics, the primary purpose is as a vehicle for discussing abstraction and what it can look like in real-world projects instead of the “toy” examples that are often shown in books and articles. While the DigiMixer project itself is still in some senses a toy project, I do intend to eventually include it within At Your Service (my church A/V system) and my aim is to examine the real problems that come with introducing abstraction.
In this post, I’ll cover the very basics of what we’re trying to achieve with DigiMixer: the most fundamental requirements of the project, along with the highest-level description on what a digital audio mixer can do (and some terminology around control surfaces). Each of the aspects described here will probably end up with a separate post going into far more detail, particularly highlighting the differences between different physical mixers.
Brief interlude: Mixing Station
When I wrote the introductory DigiMixer blog post I was unaware of any other projects attempting to provide a unified software user inferface to control multiple digital mixers. I then learned of Mixing Station – which does exactly that, in a cross-platform way.
I’ve been in touch with the author, who has been very helpful in terms of some of the protocol details, but is restricted in terms of what he can reveal due to NDAs. I haven’t yet explored the app in much depth, but it certainly seems comprehensive.
DigiMixer is in no way an attempt to compete with Mixing Station. The goal of DigiMixer is primarily education, with integration into At Your Service as a bonus. Mixing Station doesn’t really fit into either of those goals – and DigiMixer is unlikely to ever be polished enough to be a viable alternative for potential Mixing Station customers. If this blog post series whets your appetite for digital audio mixers, please look into Mixing Station as a control option.
What is a digital audio mixer?
I need to emphasize at this stage that I’m very much not an audio engineer. While I’ll try to use the right terminology as best I can, I may well make mistakes. Corrections in comments are welcome, and I’ll fix things where I can.
A digital audio mixer (or digital mixer for short from here onwards – if I ever need to refer to any kind of mixer other than an audio mixer, I’ll do so explicitly) is a hardware device which accepts a number of audio inputs, provides some processing capabilities, and then produces a number of audio outputs.
The “digital” aspect is about the audio processing side of things. There are digital mixers where every aspect of human/mixer interaction is still analogue via a physical control surface (described in more detail below). Many other digital mixers support a mixture of physical interaction and remote digital control (typically connected via USB or a network, with applications on a computer, tablet or phone). Some have almost no physical controls at all, relying on remote control for pretty much everything. This latter category is the one I’m most familiar with: my mixers are all installed in a rack, as shown below.
My shed mixer rack, December 2022 – the gap in the middle is awaiting an Allen and Heath Qu-SB, on back-order.
The only mixer in the rack that provides significant physical control is the Behringer X-32 Rack, just below the network switch in the bottom rack. It has a central screen with buttons and knobs round the side – but even in this case, you wouldn’t want to use those controls much in a live situation. They’re more for set-up activities, in my view.
Most of the other mixers just have knobs for adjusting head-phone output and potentially main output. Everything else is controlled via the network or USB.
Even though DigiMixer doesn’t have any physical controls (yet), the vocabulary I’ll use when describing it is intended to be consistent with that of physical control surfaces. Aside from the normal benefits of consistency and familiarity, this will help if and when I allow DigiMixer to integrate with dedicated control surfaces such as the X-Touch Mini, Monogram or Icon Platform M+.
Before getting into mixers, I wasn’t even aware of the term control surface but it appears to be ubiquitous – and useful to know when researching and shopping. I believe it’s also used for aircraft controls (presumably including flight simulators) and submarines.
While mixers often have control surfaces as part of the hardware, dedicated control surfaces (such as the ones listed above) are also available, primarily for integration with Digital Audio Workstations (DAWs) used for music recording and production. Personally I’ve always found DAWs to be utterly baffling, but I’m certainly not the target audience. (If I’d understood them well in 2020, they could potentially have saved me a lot of time when editing multiple tracks for the Tilehurst Methodist Church virtual choir items, but Audacity)
Faders are the physical equivalent to slider controls in software: linear controls which move along a fixed track. These are typically used to control volume/gain.
When you get past budget products, many control surfaces have motorised faders. These are effectively two-way controls: you can move them with your fingers to change the logical value, or if the logical value is changed in some other way, e.g. via a DAW, the fader will physically move to reflect that.
Faders generally do exactly what they say on the tin – and are surprisingly satisfying to use.
For what sounds like an utterly trivial aspect of control, there are a few things to consider when it comes to physical buttons.
The first is whether they’re designed for state or for transition. The controls around the screen of the X-32 Rack mixer demonstrate this well:
There’s a set of four buttons (up/down/left/right) used to navigate within the user interface:
There are buttons to the side of the screen which control and indicate which “page” of the user interface is active:
There are on/off buttons such as for toggling muting, solo, and talkback. (I’ll talk more about those features later on… hopefully muting is at least reasonably straightforward.)
Secondly, a state-oriented button may act in a latching or momentary manner. A latching button toggles each time you press it: press it once to turn it on (whatever that means for the particular button), press it again to turn it off. A momentary button is only “on” while you’re pressing it. (This is also known as “push-to-talk” in some scenarios.) In some cases the same button can be configured to be “sometimes latching, sometimes momentary” – which can cause confusion if you’re not careful.
The most common use case for buttons on a mixer is for muting. On purely-physical mixers, mute buttons are usually toggle buttons where the state is indicated by whether the button is physically depressed or not (“in” or “out”). On the digital mixers I’ve used, most buttons (definitely including mutes) are semi-transparent rubberised buttons which are backlit – using light to represent state is much clearer at-a-glance than physical position. Where multiple buttons are placed close together, some control surfaces use different light colours to differentiate between them. I’ve seen just a few cases where a single physical button uses different light colours to give even more information.
Rotary encoders, aka knobs
While I’ve been trying to modify my informal use of terminology to be consistent with industry standards, I do find it hard to use “rotary encoder” for what everyone else I know would just call a knob. I suspect the reasons for the more convoluted term are a) to avoid sexual connotations; b) to sound more fancy.
Like faders, knobs are effectively continous controls (as opposed to the usually-binary nature of buttons) – it’s just that the movement is rotational instead of linear.
On older mixers, knobs are often limited in terms of the minimum and maximum rotation, with a line on the knob to indicate the position. This style is still used for some knobs on modern control surfaces, but others can be turned infinitely in either direction, reporting changes to the relevant software incrementally rather than in terms of absolute position. Lighting either inside the knob itself or around it is often used to provide information about the logical “position” of the knob in this case.
Some knobs also act as buttons, although I personally find pushing-and-twisting to be quite awkward, physically.
Jog wheel / shuttle dial
I haven’t actually seen jog wheels on physical mixers, but they’re frequently present on separate control surfaces, typically for use with DAWs. They’re large rotational wheels (significantly larger than knobs); some spring back to a central position after being released, whereas others are more passive. In DAWs they’re often used for time control, scrolling backward and forward through pieces of audio.
I mention jog wheels only as a matter of completeness; they’re not part of the abstraction I need to represent in DigiMixer.
Meters aren’t really controls as such, but they’re a crucial part of the humn/machine interface on mixers. They’re used to represent amounts of signal at some stage of processing (e.g. the input for a microphone channel, or the output going to a speaker). In older mixers a meter might consist of several small lights in a vertical line, where a higher level of signal leads to a larger number of lights being lit (starting at the bottom). Sometimes meters are a single color (and if so, it’s usually green); other meters go from mostly green to yellow near the top to red at the very top to warn the user when the signal is clipping.
Meters sometimes have a peak indicator, showing the maximum signal level over some short-ish period of time (a second or so).
How are digital mixers used?
This is where I’m on particularly shaky ground. My primary use case for a mixer is in church, and that sort of “live” setup can probably be lumped in with bands doing live gigs (using their own mixers), along with pubs and bars with occasional live sound requirements (where the pub/bar owns and operates the equipment, with guest talent or maybe just someone announcing quiz questions etc). Here, the audio output is heard live, so the mixing needs to be “right” in the moment.
Separately, mixers are used in studio setups for recording music, whether that’s a professional recording studio for bands etc or home use. This use case is much more likely to use a DAW afterwards for polishing – so a lot of the task is simply to get each audio track recorded separately with as little interference as possible. A mixer can be used as a way of then doing the post-processing (equalizing, compression, filters, effects etc); I don’t know enough about the field to know whether that’s common or whether it’s usually just done in software on a regular computer.
Focusing on the first scenario, there are two distinct phases:
- Configuring the mixer as far as possible beforehand
- Making adjustments on-the-fly in response to what’s happening in the room
The on-the-fly adjustments (at least for a rank amateur such as myself) are:
- Muting and unmuting individual input channels
- Adjusting the volume of individual input/output combinations (e.g. turning up one microphone’s output for the portion of our church congregation on Zoom, while leaving it alone for the in-building congregation)
- Adjusting the overall output volumes separately
What is DigiMixer going to support?
Selfishly, DigiMixer is going to support my use case, and very little else. Even within “stuff I do”, I’m not aiming to support the first phase where the mixer is configured. This doesn’t need any integration into At Your Service – if multiple churches each with their own mixer each have a different mixer model, that’s fine… the relevant tech person at the church can set the mixer up with the app that comes with the mixer. If they want to add some reverb, or add a “stereo to mono” effect (which we have at Tilehurst Methodist Church) or whatever, that doesn’t need to be part of what’s controlled in the “live” second phase.
This vastly reduces the level of detail in the abstraction. I’ve gone into a bit more detail in the section below to give more of an idea of the amount of work I’m avoiding, but what we do need in DigiMixer is:
- Whether the mixer is currently connected
- Input and output channel configuration (how many, names, mono vs stereo)
- Muting for inputs and outputs
- Meters for inputs and outputs
- Faders for input/output combinations
- Faders for overall outputs
What is DigiMixer not going to support?
I have a little experience in trying to do “full fidelity” (or close-to full fidelity) companion apps – my V-Drum Explorer app attempts to enable every aspect of the drum kit to be configured, which requires knowledge of every aspect of the data model. In the case of Roland V-Drums, there’s often quite a lot of documentation which really helps… I haven’t seen any digital mixers with that level of official documentation. (The X32 has some great unofficial documentation thanks to Patrick-Gilles Maillot, but it’s still not quite the same.)
Digital mixers have a lot of settings to consider beyond what DigiMixer represents. It’s worth running through them briefly just to get more of an idea of the functionality that digital mixers provide.
Channel input settings
Each input channel has multiple settings, which can depend on the input source (analog, USB, network etc). Common settings for analog channels are:
- Gain: the amount of pre-amp gain to apply to the input before any other signal processing. This is entirely separate from the input channel’s fader. (As a side-note, the number of places you effectively control the volume of a signal as it makes its way through the system can get a little silly.)
- Phantom power: whether the mixer should provide 48v phantom power to the physical input. This is usually used to power condenser microphones.
- Polarity: whether to invert the phase of the signal
- Delay: a customizable delay to the input, used to synchronize sound from sources with different natural delays
“Standard” signal processing
Most mixers allow very common signal processing to apply to each input channel individually:
- A gate reduces noise by effectively muting a channel completely when the signal is below a certain threshold – but with significantly more subtlety. A gate typically has threshold, attack, release and hold parameters.
- A compressor reduces the dynamic range of sound, boosting quiet sounds and taming loud ones. (I find it interesting that this is in direct contrast to high dynamic range features in video processing, where you want to maximize the range.)
- An equalizer adjusts the volume of different frequency bands.
Effects (FX) processing
Digital mixers generally provide a fixed set of FX “slots”, allowing the user to choose effects such as reverb, chorus, flanger, de-esser, additional equalization and others. A single mixer can offer many, many effects (multiple reverbs, multiple choruss etc).
Not only does each effect option have its own parameters, but there are multiple ways of applying the effect, via side-chaining or as an insert. Frankly, it gets complicated really quickly – multiple input channels can send varying amounts of signal to an FX channel, which processes the combination and then contributes to regular outputs (again, by potentially varying amounts).
I’m sure it all makes sense really, but as a novice audio user it makes my head hurt. Fortunately I haven’t had to do much with effects so far.
Routing refers to how different signals are routed through the mixer. In a very simple mixer without any routing options, you might have (say) 4 input sockets and 2 output sockets. Adjusting “input 1” (e.g. with the first fader) would always adjust how the sound coming through the first input socket is processed. In digital mixers, things tend to get much more complicated, really quickly.
Let’s take my X32 Rack for example. It has:
- 16 XLR input sockets for the 16 regular “local” inputs
- 6 aux inputs (1/4″ jack and RCA)
- A talkback input socket
- A USB socket used for media files (both to play and record)
- 8 XLR main output sockets
- 6 aux outputs (1/4″ jack and RCA)
- A headphone socket
- Two AES50 ethernet sockets for audio-over-ethernet, each of which can have up to 48 inputs and 48 outputs. (The X32 can’t handle quite that many inputs and outputs, but it can work with AES50 devices which do, and address channels 1-48 on them.)
- An ultranet monitoring ethernet socket (proprietary audio-over-ethernet to Behringer monitors)
- A “card” which supports different options – I have the USB audio interface card, but other options are available.
(These are just the sockets for audio; there are additional ethernet and MIDI sockets for control.)
How should this vast set of inputs be mapped to the 32 (+8 FX) usable input channels? How should 16 output channels be mapped to the vast set of outputs? It’s worth noting that there’s an asymmetry here: it doesn’t make sense to have multiple configured sources for a single input channel, but it does make sense to send the same output (e.g. “output channel 1”) to multiple physical devices.
As an example, in my setup:
- Input channels 1-16 map to the 16 local XLR input sockets on the rack
- Input channels 17-24 map to input channels 1-8 on the first AES50 port, which is connected to a Behringer SD8 stage box (8 inputs, 8 outputs)
- Input channels 25-32 map to channels 1-8 via the USB port
- Output channels 1-8 map to the local output XLR sockets and to the first AES50 port’s outputs 1-8 and to channels 9-16 via the USB port
- Output channels 9-16 map to channels 1-8 via the USB port (yes, that sounds a little backwards, but it happens to simplify using the microphones)
- The input channels 1-8 from the first AES50 port are also mapped to output channels 17-24 on the USB port
- The output channels 1-8 on the USB port are also mapped to input channels 25-32 on the USB port.
Oh, and there are other options like having an oscillator temporarily take over an output port. This is usually used for testing hardware connections, although I’ve used this for reverse engineering protocols – a steady, adjustable output is really useful. Then there are options for where talkback should go, how the aux inputs and outputs are used, and a whole section for “user in” and “user out” which I don’t understand at all.
All of this is tremendously powerful and flexible – but somewhat overwhelming to start with, and the details are different for every mixer.
Each digital mixer has its own range of settings, such as:
- The name of the mixer (so you can tell which is which if you have multiple mixers)
- Network settings
- Sample rates
- MIDI settings
- Link preferences (for stereo linked channels)
- User interface preferences
That’s just a small sample of what’s available in the X32 – there are hundreds of settings, many cryptically described (at least to a newcomer), and radically different across mixers.
When I started writing this blog post, I intended it to mostly focus on the abstraction I’ll be implementing in DigiMixer… but it sort of took on a life of its own as I started describing different aspects of digital mixers.
In some ways, that’s a good example of why abstractions are required. If I tried to describe everything about even one of the mixers I’ve got, that would be a very long post indeed. An abstraction aims to move away from the detail, to focus on the fundamental aspects that all the mixers have in common.
This series of blog posts won’t be entirely about abstractions, even though that’s the primary aim. I’ll go into some comparisons of the network protocols supported by the various mixers, and particular coding patterns too.
There’s already quite a bit of DigiMixer code in my democode repository – although it’s in varying states of production readiness, let’s say. I expect to tidy it up significantly over time.
I’m not sure what I’ll write about next in terms of DigiMixer, but I hope the project be as interesting to read about as it’s proving to explore and write about.