Brief Summary
Rather than replacing us as some fear it threatens to, technology using AI should be making it easier for us to do what we’re already doing. If our tools know more about what music should sound like they will require less input from us, leaving us free to do the creative stuff. Here are three tasks we’d delegate today!
Going Deeper
We hear about AI taking the role of humans out of the production process. Are there tasks which AI might make the jobs of producers and engineers easier and the music better rather than threatening to make us unnecessary and potentially devalue the craft and the product? We think there might be.
The kind of thing I’m talking about are tools like Sonnox DrumGate. It takes a laborious task - setting the threshold and side chain filters of a gate so you can exclude everything apart from the wanted signal using ‘dumb’ parameters like level and frequency content and instead can differentiate between a kick, a snare and a tom because of how they sound, just like we do. It’s important not to confer abilities too readily but the result is that you don’t have to micro-manage the process, it can figure out what you want and it just does it. What other tasks would benefit us if they were more like this? A bit more (artificially) intelligent?
What made me think about this was a use for Chat GPT which I actually find useful. I’ve played with Chat GPT in the past and always found the results superficially impressive but when you actually read the text it generates it feels fundamentally uninteresting. Like a really bad first year undergraduate essay!
The use I found which actually saved me time was that I have tried dictation technologies in the past and have found that, apart from the danger of unlimited waffling (a serious danger in my case…), I also found I spent too long checking and correcting the text to really have gained any time over manual typing (in case you’re wondering, I’m typing this in the time served fashion). What Chat GPT helped me do was to quickly clean up dictated text by using its huge data set and LMM to catch those misheard or misinterpreted words and sort them out automatically. You do of course still have to check it but it’s significantly quicker.
Music is also a language. Are there cases of tasks which regularly get in our way in music production which might benefit from a similar use of a music LMM? Here are three suggestions:
Tidying up MIDI
MIDI data is inherently ‘neater’ than its messy cousin audio. Most of it is distinctly binary - on or off. Even MIDI CC data has a hugely reduced resolution compared to audio so it seems a great candidate for simple, intelligent cleanup. All DAWs have plenty of tools for selecting MIDI data according to user-defined parameters. For example Pro Tools, criticised by some for lagging behind competitors in the depth of its MIDI features, allows selection of notes by note range, velocity, position or duration. This manual process isn’t what I mean. I mean something which uses a large data set like the Large Language Model (LMM) of Chat GPT to identify patterns in what music usually ‘looks like’ and identifying data which looks ‘wrong’.
Of course there will be cases where this approach isn’t appropriate, if you’re playing atonal free jazz then the your music won’t fit the data set but most music is similar and rather than having to direct a DAW’s actions, I’d rather just check them.
Guitar Synths
If we’re looking for a stubborn problem which has been around for too long, the guitar synth is probably a candidate. The fact that pitch to MIDI conversion needs to hear a certain amount of the waveform before it can identify the pitch, and the fact that the lower the note the longer that has to take is a fact which has prompted many impractical solutions. Various forms of switches, including conductive frets, have been tried but the fact that no one solution has emerged suggest that interfering with the interface of guitar string and fret isn’t the way to go.
Noise is also an issue. Guitars go ‘Chaaangg’ and the ‘Ch’ bit, the attack of the note, is noise-based. The note hasn’t yet established itself. This adds to the confusion when it comes to identifying note data. Add to that the fact that most guitarists find that their natural playing is too loose not to confuse a guitar synth, necessitating very precise playing, usually at the expense of the performance.
However, much guitar playing is reasonably predictable. The notes accessible during any passage are limited (yes there are virtuosos who can do the seemingly impossible but they are, by definition, outliers) and if we look at what actually gets played by the majority of people the majority of the time the potential next note is any given passage actually becomes relatively restricted. Add to that the fact that repetition is such a feature of music and the possibility of a guitar synth using AI to improve pitch tracking seems credible.
This would be both challenging and risky for a live performance, but so is using autotune and people do that. But in the studio it could be genuinely useful. You might argue that basing decisions on what people usually do would wreck a Zappa solo, but that’s missing the point. If a correction is incorrect then you don’t use it.
Real Time Tempo Following
Something which has always frustrated me is the fact that if you use loops, sequences or a backing track line the onus is on you as the performer to stay in time with the technology. If a click track were a musician it would get sacked because it never listens to its bandmates. I love the potential offered by using loops, arpeggiators and sequencers live but it’s just not worth the flexibility you have to give up and the complexity you inevitably add. The point of live performance is the spontaneous interaction of musicians. A click track interferes with that. Why can’t a DAW or a hardware instrument understand and follow the timing of a group of musicians?
Happily this is at least partly possible today. There is a follow tempo function built into Ableton Live 11. It looks impressive. It needs a clear guide to follow, this will usually be drums, and it can track significant tempo changes around a pre-established target tempo. See it in use in the video below.
I’d really like to see this available more widely. I’m not aware of any hardware instrument which offers this capability but I’d love to see a sampler or workstation keyboard which could listen to and stay in time with a group of musicians. Something the Ableton follow feature can’t do is deal with sudden changes of tempo and I’d really like so see something which could understand the characteristic cues which denote a song finish. I imagine something which learns the song over repeated practice performances, just like a person does, but stays flexible in terms of song structure and length. It would be challenging but I think it’s not outside the realms of possibility for a sampler to figure out that that slowdown at the end of repeating choruses is the end of the song and not to just keep playing out the loop until someone hits stop!
Some Of This Exists, But Is It Doing The Right Jobs?
Some of the mix assistant type tools from companies such as iZotope take an approach similar to this to make decisions on the sonics of mixes, the issue I have with these is that that’s not something I really want any help with. In the same way that if I had an assistant I probably wouldn’t want to offload mix decisions to them, I’d rather have them doing the laborious repetitive stuff.
Can you think of studio tasks you’d like to be able to delegate to a virtual assistant?
Photo by Elijah Merrell on Unsplash