Brief Summary
The recent resignation of Ed Newton-Rex from his position as VP of Audio at Stability AI, a leader in AI music generation because he can’t reconcile himself with his former company’s use of copyrighted music in training their model has cast new light on the issues around this rapidly developing area. We examine the issues.
Going Deeper
The explosive progress of generative AI and Large Language Models can’t have escaped your attention and, perhaps inevitably, after the initial questions of ‘can we?’ come the more difficult questions of ‘should we?’. An essential part of creating generative AI tools like Chat GPT is training them and for that you need a lot of training material. Examples of the kind of content you want to generate.
Computers are excellent at identifying patterns in large data sets and Large Language Models are trained using huge amounts of example data from which they can create very convincing human-like responses to prompts. In something like Chat GPT by predicting the next word or sentence based on the patterns identified during training. With a large enough tranche of example data from which to train the responses these tools can give the impression of understanding their responses. Though the precise nature of true understanding remains an elusive concept.
While much of the attention has initially been on text based output, with suitable training data many kinds of tasks are suitable for this approach and images, computer coding and audio have all been subjects of AI based approaches, with varying degrees of success. The shortcomings of these early iterations of these tools shouldn’t be seen as any kind of predictor of future performance. Just because something is clunky or seems fake today doesn’t mean that it won’t be entirely convincing at some point in the future.
But regardless of what is or isn’t possible, ethtical concerns remain and an interesting recent development which illustrates the discomfort AI creates in the creative community is the recent decision taken by Ed Newton-Rex to resign from his position as VP of Audio at Stability AI, a new product for AI music generation, because he didn’t agree with the company’s opinion that training generative AI models on copyrighted works is ‘fair use’.
What Is ‘Fair Use’?
Fair use is a crucial aspect of copyright law that allows for the limited use of copyrighted material without permission from the copyright owner. It serves as an essential safeguard for creativity, scholarship, and free expression. Fair use enables individuals to use copyrighted works for purposes such as criticism, commentary, news reporting, teaching, and research. Central to this is the idea that such use is ‘transformative’, a test which AI music generation seems to meet. Law varies around the world, for example in the UK the Copyright Act specifically allows the mining of text and data to which researchers have lawful access, but only for non-commercial purposes.
The possibilities opened up by AI inevitably open up legal questions with which legislation. will catch up but in the meantime individuals draw their own conclusions and in the opinion of Ed Newton-Rex such use of copyrighted music for training AI without the permission of the copyright holder isn’t acceptable.
This does beg the question - is he right? Is the use of large amounts of recorded music to train an AI model fundamentally any different to the way music has always been created, apart of course from the direct involvement of a human composer?
How Original Is Original Music Anyway?
All creative work takes elements of what the listener has already heard. “Originality is undetected plagiarism.“ is a quote attributed to playwright William Inge and while pithy, it illustrates the serious point that for everything other than the very unconventional, the creative output of writers, composers and artists is always based on things which have gone before, particularly music, based as it is in a formalised system of music theory, well established conventions of form and genre and also, in most cases a desire to sound familiar and to fulfil the expectations of the listener. Is training AI different to human listening to and learning from music?
Emotionally I think it is but I struggle to make that argument definitely. My head doesn’t follow my heart on this one. I’ve spent a significant amount of my life listening to music and I readily concede that everything I do creatively is based on that. If its directly based on something specific it’s plagiarism but most of the time it’s less straightforward, either borrowing ideas from multiple sources or from conventions which ultimately are the combination many, many examples. Anything I’ve ever done which I might think is completely original is almost definitely just something derivative, the inspiration for which I can’t pin down. As far as I can see there is no fundamental difference between this and training an AI model. Apart from that of scale and accessibility.
A Big Difference Between Human And AI Training
If there is one, the difference is between the finite ability of a human to listen to and assimilate music and the speed and scale of automated data ingest and the scale of learning (however shallow for the time being) of AI Training. I have no idea how many hours of music I’ve listened to in my life, but I’m sure a suitably powerful computer could ingest in minutes something which has taken a lifetime for me to experience. And that exploration and accumulation represents an investment on my part which I’m uncomfortable of being potentially devalued by an automated process which might one day soon be able to produce equivalent results but not to understand or care about the end product. That’s bleak!
Cloning Voices
An area where I find it far easier to align both my head and my heart when it comes to AI music is the area of cloning the voices of singers. While I think it would be difficult to draw a clear distinction between what is and isn’t OK on something like the tone and performance style of an instrumental player, AI generated Gilmour guitar solos or Miles Davis trumpets, the instrument itself, however strongly influenced by the player, is still a mass produced item but a singer’s voice, like their visual likeness, is unique to them and the increasing fidelity and realism of cloned voices is, for me at least, very problematic. When it comes to AI generated music, I find comfort, for the time being at least, that at present it can create convincing derivative but frankly ‘bad’ music. However early drum machines didn’t sound like a drummer but a modern VI…
What do you think about human-written music forming the basis of the next generation of AI music creation tools, and doing so without being consulted? Share your thoughts in the comments.