You might remember that I linked to Apple’s AI-narrated audiobooks a little while back. The voices sounded eerily human, although they still struggled with emotional nuance, especially in fiction. For nonfiction, however, they were nearly as good if not better than some human narrators.
There was some book industry speculatin’ around which idiot narrators allowed themselves to be recorded and Borged into Tim Cook’s army of robot voices.
Turns out, none of them. At least if the word ‘allowed’ has any meaning.
Wired just reported that some indie authors who’d been using the Findaway audiobook service to hook up with narrators, missed a cheeky little paragraph down the ass end of their contracts allowing the fruit company to use the voice recordings for ‘machine learning’ purposes.
GARY FURLONG, A Texas-based audiobook narrator, had worried for a while that synthetic voices created by algorithms could steal work from artists like himself. Early this month, he felt his worst fears had been realized.
Furlong was among the narrators and authors who became outraged after learning of a clause in contracts between authors and leading audiobook distributor Findaway Voices, which gave Apple the right to “use audiobooks files for machine learning training and models.” Findaway was acquired by Spotify last June.
…“It was very disheartening,” says Furlong, who has narrated over 300 audiobooks and is one of more than a dozen narrators and authors who told WIRED of their concerns with Findaway’s agreement. “It feels like a violation to have our voices being used to train something for which the purpose is to take our place,” says Andy Garcia-Ruse, a narrator from Kansas City.
…
After Furlong first learned of Apple’s algorithms being written into Findaway agreements early this month, he contacted Isobel Starling, an author he’d worked with who distributed titles with the company. She was shocked to find a clause titled “Machine Learning” near the bottom of her lengthy agreement with Findaway.
Starling says the company had not specifically informed her about that part of the agreement, nor compensated her for it. She believes she missed it because it was buried beneath more conventional sections prohibiting hate speech and sexually explicit material. Although Furlong narrated the audiobook and his voice would potentially be ingested by Apple’s machine learning algorithms, he was not party to the agreement that was signed by Starling as the book’s rights holder.
Isobel Starling points out that the sneaky clause is particularly sneaky, indeed outright douchebaggy, because Apple used a contract between Findaway and Starling to appropriate Gary Furlong’s voice. All for the purpose of building a tool that would put him and other narrators out of work.
It’s not just Apple, of course. The publishers and tech platforms will all be scheming to get this done because audio is one of the few growth areas in publishing. And paying the narrator is far and away the biggest part of any production cost. It’s why I don’t do audiobooks for most of my indie titles unless I can get someone like Audible to stump up the cost. (And, of course, they then get 18 months of exclusive use. A fair trade, I reckon).
For me, this feels way worse than generative AI systems gobbling up terabytes of visual imagery to create ‘new’ art. But honestly, if I think hard about it, I can’t really see much difference other than the more ‘diffuse’ sources used by start-ups like Midjourney. Sure, they suck up billions or even trillions of massively distributed images, while Apple was stealing the voices of maybe a couple of dozen performers… but in the end, the technology is still based on stolen content.
And I say this as someone who remains a hopeless Apple fanboi and a paid subscriber to Midjourney which produced the image used above.
Wow. Also a big fan of Apple, but this is pretty bad. I'm in the same boat re: why I've never produced an audiobook. The cost is prohibitive, but there's a reason why- the narrator puts a lot of work into converting your title into a quality audio experience.
Yeah, my daughter has been sounding the alarm about this type of thing with me for a while, and her concerns have real merit. What is to keep some goof in the near future from using a better version of ChatGPT to produce a so-called work to sell on Amazon for .99 cents using the prompt "80,000 word sci-fi action novel style Jason Lambright John Birmingham Joe Haldeman." Then this same actor cruises over to Midjourney, "cover art poisonous mushroom alien world." Total cost involved, not much. Creative effort? Close to zero. This will happen in the near future, if not already a thing. The number of titles on Amazon will explode, further diluting the pool and decreasing opportunities for writers who actually write and their accompanying ecosystems.
Yeah - this is what worries me - all the content stealing, like remains of signatures left in generative images from these things. But i kinda wanted a future where you could get a book read by your grandfather who has long since passed. Hmmm, heading into black mirror territory here. "Audible, i have just purchased a copy of John Birmingham's A Girl in Time, voice selection (from a long list) Sam Elliott: female role, Millie Bobby Brown circa 2020's for male roles please, goddammit i said please to the AI again"
I've certainly started to see "ai-generated" images used in presentations where once you would have seen shutterstock or getty images. Very variable results of course.
I'd be surprised if there weren't already similar clauses on most of the spoken-word internet publishing sites, including youtube and the rest. At first they would have had their eye on building databases to train their speech _recognisers_, but if they can turn a buck doing speech synthesis, you can be sure that they will.
Plausible-sounding audio synthesis is quite difficult, technically. It's only just started to happen, after years of robot-voice and uncanny-valley. Lots of R&D being done at the moment.
HumancentiPad anyone?
OMG.
Wow. Also a big fan of Apple, but this is pretty bad. I'm in the same boat re: why I've never produced an audiobook. The cost is prohibitive, but there's a reason why- the narrator puts a lot of work into converting your title into a quality audio experience.
Zactly
Yeah, my daughter has been sounding the alarm about this type of thing with me for a while, and her concerns have real merit. What is to keep some goof in the near future from using a better version of ChatGPT to produce a so-called work to sell on Amazon for .99 cents using the prompt "80,000 word sci-fi action novel style Jason Lambright John Birmingham Joe Haldeman." Then this same actor cruises over to Midjourney, "cover art poisonous mushroom alien world." Total cost involved, not much. Creative effort? Close to zero. This will happen in the near future, if not already a thing. The number of titles on Amazon will explode, further diluting the pool and decreasing opportunities for writers who actually write and their accompanying ecosystems.
I can't remember the name, but that is a photo of one the lameass batman villain's from the silver age comics.
Yeah - this is what worries me - all the content stealing, like remains of signatures left in generative images from these things. But i kinda wanted a future where you could get a book read by your grandfather who has long since passed. Hmmm, heading into black mirror territory here. "Audible, i have just purchased a copy of John Birmingham's A Girl in Time, voice selection (from a long list) Sam Elliott: female role, Millie Bobby Brown circa 2020's for male roles please, goddammit i said please to the AI again"
If once you start down the dark path forever will it rule your destiny. I can almost hear Palpatine cackle.
I've certainly started to see "ai-generated" images used in presentations where once you would have seen shutterstock or getty images. Very variable results of course.
I'd be surprised if there weren't already similar clauses on most of the spoken-word internet publishing sites, including youtube and the rest. At first they would have had their eye on building databases to train their speech _recognisers_, but if they can turn a buck doing speech synthesis, you can be sure that they will.
Plausible-sounding audio synthesis is quite difficult, technically. It's only just started to happen, after years of robot-voice and uncanny-valley. Lots of R&D being done at the moment.