wil wheaton vs. text 2 speech

Posted on 26 February, 2009 By Wil

There's quite a dustup at the moment about an editorial the president of the Author's Guild wrote in the New York Times, railing against Amazon's Kindle 2, which has a text to speech feature that he claims creates unauthorized derivative works and should be stopped at all costs.

I'm not the only author who thinks this is ridiculous: John Scalzi, Cory Doctorow, and Neil Gaiman all agree. (Um. Not that I'm comparing myself to them; they're just people I know, who I respect and admire, who also have a stake in this.)

Scalzi says: "I pity the person who thinks a bland computer text reading of Zoe’s Tale is an optimal experience, especially when Tavia Gilbert’s spectaular reading of the book exists out there to get. Yes, one is free and the other isn’t, but you do get what you pay for."

Cory says: "Time and again, the Author's Guild has shown itself to be the epitome
of a venal special interest group, the kind of grasping, foolish
posturers that make the public cynically assume that the profession it
represents is a racket, not a trade. This is, after all, the same gang
of weirdos who opposed the used book trade going online."

Neil says: "When you buy a book, you're also buying the right to read it aloud,
have it read to you by anyone, read it to your children on long car
trips, record yourself reading it and send that to your girlfriend etc.
This is the same kind of thing, only without the ability to do the
voices properly, and no-one's going to confuse it with an
audiobook. And that any authors' societies or publishers who are
thinking of spending money on fighting a fundamentally pointless legal
case would be much better off taking that money and advertising and
promoting what audio books are and what's good about them with it."

But what if we're all wrong? As an author, performer, and consumer of audiobooks, what does this mean for me?

To find out, I picked a short passage from Sunken Treasure and read it. Then, I took the identical passage, and let my computer read it. I recorded the whole thing and put together something I call "Wil Wheaton versus Text 2 Speech" so you can hear for yourself.

It's about 5MB and just about 10 minutes long.

Download Wil_wheaton_vs_text_2_speech

Edited to add: My friend Jamais wrote an extremely insightful and thoughtful commentary
on the whole text 2 speech issue. He's really smart and you should read
it, regardless of where you currently fall in the debate.

Here's John Scalzi's rebuttal, which everyone should also read, and Neil Gaiman's final word which is also a must-read. Not that it matters, but I totally agree with both of them.

Also, this post has attracted a lot of traffic, and people are asking me about my own audiobooks. I'll point you to my virtual bookshelf, where you can learn everything you ever wanted to know about all my books, including the audio versions.

Discover more from WIL WHEATON dot NET

Subscribe to get the latest posts sent to your email.

Books

Comments (137)

Christina says:

26 February, 2009 at 4:01 pm

I was really late to the party when it comes to audiobooks. In fact, I still really haven’t listened to any other than Wil’s. I guess I just figured they would be kinda boring, like listening to a lecture. While I could totally see the value for someone with reading or visual difficulties, or if you were stuck in a long commute with no other time to read, I never really saw the appeal. I think that’s pretty much how text-to-speech would be – the voices and inflections have certainly improved a lot over time, but I’d rather just read it if I had the option.
HOWEVER – at least as far as Mr. Wheaton’s work is concerned, I am a total convert. His readings are so compelling and personal, I find I can’t do anything else while listening – I’m that engaged. I highly recommend the audiobooks to anyone who hasn’t given them a listen yet.
Wil says:

26 February, 2009 at 4:04 pm

Well, yeah. Everyone knows that when the alien invasion comes, all you need is a bottle of Shasta and your all-Rush mixtape.
Wil says:

26 February, 2009 at 4:06 pm

There’s a SENSATIONAL audiobook called METAtropolis that didn’t even exist in print form until it’d been out for months. It is just incredible, and it shows the real value of getting talented actors to perform material that’s already well-written. I highly recommend it.
Also, the audioversion of America: The Book: The Audiobook is hilarious and even better than the print version.
brandilionknits says:

26 February, 2009 at 4:17 pm

I would like to append this list to include Hot Pockets.
That is all.
CS Clark says:

26 February, 2009 at 4:20 pm

I don’t agree with the Author’s Guild on this, but: 1) not every book released needs soothing dulcet tones – good enough is good enough for many textbooks – so let’s not go too far with the importance of real people reading real audiobooks, if the reading aloud thing isn’t important Amazon wouldn’t be using it as such a big selling point*; and 2) there was a lot more solidarity shown during the writer’s strike even though similar arguments could have been made at some point about how the new media downloads and streaming were a tiny market, would make the original work more valuable, would never replace the value of yada yada yada. Not trying to make a comparison of the situation, just of the solidarity.
In case we forget, this isn’t the Author’s Guild vs The Blind or the Author’s Guild vs an open-source free as in beer reader for Project Gutenberg – it’s the Author’s Guild vs. a multinational corporation selling closed $350 doohickeys. Since when did we become so blasé about taking their side?
*e.g. check out the placement of ‘And now Kindle** can read to you’ in the text on this page.
**As well as stealing business plans from Apple they also steal dropping the definite article. Sigh.
Clinton the Trekkie says:

26 February, 2009 at 4:23 pm

TTS does have its uses.
However, for an audio book, I much prefer a person reading it too me. They can add inflection that TTS still doesn’t have. Maybe it will someday in the future, but not at the moment.
Keith Coogan says:

26 February, 2009 at 4:24 pm

I was really interested on where you stood on this subject.
Now I know… and I agree.
Hugh Laurie reading ANYTHING would be way cooler than the Kindle’s Text-to-Robot-Speech feature hacking away at your delicate words and mangling rhythm, meter, pacing, intonation and inflection.
Wil says:

26 February, 2009 at 4:26 pm

KEITH!
Have you watched Toy Soldiers on Hulu? I watched the first 2 reels recently, and remembered how much fun we had.
www.google.com/accounts/o8/id?id=AItOawnaskBrktAoanLlx6vPkAlRA7gbjJTgI00 says:

26 February, 2009 at 4:30 pm

As always – an excellent post. I just started reading the blog a few months ago after a friend referred me to it. Truly great stuff.
I read Roy Blount Jr’s article and I also thought he was way off base. Machine read text is no comparison to a well done audio book. It’s a quixotic venture and unlikely to net the Author’s Guild any value – and more likely will hurt them in the long run. Next thing you know these guys will be staking out the library trying to extort a fee everytime someone checks out their book.
So – when can we expect an audio version of Sunken Treasure? These are great stories and but I think having you read them adds a whole other dimension.
Also – I just listened to the D&D podcast (parts 1 and 2). You guys are a riot. Looking forward to hearing the rest of the game.
www.google.com/accounts/o8/id?id=AItOawmGf7eBNsVEvmpIGFS4ZXR3_NwW6VR2JOk says:

26 February, 2009 at 4:31 pm

Great example, Wil.
Beeteedubs, on an unrelated note I just noticed that the guy who interviewed you for LuLu is a guy named Nick Popio. I used to hang with that guy in college! I’m now consumed with jealousy at the fact that he gets to speak (even through e-mail) to Wil-mfin’-Wheaton.
Wil says:

26 February, 2009 at 4:33 pm

“Beeteedubs” is my new favorite phrase that nobody would understand if you said it ten years ago.
acomawunda says:

26 February, 2009 at 4:37 pm

audio books read by a real person are sooo much better, its like having the person in there giving you a personal performance. i enjoyed the story and couldn’t stand to listen to the whole story when it was on Alex…sorry
melanie says:

26 February, 2009 at 4:48 pm

was this just a clever ploy to get people to invest in the audio-version of your books? cuz if so… WELL DONE, SIR!! you’ve convinced me.
…now if i can just find that pesky credit card…
DeLynn says:

26 February, 2009 at 5:01 pm

That’s an awesome blog post, Wil… the audio example you gave made it that much more awesome! Text2Speech can *never* compete with a well-read audiobook!!!
Keith Coogan says:

26 February, 2009 at 5:06 pm

You simply have to read this to believe it…
http://membershipfirst.blogspot.com/2009/02/education-of-sag-board-of-directors_23.html
www.google.com/accounts/o8/id?id=AItOawkk1qAujA4AVzyf0wSgZg8ShaLcYmcdYnY says:

26 February, 2009 at 5:17 pm

You mentioned that some argue that this is similar to the strides taken by computer animation. The problem with that argument is that hero animation worth looking at is still manipulated frame-by-frame. Large shots may have extensive automation, but it is tweaked constantly until it looks right. Crowds may be automated, but there is probably extensive use of hand-crafted animation cycles.
I’m certain that a convincing computer simulation of voice is possible with current technology, but it requires tons of tweaking, mixing, and editing. I haven’t heard MacInTalk in the wild, but I’m sure there was some work required to make it sound suitable for use in Wall-E.
A Kindle may be a powerful TTS tool, but with out a human to guide its every syllable, it can’t hold a candle to a human reader.
CL Jahn says:

26 February, 2009 at 5:47 pm

Sure, you read better than a Kindle. Compare that voice against Roy Blount, Jr. He might really be threatened with replacement!
fall-apart says:

26 February, 2009 at 6:04 pm

Still, even if it is in the distant future (24th century, perhaps?), is it better to set the legal precedent now?
I’m not arguing in favour of that position, just recognising that there may actually be a point there…
Wil says:

26 February, 2009 at 6:12 pm

I agree with Neil that the money and energy the Author’s Guild is expending on this right now would be much more wisely invested in “advertising and promoting what audio books are and what’s good about them with it.”
I get the point of fighting for things. I’m a SAG guy and formerly sat on our board of directors so I’m all pro-union, but I’m also pro-being smart about things. And it’s just my opinion at the moment, and everyone is entitled to their own, of course, but I think that fighting this now is not very smart, especially the copyright angle.
SeanF says:

26 February, 2009 at 7:01 pm

The line “When Richard was loony on the cocaine, she made it okay,” as read by Alex, will stay with me to my grave.
I think I want Alex to give my eulogy.
www.google.com/accounts/o8/id?id=AItOawlc4dolmbW2ku_G72uRLek6JQ-6S-3tAmY says:

26 February, 2009 at 7:52 pm

I reject the comparison of Text to Speech to computer animation. The reason is that computer animation isn’t done just by the computer. At least not in good computer animation. There is a lot of shit computer animation out there. Some of it is better software tools can make better animation. But the essential element to Pixar movies is that human animators direct the movements. So although the frames are rendered by the computer and even interpolated by computer, the key positions are artistic selections made by a human. They’ve improved the software to create lighting effects and the physics of particles. They’ve modeled movie camera lenses and camera movements. But it’s a human that figures out which of these elements to use and when. It’s an artistic choice.
There is a valid comparison between good and bad computer animation and human read books versus text to speech books. The best of it makes you forget the medium and focus on characters and the story. Only good computer animation and only good reading make you focus on the story and character. Bad text to speech (any) or bad audio book productions are an irritant.
JoeHawkinsMusic says:

26 February, 2009 at 9:43 pm

Hilarious. Great argument as well.
Did you use a pop filter on that recording? I notices some ‘plosives going on.
Jeez… thats the audiogeek engineer in me coming out. I was about to ask what mic was used, interface, AD/DA converters, etc. Lame isn’t it?
Anyways, good job.
masukomi says:

26 February, 2009 at 10:01 pm

I just have to say that as much as I’ve enjoyed your writing, if that sample of you reading is even remotely reflective of what the audio versions of them would be like, then I can’t wait to hear them, and would happily shell out money to do so. 🙂
masukomi says:

26 February, 2009 at 10:13 pm

ok, maybe not $35 per book (especially not for a book I already own), but I’m happy to see that they’re not DRM’d, because I wouldn’t pay any money for that, as I’d like to be able to listen again years after whatever player those DRM’d audiobooks tend to come with, has gone to the great bit-bucket in the sky. 🙂
Wil says:

26 February, 2009 at 10:28 pm

Oh shit. I usually check really closely for pops, but I didn’t this time.
Well, next time whatever I do will be pop-free!
fall-apart says:

26 February, 2009 at 10:55 pm

Great perspective! Fight for something, rather than against something. Do you think authors will start to bundle ebook rights with their audiobook rights when they sell to publishers?
ÜberBeth says:

26 February, 2009 at 10:57 pm

Only just came across this, you should get a kick out of it: http://friendlyhostility.com/d/20090126.html
Cannonball Jones says:

27 February, 2009 at 1:39 am

Can’t believe the Author’s Guild is panicking about this, are they actually time travellers from the Victorian Era, terrified of the electronic witchcraft? They’re never going to win a fight against technology, just point them towards the RIAA’s less than stellar record – the only way to win is embrace it and work with it.
Ah hell, at least it gives us someone to laugh at for a while 🙂
www.google.com/accounts/o8/id?id=AItOawnO5gV5ke18GEG9uuEStY02vUgANvjYgOA says:

27 February, 2009 at 4:28 am

So if an actor reads the lines written by a screenwriter who has adapted a book written by an author and someone runs it through speech-to-text software, does it create a new derivative work or are we getting into a mobius-strip of evil if we start wandering down that path?
Great reading of your own work – the Mr. Roboto version will never compare even if text-to-speech evolves to the point of sounding like a real person.
sqlrob.livejournal.com says:

27 February, 2009 at 6:10 am

Except in Beowulf, they WERE actors. A lot of that was motion captured.
kimnbri says:

27 February, 2009 at 6:12 am

Obvious differences and I agree with ya Wil, but seriously; who doesn’t want professor Steven Hawking to read them a Dr. Seuss book or two? I think the sheer entertainment factor of how badly this thing reads is worth the purchase price alone. Wouldn’t it be lovely if we could tweak it with accents and lisps!? How about simulated Celeb voices… (the bible ala Shatner)
BR1AN
P.S. RIP, Philip José Farmer
jwordsmith.wordpress.com says:

27 February, 2009 at 6:23 am

By the way, if you did a podcast of you reading your blog entries (or riffing on the day’s topic), I would sign up in a heartbeat.
Marc Costanzo says:

27 February, 2009 at 7:04 am

I have not read the rest of the comments, but I think it is clear that Wil’s reading had the nuances necessary to convey the feeling and emotion of the story. That is not really a good description, but when Wil was reading I felt like I was there all of my sense where working as I picked up on how his voice conveyed the story. A computer generated voice does not and may never be able to tingle all of my senses when listening like the human voice does.
Marc Costanzo
John Welch says:

27 February, 2009 at 7:15 am

I think the *way* they’re going about it is silly, but I think that bringing it up now is not stupid at all.
It’s been a sad fact that if the various publishing agencies *can* screw an author or a creator over to make a few bucks, they will not only do so, but they’ll spend a lot of time on it, in the hope of profits down the line.
TTS has been kind of ignored, because hey, unless you’re visually impaired, it’s a gee-gaw. However, what happens when a publishing house with some money decides to throw some money behind it? It will get better, and fast. Tech *always* gets better, and always faster than we thought it would.
(Seriously, look at hard drives in the last ten years, and tell me who realistically thought 500GB laptop drives would be not just available, but cheap.)
Sure, a human can give a better nuanced performance. But, you pay for that. So right now, I can get a quality audio book from Audible.com for $7.50US (i know that’s not the only source or price, but bear with me.) Now, with the current arrangement, lots of people make money from that, some more than others.
What happens when TTS gets “good enough”, and a publisher decides to offer TTS at say…$2.00US. Sure, it’s not as good, but it’s two bucks.
Two bucks that the publisher is making, and only the publisher. “book” sales go up, but who benefits? Not the Author. They don’t have TTS in their agreement. Just the publisher. and maybe Amazon.
“Good enough & cheap” explains why people will buy a piece of shit car that will barely survive the loan, instead of spending a bit more and getting one that will last a couple decades. Or why people buy cheap shit every year, instead of buying quality once. Ask Wal*Mart, they’ll tell you, you can make an assload of money off of cheap shit.
So yeah, as someone who writes a lot, (even if it is mostly for free), and someone who is married to an artist and watches her daily fights, not just with dipshits who think if they can see it, it’s public domain, but with publishing and game companies, I think the guild is dead on to bring this issue up *now*, and get the dialog started *now*.
I just think they need to not have a dillhole leading the discussion, and not being so damned alarmist, because that is completely torpedoing the benefits of figuring out a good course of action before it’s a real problem.
You know, like the music companies could have done ten years ago.
jupo42.livejournal.com says:

27 February, 2009 at 7:30 am

Yes, computer animation can rival live action in some areas, and still not in others. It’s come a long way from The Last Starfighter and still has a ways to go before we can finally get rid of those pesky actors. 😉
But this isn’t even a good comparison. CGI still requires physical labor, just as creating an audiobook does. Scores of people design, model, stage, and animate these scenes. No one has yet created a 3d program that takes a raw script and dumps out a blockbuster automatically. Nor do I think anyone will create a TTS that can realistically read a book without some sort of markup to guide the narration program.
When Amazon starts doing that, then you can worry. But you might also have better grounds for saying they’re creating derivitive works and slapping them for infringement.
KCFlatlander says:

27 February, 2009 at 7:33 am

TTS sounds like someone in the Kindle2 think tank said “Hey, let’s try this and see what it works.”
I work with software vendors who put stuff into their software “just because they can” and because, at that time, it made sense, either from a R&D standpoint, or a “let’s feel this out” standpoint.
Either way, it’s in there, and you can use it, or not. Software/hardware features are just that: features. Doesn’t mean you have to use it.
TTS, in its current form, can’t emulate inflection, sarcasm (although I desperately want to patent a Sarcasm font), the highs and lows, or the emotional connection that an author can convey. It won’t replace an Audiobook for me….
Question for Wil from a Audiobook standpoint: When you do an audiobook with characters vs. your own current books on your life and comings/goings, how do you understand the emotional being of the character? Only through the read, or do you get to meet with the author on the back story?
tjlatta says:

27 February, 2009 at 7:38 am

Even smart people can have stupid prejudices when it comes to new technology.
Socrates was against the concept of writing because he thought it would destroy our ability to memorize facts and stories.
EDIT: Oh, and I can’t download the mp3. Did it get removed?
Verhoodled says:

27 February, 2009 at 7:43 am

Timely discussion! Last night my Speak-and-Spell walked in on me using my Blackberry. Awkward! So I was up all night reassuring my sobbing Speak-and-Spell that she wasn’t outdated. “No, no, no,” I told her. “Nothing could ever replace you and your cute yellow, orange, and red buttons.”
To which she replied, “Lol.”
I’m a lucky man.
adelagia.livejournal.com says:

27 February, 2009 at 7:58 am

Well, and your big brother. Failing that, a million allowances’ worth of quarters, I guess.
Sihaya says:

27 February, 2009 at 8:07 am

…Special sunglasses…
or a glass of water and a baseball bat…
or maybe a pocketful of Jelly Babies.
But that’s all.
www.google.com/accounts/o8/id?id=AItOawmjQuxOnwKE9nxtWj_n51icQ7Zf0p90EVg says:

27 February, 2009 at 8:13 am

It is absurd to think that computers will (in the foreseeable future) be able to understand the emotional context of the written word and how to convey that emotion verbally. And when they do, they will probably be writing their own books, and who is to stop them from reading those aloud?
No, what we really need to worry about is the Kindle 3. I hear that it will be able to create a 90 minute movie based on any e-book you put on it, with CG actors and everything. It will at least be better than The Da Vinci Code.
MissMeliss says:

27 February, 2009 at 8:42 am

First, doesn’t greater accessibility = greater sales, and isn’t that a good thing?
Second, Microsoft Reader has offered this function on ebooks for years. Granted, no one will admit to actually using it, but still…
Third, please add chocolate to the alien invasion preparedness kit. Chocolate may not fix everything, but it certainly helps.
Harv says:

27 February, 2009 at 8:49 am

I figured most authors being the intelligent rational humans they are would come out against this guy’s stand on this issue.
I really makes no sense at all. The automated reading software is just not even close to a nuanced performance by an actual person. It certainly doesn’t replace it and it’s doubtful it will get there anytime soon, but even if it does it’s not like they can duplicate Wil Wheaton’s voice and mannerisms to read the book. There is always the uniqueness of the person reading to push when selling the audiobook.
And if they do get to the point where they can duplicate Wil as an AI in our lifetimes then I want to be Neo when we all get installed into the Matrix.
dostrow says:

27 February, 2009 at 8:55 am

And here I thought we had all learned all you need is your towel and your thumb.
kamiilyaan.livejournal.com says:

27 February, 2009 at 9:01 am

Good reading, Wil.
I noticed that when you came back on after Alex was done, your speech was more Alex-like.
Which brings me to a tangent– Real Life Comics ( http://www.reallifecomics.com/ ) this week.
Darc Ranger says:

27 February, 2009 at 9:25 am

Neil Gaiman reference Wil’s Posting on this topic. Nice.
http://journal.neilgaiman.com/2009/02/end-of-audiobook-argument.html
cpt-barcode.livejournal.com says:

27 February, 2009 at 9:28 am

This seems like a huge waste of effort by the guild. The vast majority of book purchases are going to be either text or audio, but not both. (I assert this without any attempt to consult industry sales figures, but the anecdotal evidence compels me.) There might be a title that I want in both text and audio form, but so far that has not been the case. If they’re only going to get royalties once from any one consumer, why spend time and treasure to shut out new markets?
www.google.com/accounts/o8/id?id=AItOawm7R-8-R3--3080u58ip1TjyIVgJ3U6R7U says:

27 February, 2009 at 9:54 am

Spacewriter here, Wil. I’m an author, too, and I cannot for the life of me figure out why the AG thinks this is an issue.
I think they need to get with the times, here.
Wil says:

27 February, 2009 at 10:25 am

Really?
That sounds like it could be fun, sort of like the way I do audio versions of my books, but add commentary to them?
Wil says:

27 February, 2009 at 10:30 am

I don’t disagree with a single thing you said, John. Thanks for adding to the discussion.

Comments are closed.

wil wheaton vs. text 2 speech

Like this:

Related

Discover more from WIL WHEATON dot NET

Comments (137)