There's quite a dustup at the moment about an editorial the president of the Author's Guild wrote in the New York Times, railing against Amazon's Kindle 2, which has a text to speech feature that he claims creates unauthorized derivative works and should be stopped at all costs.
I'm not the only author who thinks this is ridiculous: John Scalzi, Cory Doctorow, and Neil Gaiman all agree. (Um. Not that I'm comparing myself to them; they're just people I know, who I respect and admire, who also have a stake in this.)
Scalzi says: "I pity the person who thinks a bland computer text reading of Zoe’s Tale is an optimal experience, especially when Tavia Gilbert’s spectaular reading of the book exists out there to get. Yes, one is free and the other isn’t, but you do get what you pay for."
Cory says: "Time and again, the Author's Guild has shown itself to be the epitome
of a venal special interest group, the kind of grasping, foolish
posturers that make the public cynically assume that the profession it
represents is a racket, not a trade. This is, after all, the same gang
of weirdos who opposed the used book trade going online."
Neil says: "When you buy a book, you're also buying the right to read it aloud,
have it read to you by anyone, read it to your children on long car
trips, record yourself reading it and send that to your girlfriend etc.
This is the same kind of thing, only without the ability to do the
voices properly, and no-one's going to confuse it with an
audiobook. And that any authors' societies or publishers who are
thinking of spending money on fighting a fundamentally pointless legal
case would be much better off taking that money and advertising and
promoting what audio books are and what's good about them with it."
But what if we're all wrong? As an author, performer, and consumer of audiobooks, what does this mean for me?
To find out, I picked a short passage from Sunken Treasure and read it. Then, I took the identical passage, and let my computer read it. I recorded the whole thing and put together something I call "Wil Wheaton versus Text 2 Speech" so you can hear for yourself.
It's about 5MB and just about 10 minutes long.
Download Wil_wheaton_vs_text_2_speech
Edited to add: My friend Jamais wrote an extremely insightful and thoughtful commentary
on the whole text 2 speech issue. He's really smart and you should read
it, regardless of where you currently fall in the debate.
Here's John Scalzi's rebuttal, which everyone should also read, and Neil Gaiman's final word which is also a must-read. Not that it matters, but I totally agree with both of them.
Also, this post has attracted a lot of traffic, and people are asking me about my own audiobooks. I'll point you to my virtual bookshelf, where you can learn everything you ever wanted to know about all my books, including the audio versions.
Discover more from WIL WHEATON dot NET
Subscribe to get the latest posts sent to your email.
I’ve only read a few audiobooks (I plan to do more, and I’m looking into public domain works right now, as a matter of fact) and I’ve only had the chance to talk to an author once, and that was because he was a friend of mine, and I wanted to give him the right to say, “I’d rather you didn’t make that guy sound like [character I thought would be fun to base him on].”
As far as I’m concerned as an actor who is performing them, a book and a script are very similar starting points, with slight differences in language. I prepare characters for audiobooks the same way I prepare characters from a script. I look at what the writer is telling the audience with the character’s actions and how he relates to other characters, what the overall piece is about, and what the character’s function is in the story as a whole.
When she said it, was she all, “L? O? L?”
i love alex. he reminds me of the electro-professor from my old clonepod. why couldn’t i have just stayed at cadmus forever…
Funny thing is I started reading this blog, because I enjoyed the keynote Wil gave at PAX. His delivery was amazing, and got me hooked on his other works. As for the debate, I think it is a pathetic attempt for the Guild to try and squeeze more money out of the consumers. Anybody that is using the kindle’s speech to text format to listen to an entire book, is NOT going to buy the audiobook anyway. Most people will probably use it for certain situations where they can’t read the book, but don’t want to stop “reading” the book. If anything it might push them to want to actually purchase the audiobook, to have a more human voice read it to them.
No competition. Full stop. Only a human could express such emotion.
PS – Mr Wheaton you have the loveliest voice, I could listen to you all day.
And for those who were talking about Final Fantasy….I made this
As a blind reader who uses TTS every day:
1. TTS sounds much better if you speed it up. You were running that voice way too slowly. When I’m on mac, I speed it up to the fastest speed OS X allows. The faster the voice goes, the fewer mistakes you hear. Also, the faster you get the text read; a major advantage for work-related reading.
2. The text you were using had several OCR errors, or typing errors, or something. Quotes and punctuation seemed to be missing in several places, throwing off the TTS system. Near the start, the “fi” in the word film seem to be missing, causing the voice to just read “lm”. The advantage of TTS is that because the reading is so absolutely regular, you can tell how the text is punctuated. If a human is reading, not so much.
3. Bland TTS is always better than a reader you dislike. As an example, I scanned in the HP books because I absolutely cannot stand the way Jim Dale does his voices. I’d rather listen to TTS than hear him squeak his way through his female characters.
I strongly believe that, if everyone used current TTS software eight hours a day, every day, they’d stop purchasing audio books and stick with the familiar TTS voice. After a while, you just stop hearing it at all, and a different voice is distracting. But on the other hand, the only way everyone would start doing this is if they all went blind.
To a tee. Hey waitaminute — how did you know that? 😉
I kept waiting for Alex to say, “Danger Will Robinson, danger!”
The other thing that makes the animated-movie-to-TTS argument a bit of a red herring is that animated movies are created over the course of months and months at the cost of millions of dollars, whereas TTS software is placed into the unenviable position of having to be able to read text on the fly: “Alex” blew the two different pronunciations of “Live” at least once because somebody didn’t go through and manually tell it which one was right.
It, like you say, is possible that in some mysterious future that software could be created that, with a human’s help, could read a piece of text as well as a human, but there’s no way that doing so would ever bring an added value over a dude in a room with a microphone the way that animated movies offer something new over a dude with a camera. The human voice wins because it’s the best at what it does at its price. GO US.
As someone who is totally blind, I can’t help but be a little envious of what the Kindle has to offer. There’s a web site for people who can’t read print where we can go to download books. (It’s all legal, and it requires registration, and the books are distributed in a onstandard format.) Do you know, it took those people years to acquire 40,000 books? It only took Amazon three months. Yes, I’d prefer human-narrated audiobooks, too; but the reality is that not every book is recorded. I’m hoping that the Kindle 3 will have talking menus. I can’t imagine what it would be like to be able to access over 200,000 books. Ah–I see my “Will vs. Alex” download is now complete. Better go check that out…
There have already been a lot of Stephen Hawking jokes, but I would enjoy having some of “A Brief History of Time” read by the computer. Really.
On the subject of technological advancement, is it just me or does the Mac’s performance sound almost exactly the same as Mac text reading software sounded 20 years ago?
Oh. Forgot to add that I think it might be a stretch to say that the TTS is creating a derivative work in this instance. It could depend on how the courts
decide to define a “copy.” Unless the Kindle is actually creating another e-copy, I…don’t think the guild could win this one. But I’m still only a law
student.
I dont think the text to speech feature is that threatening, cause it sounds like a bad cartoon robot voice.
However, I do really really hate the Kindle. Alot.
I actually got a Kindle 2 yesterday for my birthday. The text-to-speech function is just barely tolerable…neat concept, but it’s no Steve Eley. (BTW: I think I found EscapePod.org from a WWDN post originally – thanks for raising my standards)
Just wanted to point out that the Kindle store only returns two hits for a search of Wheaton – ‘The 7-Day Dating and Relationship Plan for Gay Men’ and ‘Understanding Lifestyle Sport’, which is even creepier in context.
Get your books up there, Wil, I want to own them!
It seems your premise is that the TTS quality is so low at present that no one would choose this method of consumption for a given work over a professionally produced audio version of a book. I will grant you that TTS has not yet matured. However, albeit a poor semblance of the original material in spoken form, it does in fact reproduce the work in its entirety. I think a more appropriate analogy for consideration might be to ask how you as an author would feel if I reprinted the whole of your most recent book onto dingy recycled paper using an old mimeograph machine, one page at a time. It would be smudged, loose-leafed and hard to read so surely my copy of your property would not detract from your book sales or infringe upon your IP? This then should be an acceptable substitute for your work that I should be allowed to provide to your audience without your involvement or approval. No, in fact, no matter the quality or perceived value by comparison of my low-grade product, your publisher would deliver a C&D and, should I fail to comply, would understandably take me to court for redress. Audio publishing rights are as important and valuable as book publishing rights.
Maybe this is because I’m an animator, dating an animator with a brother who’s a visual effects animator…
But it’s not the computer that makes good animation. It’s not the technology. Luxo Jr, Pixar’s first film, was basically scripted, not animated, and it’s amazing because of the people behind it.
The new Polar Express was creepy and had that uncanny valley because it was “all” computer, in that there was only motion capture. Good mo-cap needs a team of animators, enhancing the performance, fixing eye direction….
check this out to learn more-
http://wardomatic.blogspot.com/2004/12/polar-express-virtual-train-wreck.html
All good animation is because of the people behind it. The people are 95% of the job- the computer is simply the tool that finishes the other 5%.
As my animation teachers say, if you let the computer do any of your work, you’re doing it wrong.
Anyway… that’s my two cents as an animator who’s rather passionate about what I do 🙂 Otherwise, I fully agree!
I think there should be more talk about how Amazon should do more to make the Kindle more accessible to the blind and visually impaired. I wrote about this on my blog.
http://caldeas.com/2009/02/25/capitalizing-on-accessibility-the-authors-guild-is-out-of-touch/
This tool could help the blind read text books which due to the frequency which they are updated are very hard to make accessible.
Spot on, Wil. The human touch. There is a spark of creation- not just the essence of the story- or even the inflection/intonation/melody/rhythm- (though hopefully you will continue to record ALL your own works)
Walter Benjamin said (1969) “Even the most perfect reproduction of a work of art is lacking in one element: its presence in time and space, its unique existence at the place where it happens to be.” (anthro. of media class…gik…) Not to mention the quality of reproduction. Bah. We are so obsessed with ultimate portability in order to FURTHER respect and enjoy our arts, not to detract from their authenticity. The fight is a long shot, & I mos def would not spend any more time thinking that this was the new opusnapster. No loss of HP.
+love
The voice of “Alex” reminds me of a robotic voice from some ’80s movie. I can *almost* place it…. If I stuck out my tongue, you’d see it printed there…. *sigh* Failing brain power.
In other news…Wil, you said “Speak & Spell.” I love it.
Short Circuit?
Marty’s jacket from Back to the Future 2?
definitely not HAL from “2001”…..
This is gonna drive me batty.
P.S. compared to your reading, Alex’s reading made me guffaw. No, really. ;o)
While I hate DRM as much as the next guy (this is DRM right?) I can definitely see waht has this guy worried. How many people are prepared to watch crappy cam screen rips of new release movies (do they actually cost sales??? – dunno). We have lossy audio, crappy flash videos etc all things that “Joe Public” has time and time again proven they are more than willing to watch/listen to for free rather than pay for a “quality” copy.
Unfortunately I know nothing about the publishing business therefore my comparisons could be completely off base but if the basis of your argument that “it’s no big deal” is that “it’s too crappy for people to listen to rather than pay for” I think you are mistaken. That’s not to say there aren’ valid arguments … I just think that ain’t one of them.
Anyway..hello from Australia…long time blog reader…first time commenter 🙂 Loved your Criminal Minds ep (not as much screen time as most perps but some cool “psycho killer” off screen action for sure) !
Bah! They caved in – selectively. Must have scared them into thinking it was a deal-breaker or something…
http://bits.blogs.nytimes.com/2009/02/27/amazon-backs-off-text-to-speech-feature-in-kindle/
Sharing stories goes back to caveman days when they gathered around a campfire. And THOSE shared stories were in the form of oral (audio) story telling.
I think it’s a good thing that should be nurtured because telling (and sharing) a story no matter what the format is the bottom line.
(And I doubt you’d find any blind readers who don’t like the new talking feature on Kindle.)
@wilw :
Amazon Backs Off Text-to-Speech Feature in Kindle
from /. -> http://tinyurl.com/auv6pm
All of this discussion has reminded me of my 8th grade English class, in which our teacher played a recording of Patrick Stewart reading Dicken’s A Christmas Carol. I could listen to that man read for the rest of my life.
Wil — please, pretty please with sugar on top — do an audiobook of Sunken Treasure. 🙂
http://booktwo.org/“booktwo.org” has linked to this post.
Unfortunately, it looks like Amazon is folding, which sucks.
http://news.cnet.com/8301-1023_3-10184406-93.html
Despite listening to the same words both times, I really felt like I heard two very different accounts.One, Will’s reading, concerned a young boy actor, steeped in awe and insecurity as he tries to pass muster alongside adult professionals.
The other was the story of a little robot boy, trying to fit in with humans.(Why? Why was he programmed to feel pain?) At first I imagined a concerned looking Susan Sarandon, scooching over on the bed to make room for this little forlorn metalic guy. Then, when Alex read her words, I was suddenly visualising a robotic Susan Sarandon.
Maybe narration would be a good way of delivering such a twist in the story,where by hearing the voices you discover who’s a robot and who isn’t (perhaps this would necessitate a mash-up of Will and Alex narration?).
Or maybe it’s like Bladerunner and the twist would be that Susan Sarandon had never realised she was a non-human replicator herself?
[Thinks: Would Thelma and Louise have had any less impact than it did, had the closing images involved one nail-polished metalic claw covering another one in a gesture of support and solidarity??]
This just in – Amazon has caved:
SAN FRANCISCO (AFP) — Amazon is yielding to concerns of authors by letting them selectively silence a read-aloud feature in Kindle 2 electronic book readers that hit the market in February.
The US Authors Guild had warned that the new Kindle feature could pose a “significant challenge” to the publishing industry and hinted at possible legal action by saying they were studying the matter closely.
“Kindle 2’s experimental text-to-speech feature is legal: no copy is made, no derivative work is created, and no performance is being given,” Amazon said late Friday in an announcement posted online.
“Nevertheless, we strongly believe many rightsholders will be more comfortable with the text-to-speech feature if they are in the driver’s seat.”
First!. I agree. getting upset over that is just silly. However, it is good to cast the wary eye. There are much much better t2s programs out there. I gave a look through my bookmark, and called it a lost cause. I did some googling, and found two things of interest.
Better Speech:
http://www.research.att.com/~ttsweb/tts/demo.php
and Singing:
http://www.vocaloid.com/en/index.html
There use to be longer samples of the singing voice, but i can’t find them. some can be found:
http://www.zero-g.co.uk/index.cfm?articleid=802
I’ve listened to a number of audio books. I like the one’s with a cast. I love (most) audio books read by the author. Using Neil Gaiman as an example, he does the voices as he imagined them. Software won’t ever touch that.
What I have used since 1989 as a test for the quality of a text-to-speech generator, on Macs, on Windows machines, and even the Apple IIgs, is the Captain’s monologue from the Star Trek title sequences. The day that text-to-speech can render “Space…the final frontier”, etc., with the excitement and clarity that William Shatner or Patrick Stewart bring to the reading will be the day that text-to-speech is an economic threat to the human audiobook reader. Right now, even after twenty years of further development, it isn’t even close.
i hate the computer voice oh so much. my original mac back in the day would use that voice for everything. the only time i ever use it now was to study for an exam, i had my computer read me my notes as i studied. i dont think authors should have much fear, that voice is all too annoying
When I purchase a book or a video or a CD–or legally download the same–I am purchasing usage rights. In my understanding, that allows me to do anything I want with it so long as I don’t make money with it or deny money to the originator. That (in my words) pretty much describes my rights.
If I want to copy or convert from one type of content to the other for my own use, the author/creator has not been denied anything. I think it should be my right to convert printed or electronic words to spoken words or even braille. It should also be my right to convert purchased audio books or human speech to written words if I was deaf.
You know what they say opinions are like.. but here’s mine anyway:
I am disappointed that Amazon ‘caved’ (even partially) in the matter. Here is my take: To me it’s all about the ‘audience’. I liken this to the whole ‘illegal downloads’ thing. Look at CDs. You can download a rip of a CD off the internet at a very high bitrate, which to most ears is no different than listening to the actual disc. For this ‘audience’, the experience they are getting is exactly the same. Same music and virtually the same quality. That is why they download pirated CDs rather than buying them (plus its free). It’s the same with movies. You can download movie files of various qualities, all the way up to Blu-Ray rips. Pair that with a PC connected to your big screen plasma… and the experience for this ‘audience’ is virtually the the same as purchasing the disc. Now how does this relate to the Kindle issue? While nothing is being pirated and nothing is illegal, it still has correlations. It’s still all about the ‘audience’ and the ‘experience’. Let’s look at the ‘audience’ for audio books: They enjoy the natural, dramatic and professional presentation they receive when they buy a tape, CD or digital download of an audio book. Does Amazon’s Kindle provide that same experience? No it does not (I can attest to this, since I just bought my wife a Kindle). The ‘audience’ for audio books will never receive the same experience from text-to-speech as they do from a professional recording… hence, publishers and authors are not going to lose sales of audio books to the text-to-speech feature of Kindle or any other device or software. What I would have rather seen Amazon do is agree not to develop or improve upon the text-to-speech feature any further (not that the technology has improved much in the last 5 years anyway), but not limit its use in any way. That way the option is always there, but it will never be able to threaten audio book sales by coming close to the quality or experience. I don’t think you can look at text-to-speech as a free audio book for the price of an ebook. If people want an audio book for free, they’ll download an illegal rip, just like they do for the latest music. They aren’t going to settle for text-to-speech… they want the experience. Limiting what people can do with content they purchase just isn’t right (whether it’s physical or digital). If I own it, I should be able to do what I want with it (as long as it’s legal, which text-to-speech is, since no copy is made, no derivative work is created, and no performance is being given). If I buy a physical book, I can read it when I want, I can do with it whatever I want, I can even burn it when I’m finished reading it. I can read it to my wife or my kids. I own it and that is my right. The only thing I can’t do is copy the book (photocopy or scan) and give it to someone else or read it aloud and charge people to listen to me. I should have those same rights with a digital copy. I can read it aloud, have Kindle read it to me, delete it when I’m done. The only things I should not be able to do with it is read it aloud for profit or crack the DRM on it and distribute it to others. As long as I’m not breaking those laws, then the publishers should have no right to limit what I can do with the content that I’ve purchased and own. Those are just my thoughts though.
I had a revelation today. Something I never thought of before. The Kindle 2 could be an incredible boon to the disabled. My brother is quadriplegic. He’s done the audiobook thing often. But can you imagine having a Kindle 2 where all you need is just one device to listen to thousands of books? It could really open things up for him. I’m sorry, but in this instance, I am for the Kindle 2 and against audiobooks, simply for the ease of my brother.
For those of you that listen to TWIT (This week in Tech) with Leo Laporte (thanks for getting me hooked on this wil! started listening with your appearance). They discussed this on the last episode and even referenced your blog posts!! I felt “so” informed since I was of course familiar with your post already. Just thought if your ears were burning you would like to know the cause!
The episode is TWIT episode # 184
It is laughable to think that someone would equate one of those performances with the other. I can only imagine that the Authors Guild was worrying that if they let this pathetic implementation slide, then they would have to let next version and the one after that slide while they gradually improved. At some point maybe 10 years down the road the voice would be good enough to actually compete with live actors, but the precedent would have been set, so the battle would be lost.
How long do you think it will be before someone makes an mac app (maybe even an iphone app) that uses the webcam to capture an image of the kindle page, run it through OCR and reads it to you in the “alex voice”? the only tech that is missing is some way for the mac to turn the page on the kindle.