Hacker Newsnew | past | comments | ask | show | jobs | submit | more dwoozle's commentslogin

This industry seems like it could be put out of business on the low to mediocre end by deep learning.


Despite advances in deep learning, it's still very easy to tell a TTS voice from a human voice. Even companies like Apple that has paid special attention[0] to the naturalness of TTS can't get it completely.

Also, did you notice that in all major animated films (think Disney or Pixar), while the imagery are all computer-generated, the voices are not?

[0]: https://machinelearning.apple.com/2017/08/06/siri-voices.htm...


Here's why I think animated films are done like that.

The best/only way to get most of that computer-generated imagery is by huge amounts of manual labour: designing, animating, simulating, sometimes motion-capturing. It's painstaking detail work involving many people.

The best way to get the voices is with a small amount of manual labour: voice acting.

If you put as much manual effort as the imagery into controlling the nuances of a TTS engine, you might get acceptable results, but it's far easier and cheaper to use voice actors. In fact, the easiest way to tell a TTS engine exactly what you want would probably be to voice act and have it mimic you. This might be worth trying to do if remapping vocal anatomy (e.g. woman voicing man or vice versa, or monster, etc.), but for most purposes it's easier to hire appropriate voice actors and/or manipulate the vocal recording audio than to use it to drive a resynthesis by simulation.


Also, it is a way to get A-list celebs involved and cash in on their popularity.


Maybe someday we will see (or hear) Siri's voice star in one of those Disney flicks.


Weird example since Siri is based on one real woman's voice. Maybe one of the WaveNet "personalities" might be a better example.


Or the Japanese voice idols?


Vocaloids?

Those are deliberately made to sound unnatural. Not to say it changes anything, and they've already shown up once or twice in anime.

(Though the only example I can name off the bat is Black Rock Shooter, and that doesn't include the voice. It's complicated. Mato is complicated, too.)


I use Apple's TTS to read books. It may not sound like a human, but after a few hours of listening to it the strange nature of the voice gets abstracted away by your brain. It's very functional. I knocked out Neal Stephenson's Anathem a week ago in about two days using TTS to read probably 75% of it.

What it doesn't do is the acting. In an audiobook, the voice actor will change their voice in various ways for dramatic effect, and in that respect the book becomes something like a radio drama. With TTS you're getting "just the book". I think that's the major difference and the refuge in which voice actors might hope for continued employment.


I can imagine using TTS to catch up on news articles, magazine articles, reviewing a textbook, maybe even listening to opinion columns while out on the go or multitasking with my Photoshop time, but I can't imagine it doing anything except ruining a novel, or anything else involving drama or comedy.


Far from ruining novels, it's quite pleasant. After you've become accustomed to it, it becomes a very low-fatigue way to read for extended periods of time. I find books read by TTS are just as immersive as reading print. In fact the experience of TTS and reading print seem closer to me than print vs audiobooks.

I guess what I'm saying is don't underestimate neuroplasticity. I wager you could even achieve casual fluency with morse code if you listened to it long enough. I'm under the impression that some telegraph operators did.


That's really interesting. Basically the audiobook is more of an interpretation of the text. It's perhaps closer to a movie adaptation. Once you lose the exception for the TTS to "tell you a story" but rather to "tell you the words", it becomes just a different input stream. I didn't think about it like that until now. I'll give it a try.


I listen to a lot of non-fiction in TTS but I haven't found it all that satisfactory for novels. Although part of the reason is if I'm listening to a novel via TTS it's because there isn't an audiobook available and I've had to do some hacky OCR to get the text in the first place.


I do this as well. I used Samsung's TTS engine at first, but Google's has mostly caught up. As a bonus, I can switch between listening (in the car or working out) and reading (most other times) without losing my place.


The audible audiobook version of Anathem is extremely good in my opinion. Probably one of my favorite readings... I’m surprised that speech synthesis does a reasonable job given the language involved.


The Audible version of The Baroque Cycle is one of the best voice performances I've heard.

Having said that, some fairly small scale audiobooks that have the authors narrating them are also very good as you can hear the interest and the passion of the author in their subject:

e.g.

https://www.audible.co.uk/pd/Wilding-Audiobook/B07DDMZ16R

https://www.audible.co.uk/pd/Exactly-Audiobook/B07CQ3RPKC


So far, I think this is a matter of time.

https://www.youtube.com/watch?v=DWK_iYBl8cA impresses me and it just feels like a low quality recording.

It's concerning given how computing power and resources advance over time.


check the same kind of work out of Lyrebird team

[1] https://www.descript.com/lyrebird-ai



It costs 100s of millions of dollars to animate and market a feature Pixar film. It would be very penny-wise/pound-foolish to try to skimp on voice-acting. Animated films will be the last place to start using TTS. Honestly they might ADR people in low budget live-action with TTS before it reaches the major animation studios.


> Animated films will be the last place to start using TTS.

Well, big budget theatrical animated films from major studios like Pixar or DreamWorks, sure.

But most animated films aren't those.


The imagery is computer generated because it is the cheaper solution compared to hand-drawn animation.


I don't know... whilst I'm not an audio book fan myself, people who are fans seem very sensitive to different audiobook performers. For example, James Marsters wasn't able to read a single book in the Dresden Files series, and lots of fans quickly noticed and complained.

It's similar to saying that voice actors (and then actual visual actors) will soon be out of business because we can soon 'automate' that too...

Voice (audiobook) acting is a _performance_, not a routine task to tick the "available on Audible" box.


I was just about to post this when I saw your comment. Yeah, when we accidentally got the non-Marster's (still good) audiobook my wife and I were immediately like "this is not right". Master's is the only audiobook personality I know and is talked about by fans all the time (think he used to be in Buffy).

For anyone who hasn't ever read the Dresden Files, it is a fantasy series set in modern Chicago with a young wizard way in over his head. The series starts off kinda bleh with the first few books, but steadily picks up pace as the main character gets more involved with all the crazy things going on. The author puts basically every single (ok a lot) of diverse mythologies as if it is all real (Odin, Mab, Erlking, Skinwalkers, Trolls, the Fae, Necromancers, 4 different kinds of Vampires, Roman Gods, the White God, Angels, Demons, Lucifer, Dragons, Cthulhu, and about a dozen other bad and good things all with a single coherant plot. The main character starts to see that all of these major supernatural entities are moving their armies like 3D chess. The characters take some time to develop, but are seriously good.


next book when


Peace Talks is supposed to be in December or January I think. He has been wrapping up the editing since September.

The dresden files subreddit has one of the author's beta readers that sends us some updates although no spoilers of course. She says it will be as big of a whammy as "Changes".

There was also a new Goodman Gray shirt story released recently.


Oh man I cant wait! thanks for the update


Marsters did eventually record that book. I listened to the original and definitely missed his reading of the book. I felt the other reader did a decent job (way better than most readers), but Marsters simply does a masterful job.


You mean a "Marster-ful" job ;-)


Peter Kenny is the narrator for most of Iain Banks' Culture series. Hearing a different narrator for Matter just felt off, I've come to associate Peter's accent with the setting.


Solid example. That lone book using a separate narrator was jarring. I agree with the performance aspect. It will take general AI to understand timing of a joke, or pausing for drama. One must literally understand the text to deliver a good narration. I don't expect to see AI capable of that in my life.


Most I could see it do is to allow voice actors change their voice, using it by itself is not feasible because DL doesn't know where to put emphasis, where to speed up and slow down, it doesn't sound emotional. I don't know who would like to have their book read by monotone voice. Maybe it could voice dictionaries and encyclopedias.


Tbh that doesn't sound like a particularly troublesome task. Super-set the language with some new accents, symbols, and new punctuation markers and then off you go.


Hate to break it to you, but injecting human emotion into speech is more than just putting intonation and punctuation in the right place.


I read the parent post as suggesting that the appropriate "acting" cues and emphasis might be explicitly annotated in the text just as the intonation and punctuation in normal text.

However, I'd guess that it's simply quicker and more efficient for someone to just properly act and speak the lines with a dramatic effect and any explicit annotation of how exactly they should be said takes much more time and effort.


If anything it could be _improved_ by deep learning. Shitty narrators will disappear, and we'll get high production values audiobooks instead, assuming humans want to get paid, a-la Dune, where there's an entire cast of well selected voice actors narrating the book, and the result is amazing.


I agree. I once put down an audiobook (Xenocide or Children of the mind, don't remember which one) because I couldn't stand one of the voice actors. She did a Japanese accent so overdone, it was so bad that I just skipped all those sections of the story.

Having the choice between a monotone narration and the voice actors in this case would be a much welcome improvement.


At least one of the narrators (who were all American) did a Chinese accent for some sections of Xenocide, and I felt that was the wrong move, considering that the characters were speaking to each other in their native language, not Chinese characters speaking English.


Oh yep. I listened to a full cast reading of The Man of Legends by Kenneth Johnson a couple of months ago and the narration was pretty good on the most part. There was a section that was told from the point of view of a young Hispanic girl and I couldn't understand much of it all. Really annoying.


Depends whether by "the industry" you mean "businesses making audiobooks" or "people working as audiobook narrators."


Let's put it this way: it's difficult for me to find the time to even listen to an excellently narrated audiobook. I'm not going to listen to something narrated by a machine without intonation if I can avoid it. But I had to suffer through one book (Disciplined Entrepreneurship, if you must know) where narration was so bad I wanted to drive my car into a tree more than once. Those guys will be replaced with a machine. The rest have nothing to worry about. The people described in the article are clearly far above anything machines will be able to do in the foreseeable future. Their professional work is closer to that of a theater actor than it is to reading the book.


Maybe using this technology posted here a few days ago?

https://news.ycombinator.com/item?id=21525878


Low to mediocre? There's more than enough existing audio files for specific voice actors to give it the Deepfakes treatment and synthesize the voices accurately. Of course, that's just voice synthesization, for stories they would have to be able to do different voices for different characters and be able to understand context and emotional moods and the like.


Sounds like AI would basically make the low to mediocre end of the business _exist_ at all, it really, really hard to find audiobooks of some lesser known stuff and I hate having to use TTS because it's awful, but if TTS sounded like that one demo Google did of their AI appointment booking service for Pixel, I'd be way less unhappy about it.


There's a lot of spam already in this regard, e.g. on youtube where there's bots posting machine spoken textbooks or wiki pages.


That's just ghastly spam content designed to generate advertising income,

I am not sure if some of the brands advertising on YouTube realise what crap inventory their adverts are being run on.


Look up deepzen


Yep — hard to believe that Satya Nadella is only 5 years into the job at Microsoft. It’s like a completely different company.


The risk reward profile of the S&P 500 is way better. Buy Legos if you like them, sell them opportunistically, but don’t get into it as an asset class.


You mean the multi tens of thousand dollar FPGA set up that Xilinx hawks doesn’t have source control?


Commercial FPGA tools are a joke. I really hope SymbiFlow [1][2] will make the development experience way better.

[1] https://symbiflow.github.io

[2] https://github.com/SymbiFlow


I understand there is some validity with this criticism. Xilinx has always been slow to adopt standard software development practices. But really, Xilinx tools have never cost 'multi-tens of thousand dollar's. The most expensive tools are several thousand dollars. There are also free tools for smaller devices / certain flows. Source control is done with tcl scripting (simple text files) and standard source control tools. It's not as bad as it was in the past. Cheers.


Microsemi Libero is a joke and is probably the greatest hindrance to getting actual work done. I wish Xilinx bought them out instead of Microchip so there could actually be some improvement to the tools. You can throw every bit of computing resource at it's dinosaur PnR tool and it will just happily chug along at 1% CPU usage. They just came out with some update that made a 20% improvement to PnR, literally hours of time gained back from watching a wheel spin. What is it doing? Who knows, it will probably fail and not tell you why. Zero source control, almost zero documentation for scripting, and a design tool that seems nearly crash everything just to tell you there are IP core updates available.

The only parts that actually work reliably are whatever ancient Actel-branded tools are hidden in the suite.


This is one of the reasons I jumped ship from writing VHDL for Xilinx parts and am now a Java dev. It’s worth noting that Altera, the other major FPGA vendor, is not noticeably better in this regard (or at least they weren’t when I last used their stuff ~5 years ago).

(The other reason is that I am unbelievably bad at getting the damn thing to meet timing.)


So every time you compile/synthesize the result is different?


Not with the same exact source files / same software version. But sometimes small changes in the design, or changing tool versions can cause the design to not meet timing. It's just the nature of map / place and route.


It often is. Place-and-route is typically implemented as simulated annealing, which is a randomized algorithm. Unless you explicitly force it to reuse the same seed, you'll get slightly different results each time -- and even if you do set a seed, small changes to the HDL may result in a vastly different result.


People like to lambast "enterprisey" software on HN, but it's got nothing on the typical software written by a hardware company.

In their defense, nobody is buying FPGAs for the tooling, so "just good enough to check the boxes" is probably the correct economic decision with the current state of affairs, since it's probably 10x as expensive to write software that is actually pleasant to use, and while they sell a lot of FPGAs, they don't sell very many FPGA dev tools, so the economies of scale aren't there.


It’s another word for mirror and has cool vintage/steampunk/Victorian associations so it’s going to be used by a lot of different people.


I think it’s more that Americans culturally tend to the “extreme sports” mentality in many aspects of their lives. Biggest baddest loudest most intense most successful mosh committed hardest working are all excessively venerated here.


That's not particularly reflected in other forms of intoxication though. A ton of people in the US drink super light beer for instance (vs hard alcohol). They also tend to avoid espresso shots in favor of shake-like products.

I'm not saying you're incorrect but other consumption habits don't always lean toward the most extreme option.


> they also tend to avoid espresso shots in favor of shake-like products.

Maybe they're just competing to see who can have the biggest, sweetest, most extravagant beverage, rather than who can make it the most concentrated. If the name of your coffee beverage takes five minutes to pronounce, you're the coolest cat in town.


I mean there are still plenty of alcoholics and people with beer bellies. Light beer is more conducive to American activities, maybe. If you are sitting out in the hot sun, are you going to start ripping shots or drinking red wine, or a cold beer? A cup of drip coffee is typically stronger than espresso as well.


My original point was that maybe strong weed was more conducive to American activities :)


Damn, can’t believe Fred Smith is still running FedEx.


What would the behavior be if this happened in user space?


File browser / user session would crash.


Is that less bad? Wouldn’t that nuke the experience too?


When explorer (the shell and file browser, don't know if those are fully separate under Win 10) crashes, it is automatically restarted and would not close any other running programs and would not log you out.


...and that's probably the reason why they don't seem to get fixed. I've had clean installs of Win10 and 8.1 crash it often, just by browsing through a lot of files.


Thats a regular occurrence for me and people I know. My boss inserted a USB drive and Explorer crashed. It restarted itself, but crashed again. Until the drive was unplugged, at which time Windows made a bunch of error noises and popped up a "Catastrophic failure" dialog. It then worked OK after the USB drive was reinserted. The laptop cost ~3500€ and was installed and set up by the IT department with extra care because it's for the boss. It didn't matter, Windows 10 doesn't pick it's victims.


It’s a good test for whether when the entrepreneur needs to close a big sale they can get connected to the purchasing decision maker.


If you only trust facts that you verify yourself firsthand, you can’t really live on the accumulated progress of human expertise. Your level of technological sophistication will max out at “can make a thatch hut and gather pineapples.”


> If you only trust facts that you verify yourself firsthand, you can’t really live on the accumulated progress of human expertise.

Arguably, the differentiating factor between humans and animals is our ability to organize on large scales. Monkeys and killer whales can coordinate behavior in groups of sometimes up to 50 or maybe 100 individuals. However, humans can coordinate behavior across millions of people. Building a modern automobile, a computer chip, or the social security system requires coordination across so many individuals, no one person can fully and deeply understand the entire system.

The ability of humans to believe information that they do not fully know is true, is probably one of the defining traits of humanity.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: