Captioning by Google

November 11, 2011 § 3 Comments

Improved automatic cc from Google/YouTube? Here’s an article from June:

We also saw a comment the other day about google “improved” automatic
captioning…and am wondering if anyone can tell us if there is any
evidence that is has improved over the past six months? one year? The
early automatic cc was unreliable (poor) overall,  so we’ve not tried opening
it over the past few months; yet as always, we look forward to

For example, if the system approaches quality
(professional quality, 98% accuracy), then it might one day be used — drum roll
here — in “real life” – on our mobiles with all those folks whom we
cannot understand in person – with good audio, for all voices, good
acoustics, etc.! That’ll be the day 🙂;;
CCAC is the place to be for captioning advocacy.
Never doubt that a small group of thoughtful, committed citizens can
change the world; indeed, it’s the only thing that ever does” Margaret


§ 3 Responses to Captioning by Google

  • Liz says:

    So far I have seen no improvement. Captioning needs to be done as we know it. That way, what is said, will be wrote. Its a long way about it. But at least we can value the video as a hearing person would. Currently these automatic captionings still make no sense from what I have seen. So if videos are only going to use that. I won’t watch.

  • I don’t have a cell phone working with android, the Google system that has also general voice recognition. Could someone who does reply about the cell phone part of the question, perhaps?
    Re automatic captioning on YouTube: in December 2009, in Accessibility and Literacy: Two Sides of the Same Coin, under “Short words are the rub”, I used Don’t get sucked in by the rip…, a video by the University of New South Wales, to illustrate YouTube’s automatic captioning. For instance, I wrote:

    … there are over 10 different transcriptions – all wrong – for the 30+ occurrences of the word “rip.” The word is in the title (“Don’t get sucked in by the rip…”), it is explained in the video description (“Rip currents are the greatest hazards on our beaches.”), but STT software just attempts to recognize the audio. It can’t look around for other clues when the audio is ambiguous.

    I have tried the auto-captioning for that video just now, and it presently gets “rip” or “rips” 4 times right. So there is some improvement there, even for such a pain in the neck of a one-syllable word as “rip”. My impression is that the overall automatic captioning for that video has even more markedly improved in 2 years, but I can’t back that with evidence, unfortunately, because my 2009 notes were not complete enough.
    Anyway, this automatic captioning seems still far behind the 98% accuracy mentioned in the post. However, the point is that it only shows if a user requests it, and users are forewarned of its “beta” (in progress) nature. Therefore they should expect some faulty transcriptions.
    We could probably monitor the evolution of this automatic captioning by regularly downloading the automatic transcription files for our own videos, and compare them with their human captioning. This could be done in wiki pages that keep a history of revisions, for instance.

What’s this?

You are currently reading Captioning by Google at CCAC Blog.


%d bloggers like this: