Help get this topic noticed by sharing it on Twitter, Facebook, or email.
I’m confused

Why is transcription so inaccurate, even if I have the caller spell the word(s)?

Twilio transcription seems to be horribly inaccurate. I have tried speaking words, spelling words, and it doesn't seem to work very well. No, I do not have an accent or speech impediment. Is there any way to get better results out of the transcription?
5 people have
this question
+1
Reply
  • Hi Jason, sorry to hear you're confused/frustrated with our transcription service. It's a relatively new feature for us and we're constantly working to improve it. If you have an audio files with transcriptions that you'd like to show us, it will help us to further diagnose the problem (but if that is too much trouble no worries). You can send them to help@twilio.com

    Based on our experience, transcription tends to work much better if there is minimal background noise, and some cell phones (like BlackBerry) tend to pick up more noise than others. I hope that is helpful, and we'll definitely keep you posted as we continue to improve this feature of Twilio.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • I’m thankful
    Hi Danielle,

    Thanks for the prompt reply.

    I will try to gather examples and send them as I progress with my testing.

    I have worked with other products that used a parameter to hint to the transcription engine what it should expect. For example:

    <Record ... type="digits,characters,creditcard,date,etc" />

    The product I'm thinking of actually trapped the caller in a loop until it received acceptable input for the specified type.

    You might pass that on to your developers because it could help increase the accuracy of the transcriptions if the engine knew it should be listening for digits, characters, or just words.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Hi Jason,

    These are great suggestions, however it sounds like what you're actually looking for is some kind of automated speech recognition (ASR) technology, and while we would love to, we don't offer that as a feature of the Twilio API yet. We'll definitely keep your feedback in mind as we continue to plan the next features.

    Any feedback you can give us on transcription quality is very helpful - the recordings will help us get a sense of what might have gone wrong (noisy background, volume, call quality, etc.)

    We'll keep you posted on new features as they become available, too.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • I’m happy
    Jason I feel your pain. I too wish the transcription was better. The next route i'm taking is a bit of twist but its worked in the past. I'm plan to wire up a mechanical turk bridge so that audio responses are pushed to turk for translation.

    Now like anything sure, that will cost a bit or two, but on my twilio project the audio is the first best bit i want, then though, i need it translated really really well. Turks can aid this. Setting it up to be automated, will be a fun challenge.

    I'm grabbing all qualitative type data, nothing that will spark a privacy issue, and turks will never know the context of the questions that got this audio answer, merely the task of transcribing it.

    Thats my first fix thats come to mind to tackle this issue. Granted that only works in english...

    dan
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • I’m intrigued
    Using mturk for this is a great idea (I know because I had it myself independently following SpinVoxGate). Definitely interested to hear how you go, particularly if you're able to make your code available.

    Sam
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Jason,

    One of the issues that you will experience is the quality of the audio delivered from the telephone network. Many audio different codecs are used for the delivery of a telephone call and many are not suitable for the accurate reception of audio for the purposes of speech recognition.

    If you have an understanding of the types of equipment that is likely to be used by your target audience then it is possible to determine the suitability of the technology being used.

    Twilio,

    What SR modeling do you use for your solution? Do you use phonetic or a large-vocabulary speech recognition engine? Each have their strengths and weaknesses and are best suited to different types of application so the understanding of the SR techniques used would help understand whether the service delivered would provide the capability necessary for a specific application.

    SR applications can be very powerful and deliver great value, but there is often a need for a processing need and understanding around the core transcription to deliver that value.

    Regards
    Graeme
    kis-consulting.com
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Hello,

    is there any news on transcription quality improvements? This thread looks to be quite old and I wanted to know whether there was an update.

    thank you,
    Olya
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • CHAMP
    I’m very disappointed
    Could Twilio transcription actually be getting worse?
    I was testing a Twilio app, and I counted from 1 to 12.
    Here is the transcription:

    "12 please for bye thanks 7 Pete 9101112"

    Notice there are no spaces between the numbers, so "one two" is indistinguishable from "twelve". Yucch!

    Also:

    "three" == "please"
    "four" == "for"
    "six" == "thanks"
    "eight" == "Pete"
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • 1
    There a lots of transcription businesses around. So, its got quite confusing for people to select the good transcription companies among them. From my past experience I can suggest to go for GMR Transcription as it is one of the leading transcription company or go for Transcription Vendors. One can find some independent and local transcribers in the Transcription Vendors database. So, it make it easy for the people who are looking for transcriptionists. Also I suggest to go for certified transcriptionists.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Finding a good transcription service can be difficult. Every website claims to be the best. After many years in the transcription business, I suggest first that you not use price as the determining factor. Although offshore companies are usually less expensive than US services, their grasp of the English/American syntax and spelling isn't the best. Ask where the transcription is being done, even if the company has a US address - many companies outsource offshore. If you have ongoing projects and/or high volume, it's worth the time investment to interview a prospective transcription service provider and to see if there are any company reviews about them online. The best way to find a good transcription service is to ask others who use services for a recommendation.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • I’m confident
    Dear Jason
    I had a very bad experience with transcription services as well, but i found a solution when i start business with NYC Transcription Services. This company has great experience and perfect services. Try it maybe it will work for you as well.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Do NOT use Twilio transcription. The service does not produce quality transcription. I think Twilio transcription service uses Spinvox. You used to be able to use the spinVox API to automate your voice transciption. But the SpinVox transcription API is no longer available to individuals. But is still available to large companies. This is because Nuance bought out SpinVox and the transcription service has been stopped by Nuance for individuals. My experience with Nuance is that even though they are a communications company they are very bad at communicating with customers. It took me a month of speaking to many people in Nuance to find out that the SpinVox API is "under a review". My suggestion to Twilio would be to offer various levels of transcription to developers. I would pay more for a quality transcription service. It is VERY frustrating that Twilio have not reacted to peoples complaints about their poor transcription service. I am a big fan of Twilio but the service lacks 2 things:

    VOICE RECOGNITION
    QUALITY VOICE TRANSCRIPTION

    What is annoying is the Twilio PR you receive i.e. "It is under review" etc. I think the issue of NO voice recognition is due to a limitation of Asterisk voice servers. So if you want VOICE RECOGNITION you might want to look some of Twilio competitors who do offer this facility.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned