Adrian Ispas

April 7, 2024

Vatis Tech Takes a Leap Forward: Announcing Our v6 Transcription Model

TABLE OF CONTENTS

Experience the Future of Speech Recognition Today

Try Vatis now, no credit card required.

We're delighted to announce that we've once again pushed the boundaries of speech-to-text technology. Our commitment to advancing speech to text for the Romanian language has yielded significant results with the latest upgrade from v5 to v6 of our model.

With substantial improvements in accuracy, the v6 model stands as a testament to our continuous endeavor to provide the best in-class solution for our users.‍

‍

A Deeper Dive into the Results‍

The most impressive metric showcasing our model's evolution is the Word Error Rate (WER). For those new to the world of speech recognition, WER is a standard metric used to measure the performance of a speech-to-text conversion. It calculates the ratio of incorrect words (substitutions, insertions, deletions) to the total number of words spoken. A lower WER indicates better accuracy. For instance, if WER is 0.1, it implies a 90% accuracy rate.

‍

Let’s delve into the numbers

Overall WER improvement from v5 to v6: +8%

v5 WER: 0.064488
v6 WER: 0.059555

This signifies that we've successfully consolidated our transcription model for the Romanian language, consistently achieving an impressive 95% accuracy across diverse datasets and challenging audio types.

‍

Spotlight on Specific Improvements

Phone Calls: one of the most dynamic environments for speech-to-text technology is in phone call transcriptions. Varied clarity, different accents, and background noises can present challenges. Our v6 model proudly showcases a substantial 20% reduction in error rates.

v5 WER for Phone Calls: 0.07226 (implying 92.77% accuracy)
v6 WER for Phone Calls: 0.05806 (implying 94.19% accuracy)

Phone Call WER for v6 Transcription Model — Phone Calls WER for v6 Transcription Model

‍

Legal Audios: in a sector where precision is paramount, our model has achieved a 17% improvement in error rate when transcribing legal documentation.

v5 WER for Legal Audios: 0.07166 (implying 92.83% accuracy)
v6 WER for Legal Audios: 0.05984 (implying 94.01% accuracy)

Legal Audios WER for v6 Transcription Model

‍

How Can One Calculate Accuracy Based on WER?

To put it simply, Accuracy can be calculated as:

Accuracy = (1 − WER) × 100

So, if you have a WER of 0.1 (or 10%), the accuracy of the speech-to-text model would be:

Accuracy = (1 − 0.1) × 100 = 90

‍

Robust Evaluation Using Extensive Datasets

To ensure the effectiveness of our upgrades, we employed 50 datasets for evaluation. These datasets comprised a whopping 100,000 data samples, guaranteeing a comprehensive and exhaustive assessment. Such thorough testing not only validates our results but also provides users with the assurance that our improvements are genuinely beneficial in real-world scenarios.

‍

Wrapping Up

At Vatis Tech, we're driven by the desire to innovate and refine our solutions. Our Romanian speech-to-text solution's latest upgrade is a clear manifestation of this commitment.

We extend our gratitude to our dedicated team, our partners, and most importantly, our users, who continually motivate us to strive for excellence.

With the v6 model now available, we invite you to experience its heightened accuracy firsthand. Stay tuned for more advancements in the near future!

Continue Reading

Claudia Ancuta

April 22, 2024

How to Get Transcript of Youtube Video: [May 2024] Tutorial

Need a YouTube video transcript? This guide covers desktop and mobile access to YouTube's built-in transcripts. Plus, learn about external tools for greater accuracy and editing control.

Introducing Vatis Tech V7 Transcription Model

Adrian Ispas

March 19, 2024

Vatis Tech Embarks on a New Era: Introducing Our V7 Transcription Model

It's with great excitement that we announce a significant milestone in our journey: the launch of our V7 transcription model.

Claudia Ancuta

October 28, 2023

A Deep Dive into Captions: Everything You Need to Know

Captions are essential for making audio-visual content accessible to all, including those with hearing impairments.

Antonio Barbalau

October 11, 2023

What is speaker diarization and how it works

Speaker diarization is essential to making sense of transcripts generated by speech-to-text services. We present two popular approaches to implementing this.

Experience the Future of Speech Recognition Today

Try Vatis now, no credit card required.

TRY FREE Contact Sales