Announcing Our v6 Transcription Model
Adrian Ispas

Adrian Ispas

April 7, 2024

Vatis Tech Takes a Leap Forward: Announcing Our v6 Transcription Model

TABLE OF CONTENTS

Experience the Future of Speech Recognition Today

Try Vatis now, no credit card required.

We're delighted to announce that we've once again pushed the boundaries of speech-to-text technology. Our commitment to advancing speech to text for the Romanian language has yielded significant results with the latest upgrade from v5 to v6 of our model.

With substantial improvements in accuracy, the v6 model stands as a testament to our continuous endeavor to provide the best in-class solution for our users.

A Deeper Dive into the Results

The most impressive metric showcasing our model's evolution is the Word Error Rate (WER). For those new to the world of speech recognition, WER is a standard metric used to measure the performance of a speech-to-text conversion. It calculates the ratio of incorrect words (substitutions, insertions, deletions) to the total number of words spoken. A lower WER indicates better accuracy. For instance, if WER is 0.1, it implies a 90% accuracy rate.

Let’s delve into the numbers

Overall WER improvement from v5 to v6: +8%

  • v5 WER: 0.064488
  • v6 WER: 0.059555
Overall WER for v6 Transcription Model
Overall WER for v6 Transcription Model


This signifies that we've successfully consolidated our transcription model for the Romanian language, consistently achieving an impressive 95% accuracy across diverse datasets and challenging audio types.

Spotlight on Specific Improvements

Phone Calls: one of the most dynamic environments for speech-to-text technology is in phone call transcriptions. Varied clarity, different accents, and background noises can present challenges. Our v6 model proudly showcases a substantial 20% reduction in error rates.

  • v5 WER for Phone Calls: 0.07226 (implying 92.77% accuracy)
  • v6 WER for Phone Calls: 0.05806 (implying 94.19% accuracy)
Phone Call WER for v6 Transcription Model
Phone Calls WER for v6 Transcription Model

Legal Audios: in a sector where precision is paramount, our model has achieved a 17% improvement in error rate when transcribing legal documentation.

  • v5 WER for Legal Audios: 0.07166 (implying 92.83% accuracy)
  • v6 WER for Legal Audios: 0.05984 (implying 94.01% accuracy)
Legal Audios WER for v6 Transcription Model
Legal Audios WER for v6 Transcription Model

How Can One Calculate Accuracy Based on WER?

To put it simply, Accuracy can be calculated as:

Accuracy = (1 − WER) × 100

So, if you have a WER of 0.1 (or 10%), the accuracy of the speech-to-text model would be:

Accuracy = (1 − 0.1) × 100 = 90

Robust Evaluation Using Extensive Datasets

To ensure the effectiveness of our upgrades, we employed 50 datasets for evaluation. These datasets comprised a whopping 100,000 data samples, guaranteeing a comprehensive and exhaustive assessment. Such thorough testing not only validates our results but also provides users with the assurance that our improvements are genuinely beneficial in real-world scenarios.

Wrapping Up

At Vatis Tech, we're driven by the desire to innovate and refine our solutions. Our Romanian speech-to-text solution's latest upgrade is a clear manifestation of this commitment. 

We extend our gratitude to our dedicated team, our partners, and most importantly, our users, who continually motivate us to strive for excellence. 

With the v6 model now available, we invite you to experience its heightened accuracy firsthand. Stay tuned for more advancements in the near future!

Continue Reading

Experience the Future of Speech Recognition Today

Try Vatis now, no credit card required.

Waveform visual