Vanessa Bella | Text-to-Voice: Making Product Labels Accessible

Vanessa Bella

Text-to-Voice: Making Product Labels Accessible

18 Dec 2024

Born from a simple observation: product labels contain critical information, but they’re not accessible to everyone. Small print, complex medical terminology, or even language barriers can make labels a challenge for many people.

_The problem: So many times I have struggled to read ingredients on a food package? Now imagine facing this challenge daily due to visual impairment, language barriers, or other limitations. Critical information like dosage instructions, allergen warnings, and consumption guidelines becomes dangerously inaccessible.

_The wake-up call: One too many close calls heard of on social media or family stories on struggles to read medication instructions, I realized I probably take text accessibility for granted. Sure, manual assistance works, but what about independence and privacy? I think everyone deserves access to product information without depending on others.

_The solution: This Text-to-Voice project was built to bridge this accessibility gap. It’s not just a simple OCR tool, it was built to understand the structure of product labels, extracts key information like product names, ingredients, and usage instructions, and converts them to natural-sounding speech. The real thing? - probably recognizing and prioritizing the most critical information, like consumption instructions, even when the image quality is poor.

_Key features:

Text extraction from images using both traditional OCR and AI vision models
Parsing for product label structures
English and German languages support
‘Natural-sounding’ text-to-speech conversion

_Technical highlights: The project gave me hands-on experience with:

Building RESTful APIs with FastAPI
Computer vision techniques for image preprocessing
Hybrid OCR approaches combining traditional and AI-based methods
Natural language processing for extracting structured information
Regular expressions for pattern matching
Text-to-speech integration

_Learning outcomes: This project taught me the challenging aspects of dealing with the variability in real-world images—different lighting conditions, angles, and label formats. I discovered the limitations of traditinal opencv (could not identify German letters correctly) - which messed up the voice output/ user interpretation. Combining traditional OCR with AI-powered vision models created a more reliable solution (in this project image editing was done with opencv and the default text extraction was with AI, with a cv2 fallback). What’s most satisfying? Knowing this tool could help people access critical information independently, whether they’re dealing with visual impairments, language barriers, or simply aging eyes.

Check out the code.

_Future plans: I’m working on expanding language support, adding other file format(e.g pdf), adding a mobile app interface, and implementing better handling of specialized label formats like prescription medications.

Remember: Technology is most meaningful when it makes everyday information accessible to everyone.