ioAudio's Back Story

ioAudio isn't your typical text-to-speech tool.
And this isn't a typical story of launching fast and iterating.
Speed to market matters, while patience and timing are most critical.

I didn't spend long nights dreaming up million-dollar ideas for software products to bring to the market. In fact, the concept for my first text-to-speech (TTS) product came as an epiphany while undergoing common stresses you likely deal with all the time. 

Picture this...the year is 2012, I was recently married and working in Los Angeles — a whole two hours from my residence in San Diego (yes, I spent four hours a day driving to and from work!). I was in Marketing at a telecom SaaS company in Hollywood, so as you'd imagine, I received a ton of emails and attachments throughout the day. But because of my long commute, I had to wait until I reached the office to get caught up (sometimes while rushing down a hall to an important meeting).

So it hit me — what if I could listen to my emails and documents while I was driving to work? By no means was this a brand new concept, but it was definitely in its infancy at the time. Most providers had a utility or browser extension download with limited functionality for long-form content. I searched around for a product I could use, but only found monotonous, robotic sounding text-to-speech tools. It was hard enough leaving to work before sunrise — the last thing I needed was to be lulled to sleep by an excruciatingly boring robotic voice. I opted for listening to dozens of audiobooks and podcasts.

Then the epiphany, I knew I wasn't the only one who felt this way, which means there could be a market for a better solution. And that was the birth of AudioProposals — the first version of my MVP (minimum viable product), primarily focused on increasing productivity for sales people. Was it a flawless concept with hundreds of customers and investors tearing down my doors? Not by a long shot. In a nutshell, AudioProposals sounded like the perfect text-to-speech software. Users could create limitless playlists, use any file format, and it was all in the cloud. But in the fine details, there were several major issues — can you spot the first one?

AudioProposals mobile app screenshots took a whopping three days to get your text converted into a human-sounding voice. Not something you'd expect in this generation of instant gratification. And it's all because we were using actual humans to edit out the nonsense (table of contents, headers, footers, outlines, etc.), record voice-overs, master the audio, package it, and send it. So not only was the process long, but there were also security issues for businesses that felt uneasy about people reading classified documents.

AudioProposals website screenshots

Then to make matters worse, our pricing plans were, well...take a look for yourself:

AudioProposals and ioAudio pricing examples

First, let me state it was typical back then to charge by the page, number of characters, or number of voices for text-to-speech solutions. But the reality is — who really knows how many pages they'd need per month? How many voices do you really need? The average person sure doesn't know and this could become costly for business people with hundreds of pages requiring conversion each month. Trying to compete with others on the market meant a race-to-the-bottom on feature differentiation, rates, quality of service, and metrics that matter for Customers (such as ROI). Needless to say, the margins were horrible and scaling the business would be next to impossible.

It got to the point where I couldn't get 100% behind my pitch, let alone sell the idea for investments. If I wasn't convinced, how could I expect customers or investors to be? I wouldn't settle for anything less than an optimal end-to-end experience. So down went AudioProposals. But not for long — I had another iteration to bring to the public, my MVP#2, ioAudio.

This time around, I decided to get help from my audience by getting as much feedback as possible. If you're familiar with Lean Startup strategies, this phase brings things back to identifying the problem-solution fit. After all, this product is for them. So to better understand the customer, I met and surveyed hundreds of business executives and managers about their problem and ideal solution when it came to reading documents and increasing productivity. This is what was found out:

  • 72% said they don't have enough time to read everything
  • 45% had no ideas of how to resolve the problem, but wanted a way to put time back into their day
  • There were way too many emails and attachments, and blog posts or web pages were typically too long

So in order to tackle daily reading, they had to :

  • Come to the office earlier or leave later
  • Prioritize emails based on the client, peers, and deadlines
  • Delegate it to other staff members
  • Visually scan documents (and miss stuff)
  • Skip out on reading it altogether (and risk being unprepared for meetings)

So our team quickly put together a solution and learned how to use design and development sprints to release ioAudio. This time around, we decided to automate the previously manual process so our customers could quickly get conversion requests into the hands of editors. And get it back within one day (vs. three days). Unlike with Audio Proposals, ioAudio didn't have playlists (which we now brought back). We had a portal for our editors, as well as the narrators. Check it out below.

Screenshots of the first version of ioAudio

Screenshots of the first version of ioAudio in app, after log in

Then, to determine whether our solution was a fit, we took the help of the awesome team at Centercode and planned a Beta Test. Although we ultimately didn't pull the trigger on the full beta test, the process helped us to identify a backstory, acceptance criteria, and key metrics for validations. It also enabled us to build a working prototype and solution to meet their needs. We have and still continue to use customer feedback and transparency to build and improve our product. In fact, we have short-interval sprints to implement customer requests and changes quickly.

ioAudio beta test plan

Unfortunately, MVP #2 didn't make it either — we needed a way to make the process even faster. But I didn't give up on my dream and continued to meet with customers, research the newest technologies, and learn best practices to go to market. One of the hardest parts was seeing competitors come and go. They would go to market, meet with some success, and ultimately lose runway due to lack of product-market fit or gaining significant traction. The customers and demand were there, but the operating costs were too prohibitive to produce a quality product at scale. Since it didn't make sense for some of these operations to get new funding, we saw some consolidation and acquisitions in the market with larger players picking up smaller companies for their IP (intellectual property). Simply, the market had a need — I just needed to wait for TTS technology to catch up. And once it did (thank you Amazon, Google, Microsoft, and IBM), along came the new ioAudio.

Around 2016-2017, we saw a major shift in text-to-speech technology. For example, Amazon launched Amazon Polly. Polly was born from Amazon's acquisition of Inova. and priorities to improve voice-powered assistants (Alexa), transcribing Kindle books into audio, and call center, automotive, healthcare, and consumer electronics use cases. Plus, ways to address delivering listening experiences in multiple languages. The underlying architecture became more accessible too. We saw the continued increase in computing power and content delivery networks at lower costs, and the rise of TTS neural networks and deep learning that leveraged natural language processing, machine learning, and artificial intelligence.

After hearing the natural-speaking TTS products on the market, I knew it was time. This was the key to widening the margins and offering feasible prices, proper customer, sales, and product support, and a larger library of integrations to our customers. But even with these crucial components, it took years to find a solution that met the needs of both individuals and companies. Companies such as Slack, Drift, Gong, and Chorus, started proving that automating conversations in chatbots is feasible and affordable for commercial use cases. I consider AI and NLP powered text-to-speech a cousin to conversational chatbots so it's been exciting seeing their customer adoption and overall success.

ioAudio web app screenshot

Flash forward to 2020. Here I am with a excellent team of people, partners, and suppliers, who've helped build a text-to-speech platform designed with you in mind, focusing on:

  • Ease-of-use
  • Affordable and easy-to-understand pricing
  • Unlimited text-to-speech conversions
  • Non-boring, natural-speaking AI
  • Collaboration and sharing features
  • Analytics to help identify how you can be more productive
  • Sales and product consultations
  • Customer support and care

And we follow through with this by sticking to our core values of:

  1. Being better listeners
  2. Embracing change
  3. Never giving up
  4. Valuing other's time as our own
  5. Taking small actions to produce big results
  6. Building sustainability through innovation and alignment

So please join us today on this journey!

Every day we're one step closer to delivering the optimal experience in text-to-speech software you deserve.

Sign up to be notified of ioAudio's launch, and share your ideas here so we can build the best for you!

ioAudio's Founder, Spencer Parikh

Spencer Parikh

Founder, ioAudio