The art of speaking
SpeechNow.co is a useful AI-powered text-to-speech (TTS) tool that has got a lot of attention when it was first launched a couple of years ago, but which has been overtaken by other sites in terms of features. Its capabilities in terms of multiple voices, however, and general eas of uese makes it a useful part of the toolkit for those looking to streamline voiceovers.
It works as a browser-based application that uses AI algorithms to convert text into spoken word recordings. The process is straightforward: users input their text into the platform, select their desired voice and language, and the AI generates a high-quality audio file that -with a few provisos - does a good imitation of sounding human. I've been using it myself for a couple of years now, mainly to provide some colour when inserting multiple voices into podcasts, and its simplicity of use is its main attraction, although I do wish it could do a little more to make it truly useful.

Creating synthesized speech can be as simple as copying text and selecting a voice style.
How does it work?
By and large, SpeechNow provides the ability to produce voiceovers that sound genuinely natural - although just hitting synthesise without tweaking is still liable to produce slightly robotic results. When it first launched, SpeechNow's AI technology ensures that the output could mimic the nuances and intonations of human speech much more effectively than many other TTS products on the market, making it ideal for creating voiceovers for video content, podcasts, and audiobooks.
This overall comment, however, tends to neglect the fact that SpeechNow can still be caught out by the vagaries of human intonation, tending to run more slowly and with a greter number of errors which it encounters longer, more grammatically complex sentences. Don't expect to copy and paste text and simply hit record if you want to get the best output. SpeechNow generally works well, but it requires tweaking and fine tuning to avoid uncanny valley effects where the underlying technology reveals its electronic nature. With alternatives, notably Google's TTS available via Notebook LM and some other services, it can be much easier to create natural sounding speech with less need to tinker with settings.
What is genuinely impressive about the application, however, is the sheer range of voices that it can draw upon. It has been vary noticable that the number of voices in the library has expanded considerably over the past few years, very much in contrast to Google which offers you any voice you need in some products so long as it's west-coast American. In SpeechNow, the range of voices runs from Arabic to Welsh, Japanes to Bulgarian.
Two types of voice are on offer: the standard and "neural models", those modelled on machine learning approaches that are much more effective for speech synthesis. Indeed, any hope you have for effective voiceovers pretty much depends on you using the neural models.
Is it effective?
This is one where my answer is somewhat different today compared to what it was two years ago, when SpeechNow was one of the few widely available TTS services out there. The number has proliferated recently and the overall quality has improved, and this is one reason why SpeechNow is harder to recommend than it once was.
The biggest frustration with its current incarnation stems less from the necessary complexities of its fine tuning than the fact it misses entire categories - notalby voice cloning which are available with models such as ElevenLabs (to be reviewed soon) or Resemble.ai. It is also not as competitively priced as some of the competitors, who allow lower rates to engage with their services in a more limited manner, whcih might work very well for the person who just needs occasional voiceover work. That said, it also offers regular lifetime deals which can be very appealing if you want a low-cost, online TTS engine that does a good enough job for short voice effects if not entire podcasts.