London Book Fair 2015. A huge event, crowds of people, readers, writers, publishers, journalists, all bound by one passion: books. And a great moment in the history of publishing: the Official Launch of OpenBooks.com.
As revolutioners attract each other, at London Book Fair we met Xavier Anguera - a passionate innovator and leader of an enthusiastic team who also try to make world better. Their contribution is a brand new technology of creating two-in-one audio-e-Books. But Sinkronigo Publishing makes 1 plus 1 equal 3 :-) Their added value is very special feature: the text is highlighted as it’s spoken in the audio eBook. Wow - this rocks!!!
Okay - it’s been done before, but by time-consuming manual alignment of text and speech. Those guys at Sinkronigo Publishing hacked out the system and created the technology that combines text and audio at word, sentence or paragraph-level, with the precision of a samurai’s blow.
“Sinkronigo” means “synchronization” in Esperanto. (Just to recap: this artificial language was created in the 19th century in the hope of making international communication easier [“Espero” means “to hope” in Latin]: only 16 grammar rules, no exceptions or irregularities, easy to learn and politically neutral.) Sinkronigo’s technology is universal and language independent.
OpenBooks.com and Sinkronigo Publishing have embraced each other with great enthusiasm and this alliance is going to make read-aloud noise in the DRM-freedom revolution :-) Among the audio-e-books available at OpenBooks.com, there are classics by James Joyce, Henry James, F. Scott Fitzgerald, Jerome K. Jerome, Sun Tzu and many others.
Monika, OpenBooks.com: Where did the idea of Sinkronigo come from? What was first - the concept of the technical possibility or the user's need for functionality?
Xavier Anguera, CEO and founder of Sinkronigo: I had the idea of creating talking books with real voices for a long time before creating Sinkronigo. It’s well known that having someone read a text out loud to you while you’re following along has many advantages for people of all ages. I could see that people were using text-to-speech (TTS) technology to have text read to them. While this proved to me the usefulness of having speech and text read together, I thought this wasn’t good enough for long texts, as the robotic-sounding voice can make you tired very quickly. Having helped develop TTS technology in the past, I know that it will take a while before TTS technology becomes good enough to be indistinguishable from an actual user reading the text.
In order to build what I had in mind, I had to wait for the right hardware and software to become available and I needed to develop the right technology to allow for any text to be accurately aligned with a human recording.
In 2010, when the first generation iPad was introduced, I saw an opportunity to take a step in this direction. Together with a friend we developed basic text-to-audio alignment technology and a native iPad application to be able to show text and audio together. At that time we published the results in a research paper at an international conference and stopped working on the project as the alignment technology was still not mature enough and we believed that a standalone IOS application wasn’t the right way to reach people.
On the application side, a good step forward was taken with the definition of the ePub3 standard in late 2011 (with read-aloud capabilities) and its integration by some vendors (including Apple, Kobo and Google) into their readers. This opened the possibility for us to reach the general public that had started reading books using these devices.
On the technology side, there are several solutions for the problem of aligning audio to text, although most of them are either too slow, not robust enough or not accurate. During 2014 I developed technology that avoids these issues and allows for a very fast, robust and accurate synchronization of text and audio. I also developed the necessary tools to integrate any text content and its associated narration into an eBook to be compliant with the readers mentioned above. Sinkronigo was incorporated at the beginning of 2015 in order to produce our own read-aloud eBooks as well as to offer the service to authors and publishers.
I am impressed - your work and accomplishments are very interdisciplinary and demand combined skills and knowledge from various, quite distinct domains. It needs a Rennaisance-man-personality. Are you more of a humanist booklover or a technology person? Do you find this distinction relevant?
X.A.: Thank you for your flattering words. I had a good laugh at your statement of having a “Rennaisance-man-personality”. I come from a highly technical background, which I combined early on with the study of Linguistics and speech production and led to, later on, pursuing a PhD degree in signal processing applied to speech processing. I generally love technology and books, but I personally got very interested in read-aloud eBooks because I could see how my contribution could help some people improve their skills in a new language and read better in their own.
It’s true that listening to long texts narrated by artificial voices generated by a text-to-speech system is not pleasant and nothing can replace human narration - at least at the current level of technology development. Sinkronigo doesn't use talking robots but live humans. How many speakers do you cooperate with?
X.A.: Our system works independently of who is speaking. The only adjustments we need to make is when we want to incorporate a new language. Publishers working with us usually send us their recorded narrations and the text and we create the eBook instantly. For the eBooks we produce ourselves in English, we’ve been using Librivox.com recordings. For other languages, we’re currently building a pool of voice talents and plan to start releasing books narrated by them very soon.
Regarding TTS, do you believe that it’s technologically possible to develop it enough to recognize subtle emotions expressed by the text and and emulate them in speech? Like anxiety, passion, fear, joy?
X.A.: Like other speech processing technologies, TTS technology has evolved enormously over the last ten years. For instance, nowadays there are commercial systems that could be perfectly used for short excerpts without making the listener tired. One of their limitations is, as you point out, in the lack of rich expressiveness in the resulting speech.
I’m aware of lots of research being conducted towards adding such expressiveness to a synthetic voice, although I believe it’ll still take several years until we reach a point where an audiobook read by a human or a machine will be deemed equally agreeable by a listener. Note that in order for synthetic speech to contain rich emotions, the system needs, first, to correctly hypothesize where to place these emotions in the text and, second, a big enough inventory of sound snippets with every single emotional state in order for the system to glue them together into the final, emotional, voice.
If an author wants to have their work published in a Sinkronigo version, how should they proceed with this? If an author would like to read their book themselves, could they do that?
X.A.: Turning a text eBook into a read-aloud eBook is very simple. The only thing we need is the eBook itself and a recording of each chapter.
If an author wants to turn their eBook into read-aloud format they can contact us for a quote and to get help in the recording setup. Our prices are very reasonable.
Which books are your most loved? What is their role in your life?
X.A.: I must admit that I’m not an avid reader of actual books. I used to read much more before I started pursuing research, and therefore started reading lots of technical papers instead. Lately though, as we’re producing books for many classic authors, I’m getting interested and sometimes read/listen to some of the eBooks that we produce.