Initiatives

Mycroft consists of a number of components that power the voice assistant platform. Covered here are the technologies we use in our voice stack, the open source software teams we collaborate with, our open voice data projects, and Mycroft-integrated products.

Jump to:

Technology
Collaboration
Data
Built on Mycroft

Technology

Mimic

Text-to-Speech

Mimic II
The state of the art Mimic is a machine learning Text-to-Speech engine. It provides a suite of tools for collecting training data and voice recordings, correcting pronunciations, and generating Text-to-Speech from the Mycroft Home service. Read a bit more about Mimic II in this post.

GitHub Repo

Mimic I
The original Mimic is a fast, lightweight Text-to-Speech (TTS) engine developed by Mycroft and VocaliD, built on top of Carnegie Mellon University’s Flite software. Mimic uses text as an input and outputs speech using the chosen voice. It is light enough to run on Raspberry Pi class hardware.

📺 Watch a short video that explains Mimic I (1:18)

GitHub RepoDocumentation

Precise

Wake Word Listener

Precise is a Wake Word listener. Its job is to continually listen for a set wake word and activate when the sounds or speech match the wake word. Unlike other hotword detection products, Mycroft Precise is fully open source. Take a look at a comparison here.

Precise has been the default Wake Word listener for each Mycroft device since mid-March 2018. Prior to this date, PocketSphinx was the default. PocketSphinx recognizes Wake Words based on the CMU Flite dictionary of sounds.

In contrast, Precise is based on a neural network that is trained on sound patterns rather than word patterns. This reduces the dependence it has on particular languages or accents.

GitHub RepoDocumentation

Adapt

Natural Language Understanding

Adapt is an intent parser – a library for converting natural language into machine-readable data structures, such as JSON. It is lightweight and is designed to run on devices with limited computing resources, such as embedded devices.

📺 Watch a short video that explains Adapt (1:36)

Adapt takes in natural language as input, and outputs a data structure that includes:

  • The intent – what the user is trying to do
  • A match probability – how confident Adapt is that the intent has been correctly identified
  • A tagged list of entities – data that can be used by Skills to perform functions
GitHub RepoDocumentation

Padatious

Natural Language Understanding

Padatious is an efficient and agile neural network intent parser. It is an alternative to the Adapt intent parser. Unlike Adapt, which uses small groups of unique words, Padatious is trained on the sentence as a whole.

Padatious has a number of key benefits:

  • Intents are easy to create. Simply provide example sentences.
  • The machine learning model in Padatious requires a relatively small amount of data.
  • Machine learning models need to be trained. The model used by Padatious is quick and easy to train.
  • Intents run independently of each other. This allows new skills to be installed quickly without retraining all other skill intents.
  • With Padatious, you can easily extract entities and then use these in Skills. For example, “Find the nearest gas station”{ “place”:”gas station”}
GitHub RepoDocumentation

Collaboration

DeepSpeech

Speech-to-Text

We are working with Mozilla to build DeepSpeech, an open Speech-to-Text technology that can understand your voice. It is a fully open source STT engine, based on Baidu’s Deep Speech architecture and implemented with Google’s TensorFlow framework.

GitHub Repo

Mozilla Things

Controlling the devices around you using only your voice is the dream of every sci-fi fan. Pairing Mycroft with Internet of Things (IoT) technologies makes this dream a reality. We are supporting the Mozilla Things team to make IoT control systems that are both easy to use and easy to set up.

Mozilla WebThings is Mozilla’s open source implementation of the Web of Things, which connects real-world objects to the World Wide Web.

Mozilla IoT on GitHub

KDE

KDE is an international free software community developing Free and Open Source software. One of KDE’s most recognized products is the Plasma Desktop, which is the official desktop environment on many Linux distributions, such as openSUSE, OpenMandriva, Kubuntu, and many others. Mycroft works with KDE on providing voice interaction for the broad range of platforms Plasma runs on.

GitHub Repo

Additionally, the Mycroft GUI is built in collaboration with the KDE Plasma team. This interface fuses voice and screen interaction, paving the way for the Mark II and other exciting devices in the future.

Mycroft GUI on GitHubMark II GUI on GitHub

Data

Open Voice Dataset

The Open Dataset is a collection of user-donated voice data. It is used for training and improving Mycroft. Anyone using Mycroft can join the Open Dataset and contribute recordings of their voice samples. Wake word samples are used to reduce false positives (inadvertent activations) and false negatives (missed activations). Utterance samples are used to improve broader voice technologies such as Mozilla Common Voice.

Donating voice data is completely voluntary and is offered as an opt-in option.

Mozilla Common Voice

Project Common Voice is a campaign bringing together people to donate their voices in their native tongues to an open repository, or to help by validating existing recordings. Both are done right in the browser. These datasets can be used by any researcher to create and test new voice technologies, as well as be used in training production quality Speech-to-Text systems like DeepSpeech.

GitHub Repo

Mimic Recording Studio

Mimic Recording Studio is an application you can install to record voice samples. These samples are then used as training voice data to produce a distinct voice for the Mimic Text-to-Speech engine, i.e. used to produce a new Mycroft voice. Mimic II uses machine learning techniques to create a model that sounds like the voice on which it was trained.

GitHub Repo

Built on Mycroft

All of the initiatives above have broader applicability, and are being adopted by commercial-grade projects that strengthen the overall Mycroft ecosystem.

Chatterbox is the first build-it-yourself, program-it-yourself smart speaker kit designed for kids to learn about artificial intelligence by creating their very own voice-activated skills.

Neongecko provides a Conversation Processing Technology with Neon AI that adds voice and speech to websites, devices, and business systems.

Q.bo One is an interactive open source robot for makers, developers, kids, and educators. The robot has a pair of servos to control head movement and a mouth made out of LEDs. Q.bo interacts with the world via multiple microphones, speakers, and HD cameras, and is equipped with capacitive sensors on the top of its head so it can react appropriately when touched.

Ezra is a privacy-focused smart assistant for public and private schools, home-schooling co-ops, district administrators, daycares, DoD schools, government training, or just about any other education environment. Ezra can do things like take attendance, administer development assessments, and even give spelling tests to an entire class – all through the power of voice.

Sickweather maps and predicts the chance of sickness in your area. Mycroft works with Sickweather on a cough detection system in public spaces that respects individual privacy.