I’ve been at Mycroft for 24 months now, and one of my major challenges is that my wife hardly uses our Mark I at home. I am passionate about Mycroft, and I want my family to see all the hard work that we put into this product.
To be fair, my wife, Ashley, doesn’t really like voice assistants. She hates Siri, and the other ones we have had she just views as an annoying “Nate” toy. But, her challenge with Mycroft is different. It’s a lot like the issue a Community Member voiced on Twitter recently.
The voice system doesn’t recognize my voice. So it’s
Me: “Mycroft, turn on the light.”
*nothing*
“ANDY, MAKE MYCROFT TURN ON THE LIGHT!”
Andy: “Mycroft, turn on the light.”
*light comes on*
I would rather just flip a switch.
— Katie (@katiek_wanders) May 22, 2019
The Problem
To make a product that uses Machine Learning as a key pillar of the technology, there are a lot of things that have to come together, but at the root it is all about data. Clean, diverse, and accessible data. The giants that are in this space have budgets to buy data, buy companies, and trick their users into always giving up privacy for the use of the product.
At Mycroft, we are presented with a real challenge, because at the heart of our company is Privacy and User Agency. What that means is we do not by default store any recordings, transcriptions, or data when people use Mycroft. To obtain the data we need to improve Mycroft, we ask those who are comfortable to contribute to our Open Dataset. Contributing means specifically Opting-In to share your recordings, transcriptions, and usage data. Once Opted-In you have the right to Opt-Out and delete your dataset at any time.
The more people across different genders, accents, and ages that Opt-In, the more diverse the dataset is that trains our Precise wakeword model. You can do this in under a minute at https://account.mycroft.ai/profile! Please consider it!
Now we are in an almost humorous Catch-22 scenario. Mycroft only hears my wife 50% of the time. So even if she is using it at the same frequency as me, the utterances submitted will be only half of what I donate to the dataset. Magnify that across the portion of our Community that has Opted-In, and now you have a ratio that is not reflective of the makeup of the world.
The Solution
We’re developing a three-pronged approach here at Mycroft.
First, we need more people with varying voices to use the device and Opt-In. This builds the dataset.
We then need to tag those utterances, using the Precise Tagger which we’ll relaunch soon. We then train new global Precise models on those utterances, improving accuracy.
Lastly, we need to think outside the box for edge cases. For this last approach, we have started work to build a local machine learning tool which allows people to directly train their wake word on-device.
Personal Precise Trainer
This personal Precise trainer will let an individual record themselves speaking the wake word a few times, then perform a training on top of the general wake word model, creating a custom model which is immediately available and just for themselves.
This approach brings several benefits. For one, Mycroft is immediately able to recognize voices previously missed by the global model. Additionally, this system sets the stage for potential future features like speaker identification.
We will also add the ability to submit the local recordings to the global model, allowing you to improve the global model for others who sound like you. This is all early work, but experimental results have been very promising!
This will not be an easy or quick problem to solve, but from what I have seen so far it appears to be the right approach to design, develop, test, and release for you all.
Nate Tomasi is the COO of Mycroft AI. He’s built and operated strong teams and processes at companies like Captify Health and Rhythm Engineering.