For the last year, Mycroft and Mozilla have been building a relationship based on our shared interests. I was invited to join them at their company get together last week and want to share info about the event and what we’ve been up to this year.
Every six months the entire global Mozilla organization gets together in one physical location to share knowledge via presentations and organized sessions, to plan the next six months, and to let collaborators meet face-to-face. As someone who has worked in remote teams for the past 20 years, I can attest to the value of being in the same space to build camaraderie and rapidly share ideas.
As part of these All Hands, Mozilla also invites a number of “volunteers” from outside the company. These are individuals who aren’t paid by Mozilla, but who work closely with the organization. This year, I was included in this group.
The Spring 2018 All Hands was June 11 – 15 in San Francisco. In all, approximately 1200 employees and 70 volunteers attended.
There is quite a bit of common ground between what Mycroft and Mozilla are doing. Mycroft has, obviously, been working on voice interaction tech. Additionally, the level of access we have to hardware makes Mycroft a great platform for Internet of Things (IoT) interaction and control. We currently can combine voice with control of equipment like Hue lights, IoT hubs like Wink, or larger ecosystems like Home Assistant. Additionally, we are able to work with hardware directly, reacting to GPIO pins connected to switches and sensors. Plus we are able to communicate on local networks with equipment within the home.
These capabilities overlap with several Mozilla teams…
Machine Learning / DeepSpeech
Our tightest collaboration thus far is with the DeepSpeech team. Kelly Davis and crew have been implementing the DeepSpeech speech-to-text architecture, and Mycroft has been one of the earliest actual consumers of the technology. Mycroft can currently use DeepSpeech running on Mycroft Home cloud or from your own private instance.
We are also both working on implementations of the Tacotron text-to-speech architecture. Mycroft has a Mimic2 implementation and has created a complete 15-hour dataset we are using for a joint voice benchmark.
DeepSpeech is young (currently version 0.2.0-alpha.6) and still rapidly evolving on both the code and the published models. The current model is noticeably weak in noisy environments and with rapid, conversational speech. The Mycroft community is providing access to the kind of data needed to train a model to handle this. I’m very excited about what we can achieve jointly!
Mozilla began Common Voice to gather the kind of language data needed for building technologies like DeepSpeech. While their CC0 (aka “public domain”) data licensing model is different to Mycroft’s OpenVoice dataset, the collaborative ethos is very similar. We are sharing technical and social learning about working with a community to achieve better things together in data gathering and tagging – for a spectrum of languages.
Through this team, I met with another volunteer, Dewi Jones of the University of Bangor in Wales. He and I had several discussions about what it will take to build a fully-functional Welsh Mycroft as part of their Welsh Language Technology program. FFantastig!
This new IoT framework looks to simplify and unify the physical world with web technologies into a Web of Things. Mozilla integrated Mycroft’s Adapt intent parser into the platform several months ago to simplify working with all sorts of natural language commands. They are also working hard on their Things Gateway built on the Raspberry Pi where Mycroft/Picroft would obviously offer some powerful possibilities.
This caused a bit of a stir in the tech journalism world! I won’t say much, but really Scout is just an early experiment in how consumers think of voice technologies and how they might interact in collaboration with, and independently of, a screen. Mozilla has spent almost all of its existence developing the technologies for today’s web, and these are inherently visual. We are all learning how consumers think of speech differently than written text.
Both Mycroft and Mozilla are very focused on enabling and representing the user, not vice versa. This is a particularly tricky thing to do in the data-driven machine learning world. Our efforts in collaborative data gathering and tagging, remote and federated learning, data ownership and licensing are leading the way to a powerful AND ethical future.
I believe trust is a new economic benchmark, and one at which both our organizations excel.
I really enjoyed finally meeting old and new colleagues face to face and look forward to what we can do together in the future!