I have some unfortunate news to share.
Before I get into that, I want to point out that Mycroft has always had ambitious plans, but has never been a large company. At its peak of my tenure here we had 13 people. Every one of those people made sacrifices because they believed in Mycroft’s mission. I am grateful for everyone that stuck with us, despite the challenges, stress and uncertainty over the last three years, when they could have easily found better pay and less stress elsewhere.
Since starting here in early 2020 I’ve had to make some of the toughest decisions I’ve ever faced, and none more so than at the end of last year. At the end of November, just after the Mark II entered production, I was faced with the reality that I had to lay off most of the Mycroft staff. At present, our staff is two developers, one customer service agent and one attorney. Moreover, without immediate new investment, we will have to cease development by the end of the month. I want to explain what this means for our customers and community, and for Mycroft’s future in general.
The consequence of the first round of layoffs was that we were unable to make as much progress on the software as we had planned for after the launch of the Mark II. It has also greatly impacted customer service response times and our ability to engage with the community in general. Fortunately, manufacturing and shipping of the Mark II has not been affected, as these processes are handled by our manufacturing partner. All components have been purchased, and all Mark II orders outstanding and those placed in the future will be delivered.
We’ve been diligently pursuing options to ensure that all devices shipped to date and in the future will continue to operate and that our customers’ privacy will continue to be protected. The first measure we’ve taken is to ensure that even if we must shut down our servers at some point in the future, all Mark IIs will continue to operate. Our efforts to push everything to the “edge” and to improve our privacy policies have made this possible as a natural stop on our technology roadmap.
The second measure we’ve taken is to enlist the aid of one of our long-time partners to ensure continuity of development and maintenance of the Classic Core code base. This will also have the benefit of bringing back some of the most requested features by our community.
So … is this goodbye? Not quite. We’ve accomplished a lot in the last few years, and along the way I’ve learned that there are still many untapped opportunities in the voice assistant space. The mission of a privacy-first voice assistant for every human that wants one is yet to be realized, and we are exploring new pathways to get us there. Rest assured that regardless of what happens, no devices will become ‘bricks’ and our commitment to customer’s privacy will not be compromised.
This has been a very difficult message to write. I came out of retirement three years ago, invested a truly inadvisable percentage of my personal savings, and gave my best effort to advance Mycroft’s mission. I am still very emotionally invested in Mycroft and its vast potential, and I believe the challenge to privacy is an important and a solvable one, to say nothing of access for people with disabilities, under-resourced languages and other pressing issues that open source, privacy-respecting software is uniquely positioned to address.
There is much more to be said and many other topics that I will cover in future posts over the coming days.
Mycroft AI’s primary mission has always been to create a true privacy-respecting voice assistant, one that is truly a personal assistant rather than a household spying device, a device which does what you want it to do rather than what the mega-corporation that sold it to you wants it to do. So one of the greatest challenges for us has been the lack of a fast, accurate, flexible Speech to Text (STT) engine that can run locally. While the product is still in early days of development, we believe we finally have an answer to this problem. We call it Grokotron.
For a voice assistant like Mycroft, speech recognition must be performed very quickly and with a high degree of accuracy. This is one of the reasons that voice interfaces have exploded in recent years. When it comes to automatic speech recognition, the difference between 80%, 90%, 95% or even higher accuracy may sound like small potatoes, but they are absolutely game changing for how usable a system is in the real world.
We’ve tried a lot of local STT options over the years, and while there’s been incredible work going into many projects, unfortunately nothing has come close to providing the level of experience we think is required for a general purpose voice assistant.
For this reason, by default Mycroft has used Google’s STT cloud services and layered on some additional privacy protections. We proxy the requests through Mycroft’s servers and delete identifying data related to these requests as soon as possible. (You can read more about that here.) But as much as we try to mitigate the privacy exposure inherent in such a system, this has always been a stop gap solution – a necessary evil in order to provide a quality voice experience.
We want Grokotron, our new STT module (based on the great work done on the Kaldi project), to break this reliance. It is not yet ready to replace big data cloud services for all users and all use cases, but we have big plans for it and look forward to it becoming a viable replacement for cloud services for those who want a zero-trust privacy solution.
Grokotron provides limited domain automatic speech recognition on low-resource hardware like the Raspberry Pi 4 that comes in the Mark II. It does this extremely quickly, and of course completely offline. Grokotron’s impressive accuracy and performance is due to its hybrid nature. It includes both an acoustic model and a grammar of expected expressions which constrains its transcription. This grammar is easy to define and extend with a simple markup language. This ability to be expanded easily means that while the range of expressions Grokotron can process is limited, it can be quite large and can be practically extended to cover nearly anything a voice assistant needs.
So whilst it won’t yet be transcribing your original space opera screenplay about an invasion led by the first Pontifex Dvorn… It can understand all of your requests to check the weather, set a timer, even play different music.
To show this in action, we wanted to share a complete proof of concept image. This is a Mark II Sandbox image running the new Dinkum software with a couple of tweaks.
- It has Grokotron pre-configured for STT
- It does not need to be connected to the internet to function.
- It has our backend pairing completely disabled, so even if you do connect to the internet, it won’t touch our servers.
Because this image is designed to run completely offline, functions normally provided by our backend are not available, including paid API’s like the weather and Wolfram Alpha. Some settings normally configured on the backend such as the device’s location must also be set manually within your mycroft.conf. See the Grokotron documentation for details.
The grammar pre-configured on this image does not yet cover all expressions which Mycroft’s core intent system can understand, however it is straightforward to update the grammar and retrain the model on-device. Details on the sentence template syntax and training commands can also be found in the Grokotron documentation.
A Mycroft system already knows the majority of utterances that it is expecting to hear. These strings form the basis of both intent matching and integration test cases. A future optimization would be to reduce duplication of these definitions, and with Grokotron utilize them to provide a local-only grammar model for any Skill that gets installed.
Even big cloud STT systems have trouble with proper nouns. Media libraries are a classic challenge here. Beyonce is only known because of how popular she is, but how about Ke$ha, or Urthboy? These names are trained into cloud based models courtesy of partnerships with streaming media providers, but for open source tools these terms have traditionally been a bridge too far. Grokotron can use entity lists to define exactly the names it needs to recognize for each individual user’s case, which can go a long way to mitigating this problem. Even better, such lists can be compiled and the model efficiently retrained on the fly. For instance, on ingestion of a music library, artist names could automatically be compiled into Grokotron’s grammar. This is just one feature we plan to work on to make Grokotron the best local STT system out there.
Without further ado, you can find the first Grokotron image here:
Download Grokotron
Grokotron Documentation
As a special introductory price, the Mark II has been on sale for $349, but the discounted price is ending in 24 hours!
Get my discounted Mark II
In case you’ve been living under a rock, the Mycroft Mark II is a state-of-the-art, privacy-respecting smart speaker designed for you and your family, from kids to grandparents. At the same time, it is open hardware and software for developers and makers.
The Mark II provides a great voice experience without sacrificing privacy. Mycroft never collects any data from you unless you ask us to (opt-in rather than opt-out).
With active noise cancellation, a bright full-color touch-screen and 10W of room-filling audio, the Mark II can go head-to-head with other high-end smart speakers offered by the big-tech companies.
By default it comes with our most robust software known as Dinkum. However we are firm believers that if you buy a piece of hardware, then you should be able to do whatever you want with it.
To this end, we have released a range of Mycroft Sandbox images providing everything developers and more advanced users need to customize their Mark II, and even create their own voice projects. Mycroft Sandbox images start with a Raspberry Pi OS Lite base that can be built upon to fit your specific development needs.
We’ve also been hearing from other open voice projects about their efforts to package their software for use on the Mark II hardware as well. You can expect future blog posts exploring some of these.
Whether you want a great Mycroft experience, or are looking for an open voice development platform – you should snap up a Mark II before this sale finishes.
Get my discounted Mark II now!
We hope developers will find the Mark II an excellent platform for all kinds of projects, whether they base them on Mycroft or forge their own paths. Given the diversity in our community, we wanted to give multiple OS options, so you can pick the one that is right for you.
On the inside, the Mark II has a Raspberry Pi 4 with 2GB of RAM, and a custom-designed board that implements the hardware features necessary for a premium voice-driven experience. This includes the hardware-accelerated noise-canceling microphone array, a 20W audio amplifier, I/O breakouts and the power supply. The Mark II also includes quality 2” speaker drivers and a custom-tuned resonating chamber for quality sound.
We wanted to ship an amazing development platform with a stable and functional version of Mycroft suitable for anyone. This necessitated a substantial refactoring of the Mycroft Core code into what we’ve dubbed Mycroft Dinkum (after the thinkum dinkum in The Moon is a Harsh Mistress).
We’ve already received feedback that people want the ability to utilize previously created Skills. The Skill Installer was a feature in Classic Core that needed more extensive rework than we’ve been able to do for Dinkum. We are tackling this in a few stages. The first step is to document how to port Skills to Dinkum and how to add them to the device. You can find the first of these guides in our Mark II Documentation.
Sandbox Images
To facilitate development, we have created a number of open source images in the Mycroft Sandbox, providing everything developers and more advanced users need. Mycroft Sandbox images start with a Raspberry Pi OS Lite base that can be customized to fit your specific development needs on the Mark II.
- Dinkum Sandbox – provides a more traditional Linux environment for the same Dinkum software that comes installed by default on all Mark IIs. This is the recommended starting point for new projects and enterprise solutions on the Mark II (it’s what we use internally!).
- Classic Core Sandbox – For those who are familiar with previous versions of Mycroft Core, this is an image with the long standing Mycroft Core software including all of the existing tooling you know and love. Good for those who want to tap into the existing ecosystem of Skills.
- Mark II Hardware Sandbox – If you want to start a project from scratch, this comes with all of the drivers and a HAL needed to utilize the Mark II hardware, but no other Mycroft software. Perfect for those wanting to utilize the hardware in their own ways.
The Sandbox images are meant for developers, so unlike the retail software they don’t have an automatic update mechanism. For a full list of software available on the Mark II – see the Mark II documentation.
We can’t wait to see the amazing things devs do with our hardware.
Mark II Software Development Roadmap
At launch, Dinkum has a core of standard Skills built in, among them:
- Home assistant integration
- Radio streaming
- A local music player
- Headline news
- Alarms and timers
- Weather reports
- General question and answers provided by DuckDuckGo, Wikipedia and WolframAlpha.
We are already hard at work extending the capabilities of Dinkum. Some examples of upcoming features include:
- 100% local functionality (including excellent local speech to text capabilities built in)
- Improved documentation and tools for porting existing Classic Core Skills
- Tools for creating new Skills for Dinkum
- Media server integration
- Bluetooth™ streaming audio
Feedback
Whether you have a Mark II already sitting in your kitchen, are eagerly awaiting for your delivery, or you are about to order one – let us know what features or improvements are most important to you for our future roadmap:
Suggest a new feature or improvement
Or if you spot any quirks or issues, then please report them here:
Report an issue
That’s right, we’ve made the big decision to move Mycroft Chat, our Mattermost instance, to Mattermost! More specifically their new managed cloud offering.
Mattermost is an open source collaboration platform that Mycroft has been using for many years. Over this time, we have always self-hosted Mattermost. As a free and open source project this made sense to us, however like any piece of infrastructure that comes with its own costs. Both the direct hosting costs, as well as the time cost of maintaining that infrastructure, testing and deploying new releases, amongst other things.
So we’ve made the decision to migrate our self-hosted instance to their new Mattermost Cloud offering. This means that our community gets the latest features of Mattermost as soon as they become available as well as robust and reliable infrastructure to maximize uptime. For our internal team it means we can spend more time focusing on our core business – creating the best privacy-focused open-source voice technologies.
Many people would have seen that we’ve had a few scheduled outages of Mattermost over the last few months, these have primarily been in preparation for the move, and now the time is upon us.
The final migration is scheduled for this Saturday the 29th of October. Mycroft Chat will be unavailable for the duration of the migration, and we will be making announcements in Chat and the Mycroft Forums before and after the migration is complete.
When the migration is complete, the primary url for Mycroft Chat will be at cloud.mattermost.com rather than mycroft.ai. However you will always be able to find Mycroft Chat by navigating to: https://chat.mycroft.ai
The first time you login to the new Mycroft Chat…
You will be prompted to reset your password. This is an intentional requirement as we will not be transferring server side secrets to the new cloud platform. Without going into too much detail, let’s have a quick look at everyone’s favorite food additive salt!
When your password is sent to a server it is passed through a one-way hashing algorithm along with a touch of salt (that’s actually what it’s called). The salt adds additional protections by further obfuscating your password against specific types of attacks. Just like with passwords, it’s best practice not to re-use a salt across multiple services. So when you first try to login to the new Mattermost Chat server, it will not recognize your password because it has used a different salt during hashing. This is why you have to change your password.
Once you’ve logged in you might notice a few other changes.
- All messages posted prior to the migration will be marked unread.
- Any favorited DM channels will need to be re-added to your favorites list.
- URL’s to specific messages will have changed. That means URL’s to channels will be consistent, but the hash identifying specific messages within a channel will have been recalculated.
Of course if you notice anything out of place and not included in this list, please reach out to us so we can investigate. Despite doing a range of testing, migrations always have the possibility of surprises.
We’ll see you on the other side…
On Tuesday, October 25th, for a short period our website started advertising essay writing services. As you might expect, that was spam and it shouldn’t have happened, however we can assure you that no data was compromised. We always want to be open when things go wrong, as much as when they go right, so here’s the lowdown on how it happened.
Like most companies and websites, we experience a never ending barrage of cyberattacks. As you are reading this, I can basically guarantee that someone, somewhere is trying to compromise one of our servers for one reason or another. Most of these are automated, not targeted at us specifically, just probing as many websites as possible looking for a crack in the firewall, and yesterday they found one.
Recently while addressing a completely unrelated issue, we temporarily disabled some of the security policies that mitigate against particular attack methods.
A brute force attack is a fancy way of saying – try all the things! In this case, try all the possible passwords. If you think of a simple combination lock with 4 numbers, there are only 10,000 possible combinations – so even a human can eventually try them all and find the correct combination to open the lock. Add a computer to the mix and that whole process could take a few seconds.
But what if you limit the number of guesses allowed? If you only allow 3 wrong attempts per day, all of a sudden you’ve leveled the playing field. Even for a computer that simple 4-digit combination lock might take up to 9 years to crack! Now I can assure you that the passwords our team use are more complex than a 4-digit number, but the same principles apply – the math is just a little more complicated.
Now for the good news. The account they gained access to did not have administrative privileges or system level access and all passwords we use are unique. So they could not install anything malicious nor use the credentials they discovered to login to any other system, but they could publish new content to our blog, and they did.
Thanks to our vigilant community members for flagging this, we were able to take it down very quickly, and immediately locked down the compromised account. We then engaged the assistance of independent experts to help us confirm exactly what actions were taken by the attackers after gaining access. We confirmed that no other malicious activity had taken place, and most importantly that no customer data had been accessed in any way.
Whilst it’s great that nothing major happened, it shouldn’t have happened in the first place. Like most security fails, it was human error, and we’ll be reviewing our processes going forward to see how we can continue to improve.
But for now – definitely don’t buy essay writing services from us. And if you care deeply about the thing you’re locking up, probably skip the combination locks too.