At Mycroft, our aim is to build an open source and privacy respecting voice assistant that can be used by everyone across the world. To create something that anyone can use, we must provide the best possible experience, not just for techies and tinkerers, but for the whole family. Rigorous quality assurance (QA) testing is critical to achieving this goal.
In its most basic form, QA is about defining a standard and then testing against that. It can be a manual process, setup a device, talk to it, and note down what you find. This however is pretty time intensive, can vary over time, and does not scale well. Having a clear, documented and automated process means that we can get repeatable results anytime a change is made. That allows our team to develop more quickly knowing that if something breaks, we will know about it.
So shortly after joining Mycroft I was asked to utilize my newly acquired data analysis skills to improve the existing quality assurance processes.
Voice interaction is one of the most critical pieces of Mycroft to ensure a good user experience. To measure a skills success from a Users perspective, we need to test a large number of utterances against an expected successful outcome for those requests. Doing this for each utterance we can then measure our pass or fail rate for each Skill and for Mycroft as a whole. The greater number of successful interactions with a smart device, provides a better overall experience.
Over time we can now monitor the success rates of these tests and know the progress being made. Mycroft will not only continue to build upon its stack with new features, but also refine its existing capabilities as we discover new ways that people are using the device or asking questions. As this grows we can easily add new utterances to our QA process, expanding Mycroft’s known and tested vocabulary.
Over the last few months we have been focusing on our top 8 skills. These represent the majority of interactions that an average person might have with Mycroft. They include things like setting timers, checking the weather or hearing the latest news bulletin.
Each Skill has a range of utterances being tested, and the results can then be broken down into percentages based on the number of correct responses generated.
While you might at times see drops, such as in the Spotify Skill, our new process means we can be made aware of the change and fix it before it reaches your device.
Moving forward Mycroft will have a standard metric to test against and assure that new features are working properly and old ones are being monitored for bugs. We are continuing to grow the range of utterances being tested for each Skill, and at times need to debate what constitutes a passing grade. There is currently a divide in the team on whether or not a remix of Baby by Justin Bieber can ever constitute a pass…
I am very excited to see the improvements that will be made to Mycroft thanks to our updated quality assurance processes. Ultimately my job is to break things, and break things a lot. Whilst our aim is for a 100% pass rate, we are also looking at a constantly moving target. So I will be happy every time I succeed in breaking something because it means we can identify a new problem and fix it.