We recently added a new piece of software architecture to Mycroft known as a CommonPlaySkill
. This is the first of a series of “Common” infrastructure pieces which will make working with Mycroft much more natural and powerful.
What is a Skill?
First a quick review: a Skill adds new abilities to your Mycroft. Think of it like the scene in the Matrix where Neo learns Jiu-Jitsu. Plug in a skill and suddenly Mycroft has new powers. Skills have two primary pieces: intents which allow them to define patterns of words to listen for, and handlers which allow them to perform an action when the intent is heard.
For example, a simple skill can handle phrases like “tell me a joke”. The skill has an intent which spells out an interest in that phrase (along with related phrases like “I want to hear a joke”, etc). That intent is connected to a handler which looks up a random joke and has Mycroft read it to you. Hilarity ensues.
Why do we need CommonPlay?
Clearly, the skill system is really powerful! But it has an inherent limitation – it decides the handler purely on the word patterns. While I can easily define a pattern that captures the phrase “play something”, without a deeper understanding of that something Mycroft would be unable to distinguish which player to use purely from the words.
Here are some example phrases that illustrate the challenges:
play Zork
This one is easy – there is a game called Zork, just play it.
play the News
This one is easy too – fire up NPR!
Play Huey Lewis and the News
Looking at this naively (as if I’ve never heard Huey Lewis), is Huey Lewis a reporter or a singer? Which skill should handle this?
play The Latest Single by The Hot New Band
Even if I understand this is a song request, it is impossible to tell from these words which music service has the legal contracts in place to be able to play the music.
play Ragtime
Is this a band? A music style? A movie? Yes to all of these. What should Mycroft do?
CommonPlay Approach
A single skill (skill-playback-control
) currently captures all of the “play *” style utterances, like those listed above. This skill will now query all the CommonPlay skills and give them an opportunity to respond with:
- I can potentially handle that request
- This is how confident I feel in my handling
After the CommonPlay skills respond, there are a few ways to continue. If only one skill replies, it is the winner and will handle the request. When there are multiple respondents, the highest confidence wins. If there are several with about the same confidence, we can ask the user to pick the winner.
Gory Details
As they say, the devil is in the details. How do you catch the query? How do you format the response? What does “confidence” mean? We wrapped all of this up in a class called CommonPlaySkill
which itself derives from the familiar MycroftSkill
. To participate in the CommonPlay system you only need to derive your skill from CommonPlaySkill
and override a handful of methods. Here is the all you need to connect a News skill to the CommonPlay
system.
def CPS_match_query_phrase(self, phrase): if self.voc_match(phrase, "News"): return ("news", CPSMatchLevel.TITLE)
And:
def CPS_start(self, phrase, data): # Begin the news stream self.CPS_play(self.url_rss)
That’s it. The first method responds to the CommonPlay
query, responding to any phrase that contains the words “News”. The framework will generate a standardized confidence level based on the given CPSMatchLevel
and the number of words in the phrase that were used in the “news” title match it found.
The second method is invoked by the framework if the query match is determined to be the best match.
You can see the entire News Skill on Github. It also has an intent which supports a few other non-“play” phrases such as “what is the news” and “tell me the latest news”. As you can see, it has all the capabilities of a regular skill in addition to being in the CommonPlay
system.
I won’t bore you with lines of code here, but you can see more examples involving complex matches on the Pandora/Pianobar Skill and the Spotify Skill.
So Much in Common
This is the first of several “Common” skill frameworks I have planned. The CommonQASkill
will allow Question and Answer skills to search their databases for answers and then present the best answer found. A good example of why this is needed is the question “How old is …”. From those words alone (not knowing the specific name) you can’t tell if the best answer would be in Wikipedia, IMDB, or Wookiepedia (a Star Wars knowledge base). It might even best be answered by a skill that tracks refrigerator contents – “How old is my milk?”. The CommonQASkill
framework will allow each of these skills to look at the specific query and report back how confidently they can answer that question.
A CommonIoTSkill
is also coming, making it easy to combine multiple types of Internet of Things systems. They can handle identical verbal requests such as “turn on the light” by looking at the context clues, such as the location of the Mycroft unit which heard the words.
Something for Everyone
Everyone is welcome to create a Common Skill. The framework will likely evolve, but by deriving from the CommonPlaySkill
class, your skill will receive the benefits of this evolution. Play on!
Steve has been building cutting edge yet still highly usable technology for over 25 years, previously leading teams at Autodesk and the Rhythm Engineering. He now leads the development team at Mycroft as a partner and the CTO.