Lingua Franca is our multilingual Natural Language Processing library. It allows Mycroft to both understand and respond with naturally expressed entities such as numbers, dates and times.
Today I’m proud to announce that we are releasing this library as a standalone Python package enabling any project to utilize it for their own natural language processing.
Lingua Franca is one of the broadest available libraries of Natural Language Processing parsers and formatters, particularly for practical and difficult tasks commonly needed in voice control systems. It is ready-to-use and currently has support for Danish, Dutch, English, French, German, Hungarian, Italian, Portuguese, Spanish, and Swedish.
It provides heuristic parsing routines to extract numbers, dates, times, or durations from a spoken language transcription. This includes support for idioms such as “the day after tomorrow” and contextual understanding that “set an alarm for 7” refers to 7:00am tomorrow when spoken at night, or 7:00pm today when spoken in the morning.
The library also provides natural language formatters for numbers, dates, times and durations as well as utilities for working with lists in multiple languages. Output can target written or spoken language — e.g. “1001” or “one thousand and one”.
And, of course, it is open source and ever growing!
This library has always been a core part of Mycroft. For Skill Authors it provides a range of methods that make building complex interactions with dates and numbers simple and consistent. However we also saw Community members manually extracting the library for use in their own projects.
Through conversations with the Community we saw the value that was possible for a whole range of projects if they could also use this library outside of Mycroft. If we can help all of these projects by making our code more easily accessible, we want to do that, however there’s also benefit back to Mycroft. The more people using Lingua Franca, the more incentive they have to contribute to the code base, both improving existing languages and adding support for new ones.
The simplest way to use Lingua Franca in your own project is through the Python Packaging Index – PyPI.org. It is now available at: https://pypi.org/project/lingua-franca/
If you haven’t used packages from PyPI before, check out this tutorial to learn all about Python packages and PyPI.
You can also clone or fork the Lingua Franca repository on Github if you’d prefer.
We always welcome new contributors to Mycroft. There are many ways to contribute to the project, but if you’re a programmer then this is a great option.
If you use Mycroft or another project that makes use of Lingua Franca, and something isn’t working as expected, then we want to hear about it. The issues page on Github is where we track both bugs and feature requests for the library. For a useful bug report, remember to include:
which method you were trying to use (if known)
the input phrase or string
the output you expected
the actual output from the system (if any)
any error messages that were spoken or displayed.
By creating tests first, we are explicitly defining the outcomes that are expected from the software. This allows those working on the code to understand what is expected from a range of user inputs, particularly if they are not native speakers of the language. Once a test is added, they are run every time the software is tested, ensuring that any future work, doesn’t break previous functionality.
Let’s look at a basic example test for the extract_number method:
self.assertEqual(extract_number("this is one simple test"), 1)
The above test will pass if the extract_number method receives the string “this is one simple test” and returns the integer 1.
Checkout the current list of issues to see which ones need help defining tests.
If you’re a programmer that is reasonably fluent in a spoken language, this is great way to contribute to open source.
Gez is the Director of Developer Relations at Mycroft. He comes from the land down under, has a strange love of crocodiles, and one day hopes to play the ukulele. If he’s not hanging out in our Community Chat and Forums, he is probably getting lost in the bush.