I had the opportunity to attend Microsoft Build this year, and one thing that stood out to me was the variety of machine learning APIs (known as Microsoft Cognitive Services) that Microsoft has been releasing. And it's not just Microsoft: other big players such as Google, IBM, and Amazon have also been putting out specialized, pre-trained machine learning APIs that can perform various tasks. But what exactly is machine learning, and why should you care?
Machine learning is a type of artificial intelligence where a system is able to learn and improve by observing patterns within a set of training data. The more training data that the system receives, the better the results should be. It's exciting to see more machine learning APIs becoming available since this makes machine learning more accessible to those who may not have the means, budget, or desire to build and train a custom machine learning solution.
For this article, we took a look at some of Microsoft's Cognitive Services offerings, specifically APIs that are ready to go out of the box without any training. Here are some of these APIs and capabilities we found particularly interesting, and how they might be used to make your life easier.
Computer Vision API
The Computer Vision API is comprised of a number of endpoints that can take an image and perform analysis and/or processing of the image. This includes a variety of different capabilities, including (among other things) the ability to:
- Generate a caption that will describe the contents of an image
- Generate tags describing the contents of an image
- Convert an image to a thumbnail of specified proportions, and automatically crop it to retain the most important/interesting part of the image
- Convert images of handwritten or printed text (OCR) to actual text
- Detect whether an image contains inappropriate content
- Identify celebrities or landmarks within an image
This API could be used for a wide range of possible applications. For example, we tried out the caption generation endpoint, and it seems to generally do a good job of describing what is going on in an image based on what is visually apparent. While this might not be the most useful caption for someone who can see the image, it could be very helpful for accessibility purposes (for example, for users using a screen reader). To ensure good accessibility on a site, one of the most important (and often neglected) tasks is to supply alt tags for all images.
The captions that the Computer Vision API generates are perfect for this application. Setting up something in the CMS or admin area to automatically populate the alt text field for uploaded images could greatly reduce the effort required from content editors, although the ability to review and improve captions may be necessary in case the API misses something or makes a mistake, since it doesn't have a 100% success rate.
Another potential application incorporates the thumbnail generation endpoint. Often, when images are used on a site, multiple aspect ratios and sizes of the same image may be required for use in different contexts or on different devices. It would be helpful to automate this so that content editors don't need to supply numerous sizes of the same image, but auto-cropping an image using a naïve approach is not always acceptable since it can result in important parts of the image being cut off.
Microsoft's smart thumbnail generation could help the effort needed for creating thumbnail-size versions of images, as it will attempt to detect and retain the most important part of the image within the thumbnail. From our experimentation, thumbnail generation seemed to do a good job of retaining the important parts of an image, especially images containing people or animals.
Other possible use cases might include automatically adding tags to uploaded photos, auto-flagging of potentially inappropriate user photos, or a tool to import scanned versions of a paper form and using handwriting recognition/OCR to convert them to a text-based format.
The Face API is comprised of a number of different endpoints revolving around the analysis and recognition of faces based on images, including:
- The ability to pass in an image that includes one or more faces, and receive a list of those faces detected within the image. For each face, it will guess attributes such as age, gender, emotions, type of facial hair, makeup worn, and hair color/baldness. Emotions are broken down into percentages of happiness, sadness, surprise, anger, contempt, disgust, fear, and neutral.
- The ability to pass in two images and then determine the likelihood that two faces are the same person.
- Create groups of people, and for each person, supply one or more photos of their face. Once this data is supplied, it may be used to identify new faces.
Our findings when testing out the Face API was that the API generally does a good job with facial analysis, though it tends to perform best if there is good visibility of the face, and the face is in a primarily front-on view. If a person is in profile, the API is a lot less likely to pick up on that face. Additionally, while it does a pretty good job of guessing things like age, gender, and emotion, it was not quite as good at determining whether a person is wearing makeup or has facial hair.
There are many potential applications for these endpoints. For example, if you have many user-submitted photos or other photos representing customers, this API could be leveraged to come up with demographic data including gender and age. With photos gathered at regular intervals from a trade show booth, analytics could be gathered regarding both demographics and emotional reactions.
For applications that deal with a distinct set of people, such as members of a sports team or a group of employees, the ability to identify individuals could be used to automatically tag people within photos.
Text Analytics API
The Text Analytics API is a relatively simple API consisting of just a few endpoints, each of which performs some analysis on a piece of text. It can:
- Extract key phrases from a piece of text
- Analyze the overall sentiment of the text
- Identify the language that text is written in
In our experimentation, the sentiment analysis seems to be accurate most of the time, although it does tend to come through as mostly positive or mostly negative with not a whole lot of variance in between. Certain subtleties of language (e.g. sarcasm) are lost on it. It seems to be best at handling input that resembles reviews. The ability to pull out key phrases tends to grab a lot of words and phrases, rather than totally whittling it down to a short list.
A possible use case that could leverage both endpoints would be a tool that analyzes user feedback. This tool could group feedback into positive and negative feedback, and then list out the most frequently occurring phrases in each group. This could be useful for identifying trends, and could provide insights into areas for improvement.
Bing Custom Search
As of April 2017, Google started to discontinue the paid tier of the popular Google Site Search. By 2018, the product will be retired entirely. This move left a hole in the market for robust, easily integrated, commercial-grade site search options. Bing Custom Search (currently in Preview mode, expected to be released in Fall 2017) could help fill this void.
Bing Custom Search provides a web interface where the custom search instance may be set up and configured, including the ability to include one or more websites within the search results, promote specific results for certain queries, weight some sections of the site higher than others in search, exclude pages from search, and so on. This setup and configuration is accomplished through a friendly GUI and does not need to be done by a developer. Once the site search is configured, a developer can integrate the custom search instance into a site using the Bing Web Search API.
We tested out Bing Custom Search and found it easy to work with and to get up and running. It also includes some nice features such as the ability to autocorrect misspelled searches, hit highlighting in results, and the fact it is ad-free. Once it is fully released, Bing Custom Search should be a nice option for those seeking custom site search.
In addition to what has been covered in this article, there are many other machine learning APIs out there, including both pre-trained APIs like the ones covered in this article, and trainable APIs that can be used for more customizable solutions. We hope that hearing about what is possible has sparked some ideas of what is possible with machine learning APIs.