1. 程式人生 > >Know your Intent: State of the Art results in Intent Classification

Know your Intent: State of the Art results in Intent Classification

THE AUGMENTATION

We started with a simple data augmentation technique. We took the under sample class and we did some dictionary based synonyms replacement of the nouns and verbs present to increase the total number of samples.

Original sentence : What can I do to improve my running skills?
Augmented data generated :
* What can I do to advance my running skills?* What can I do to better my running skills?* What can I do to correct my running know-how?* What can I do to correct my running proficiency?

It did helped in our case. There was a gain in accuracy of 1.4 – 2 percent.

Also, since this is a chatbot dataset and there are higher chances of making spelling mistakes. So, we need to take that into account too. One simple way to correct the spelling mistakes is to find the Levensthein distance

and map the word to it’s nearest neighbour when a spelling mistake is encountered.

A dictionary with vocabulary specific to the dataset can be used instead of a general English word dictionary for domain specific spelling correction. However, our dataset is very small and there is a high chance of a word which might be out of the dataset specific dictionary and hence we did not create a domain specific dictionary.

However, we tried a different approach. We took a key present in the keyboard and we mapped the nearest keys around it. It is very likely that the typo errors are due to pressing the nearby key in the keyboard to a specific character instead of the real one. So, we created a list of all the keys and their nearby distances. We also kept the nearby keys fixed to one surrounding key only. We then set a Mistake Probability P.

For any character C, swap it with another character C’ that is within d distance from C, and has an error probability greater than P. We predefined d and P.