NLP could be a big boon in conversation building regional language chatbots but a lot needs to be done before we get there.
By Prakash Kumar and Apoorv Vishnoi
Have you wondered how spam emails get filtered out, search engines throw up most appropriate pages, word application keeps on correcting grammar, voice assistants talk to us while composing a message on WhatsApp, you get suggestions on the next few words? How do the evaluators quickly find out cases of plagiarism in assignments submitted online? Behind all these, there is a major Artificial Intelligence (AI) application called Natural Language Processing or NLP.
We, humans, have always been fascinated by others who can talk like us, whether it’s a parrot or a voice assistant like Alexa. Now, parrots simply repeat what they hear without understanding the true meaning of what they are saying. Still, our voice assistants are smarter than that; they can do meaningful tasks for us, control our devices and even have an interactive conversation with us. The major field that gives voice assistants this power is Natural Language Processing or (NLP). Simply put, NLP makes it possible for devices to process natural languages in written or spoken format and interact with us. In the case of spoken commands, there is another step of conversion from speech to text when we talk to the device and text to speech when the device speaks to us.
How do these devices understand Natural languages like English, Hindi, or French? Well, the computers and the algorithms don’t really understand words. The text is converted to numbers by computer programs as computers understand only numbers. In the case of NLP, there is an intermediate process called conversion of human speech into computable properties or characteristics called feature vectors like intent, timing and sentiment. This is required as language, unlike computer commands, have many nuances. All these happen using certain specialised machine learning and deep learning algorithms. These applications are then trained for specific tasks using a large amount of data. When we use such applications on our phones to dictate a message, we not only use the NLP application but also provide more data that is used by the program to enhance its performance. In other words, these applications keep on learning, which is why we notice an improvement in their performance as we use the application more and more.
We get alerts about sentiment prevailing on social media platforms like Twitter, especially when an incident has taken place like India losing a cricket match in the T20 world cup. How is that done? Reading a tweet, we get to know whether the text is displaying a Positive or Negative or a Neutral sentiment. The NLP based applications are trained to analyse the tweets by looking at words/phrases after removing articles, punctuation marks etc. A dictionary of words/phrases is prepared based on previous tweets, marking them as negative, positive or neutral. New tweets are looked at with appropriate hashtags like @T20World Cup for our example, and using NLP evaluated whether people are happy, sad or neutral with the result.
Businesses are using NLP based tools. One such application is identifying a topic in a large document and summarising the report. Using a well-trained algorithm to summarise these reports into key points save a lot of time. Tools to write summary reports are being used by law firms and government departments where they have to go through a large number of pages.
Governments invite suggestions from people on draft policies. A sheer number of responses makes it difficult to analyse thousands of responses manually. NLP tools are ideally placed to do this for government agencies.
Chatbots, which are NLP based tools, have suddenly become very popular on websites. They are trained on the FAQs and can answer calls effectively, easing the burden on helpdesk staff. They also resolve purely informational queries, which many people find difficult to locate on websites. Many government departments have started using Chatbots. The next step could be a Multilingual Chatbot on Govt portals to enable people from various states to interact with and get information/help by speaking to it in their mother tongue or using text.
Voice Assistants are the most well-known application of NLP. They are a blend of multiple emerging technologies: A human will say, “hey assistant, turn the light on”. Through speech recognition, the spoken words get converted to text; the text gets converted to numbers that the system understands; the numbers go through a set of trained NLP algorithms which give a message about action. The action message gets conveyed to the light bulb using the Internet of Things or IoT, and finally, the bulb gets turned on. Of course, all this will happen in a fraction of a second.
Another landmark use case of NLP is language translation. Many such apps are available that help users translate from one language to another. For a country like ours, which has 22 official languages, NLP trained translator can help us convert documents, websites, public notices to various regional languages. In fact, it can even help us to talk in regional languages.
While a lot of progress has been made, at the moment, NLP cannot give us very high accuracy on tasks. Even a 75% accuracy is considered quite well. One of the reasons for this is that languages have complex attributes such as sarcasm, irony, idioms and many more. For example, a person wrote ‘I want to jump from the bridge’. The App responded with a list of six bridges in the near vicinity. A frustrated passenger tweeted, “@XXX Thank you for sending my baggage to Hyderabad and flying me to Calcutta at the same time. Brilliant service. #XXX.” The airline chatbot replied, “Glad to hear that. #KeepFlying XXX.” Also, the NLP algorithm is very domain-specific, so an algorithm trained to analyse legal documents will not be suitable for analysing medical records.
Finally, NLP algorithms give better results for languages like English, for which there is a large amount of data available. If we need a similar performance on regional languages, we will need to put a lot of effort into collecting and cleaning the data and training these applications.