An engineer from the Johns Hopkins Centre for Language and Speech Processing has developed a machine learning model that can distinguish functions of speech in transcripts of dialogues outputted by language understanding, or LU, systems in an approach that could eventually help computers “understand” spoken or written text in much the same way that humans do.
The new model identifies the intent behind words and organises them into categories such as “Statement,” “Question,” or “Interruption,” in the final transcript: a task called “dialogue act recognition.” By providing other models with a more organised and segmented version of text to work with the new model could become a first step in making sense of a conversation.
This new method means that LU systems no longer have to deal with huge, unstructured chunks of text, which they struggle with when trying to classify things such as the topic, sentiment, or intent of the text. Instead, they can work with a series of expressions, which are saying very specific things, like a question or interruption. My model enables these systems to work where they might have otherwise failed.
– Piotr Zelasko, Assistant Research Scientist, Johns Hopkins Centre for Language and Speech Processing
The researchers adapt some recently introduced language-understanding models with the goal of organising and categorising words and phrases and investigating how different variables, such as punctuation, affect those models’ performance.
In the industry working on human-to-human conversational analytics, the researchers noticed that many natural language processing algorithms operate well only when the text has a clear structure, such as when a person speaks in complete sentences. However, in real life, people seldom speak so formally, making it difficult for systems to ascertain exactly where a sentence starts and ends. Zelasko wanted to make sure his system could understand an ordinary conversation.
This model could eventually help companies that use speech analytics, a process that some businesses use to gain insights from analysis of interactions between customers and call centre customer service representatives. Speech analytics usually involve automatic transcription of conversation and keyword searches, providing limited opportunities for insight.
With the old approach, people might be able to say that highlights of a conversation involve whatever type of phone the customer owns, ‘technical issues,’ and ‘refund,’ but what if somebody was just exploring their options and did not actually request a refund? That is why the researchers need to actually understand the conversation and not simply scan it for keywords.
This model could also someday be used by physicians, saving them the valuable time they now spend taking notes while interacting with patients. Instead, a device using this model could quickly go through the transcript of the conversation, fill out forms, and write notes automatically, allowing doctors to focus on their patients.
As reported by OpenGov Asia, U.S. experts designed and evaluated programs, policies, and technologies to modernise government — have come to the aid of the Santa Clara County Public Health Department. In a study detailed in Proceedings of the National Academies of Science, the RegLab team describes how it applied machine learning to transform contact tracing in Santa Clara County — and narrowed the health gap between the county’s Latino and other communities.
Contact tracers usually start with only the most basic information about the people they call, such as the patient’s name, address, date of birth and test result. Researchers combined that bare-bones data with demographic information from the census and other administrative data.
A machine-learning algorithm analysed and weighed data like census block group, age and name-based race and ethnicity information from census and mortgage data and identify patterns that would predict a language preference. Contacts were scored as to which language they would likely prefer before they were assigned to a tracer.