Projekt

Tim Schrader, Johannes Richard Bartsch, Johannes Rudolf Herkner, Paul Heinze, Rostislav Iudin - Data Science Kooperationsprojekt

Coronabot

For some time now, the world has been confronted with the challenges of Covid-19. In addition to the infection as such, the population also faces the problem of keeping up to date with the latest information regarding infection figures. To counter this problem, the federal government has developed C-19. C-19 is a chatbot that is providing citizens with information about the pandemic. At the beginning of the project, the bot was limited to a FAQ-based question system. Predefined questions could be mapped to important documents provided by the ministries. As part of the “Question Answering and Chatbots”-module, C-19 was expanded in cooperation with the federal government. The aim was to integrate a data-based approach to use C-19 with the content of the Robert Koch Institute's data for specific user questions.

In this context, a REST-Endpoint was created, which can be used by C-19 to answer questions about infection and death rates for user-specified time periods and locations. The newly implemented system provided through the REST-Endpoint is a question-answering system on its own. It consists of various components that are combined into a pipeline using the Qanary framework. The resulting system supports both German and English questions, offers information through possible visualisations such as diagrams and colour highlighting and integrates a dialogue management. The latter enables the system to ask counter-questions if the information is lacking and thus help the user to find the content they are looking for. For example, ambiguities can be resolved without the user having to ask the entire question again.

To identify whether the user is asking about infection or death rates, DistilledBERT is used for intent classification. DistilledBERT is a compressed version of BERT, which achieves quite similar results. To increase the performance the model is deployed via Tensorflow Serving.
The location the chat partner is inquiring about is determined using Stanza and Spacy depending on the question language. The location information found is then compared with the list of locations that are present in the data of the Robert Koch Institute. If the user asks for a city, which is not found natively in the data, a mapping is used to try to determine the closest hierarchical level (e.g.: a district) that can be found in the data.

In order to extract the desired information from the data source, a time window is also required. By default, this is the entire period of the pandemic. However, by using SUTime (as part of CoreNLP) it is possible to narrow down the time period. The corresponding component examines the user input for a possible time information and converts this into a start and end date. In addition to enabling questions about concrete dates, relative times such as "yesterday" or “last week” are also recognised.
With the information obtained, queries are compiled with which the data from the API of the Robert Koch Institute can be retrieved. After retrieving the data, the system saves further information that enables the visualization of the data. For example, diagrams can be created that provide information about the course of infection for the last 7 days of the queried period.

At the end of the pipeline, the Natural Language Generation creates an answer to the user. The component uses all the acquired data to select an appropriate response template and fill it with content. To give the user the feeling that they are talking to a fellow human being, different templates can be used depending on the response type.
 
The system was tested using end-to-end micro-benchmarking. This makes it possible to test the individual components both separately and in the context of the whole pipeline. The advantage of this method is that it can also reveal problematic dependencies in the components. Due to this approach, it is possible to accurately identify weak spots of the pipeline and make appropriate adjustments.
Through the addition of the developed system, the C-19 Bot can now use the collected data from the RKI. This enables users to get additional information regarding infection and death numbers during the Covid-19 pandemic.