Sahu, Surya Prakash

CodeQueries : Benchmarking query answering over source code - Bengaluru IISc 2023 - ix, 69p. col. ill. ; 29.1 cm * 20.5 cm e-Thesis 1003.Kb

Include bibliographical references and index.

MTech (Res); 2023; Computer science and automation

Software developers often make queries about the security, performance effectiveness, and maintainability of their code. Through an iterative debugging process, developers analyze the code to find answers to these queries. The process can be seen as a question-answering task that requires developers to identify code spans satisfying certain properties. Many of these queries can be answered by existing code analysis tools such as CodeQL. However, using such tools requires design, implementation, and verification efforts. In this work, we propose an alternative to the code analysis tools by formulating the task of query answering over source code as a span prediction problem. In the proposed approach, a neural model is designed to predict appropriate answer spans in a code in response to a query. The required supporting-facts to justify the predicted answers are also identified by the model. Pre-trained language models for code are fine-tuned on a newly prepared challenging dataset, CodeQueries, for query answering over source code. We demonstrate that the proposed approach performs well on the query answering over source code task when only relevant code blocks are provided as input to the model. Experiments conducted on the dataset demonstrate that the proposed neural approach is robust to noisy span labeling and can even handle code with minor syntax errors. Although large-sized code and limited training examples adversely affect the model performance, we suggest methods to address these issues. Based on our study, we believe that the proposed neural approach will be an additional tool in a developer's toolbox for query answering over source code.


Natural Language Processing
Extractive Question-Answering
Code Understanding

005 / SUR