Abstract:
Grammar inferencing is a major topic in the realm of programming language. To cre-
ate and enhance the ecosystem of any programming language (which generally con-
sists of optimized compilers, defined maintenance, and comprehensive documenta-
tion) inferring grammar for that language is unavoidable. Automated inferencing of
grammar from strings of code has been there for a while. The progression of grammar
inference has evolved through various stages, from template-based systems to black
box-dependent models, and more recently, to neural network-based approaches. We
introduce a novel technique in this continuum: LLM-based (Large Language Model)
systems. This innovative approach leverages the capabilities of large language models
to enhance performance, providing a fresh perspective and potentially transformative
impact on grammar inference. We have harnessed LLM’s ability to generalize and
consume domain-specific knowledge to infer grammar in a context-aware method.
Our method outperforms state of the art TREEVADA approach in both precision and
recall metrics and also makes the entire process more generalized and constraint-free.
Our method outperforms TREEVADA by 8% in terms of precision and 28% in terms
of recall. However, our testing methodology suffers from the absence of a totally un-
foreseen noble language for the LLMs. As our target approach, TREEVADA requires
some sort of lexer-parser setup within its oracle, comparing the two methods based
on a noble language is precluded from our work currently.
Description:
Supervised by
Mr. Md. Jubair Ibna Mostafa,
Assistant Professor,
Department of Computer Science and Engineering (CSE)
Islamic University of Technology (IUT)
Board Bazar, Gazipur, Bangladesh
This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Software Engineering, 2024