A Framework for Grammar Inferencing using LLM Heuristics

Noman, Abdullah Al; Shreshtho, Talimul Bari; Sakib, Esrat Ebtida

A Framework for Grammar Inferencing using LLM Heuristics

Noman, Abdullah Al; Shreshtho, Talimul Bari; Sakib, Esrat Ebtida

URI: http://hdl.handle.net/123456789/2380

Date: 2024-07-04

Abstract:

Grammar inferencing is a major topic in the realm of programming language. To cre- ate and enhance the ecosystem of any programming language (which generally con- sists of optimized compilers, defined maintenance, and comprehensive documenta- tion) inferring grammar for that language is unavoidable. Automated inferencing of grammar from strings of code has been there for a while. The progression of grammar inference has evolved through various stages, from template-based systems to black box-dependent models, and more recently, to neural network-based approaches. We introduce a novel technique in this continuum: LLM-based (Large Language Model) systems. This innovative approach leverages the capabilities of large language models to enhance performance, providing a fresh perspective and potentially transformative impact on grammar inference. We have harnessed LLM’s ability to generalize and consume domain-specific knowledge to infer grammar in a context-aware method. Our method outperforms state of the art TREEVADA approach in both precision and recall metrics and also makes the entire process more generalized and constraint-free. Our method outperforms TREEVADA by 8% in terms of precision and 28% in terms of recall. However, our testing methodology suffers from the absence of a totally un- foreseen noble language for the LLMs. As our target approach, TREEVADA requires some sort of lexer-parser setup within its oracle, comparing the two methods based on a noble language is precluded from our work currently.

Description:

Supervised by Mr. Md. Jubair Ibna Mostafa, Assistant Professor, Department of Computer Science and Engineering (CSE) Islamic University of Technology (IUT) Board Bazar, Gazipur, Bangladesh This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Software Engineering, 2024

Show full item record