ML Model Access and Collaborative Training through Decentralized Data Contribution

Islam, Md Samin Yasar; Khan, Arafat Kabir; Noor, Agni Otulonio

ML Model Access and Collaborative Training through Decentralized Data Contribution

Islam, Md Samin Yasar; Khan, Arafat Kabir; Noor, Agni Otulonio

URI: http://hdl.handle.net/123456789/2377

Date: 2024-06-05

Abstract:

Machine learning has been at the heart of various industries by enabling systems to learn from data and improve their performance over time. Its applications span a wide range of fields, from healthcare to finance, entertainment to transportation and into more specialized domains like Natural Language Processing, sentiment analysis and so on. However, conventional machine learning (ML) approaches suffer few fundamental problems, such as the requirement for frequent retraining of the machine learning model to keep it updated, the extensive use of private datasets and financial concerns to use the model for inference. The need for frequent training is one of the major financial burden for preparing the ML model. Cost associated with frequently constructing the dataset and then training the model using it poses quite the finan- cial challenge. Besides that, private datasets raise questions about how the model was trained. In addition to this, third parties interested to study or improve the existing model cannot do so due to the lack of access to the dataset. Finally, companies or the end users of the model also have to bear expenses in order to actually use the model for inference. Blockchain technology can come in useful to tackle these issues. Blockchain is a distributed ledger system that maintains a serial of transactions that are unchangeable or more accurately, immutable. After every transaction occurs, it is validated by "miners" or network participants and recorded as a "block" of data within the blockchain. These blocks document the precise time and sequence of transactions. They are securely linked together, ensuring that no block can be altered or inserted between existing blocks. As new blocks are added, they reinforce the validity of the preceding blocks, thereby strengthening the integrity of the entire blockchain. Blockchain is therefore highly secured and trustable due to its robust design and the distributed nature of the technology also makes it accessible and transparent to everyone. The objective of this study is to enhance trust, transparency, and community engagement in the development of machine learning models through the utilisation of decentralised computing and collaborative learning and this is where blockchain aids in this research. This study evaluates a Naive Bayes classifier using a decentralised approach and compares its outcomes to those of a Sparse Perceptron model from previous research. Using Hyperledger Calliper shows that the Naive Bayes model outperforms the Sparse Perceptron model in terms of throughput, average latency, and peak latency. The study's findings demonstrate that the area of machine learning could be enhanced by adopting a decentralised, shared, and collaborative approach.

Description:

Supervised by Dr. Md. Azam Hossain, Associate Professor, Department of Computer Science and Engineering (CSE) Islamic University of Technology (IUT) Board Bazar, Gazipur, Bangladesh This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Software Engineering, 2024

Show full item record