Abstract:
Zero-shot Semantic Segmentation (ZS3) is a daunting task since it requires
segmenting items into classes that were never seen during training. One popular
method is to divide ZS3 into two sub-tasks: creating mask suggestions and assign ing class labels to individual pixels inside those regions. However, many existing
approaches have difficulty producing masks with sufficient generalization capa bilities, resulting in notable performance constraints, particularly on unknown
classes. In this regard, we propose using “Dynamic Kernels” to improve object
understanding within a ZS3 model during the training phase. We want to pro duce superior mask suggestions that permit a more accurate representation of
the objects by harnessing the intrinsic inductive biases of these kernels. These
specialized agents, known as dynamic kernels, adjust based on data taken from
visible classes, allowing them to obtain insights on unseen things. In addition,
for segment classification, our proposed system utilizes the Contrastive Language Image Pre-Training (CLIP) architecture. This integration improves the model’s
generalizability by utilizing its cross-modal training capabilities. The utilization
of dynamic kernels in conjunction with CLIP proves to be advantageous as it
allows for finer granularity in processing, enabling performance enhancements
for both seen and unseen classes. Our proposed ZSK-Net surpasses the existing
state-of-the-art methods by achieving a remarkable improvement of +10.4 and
+0.9 in hIoU on the Pascal VOC and COCO-Stuff datasets, respectively.
Description:
Supervised by
Dr. Md. Hasanul Kabir,
Professor,
Co-Supervisor
Sabbir Ahmed,
Assistant Professor,
Department of Computer Science and Engineering(CSE),
Islamic University of Technology(IUT),
Board Bazar, Gazipur-1704, Bangladesh