Industrial defect detection and localization using large vision-language model

Qin, Hao

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Electrical and Electronic Engineering	en_US
dc.creator	Qin, Hao	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/13900	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Industrial defect detection and localization using large vision-language model	en_US
dcterms.abstract	Industrial Anomaly Detection (IAD) is crucial for identifying defects in manufacturing processes, where anomalies often signify potential failures that can lead to significant losses. However, real-world anomaly detection faces challenges, particularly the scarcity of defective samples due to the rarity of abnormal conditions in actual production. Traditional methods combining machine learning and image processing often fall short in meeting industrial requirements. In recent years, deep learning models, particularly those leveraging visual-language approaches like CLIP, have demonstrated remarkable progress in zero-shot and few-shot anomaly detection.	en_US
dcterms.abstract	This dissertation explores the application of a CLIP-based framework for industrial defect detection and localization. We propose a novel approach that enhances the existing Prompt Learning paradigm, addressing its limitations in detecting fine-grained anomalies. The key contributions of this work include: 1) designing a category-independent prompt template that avoids semantic interference by focusing on anomalies rather than object-specific descriptions, and 2) introducing learnable token embeddings into the text encoder to refine textual representation and improve alignment with anomaly semantics.	en_US
dcterms.abstract	Our model combines these improvements with advanced attention mechanisms to enhance both global and local feature representation. Experiments were conducted on two benchmark datasets, MVTec AD and VisA, under zero-shot and few-shot settings. The results demonstrate that our approach achieves state-of-the-art performance, with an AUPRO of 91.5 on MVTec AD and 83.5 on VisA, outperforming existing methods like AnomalyCLIP and WinCLIP. Visual comparisons of segmentation maps further confirm the model's superior ability to capture fine-grained and localized defect patterns.	en_US
dcterms.abstract	While our method represents a step forward in unsupervised IAD, further research is needed to meet the high accuracy demands of industrial applications. Future work could focus on leveraging incremental data for model fine-tuning and exploring the broader potential of CLIP-based models in industrial settings.	en_US
dcterms.extent	vii, 45 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2024	en_US
dcterms.educationalLevel	M.Sc.	en_US
dcterms.educationalLevel	All Master	en_US
dcterms.LCSH	Anomaly detection (Computer science)	en_US
dcterms.LCSH	Machine learning -- Industrial applications	en_US
dcterms.LCSH	Quality control	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	restricted access	en_US

Files in This Item:

File	Description	Size	Format
8308.pdf	For All Users (off-campus access for PolyU Staff & Students only)	6.52 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13900