Leveraging CLIP and deep learning for robust cataract surgery phase classification

Chen, Xutao

Author:	Chen, Xutao
Title:	Leveraging CLIP and deep learning for robust cataract surgery phase classification
Degree:	M.Sc.
Year:	2024
Department:	Department of Electrical and Electronic Engineering
Pages:	v, 44 pages : color illustrations
Language:	English
Abstract:	• Objective: This study investigates the effectiveness of large language models (LLMs), specifically CLIP, for cataract surgery phase classification. We evaluate CLIP's capability in feature extraction from surgical videos and propose a novel phase recognition framework that leverages CLIP embeddings for accurate and efficient classification. • Material and Methods: This study utilizes the CATARACT dataset, comprising 50 annotated cataract surgery videos with labeled frames corresponding to distinct surgical phases. We propose a two-step approach for cataract phase classification. First, the CLIP model is employed to extract features from all video frames. These features are then used in a Siamese network to detect phase transitions by performing similarity analysis, enabling accurate segmentation of the videos into multiple phases. Finally, multiple instance learning is applied to classify the resulting video segments, leveraging the similarity-based splitting for robust phase recognition. • Results: The proposed method achieved an average video-level classification accuracy of 74.8% across 25 test videos from the CATARACT dataset. These results outperform existing state-of-the-art methods and highlight the effectiveness and consistency of the approach in accurately classifying cataract surgery phases. • Discussion: Our results indicate that the CLIP-extracted features effectively capture both the changes in frame categories and the temporal variations within the video, as demonstrated by the t-SNE analysis. This suggests that these features have strong potential for detecting phase transitions and ensuring accurate phase classification in cataract surgery videos. The ability to distinguish both spatial and temporal aspects of the data highlights the robustness of the method in addressing the challenges of phase detection. • Conclusion: These findings underscore the potential of integrating CLIP-based feature extraction with deep learning models for robust cataract phase classification. The proposed provides a fast, effective, and promising solution for surgical assistance systems.
Rights:	All rights reserved
Access:	restricted access

Files in This Item:

File	Description	Size	Format
8294.pdf	For All Users (off-campus access for PolyU Staff & Students only)	5.59 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13888