Author: Chen, Yiliang
Title: Surgical action triplet recognition via triplet disentanglement
Advisors: Qin, Jing (SN)
Degree: M.Phil.
Year: 2024
Subject: Robotics in medicine
Surgery -- Data processing
Deep learning (Machine learning)
Hong Kong Polytechnic University -- Dissertations
Department: School of Nursing
Pages: xiii, 70 pages : color illustrations
Language: English
Abstract: This study investigates the analysis of surgical triplets in robotic procedure videos. The increasing integration of robotic techniques in minimally invasive surgeries necessitates systematic approaches for procedure interpretation. The research addresses three principal components of a surgical triplet: the instrument, the action, and the targeted organ, as well as their interrelationships, which collectively characterize the surgical process.
However, the recognition of surgical triplets presents distinct technical challenges. These encompass visual complexities in surgical scenes, including instrument overlapping, peripheral tool positioning, and occlusions from blood and surgical smoke. The complexity increases with subtle variations between similar actions and instruments, combined with non-operative movements. Directly learning to address these challenges through end-to-end network training proves extremely difficult due to the intricate relationships and dependencies between triplet components. Besides, current datasets present additional constraints, including annotation inconsistencies and temporal sparsity at 1 frame per second (fps), limiting available temporal information. Moreover, the availability of only one dataset in this domain poses challenges for comprehensive validation of the proposed methods' effectiveness.
To address these challenges, our methodology consists of two main parts. The first implements a triplet disentanglement model that breaks down the complex recognition task into smaller, manageable sub-tasks. By solving these sub-tasks sequentially, this approach effectively addresses the high learning difficulty inherent in directly modeling the intricate relationships between triplet components in surgical activity recognition.
The second part involves dataset diversification and enhancement through the development of a new surgical triplet dataset from a public prostate surgery dataset. This expansion beyond existing datasets addresses the limited validation scope in surgical triplet recognition. The new dataset features more comprehensive annotations that help resolve action ambiguity issues present in existing triplet datasets. Additionally, to address the temporal sparsity challenge from 1 fps sampling, we enhanced our disentanglement model with a temporal smoothness loss function. This method improves prediction consistency across sequential frames and video segments, effectively incorporating temporal context into surgical procedure interpretation.
The methodologies undergo experimental validation on both the extended new dataset and the existing public dataset. Comparative analyses demonstrate performance characteristics between the proposed and existing methods across multiple datasets. These approaches contribute to applications in surgical video analysis and procedural training assessment.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
7985.pdfFor All Users11.43 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13535