Manipulating deformable objects by learning-based representations, planning and control

Huo, Shengzeng

Author:	Huo, Shengzeng
Title:	Manipulating deformable objects by learning-based representations, planning and control
Advisors:	Navarro-Alarcon, David (ME)
Degree:	Ph.D.
Year:	2025
Subject:	Robots -- Control systems Robotics Hong Kong Polytechnic University -- Dissertations
Department:	Department of Mechanical Engineering
Pages:	xxvi, 172 pages : color illustrations
Language:	English
Abstract:	The manipulation of deformable objects has emerged as a prominent area of research within robotic manipulation, owing to its extensive range of applications. In contrast to rigid objects, the manipulation of deformable objects presents significant challenges due to their infinite degree-of-freedom and complex non-linear dynamics. Despite substantial advancements in this field in recent years, many approaches rely on strong assumptions, such as the presence of an explicit goal, structured environments and pre-grasping. These assumptions considerably limit their applicability in real-world scenarios. The objective of this thesis is to enhance the ability of robots to manipulate deformable objects in real-world environments. Specifically, we aim to remove strong assumptions in the field of Deformable Object Manipulation (DOM), enabling robots to solve real problems in practical environments. Specifically, we focus on two types of objects: Deformable Linear Object (DLO)s and cloth-like items, which are commonly encountered in everyday life. Given the challenges involved in accurately modeling deformable objects, we propose a variety of data-driven, learning-based methods to improve the robustness of the controller. Three aspects are taken into consideration, including representations, planning and control. In the first study, we address the problem of contact-based manipulation of DLOs towards desired shapes with a dual-arm robotic system. To alleviate the burden of high-dimensional continuous state-action spaces, we model DLOs as kinematic multibody systems via our proposed keypoint encoding network. This novel encoding is trained on a synthetic labeled image dataset without requiring any manual annotations and can be directly transferred to real manipulation scenarios. Our goal-conditioned policy efficiently rearranges the configuration of the DLO based on the keypoints. The proposed hierarchical action framework tackles the manipulation problem in a coarse-to-fine manner (with high-level task planning and low-level motion control) by leveraging two action primitives. The identification of deformation properties is bypassed since the algorithm replans its motion after each bimanual execution. The conducted experimental results reveal that our method achieves high performance in state representation and shaping manipulation of the DLO under environmental constraints. In the second study, we propose a self-supervised planning and control approach to address the challenge of rearranging DLOs for implicit goals. Most existing research in this area focuses on shape control for a provided explicit goal and does not consider physical constraints, which limits its applicability in many real-world scenarios. Specifically, we consider the context of making both ends of the object reachable (inside the robotic access range) and graspable (outside potential collision regions) by dual-arm robots. Firstly, we describe the object with sequential keypoints and parameterize the correspondence-based action. Secondly, we develop a generator capable of producing multiple explicit targets that adhere to implicit conditions. Thirdly, we learn value models to assign the most promising explicit target as guidance and determine the goal-conditioned action. All models within our policy are trained in a self-supervised manner based on data collected from simulations. Importantly, the learned policy can be directly applied to real-world settings since we do not rely on accurate dynamic models. We validate the performance of our new method with simulations and real-world experiments. In the third study, we address the problem of contact-rich fabric manipulation. Considering that fabrics need to contact with different rigid objects in different tasks, we solve this problem efficiently with imitation learning. Specifically, robots imitate humans' behaviors in a single demonstration video immediately without the need for additional training or explorations. To accomplish this objective, we formulate the problem as adapting to a new scene based on the prior knowledge learned in a meta scene within the task family. Our approach consists of three key modules: (1) learning general prior knowledge through random explorations in simulation, which includes state representations, dynamic models, and the constrained action space of the task; (2) extracting a state alignment-based reward function from a single demonstration video; and (3) real-time optimization of the imitation policy under systematic safety constraints using sampling-based model predictive control (MPC). By following this strategy, we establish an efficient one-shot imitation-from-video approach that simplifies the learning and execution of robot skills in real-world applications. Furthermore, by not assuming strong dynamic consistency between scenes, learning priors can be conducted in simulation, thus avoiding the need to collect data in real-world circumstances. To evaluate the effectiveness of our approach, we focus on contact-rich fabric manipulation, a common scenario in industrial and domestic tasks. Through detailed numerical simulations and real-world hardware experiments, we demonstrate that our method enables rapid skill acquisition for challenging manipulation tasks. In the fourth study, we focus on the task of hanging crumpled garments on a rack, a common scenario encountered in household environments. This context presents two primary challenges: (1) effectively perceiving and grasping the collars of garments that exhibit severe deformations and self-occlusions; (2) accurately aligning the collars of garments with the supporting parts of the rack with partial egocentric view. To address these challenges, we propose a confidence-guided grasping strategy that actively searches for the collars of garments. Specifically, we utilize handovers between dual robotic arms to collect real-world data automatically for the collar detection network. During deployment, we compute the grasping pose using depth-aware contour extraction and introduce a closed-loop evaluation method to assess the success of the grasping point. Additionally, we address the task of hanging garments on the rack by aligning with a predefined pose in the demonstration and retrieving the interaction trajectory. To achieve precise alignment between the collar and the supporting item in spatial contexts, we propose a two-layered hanging strategy that encompasses a coarse approaching followed by fine transformation. We conduct extensive experiments and demonstrate significant improvements in our framework compared to existing garment grasping and active pose alignment methods with in the success rate. In conclusion, the detailed experimental physical demonstrations validate the practicality of our proposed methods. These strong results have revealed that our proposed framework enhance robotics' capabilities in DOM.
Rights:	All rights reserved
Access:	open access

Files in This Item:

File	Description	Size	Format
8135.pdf	For All Users	13.14 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13689