Author: | Hu, Zhejing |
Title: | Computational creativity in music : music generation and style transfer |
Advisors: | Liu, Yan (COMP) |
Degree: | Ph.D. |
Year: | 2023 |
Subject: | Computer composition (Music) Composition (Music) Creative ability Hong Kong Polytechnic University -- Dissertations |
Department: | Department of Computing |
Pages: | xvii, 123 pages : color illustrations |
Language: | English |
Abstract: | Computational creativity, an interdisciplinary field blending artificial intelligence and psychology, simulates creative thought and behavior. This field has seen remarkable advancements in artistic creation in recent times, such as transferring an image captured with a smartphone into a Picasso-style painting using deep learning techniques. Studies in computational artistic creation highlight intriguing aspects of human intelligence and hold great potential for advancing true machine intelligence. Among the various subareas of artistic creation using computational models, this thesis focuses on computational creativity in music, covering an array of valuable and challenging problems. For instance, the music generation problem requires the study of abstract music structure, while music style transfer necessitates studying content and style features from music in different application scenarios. Despite current advancements, music generation and music style transfer still have room for improvement, necessitating the development of new techniques. Thus, we propose a novel framework to improve the performance of music generation and music style transfer via computational creativity models. First, we propose a Motif-to-music Generation Model (MGM) that studies the structure of music at the motif level and generates a complete piece of music based on a few key motifs. MGM contains a motif-level repetition generator (MRG) and an outline-to-music generator (O2MG). MRG learns to generate motif-level repetitions, while O2MG learns to generate a complete piece of music based on the music’s outline. MGM is trained on a new music repetition dataset (MRD), mitigating the problem that machine-composed music tends to be random and structureless. Both subjective and objective experimental results demonstrate that MGM can generate different motif-level repetitions and a complete piece of music of high quality. Second, we introduce a novel Call-Response Generator (CRG) featuring a knowledge-enhanced mechanism that studies the structure of music at the phrase level and generates the response of the music given the call. We manually segment 909 pop songs and label 19,155 call-response pairs, design a knowledge-enhancement module to select instructive training data based on rhythm, melody, and harmony quality, and train the composition module on the call-response pairs, augmenting it with musical knowledge. Our experiments show that the proposed model successfully generates a wide variety of engaging and appealing responses for different musical calls, enhancing the quality of machine-composed music. Third, we introduce a novel transfer model for a new music style transfer task, therapeutic music transfer. The limited availability of suitable music pieces for different individuals has posed a challenge in the field of music therapy. We create therapeutic music according to the input music even if the data sample is limited. Given the limited nature of therapeutic music and a user’s favorite music, we design a new domain adaptation algorithm that transfers the learning result for music genre classification to therapeutic music transfer. We then utilize a joint minimization technique to optimize the output music. Subjective experiments on anxiety sufferers prove that the customized therapeutic music has achieved better and stable performance in anxiety reduction. Fourth, we introduce the User Preference Transformer (UP-Transformer) for a new music style transfer task referred to as User Preference Music Transfer (UPMT). This task helps to tailor music to a user’s individual preferences, which can significantly enhance musical diversity and improve mental health. The UP-Transformer is a combined model that utilizes deep learning for training and relies on prior knowledge of just a single piece of music favored by the user for inference. We improve the Transformer-based model by introducing a new loss function, the favorite-aware loss function, which takes into account the user’s favorite music. During the inference phase, we perform UPMT using the musical knowledge extracted from the user’s favorite piece. To address the problem of evaluating melodic similarity in music style transfer, we propose a new metric called pattern similarity (PS) to assess the similarity between two pieces of music. Both objective and subjective evaluations demonstrate that the music transferred using our proposed method outperforms traditional methods in terms of musicality, similarity, and user preferences. |
Rights: | All rights reserved |
Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/12741