Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor | Department of Computing | en_US |
| dc.contributor.advisor | Yiu, Man Lung Ken (COMP) | en_US |
| dc.creator | Wong, Tsz Kan | - |
| dc.identifier.uri | https://theses.lib.polyu.edu.hk/handle/200/14132 | - |
| dc.language | English | en_US |
| dc.publisher | Hong Kong Polytechnic University | en_US |
| dc.rights | All rights reserved | en_US |
| dc.title | Development of a data analytics assistant using large language models | en_US |
| dcterms.abstract | The rapid advancement of Large Language Models (LLMs) has significantly influenced various domains, including data science and business analytics. This dissertation explores the transformative potential of LLMs in democratizing access to data insights, with a focus on simplifying the data querying and analysis processes for non-technical business users. The research investigates two primary questions: How LLM can be leveraged to answer business questions with data analytics reasoning and how LLM can be adapted to generate SQL queries from natural language inputs. | en_US |
| dcterms.abstract | The study presents the development of an AutoAnalyst, a system that uses LLM to automate data analysis and provide insights directly from complex datasets, using a natural language interface. In addition, the study continues to improve the Text-to-SQL module, introduces and evaluates several LLM-based strategies for Text-to-SQL query generation, integrating methods such as LangChain, C3, and DIN-SQL to address challenges in schema comprehension and query accuracy. | en_US |
| dcterms.abstract | The results of the experiment demonstrate that the AutoAnalyst system, powered by GPT-4, significantly outperforms the existing baseline ChatGPT system in terms of completion rates and resource cost. In addition, a comparative analysis between human data analysts and the AutoAnalyst system reveals that the latter can complete tasks at a fraction of the cost, providing substantial operational savings for businesses. The results of another experiments conducted on benchmarks such as the Spider and BIRD datasets also demonstrate that the integrated C3+DIN strategy with GPT-4 offers high accuracy in SQL generation, although at a higher computational cost. The findings highlight the trade-offs between accuracy, efficiency, and cost in real-world applications of LLMs for data analytics. | en_US |
| dcterms.abstract | This research contributes to the evolving field of artificial intelligence in data analytics by offering insights into the development of more accessible, scalable, and powerful data analytics tools. Highlights the potential of LLMs to transform data-driven decision-making processes, making advanced data analysis capabilities available to a wider range of users. | en_US |
| dcterms.extent | xiv, 134 pages : color illustrations | en_US |
| dcterms.isPartOf | PolyU Electronic Theses | en_US |
| dcterms.issued | 2025 | en_US |
| dcterms.educationalLevel | Eng.D. | en_US |
| dcterms.educationalLevel | All Doctorate | en_US |
| dcterms.accessRights | restricted access | en_US |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 8583.pdf | For All Users (off-campus access for PolyU Staff & Students only) | 16.8 MB | Adobe PDF | View/Open |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/14132

