Web-based data mining and discovery of useful herbal ingredients (WD²UHI)

Pao Yue-kong Library Electronic Theses Database

Web-based data mining and discovery of useful herbal ingredients (WD²UHI)


Author: Wong, Ho Kei Jackei
Title: Web-based data mining and discovery of useful herbal ingredients (WD²UHI)
Degree: Ph.D.
Year: 2010
Subject: Herbs -- Therapeutic use -- Data processing.
Medicine, Chinese -- Data processing.
Medicine, Chinese -- Formulae, receipts, prescriptions -- Data processing.
Hong Kong Polytechnic University -- Dissertations
Department: Dept. of Computing
Pages: 512 pages : illustrations
Language: English
InnoPac Record: http://library.polyu.edu.hk/record=b2898257
URI: http://theses.lib.polyu.edu.hk/handle/200/8549
Abstract: This PhD thesis is in the TCM (Traditional Chinese Medicine) area, focusing on discovering trusted and useful herbal ingredients from the enterprise angle. In the research, a conceptual framework of six essential elements is proposed, namely: i) enterprise TCM ontology; ii) automation of ontology-based system generation, directly from the given iconic specification by adhering to the Meta-Interface (MI) concept; iii) concept of "living ontology" and its "reversible" implementation support; iv) text mining of open data sources (e.g. the open web and other public knowledge repertoires) in an on-line manner; v) techniques to define the associations/relevance among various ontological entities, namely, automatic semantic aliasing and neural network; and iv) system trustworthiness attained by achieving cross-layer semantic transitivity. Therefore, we call the trusted conceptual framework, which represents the aim of this research, the WD²UHI (Web-based Data Mining and Discovery of Useful Herbal Ingredients) platform. In fact, the trustworthiness of the platform is ensured throughout the research process because all the prototypes at different stages are verified in the clinical environments, involving physicians and treatment of patients whenever appropriate. The meaning of trustworthiness or "being trusted" adheres to spirit of the RFC 2828 - Internet Security Glossary. From the above, the two objectives of this project have become transparent: i) 1st objective - the proposal and development of the trusted WD²UHI platform, and ii) 2nd objective - the proposal of novel methods to discover herbal ingredients correctly and meaningfully. The trusted WD²UHI platform in the 1st objective involves details in two areas: i) reliable client/server communication over the mobile Internet; and ii) meaningful herbal information discovery, which must adhere to IT (Information Technology) formalisms and globally accepted TCM formal principles. Finding suitable and efficacious methods to guarantee reliable client/server communication over the mobile Internet and defining TCM formalisms for meaningful TCM discoveries require a colossal amount of work, which would exceed the time/effort constraints imposed on this PhD research. For this reason, we had to make a decision from the experience of my serious and extensive preliminary explorations. As a result, it was decided that the rest of the research energy should be focused mainly on the second objective, which itself requires a substantial amount of effort as manifested by the scale of the essential elements defined for the proposed conceptual framework. As is mentioned above, this research covers two domains of formalisms - IT and TCM. Since my TCM knowledge is limited, I had to consult and discuss with different TCM experts (e.g. physicians including those of the YOT (Yan Oi Tong) mobile clinics that treat hundreds of patients daily in the Hong Kong SAR, and also pharmacologists from other parties), in light of applicable TCM formalisms, continuously. The research activities are organized as a fast prototyping process, which feeds the current useful experience to the next stage incrementally to re-orient the research direction when necessary. The TCM formalism identified for this research is the SIMILARITY/SAME (i.e. "同") principle (or "同病異治, 異病同治" in classical TCM terminology). If the three different sets of prescriptions for Illness (A, a) (i.e. illness A for region a), Illness (A, b) (i.e. the same illness name for region b) and Illness (A, c) (i.e. the same illness name for region c) are PAa , PAb , and PAc respectively, by the SIMILARITY/SAME principle the total/common set of usable prescriptions for treating the three illnesses should be Pall = PAa∪PAb∪PAc . The ∪ operation (i.e. union) associates the three different sets of prescriptions into a single pool (i.e. common set Pall ) by their common attributes/factors. In fact, the three illnesses are defined by some additional attributes on top of the common set, due to geographical and epidemiological differences. [Figure a.1 see article file for the details of the abstract] The generation of testing prototypes in this research is automatic and adheres to the meta-interface (MI) philosophy that was originally proposed by the Nong's Company Limited, which also allows my prototypes to be generated/customized from its production enterprise TCM ontology core (onto-core) for real-life mobile-clinic operations. The original MI paradigm proposed by Nong's is only a conceptual "shell" containing insufficient details for implementation, but the Nong's enterprise TCM onto-core was already a production version when I started my PhD research. It is part of the PhD research endeavour to make the "shell" MI paradigm work.
The Nong's enterprise TCM onto-core (ontology core) is skeletally built from classical information already enshrined in canons, treatises, and case histories since the ancient past, via a consensus certification process. Since its resultant consensus-certified onto-core does not evolve automatically, it risks the danger of stagnating with old knowledge. The OCOE&CID (On-line Continuous Ontology Evolution and Clinical Intelligence Discovery) paradigm proposed in my PhD research is actually the advanced, implementable version of the "shell" Nong's MI philosophy. It neutralizes the danger of knowledge stagnation by opening up the closed skeletal TCM onto-core with the help of continuous on-line text mining and automatic semantic aliasing (ASA). The ASA weights the similarity between two terms (e.g. Ter₁ and Ter₁). Ter₁ = Ter₂ means that the two terms are synonyms or logically the same. In the logical expression, which is an IT formalism, P(Ter₁ ∪ Ter₂) = P(Ter₁) + P(Ter₂) - P(Ter₁∩ Ter₂) , the symbols ∪ and ∩ stand for union and intersection respectively. If Ter₁ and Ter₁ are only similar, Ter₁ ≠ Ter₂ is logically true; they are then aliases (not synonyms). P(Ter₁∩Ter₂) represents the degree of similarity (probability) between Ter₁ and Ter₁. P(Ter₁) and P(Ter₂) are probabilities for the multi-representations (other meanings). [Figure a.2 : see article file for the details of the abstract] In this research there are two types of discoveries conceptually: i) Type 1 - if the discovery is not within the context of the extant skeletal ontology (i.e. Part A in Figure a.2); and ii) Type 2 - if the discovery is within the current context of the working ontology (i.e. "Part A and Part B" together). Type 1 is considered as "high-level" and Type 2 "low-level". In fact, any discovery is determined on the relevance index, which can be computed by the ASA mechanism or the NN (neural network) backpropagation approach. The NN approach is particularly suitable for Type 2 discovery of individual herbal ingredients. Since the NN named module is trained only with the prescribed dataset, training is considered completed in the context of Type 2 discovery, as long as the NN has learned all the "current knowledge" intertwined and embedded in the current training dataset. It produces potential discoveries, which should be later decided upon by TCM domain experts. The solutions proposed in my PhD research have contributed to 16 publications so far. All the stated PhD research objectives have been achieved. The research has also uncovered many relevant problems, which should be resolved in the future work including: i) reducing the ANT value defined by ANT = k->∞Σj=1 jPj 1/(1-δ) , where δ is the channel error probability, in order to parallelize a very large data base (VLDB), such as a sizeable TCM ontology, for fast system response, and ii) identifying other formal TCM principles to facilitate more effective discovery of herbal ingredients, with the help of practicing TCM physicians and domain experts.

Files in this item

Files Size Format
b28982575.pdf 7.205Mb PDF
Copyright Undertaking
As a bona fide Library user, I declare that:
  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.


Quick Search


More Information