From Possibilistic Rule-Based Systems to Machine Learning - A Discussion Paper

Author(s):  
Didier Dubois ◽  
Henri Prade
Author(s):  
Raymond Chiong

In the field of Natural Language Processing, one of the very important research areas of Information Extraction (IE) comes in Named Entity Recognition (NER). NER is a subtask of IE that seeks to identify and classify the predefined categories of named entities in text documents. Considerable amount of work has been done on NER in recent years due to the increasing demand of automated texts and the wide availability of electronic corpora. While it is relatively easy and natural for a human reader to read and understand the context of a given article, getting a machine to understand and differentiate between words is a big challenge. For instance, the word ‘brown’ may refer to a person called Mr. Brown, or the colour of an item which is brown. Human readers can easily discern the meaning of the word by looking at the context of that particular sentence, but it would be almost impossible for a computer to interpret it without any additional information. To deal with the issue, researchers in NER field have proposed various rule-based systems (Wakao, Gaizauskas & Wilks, 1996; Krupka & Hausman, 1998; Maynard, Tablan, Ursu, Cunningham & Wilks, 2001). These systems are able to achieve high accuracy in recognition with the help of some lists of known named entities called gazetteers. The problem with rule-based approach is that it lacks the robustness and portability. It incurs steep maintenance cost especially when new rules need to be introduced for some new information or new domains. A better option is thus to use machine learning approach that is trainable and adaptable. Three wellknown machine learning approaches that have been used extensively in NER are Hidden Markov Model (HMM), Maximum Entropy Model (MEM) and Decision Tree. Many of the existing machine learning-based NER systems (Bikel, Schwartz & Weischedel, 1999; Zhou & Su, 2002; Borthwick, Sterling, Agichten & Grisham, 1998; Bender, Och & Ney, 2003; Chieu & Ng, 2002; Sekine, Grisham & Shinnou, 1998) are able to achieve near-human performance for named entity tagging, even though the overall performance is still about 2% short from the rule-based systems. There have also been many attempts to improve the performance of NER using a hybrid approach with the combination of handcrafted rules and statistical models (Mikheev, Moens & Grover, 1999; Srihari & Li, 2000; Seon, Ko, Kim & Seo, 2001). These systems can achieve relatively good performance in the targeted domains owing to the comprehensive handcrafted rules. Nevertheless, the portability problem still remains unsolved when it comes to dealing with NER in various domains. As such, this article presents a hybrid machine learning approach using MEM and HMM successively. The reason for using two statistical models in succession instead of one is due to the distinctive nature of the two models. HMM is able to achieve better performance than any other statistical models, and is generally regarded as the most successful one in machine learning approach. However, it suffers from sparseness problem, which means considerable amount of data is needed for it to achieve acceptable performance. On the other hand, MEM is able to maintain reasonable performance even when there is little data available for training purpose. The idea is therefore to walkthrough the testing corpus using MEM first in order to generate a temporary tagging result, while this procedure can be simultaneously used as a training process for HMM. During the second walkthrough, the corpus uses HMM for the final tagging. In this process, the temporary tagging result generated by MEM will be used as a reference for subsequent error checking and correction. In the case when there is little training data available, the final result can still be reliable based on the contribution of the initial MEM tagging result.


2021 ◽  
Vol 12 (2) ◽  
pp. 136
Author(s):  
Arnan Dwika Diasmara ◽  
Aditya Wikan Mahastama ◽  
Antonius Rachmat Chrismanto

Abstract. Intelligent System of the Battle of Honor Board Game with Decision Making and Machine Learning. The Battle of Honor is a board game where 2 players face each other to bring down their opponent's flag. This game requires a third party to act as the referee because the players cannot see each other's pawns during the game. The solution to this is to implement Rule-Based Systems (RBS) on a system developed with Unity to support the referee's role in making decisions based on the rules of the game. Researchers also develop Artificial Intelligence (AI) as opposed to applying Case-Based reasoning (CBR). The application of CBR is supported by the nearest neighbor algorithm to find cases that have a high degree of similarity. In the basic test, the results of the CBR test were obtained with the highest formulated accuracy of the 3 examiners, namely 97.101%. In testing the AI scenario as a referee, it is analyzed through colliding pieces and gives the right decision in determining victoryKeywords: The Battle of Honor, CBR, RBS, unity, AIAbstrak. The Battle of Honor merupakan permainan papan dimana 2 pemain saling berhadapan untuk menjatuhkan bendera lawannya. Permainan ini membutuhkan pihak ketiga yang berperan sebagai wasit karena pemain yang saling berhadapan tidak dapat saling melihat bidak lawannya. Solusi dari hal tersebut yaitu mengimplementasikan Rule-Based Systems (RBS) pada sistem yang dikembangkan dengan Unity untuk mendukung peran wasit dalam memberikan keputusan berdasarkan aturan permainan. Peneliti juga mengembangkan Artificial Intelligence (AI) sebagai lawan dengan menerapkan Case-Based reasoning (CBR). Penerapan CBR didukung dengan algoritma nearest neighbour untuk mencari kasus yang memiliki tingkat kemiripan yang tinggi. Pada pengujian dasar didapatkan hasil uji CBR dengan accuracy yang dirumuskan tertinggi dari 3 penguji yaitu 97,101%. Pada pengujian skenario AI sebagai wasit dianalisis lewat bidak yang bertabrakan dan memberikan keputusan yang tepat dalam menentukan kemenangan.Kata Kunci: The Battle of Honor, CBR, RBS, unity, AI


2011 ◽  
Vol 18 (5) ◽  
pp. 552-556 ◽  
Author(s):  
Özlem Uzuner ◽  
Brett R South ◽  
Shuying Shen ◽  
Scott L DuVall

Abstract The 2010 i2b2/VA Workshop on Natural Language Processing Challenges for Clinical Records presented three tasks: a concept extraction task focused on the extraction of medical concepts from patient reports; an assertion classification task focused on assigning assertion types for medical problem concepts; and a relation classification task focused on assigning relation types that hold between medical problems, tests, and treatments. i2b2 and the VA provided an annotated reference standard corpus for the three tasks. Using this reference standard, 22 systems were developed for concept extraction, 21 for assertion classification, and 16 for relation classification. These systems showed that machine learning approaches could be augmented with rule-based systems to determine concepts, assertions, and relations. Depending on the task, the rule-based systems can either provide input for machine learning or post-process the output of machine learning. Ensembles of classifiers, information from unlabeled data, and external knowledge sources can help when the training data are inadequate.


Author(s):  
Padmavathi .S ◽  
M. Chidambaram

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.


Author(s):  
Praveen Kumar Dwivedi ◽  
Surya Prakash Tripathi

Background: Fuzzy systems are employed in several fields like data processing, regression, pattern recognition, classification and management as a result of their characteristic of handling uncertainty and explaining the feature of the advanced system while not involving a particular mathematical model. Fuzzy rule-based systems (FRBS) or fuzzy rule-based classifiers (mainly designed for classification purpose) are primarily the fuzzy systems that consist of a group of fuzzy logical rules and these FRBS are unit annexes of ancient rule-based systems, containing the "If-then" rules. During the design of any fuzzy systems, there are two main objectives, interpretability and accuracy, which are conflicting with each another, i.e., improvement in any of those two options causes the decrement in another. This condition is termed as Interpretability –Accuracy Trade-off. To handle this condition, Multi-Objective Evolutionary Algorithms (MOEA) are often applied within the design of fuzzy systems. This paper reviews the approaches to the problem of developing fuzzy systems victimization evolutionary process Multi-Objective Optimization (EMO) algorithms considering ‘Interpretability-Accuracy Trade-off, current research trends and improvement in the design of fuzzy classifier using MOEA in the future scope of authors. Methods: The state-of-the-art review has been conducted for various fuzzy classifier designs, and their optimization is reviewed in terms of multi-objective. Results: This article reviews the different Multi-Objective Optimization (EMO) algorithms in the context of Interpretability -Accuracy tradeoff during fuzzy classification. Conclusion: The evolutionary multi-objective algorithms are being deployed in the development of fuzzy systems. Improvement in the design using these algorithms include issues like higher spatiality, exponentially inhabited solution, I-A tradeoff, interpretability quantification, and describing the ability of the system of the fuzzy domain, etc. The focus of the authors in future is to find out the best evolutionary algorithm of multi-objective nature with efficiency and robustness, which will be applicable for developing the optimized fuzzy system with more accuracy and higher interpretability. More concentration will be on the creation of new metrics or parameters for the measurement of interpretability of fuzzy systems and new processes or methods of EMO for handling I-A tradeoff.


Author(s):  
Katherine Darveau ◽  
Daniel Hannon ◽  
Chad Foster

There is growing interest in the study and practice of applying data science (DS) and machine learning (ML) to automate decision making in safety-critical industries. As an alternative or augmentation to human review, there are opportunities to explore these methods for classifying aviation operational events by root cause. This study seeks to apply a thoughtful approach to design, compare, and combine rule-based and ML techniques to classify events caused by human error in aircraft/engine assembly, maintenance or operation. Event reports contain a combination of continuous parameters, unstructured text entries, and categorical selections. A Human Factors approach to classifier development prioritizes the evaluation of distinct data features and entry methods to improve modeling. Findings, including the performance of tested models, led to recommendations for the design of textual data collection systems and classification approaches.


Sign in / Sign up

Export Citation Format

Share Document