Die ontwikkeling van ’n fleksievormgenereerder vir Afrikaans

S. Pilon

doi:10.4102/lit.v29i1.102

Die ontwikkeling van ’n fleksievormgenereerder vir Afrikaans

Literator ◽

10.4102/lit.v29i1.102 ◽

2008 ◽

Vol 29 (1) ◽

pp. 93-110

Author(s):

S. Pilon

Keyword(s):

Machine Learning ◽

Training Data ◽

Rule Based ◽

Plural Form ◽

Inflected Form ◽

Core Technology ◽

Average Accuracy ◽

Rule Based Approach

The development of an inflected form generator for Afrikaans In this article the development of an inflected form generator for Afrikaans is described. Two requirements are set for this inflected form generator, viz. to generate only one specific inflected form of a lemma and to generate all possible inflected forms of a lemma. The decision to use machine learning instead of the more traditional rule-based approach in the development of this core-technology is explained and a brief overview of the development of LIA, a lemmatiser for Afrikaans, is given. Experiments are done with three different methods and it is shown that the most effective way of developing an inflected form generator for Afrikaans is by training different classifiers for each affix. Therefore a classifier is trained to generate a plural form, one to generate the diminutive, one to generate the plural of diminutive, et cetera. The final inflected form generator for Afrikaans (AIL-3) reaches an average accuracy of 86,37% on the training data and 86,88% on a small amount of new data. It is indicated that, with the help of a preprocessing module, AIL-3 meets the requirements that were set for an Afrikaans inflected form generator. Finally suggestions are made on how to improve the accuracy of AIL-3.

Download Full-text

Die ontwikkeling van ’n woordafbreker en kompositumanaliseerder vir Afrikaans

Literator ◽

10.4102/lit.v29i1.99 ◽

2008 ◽

Vol 29 (1) ◽

pp. 21-42 ◽

Cited By ~ 1

Author(s):

S. Pilon ◽

M.J. Puttkammer ◽

G.B. Van Huyssteen

Keyword(s):

Machine Learning ◽

Training Data ◽

Practical Implementation ◽

Manual Annotation ◽

Machine Learning Technique ◽

Rule Based ◽

The Core ◽

Learning Classifier ◽

Learning Technique ◽

Rule Based Approach

The development of a hyphenator and compound analyser for Afrikaans The development of two core-technologies for Afrikaans, viz. a hyphenator and a compound analyser is described in this article. As no annotated Afrikaans data existed prior to this project to serve as training data for a machine learning classifier, the core-technologies in question are first developed using a rule-based approach. The rule-based hyphenator and compound analyser are evaluated and the hyphenator obtains an fscore of 90,84%, while the compound analyser only reaches an f-score of 78,20%. Since these results are somewhat disappointing and/or insufficient for practical implementation, it was decided that a machine learning technique (memory-based learning) will be used instead. Training data for each of the two core-technologies is then developed using “TurboAnnotate”, an interface designed to improve the accuracy and speed of manual annotation. The hyphenator developed using machine learning has been trained with 39 943 words and reaches an fscore of 98,11% while the f-score of the compound analyser is 90,57% after being trained with 77 589 annotated words. It is concluded that machine learning (specifically memory-based learning) seems an appropriate approach for developing coretechnologies for Afrikaans.

Download Full-text

Named Entity Recognition for a Low Resource Language

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2085.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 587-590

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Training Data ◽

Entity Recognition ◽

Linguistic Knowledge ◽

Rule Based ◽

Low Resource ◽

Named Entity ◽

The North ◽

Rule Based Approach

Kokborok named entity recognition using the rules based approach is being studied in this paper. Named entity recognition is one of the applications of natural language processing. It is considered a subtask for information extraction. Named entity recognition is the means of identifying the named entity for some specific task. We have studied the named entity recognition system for the Kokborok language. Kokborok is the official language of the state of Tripura situated in the north eastern part of India. It is also widely spoken in other part of the north eastern state of India and adjoining areas of Bangladesh. The named entities are like the name of person, organization, location etc. Named entity recognitions are studied using the machine learning approach, rule based approach or the hybrid approach combining the machine learning and rule based approaches. Rule based named entity recognitions are influence by the linguistic knowledge of the language. Machine learning approach requires a large number of training data. Kokborok being a low resource language has very limited number of training data. The rule based approach requires linguistic rules and the results are not depended on the size of data available. We have framed a heuristic rules for identifying the named entity based on linguistic knowledge of the language. An encouraging result is obtained after we test our data with the rule based approach. We also tried to study and frame the rules for the counting system in Kokborok in this paper. The rule based approach to named entity recognition is found suitable for low resource language with limited digital work and absence of named entity tagged data. We have framed a suitable algorithm using the rules for solving the named entity recognition task for obtaining a desirable result.

Download Full-text

A Brief Survey on Text Classification Using Various Machine Learning Techniques

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v8i1.521 ◽

2018 ◽

Vol 8 (1) ◽

pp. 14

Author(s):

Padmavathi .S ◽

M. Chidambaram

Keyword(s):

Machine Learning ◽

Text Classification ◽

Fixed Number ◽

Machine Learning Techniques ◽

Online Information ◽

Rule Based ◽

Learning Techniques ◽

Machine Learning Approach ◽

Rule Based Approach

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.

Download Full-text

Three level method using machine learning and rule based approach for extracting web-table information

30th Annual Conference of IEEE Industrial Electronics Society, 2004. IECON 2004 ◽

10.1109/iecon.2004.1432313 ◽

2005 ◽

Author(s):

Sung-Won Jung ◽

Sung-Shin Lim ◽

Hyuk-Chul Kwon

Keyword(s):

Machine Learning ◽

Rule Based ◽

Level Method ◽

Rule Based Approach

Download Full-text

A tree-based learning approach for document structure analysis and its application to web search

Natural Language Engineering ◽

10.1017/s1351324914000023 ◽

2014 ◽

Vol 21 (4) ◽

pp. 569-605 ◽

Cited By ~ 2

Author(s):

F. CANAN PEMBE ◽

TUNGA GÜNGÖR

Keyword(s):

Machine Learning ◽

Web Search ◽

Learning Algorithm ◽

Classification Problem ◽

Support Vector ◽

Learning Approach ◽

Rule Based ◽

Document Structure ◽

Rule Based Approach ◽

Extraction Model

AbstractIn this paper, we study the problem of structural analysis of Web documents aiming at extracting the sectional hierarchy of a document. In general, a document can be represented as a hierarchy of sections and subsections with corresponding headings and subheadings. We developed two machine learning models: heading extraction model and hierarchy extraction model. Heading extraction was formulated as a classification problem whereas a tree-based learning approach was employed in hierarchy extraction. For this purpose, we developed an incremental learning algorithm based on support vector machines and perceptrons. The models were evaluated in detail with respect to the performance of the heading and hierarchy extraction tasks. For comparison, a baseline rule-based approach was used that relies on heuristics and HTML document object model tree processing. The machine learning approach, which is a fully automatic approach, outperformed the rule-based approach. We also analyzed the effect of document structuring on automatic summarization in the context of Web search. The results of the task-based evaluation on TREC queries showed that structured summaries are superior to unstructured summaries both in terms of accuracy and user ratings, and enable the users to determine the relevancy of search results more accurately than search engine snippets.

Download Full-text

Applying Artificial Intelligence for Operating System Fingerprinting

Engineering Proceedings ◽

10.3390/engproc2021007051 ◽

2021 ◽

Vol 7 (1) ◽

pp. 51

Author(s):

Rubén Pérez-Jove ◽

Cristian R. Munteanu ◽

Alejandro Pazos Sierra ◽

José M. Vázquez-Naya

Keyword(s):

Machine Learning ◽

Operating System ◽

Computer Security ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Reference Database ◽

Penetration Test ◽

Rule Based ◽

Learning Techniques ◽

Rule Based Approach

In the field of computer security, the possibility of knowing which specific version of an operating system is running behind a machine can be useful, to assist in a penetration test or monitor the devices connected to a specific network. One of the most widespread tools that better provides this functionality is Nmap, which follows a rule-based approach for this process. In this context, applying machine learning techniques seems to be a good option for addressing this task. The present work explores the strengths of different machine learning algorithms to perform operating system fingerprinting, using for that, the Nmap reference database. Moreover, some optimizations were applied to the method which brought the best results, random forest, obtaining an accuracy higher than 96%.

Download Full-text

Extracting cancer mortality statistics from death certificates: A hybrid machine learning and rule-based approach for common and rare cancers

Artificial Intelligence in Medicine ◽

10.1016/j.artmed.2018.04.011 ◽

2018 ◽

Vol 89 ◽

pp. 1-9 ◽

Cited By ~ 8

Author(s):

Bevan Koopman ◽

Guido Zuccon ◽

Anthony Nguyen ◽

Anton Bergheim ◽

Narelle Grayson

Keyword(s):

Machine Learning ◽

Cancer Mortality ◽

Death Certificates ◽

Mortality Statistics ◽

Rule Based ◽

Rare Cancers ◽

Hybrid Machine ◽

Rule Based Approach

Download Full-text

Learning statistical models of phenotypes using noisy labeled training data

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocw028 ◽

2016 ◽

Vol 23 (6) ◽

pp. 1166-1173 ◽

Cited By ~ 58

Author(s):

Vibhu Agarwal ◽

Tanya Podchiyska ◽

Juan M Banda ◽

Veena Goel ◽

Tiffany I Leung ◽

...

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Myocardial Infarction ◽

Type 2 Diabetes ◽

Type 2 Diabetes Mellitus ◽

Statistical Models ◽

Training Data ◽

Rule Based ◽

Precision And Accuracy

Abstract Objective Traditionally, patient groups with a phenotype are selected through rule-based definitions whose creation and validation are time-consuming. Machine learning approaches to electronic phenotyping are limited by the paucity of labeled training datasets. We demonstrate the feasibility of utilizing semi-automatically labeled training sets to create phenotype models via machine learning, using a comprehensive representation of the patient medical record. Methods We use a list of keywords specific to the phenotype of interest to generate noisy labeled training data. We train L1 penalized logistic regression models for a chronic and an acute disease and evaluate the performance of the models against a gold standard. Results Our models for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.90, 0.89, and 0.86, 0.89, respectively. Local implementations of the previously validated rule-based definitions for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.96, 0.92 and 0.84, 0.87, respectively. We have demonstrated feasibility of learning phenotype models using imperfectly labeled data for a chronic and acute phenotype. Further research in feature engineering and in specification of the keyword list can improve the performance of the models and the scalability of the approach. Conclusions Our method provides an alternative to manual labeling for creating training sets for statistical models of phenotypes. Such an approach can accelerate research with large observational healthcare datasets and may also be used to create local phenotype models.

Download Full-text

Detection on sarcasm using machine learning classifiers and rule based approach

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1055/1/012105 ◽

2021 ◽

Vol 1055 (1) ◽

pp. 012105

Author(s):

K. Sentamilselvan ◽

P. Suresh ◽

G K Kamalam ◽

S. Mahendran ◽

D. Aneri

Keyword(s):

Machine Learning ◽

Rule Based ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Rule Based Approach

Download Full-text

Maximizing SLU Performance with Minimal Training Data Using Hybrid RNN Plus Rule-based Approach

10.18653/v1/w18-5043 ◽

2018 ◽

Cited By ~ 1

Author(s):

Takeshi Homma ◽

Adriano S. Arantes ◽

Maria Teresa Gonzalez Diaz ◽

Masahito Togami

Keyword(s):

Training Data ◽

Rule Based ◽

Rule Based Approach

Download Full-text