Die ontwikkeling van ’n woordafbreker en kompositumanaliseerder vir Afrikaans

S. Pilon; M.J. Puttkammer; G.B. Van Huyssteen

doi:10.4102/lit.v29i1.99

Die ontwikkeling van ’n woordafbreker en kompositumanaliseerder vir Afrikaans

Literator ◽

10.4102/lit.v29i1.99 ◽

2008 ◽

Vol 29 (1) ◽

pp. 21-42 ◽

Cited By ~ 1

Author(s):

S. Pilon ◽

M.J. Puttkammer ◽

G.B. Van Huyssteen

Keyword(s):

Machine Learning ◽

Training Data ◽

Practical Implementation ◽

Manual Annotation ◽

Machine Learning Technique ◽

Rule Based ◽

The Core ◽

Learning Classifier ◽

Learning Technique ◽

Rule Based Approach

The development of a hyphenator and compound analyser for Afrikaans The development of two core-technologies for Afrikaans, viz. a hyphenator and a compound analyser is described in this article. As no annotated Afrikaans data existed prior to this project to serve as training data for a machine learning classifier, the core-technologies in question are first developed using a rule-based approach. The rule-based hyphenator and compound analyser are evaluated and the hyphenator obtains an fscore of 90,84%, while the compound analyser only reaches an f-score of 78,20%. Since these results are somewhat disappointing and/or insufficient for practical implementation, it was decided that a machine learning technique (memory-based learning) will be used instead. Training data for each of the two core-technologies is then developed using “TurboAnnotate”, an interface designed to improve the accuracy and speed of manual annotation. The hyphenator developed using machine learning has been trained with 39 943 words and reaches an fscore of 98,11% while the f-score of the compound analyser is 90,57% after being trained with 77 589 annotated words. It is concluded that machine learning (specifically memory-based learning) seems an appropriate approach for developing coretechnologies for Afrikaans.

Download Full-text

Named Entity Recognition for a Low Resource Language

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2085.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 587-590

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Training Data ◽

Entity Recognition ◽

Linguistic Knowledge ◽

Rule Based ◽

Low Resource ◽

Named Entity ◽

The North ◽

Rule Based Approach

Kokborok named entity recognition using the rules based approach is being studied in this paper. Named entity recognition is one of the applications of natural language processing. It is considered a subtask for information extraction. Named entity recognition is the means of identifying the named entity for some specific task. We have studied the named entity recognition system for the Kokborok language. Kokborok is the official language of the state of Tripura situated in the north eastern part of India. It is also widely spoken in other part of the north eastern state of India and adjoining areas of Bangladesh. The named entities are like the name of person, organization, location etc. Named entity recognitions are studied using the machine learning approach, rule based approach or the hybrid approach combining the machine learning and rule based approaches. Rule based named entity recognitions are influence by the linguistic knowledge of the language. Machine learning approach requires a large number of training data. Kokborok being a low resource language has very limited number of training data. The rule based approach requires linguistic rules and the results are not depended on the size of data available. We have framed a heuristic rules for identifying the named entity based on linguistic knowledge of the language. An encouraging result is obtained after we test our data with the rule based approach. We also tried to study and frame the rules for the counting system in Kokborok in this paper. The rule based approach to named entity recognition is found suitable for low resource language with limited digital work and absence of named entity tagged data. We have framed a suitable algorithm using the rules for solving the named entity recognition task for obtaining a desirable result.

Download Full-text

Density Based Clustering with Integrated One-Class SVM for Noise Reduction

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v6i3.pp199-208 ◽

2017 ◽

Vol 6 (3) ◽

pp. 199

Author(s):

K. Nafees Ahmed ◽

T. Abdul Razak

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Data Analysis ◽

Noise Reduction ◽

Spatial Clustering ◽

Support Vector ◽

Machine Learning Technique ◽

Learning Classifier ◽

Density Based Clustering ◽

Learning Technique

<p>Information extraction from data is one of the key necessities for data analysis. Unsupervised nature of data leads to complex computational methods for analysis. This paper presents a density based spatial clustering technique integrated with one-class Support Vector Machine (SVM), a machine learning technique for noise reduction, a modified variant of DBSCAN called Noise Reduced DBSCAN (NRDBSCAN). Analysis of DBSCAN exhibits its major requirement of accurate thresholds, absence of which yields suboptimal results. However, identifying accurate threshold settings is unattainable. Noise is one of the major side-effects of the threshold gap. The proposed work reduces noise by integrating a machine learning classifier into the operation structure of DBSCAN. The Experimental results indicate high homogeneity levels in the clustering process.</p>

Download Full-text

An Unsupervised Machine-Learning Technique for the Definition of a Rule-Based Control Strategy in a Complex HEV

SAE International Journal of Alternative Powertrains ◽

10.4271/2016-01-1243 ◽

2016 ◽

Vol 5 (2) ◽

pp. 308-327 ◽

Cited By ~ 7

Author(s):

Roberto Finesso ◽

Ezio Spessa ◽

Mattia Venditti

Keyword(s):

Machine Learning ◽

Control Strategy ◽

Machine Learning Technique ◽

Unsupervised Machine Learning ◽

Rule Based ◽

Learning Technique ◽

Definition Of

Download Full-text

Die ontwikkeling van ’n fleksievormgenereerder vir Afrikaans

Literator ◽

10.4102/lit.v29i1.102 ◽

2008 ◽

Vol 29 (1) ◽

pp. 93-110

Author(s):

S. Pilon

Keyword(s):

Machine Learning ◽

Training Data ◽

Rule Based ◽

Plural Form ◽

Inflected Form ◽

Core Technology ◽

Average Accuracy ◽

Rule Based Approach

The development of an inflected form generator for Afrikaans In this article the development of an inflected form generator for Afrikaans is described. Two requirements are set for this inflected form generator, viz. to generate only one specific inflected form of a lemma and to generate all possible inflected forms of a lemma. The decision to use machine learning instead of the more traditional rule-based approach in the development of this core-technology is explained and a brief overview of the development of LIA, a lemmatiser for Afrikaans, is given. Experiments are done with three different methods and it is shown that the most effective way of developing an inflected form generator for Afrikaans is by training different classifiers for each affix. Therefore a classifier is trained to generate a plural form, one to generate the diminutive, one to generate the plural of diminutive, et cetera. The final inflected form generator for Afrikaans (AIL-3) reaches an average accuracy of 86,37% on the training data and 86,88% on a small amount of new data. It is indicated that, with the help of a preprocessing module, AIL-3 meets the requirements that were set for an Afrikaans inflected form generator. Finally suggestions are made on how to improve the accuracy of AIL-3.

Download Full-text

Scalable density based spatial clustering with integrated one-class SVM for noise reduction

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.9.10093 ◽

2018 ◽

Vol 7 (2.22) ◽

pp. 28

Author(s):

K. Nafees Ahmed ◽

T. Abdul Razak

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Side Effects ◽

Information Extraction ◽

Noise Reduction ◽

Spatial Clustering ◽

Machine Learning Technique ◽

Learning Classifier ◽

Learning Technique ◽

High Scalability

Information extraction from data is one of the key necessities for data analysis. Unsupervised nature of data leads to complex computational methods for analysis. This paper presents a density based spatial clustering technique integrated with one-class SVM, a machine learning technique for noise reduction, a modified variant of DBSCAN called NRDBSCAN. Analysis of DBSCAN exhibits its major requirement of accurate thresholds, absence of which yields suboptimal results. However, identifying accurate threshold settings is unattainable. Noise is one of the major side-effects of the threshold gap. The proposed work reduces noise by integrating a machine learning classifier into the operation structure of DBSCAN. Further, the proposed technique is parallelized using Spark architecture, thereby increasing its scalability and its ability to handle large amounts of data. Experiments and comparisons with similar techniques indicate high scalability levels and high homogeneity levels in the clustering process.

Download Full-text

Predicting On-Axis Rotorcraft Dynamic Responses Using Machine Learning Techniques

10.20944/preprints201907.0348.v1 ◽

2019 ◽

Author(s):

Ryan Jackson ◽

Michael Jump ◽

Peter Green

Keyword(s):

Machine Learning ◽

Dynamic Response ◽

Real Time ◽

Computational Cost ◽

Dynamic Responses ◽

Training Data ◽

Machine Learning Techniques ◽

Machine Learning Technique ◽

Learning Technique ◽

Gp Model

Physical-law based models are widely utilized in the aerospace industry. One such use is to provide flight dynamics models for use in flight simulators. For human-in-the-loop use, such simulators must run in real-time. Due to the complex physics of rotorcraft flight, to meet this real-time requirement, simplifications to the underlying physics sometimes have to be applied to the model, leading to model response errors in the predictions compared to the real vehicle. This study investigated whether a machine-learning technique could be employed to provide rotorcraft dynamic response predictions, with the ultimate aim of this model taking over when the physics-based model's accuracy degrades. In the current work, a machine-learning technique was employed to train a model to predict the dynamic response of a rotorcraft. Machine learning was facilitated using a Gaussian Process (GP) non-linear autoregressive model, which predicted the on-axis pitch rate, roll rate, yaw rate and heave responses of a Bo105 rotorcraft. A variational sparse GP model was then developed to reduce the computational cost of implementing the approach on large data sets. It was found that both of the GP models were able to provide accurate on-axis response predictions, particularly when the input contained all four control inceptors and one lagged on-axis response term. The predictions made showed improvement compared to a corresponding physics-based model. The reduction of training data to one-third (rotational axes) or one-half (heave axis) resulted in only minor degradation of the GP model predictions.

Download Full-text

A Brief Survey on Text Classification Using Various Machine Learning Techniques

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v8i1.521 ◽

2018 ◽

Vol 8 (1) ◽

pp. 14

Author(s):

Padmavathi .S ◽

M. Chidambaram

Keyword(s):

Machine Learning ◽

Text Classification ◽

Fixed Number ◽

Machine Learning Techniques ◽

Online Information ◽

Rule Based ◽

Learning Techniques ◽

Machine Learning Approach ◽

Rule Based Approach

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.

Download Full-text

What Should Investors Care About? Mutual Fund Ratings by Analysts vs. Machine Learning Technique

SSRN Electronic Journal ◽

10.2139/ssrn.3702749 ◽

2020 ◽

Author(s):

Si Cheng ◽

Ruichang Lu ◽

Xiaojun Zhang

Keyword(s):

Machine Learning ◽

Mutual Fund ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

The Development of a Quantitative Precipitation Forecast Correction Technique Based on Machine Learning for Hydrological Applications

Atmosphere ◽

10.3390/atmos11010111 ◽

2020 ◽

Vol 11 (1) ◽

pp. 111 ◽

Cited By ~ 2

Author(s):

Chul-Min Ko ◽

Yeong Yun Jeong ◽

Young-Mi Lee ◽

Byung-Sik Kim

Keyword(s):

Machine Learning ◽

Heavy Rainfall ◽

Extreme Rainfall ◽

Machine Learning Techniques ◽

Precipitation Forecast ◽

Machine Learning Technique ◽

Rainfall Forecast ◽

Quantitative Precipitation Forecast ◽

Correction Technique ◽

Learning Technique

This study aimed to enhance the accuracy of extreme rainfall forecast, using a machine learning technique for forecasting hydrological impact. In this study, machine learning with XGBoost technique was applied for correcting the quantitative precipitation forecast (QPF) provided by the Korea Meteorological Administration (KMA) to develop a hydrological quantitative precipitation forecast (HQPF) for flood inundation modeling. The performance of machine learning techniques for HQPF production was evaluated with a focus on two cases: one for heavy rainfall events in Seoul and the other for heavy rainfall accompanied by Typhoon Kong-rey (1825). This study calculated the well-known statistical metrics to compare the error derived from QPF-based rainfall and HQPF-based rainfall against the observational data from the four sites. For the heavy rainfall case in Seoul, the mean absolute errors (MAE) of the four sites, i.e., Nowon, Jungnang, Dobong, and Gangnam, were 18.6 mm/3 h, 19.4 mm/3 h, 48.7 mm/3 h, and 19.1 mm/3 h for QPF and 13.6 mm/3 h, 14.2 mm/3 h, 33.3 mm/3 h, and 12.0 mm/3 h for HQPF, respectively. These results clearly indicate that the machine learning technique is able to improve the forecasting performance for localized rainfall. In addition, the HQPF-based rainfall shows better performance in capturing the peak rainfall amount and spatial pattern. Therefore, it is considered that the HQPF can be helpful to improve the accuracy of intense rainfall forecast, which is subsequently beneficial for forecasting floods and their hydrological impacts.

Download Full-text

Voice Pathology Detection Using Machine Learning Technique

2020 IEEE 5th International Symposium on Telecommunication Technologies (ISTT) ◽

10.1109/istt50966.2020.9279346 ◽

2020 ◽

Author(s):

Fahad Taha AL-Dhief ◽

Nurul Mu'azzah Abdul Latiff ◽

Nik Noordini Nik Abd. Malik ◽

Naseer Sabri ◽

Marina Mat Baki ◽

...

Keyword(s):

Machine Learning ◽

Machine Learning Technique ◽

Learning Technique ◽

Voice Pathology Detection

Download Full-text