scholarly journals Die ontwikkeling van ’n woordafbreker en kompositumanaliseerder vir Afrikaans

Literator ◽  
2008 ◽  
Vol 29 (1) ◽  
pp. 21-42 ◽  
Author(s):  
S. Pilon ◽  
M.J. Puttkammer ◽  
G.B. Van Huyssteen

The development of a hyphenator and compound analyser for Afrikaans The development of two core-technologies for Afrikaans, viz. a hyphenator and a compound analyser is described in this article. As no annotated Afrikaans data existed prior to this project to serve as training data for a machine learning classifier, the core-technologies in question are first developed using a rule-based approach. The rule-based hyphenator and compound analyser are evaluated and the hyphenator obtains an fscore of 90,84%, while the compound analyser only reaches an f-score of 78,20%. Since these results are somewhat disappointing and/or insufficient for practical implementation, it was decided that a machine learning technique (memory-based learning) will be used instead. Training data for each of the two core-technologies is then developed using “TurboAnnotate”, an interface designed to improve the accuracy and speed of manual annotation. The hyphenator developed using machine learning has been trained with 39 943 words and reaches an fscore of 98,11% while the f-score of the compound analyser is 90,57% after being trained with 77 589 annotated words. It is concluded that machine learning (specifically memory-based learning) seems an appropriate approach for developing coretechnologies for Afrikaans.

Kokborok named entity recognition using the rules based approach is being studied in this paper. Named entity recognition is one of the applications of natural language processing. It is considered a subtask for information extraction. Named entity recognition is the means of identifying the named entity for some specific task. We have studied the named entity recognition system for the Kokborok language. Kokborok is the official language of the state of Tripura situated in the north eastern part of India. It is also widely spoken in other part of the north eastern state of India and adjoining areas of Bangladesh. The named entities are like the name of person, organization, location etc. Named entity recognitions are studied using the machine learning approach, rule based approach or the hybrid approach combining the machine learning and rule based approaches. Rule based named entity recognitions are influence by the linguistic knowledge of the language. Machine learning approach requires a large number of training data. Kokborok being a low resource language has very limited number of training data. The rule based approach requires linguistic rules and the results are not depended on the size of data available. We have framed a heuristic rules for identifying the named entity based on linguistic knowledge of the language. An encouraging result is obtained after we test our data with the rule based approach. We also tried to study and frame the rules for the counting system in Kokborok in this paper. The rule based approach to named entity recognition is found suitable for low resource language with limited digital work and absence of named entity tagged data. We have framed a suitable algorithm using the rules for solving the named entity recognition task for obtaining a desirable result.


Author(s):  
K. Nafees Ahmed ◽  
T. Abdul Razak

<p>Information extraction from data is one of the key necessities for data analysis. Unsupervised nature of data leads to complex computational methods for analysis. This paper presents a density based spatial clustering technique integrated with one-class Support Vector Machine (SVM), a machine learning technique for noise reduction, a modified variant of DBSCAN called Noise Reduced DBSCAN (NRDBSCAN). Analysis of DBSCAN exhibits its major requirement of accurate thresholds, absence of which yields suboptimal results. However, identifying accurate threshold settings is unattainable. Noise is one of the major side-effects of the threshold gap. The proposed work reduces noise by integrating a machine learning classifier into the operation structure of DBSCAN. The Experimental results indicate high homogeneity levels in the clustering process.</p>


Literator ◽  
2008 ◽  
Vol 29 (1) ◽  
pp. 93-110
Author(s):  
S. Pilon

The development of an inflected form generator for Afrikaans In this article the development of an inflected form generator for Afrikaans is described. Two requirements are set for this inflected form generator, viz. to generate only one specific inflected form of a lemma and to generate all possible inflected forms of a lemma. The decision to use machine learning instead of the more traditional rule-based approach in the development of this core-technology is explained and a brief overview of the development of LIA, a lemmatiser for Afrikaans, is given. Experiments are done with three different methods and it is shown that the most effective way of developing an inflected form generator for Afrikaans is by training different classifiers for each affix. Therefore a classifier is trained to generate a plural form, one to generate the diminutive, one to generate the plural of diminutive, et cetera. The final inflected form generator for Afrikaans (AIL-3) reaches an average accuracy of 86,37% on the training data and 86,88% on a small amount of new data. It is indicated that, with the help of a preprocessing module, AIL-3 meets the requirements that were set for an Afrikaans inflected form generator. Finally suggestions are made on how to improve the accuracy of AIL-3.


2018 ◽  
Vol 7 (2.22) ◽  
pp. 28
Author(s):  
K. Nafees Ahmed ◽  
T. Abdul Razak

Information extraction from data is one of the key necessities for data analysis. Unsupervised nature of data leads to complex computational methods for analysis. This paper presents a density based spatial clustering technique integrated with one-class SVM, a machine learning technique for noise reduction, a modified variant of DBSCAN called NRDBSCAN. Analysis of DBSCAN exhibits its major requirement of accurate thresholds, absence of which yields suboptimal results. However, identifying accurate threshold settings is unattainable. Noise is one of the major side-effects of the threshold gap. The proposed work reduces noise by integrating a machine learning classifier into the operation structure of DBSCAN. Further, the proposed technique is parallelized using Spark architecture, thereby increasing its scalability and its ability to handle large amounts of data. Experiments and comparisons with similar techniques indicate high scalability levels and high homogeneity levels in the clustering process.


Author(s):  
Ryan Jackson ◽  
Michael Jump ◽  
Peter Green

Physical-law based models are widely utilized in the aerospace industry. One such use is to provide flight dynamics models for use in flight simulators. For human-in-the-loop use, such simulators must run in real-time. Due to the complex physics of rotorcraft flight, to meet this real-time requirement, simplifications to the underlying physics sometimes have to be applied to the model, leading to model response errors in the predictions compared to the real vehicle. This study investigated whether a machine-learning technique could be employed to provide rotorcraft dynamic response predictions, with the ultimate aim of this model taking over when the physics-based model's accuracy degrades. In the current work, a machine-learning technique was employed to train a model to predict the dynamic response of a rotorcraft. Machine learning was facilitated using a Gaussian Process (GP) non-linear autoregressive model, which predicted the on-axis pitch rate, roll rate, yaw rate and heave responses of a Bo105 rotorcraft. A variational sparse GP model was then developed to reduce the computational cost of implementing the approach on large data sets. It was found that both of the GP models were able to provide accurate on-axis response predictions, particularly when the input contained all four control inceptors and one lagged on-axis response term. The predictions made showed improvement compared to a corresponding physics-based model. The reduction of training data to one-third (rotational axes) or one-half (heave axis) resulted in only minor degradation of the GP model predictions.


Author(s):  
Padmavathi .S ◽  
M. Chidambaram

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.


Atmosphere ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 111 ◽  
Author(s):  
Chul-Min Ko ◽  
Yeong Yun Jeong ◽  
Young-Mi Lee ◽  
Byung-Sik Kim

This study aimed to enhance the accuracy of extreme rainfall forecast, using a machine learning technique for forecasting hydrological impact. In this study, machine learning with XGBoost technique was applied for correcting the quantitative precipitation forecast (QPF) provided by the Korea Meteorological Administration (KMA) to develop a hydrological quantitative precipitation forecast (HQPF) for flood inundation modeling. The performance of machine learning techniques for HQPF production was evaluated with a focus on two cases: one for heavy rainfall events in Seoul and the other for heavy rainfall accompanied by Typhoon Kong-rey (1825). This study calculated the well-known statistical metrics to compare the error derived from QPF-based rainfall and HQPF-based rainfall against the observational data from the four sites. For the heavy rainfall case in Seoul, the mean absolute errors (MAE) of the four sites, i.e., Nowon, Jungnang, Dobong, and Gangnam, were 18.6 mm/3 h, 19.4 mm/3 h, 48.7 mm/3 h, and 19.1 mm/3 h for QPF and 13.6 mm/3 h, 14.2 mm/3 h, 33.3 mm/3 h, and 12.0 mm/3 h for HQPF, respectively. These results clearly indicate that the machine learning technique is able to improve the forecasting performance for localized rainfall. In addition, the HQPF-based rainfall shows better performance in capturing the peak rainfall amount and spatial pattern. Therefore, it is considered that the HQPF can be helpful to improve the accuracy of intense rainfall forecast, which is subsequently beneficial for forecasting floods and their hydrological impacts.


Author(s):  
Fahad Taha AL-Dhief ◽  
Nurul Mu'azzah Abdul Latiff ◽  
Nik Noordini Nik Abd. Malik ◽  
Naseer Sabri ◽  
Marina Mat Baki ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document