A Semantic Scattering model for the automatic interpretation of English genitives

2009 ◽  
Vol 15 (2) ◽  
pp. 215-239 ◽  
Author(s):  
ADRIANA BADULESCU ◽  
DAN MOLDOVAN

AbstractAn important problem in knowledge discovery from text is the automatic extraction of semantic relations. This paper addresses the automatic classification of thesemantic relationsexpressed by English genitives. A learning model is introduced based on the statistical analysis of the distribution of genitives' semantic relations in a corpus. The semantic and contextual features of the genitive's noun phrase constituents play a key role in the identification of the semantic relation. The algorithm was trained and tested on a corpus of approximately 20,000 sentences and achieved an f-measure of 79.80 per cent for of-genitives, far better than the 40.60 per cent obtained using a Decision Trees algorithm, the 50.55 per cent obtained using a Naive Bayes algorithm, or the 72.13 per cent obtained using a Support Vector Machines algorithm on the same corpus using the same features. The results were similar for s-genitives: 78.45 per cent using Semantic Scattering, 47.00 per cent using Decision Trees, 43.70 per cent using Naive Bayes, and 70.32 per cent using a Support Vector Machines algorithm. The results demonstrate the importance of word sense disambiguation and semantic generalization/specialization for this task. They also demonstrate that different patterns (in our case the two types of genitive constructions) encode different semantic information and should be treated differently in the sense that different models should be built for different patterns.

2020 ◽  
Vol 26 (1) ◽  
pp. 157-172
Author(s):  
Dirk Krüger ◽  
Moritz Krell

ZusammenfassungVerfahren des maschinellen Lernens können dazu beitragen, Aussagen in Aufgaben im offenen Format in großen Stichproben zu analysieren. Am Beispiel von Aussagen von Biologielehrkräften, Biologie-Lehramtsstudierenden und Fachdidaktiker*innen zu den fünf Teilkompetenzen von Modellkompetenz (NTraining = 456; NKlassifikation = 260) wird die Qualität maschinellen Lernens mit vier Algorithmen (naïve Bayes, logistic regression, support vector machines und decision trees) untersucht. Evidenz für die Validität der Interpretation der Kodierungen einzelner Algorithmen liegt mit zufriedenstellender bis guter Übereinstimmung zwischen menschlicher und computerbasierter Kodierung beim Training (345–607 Aussagen je nach Teilkompetenz) vor, bei der Klassifikation (157–260 Aussagen je nach Teilkompetenz) reduziert sich dies auf eine moderate Übereinstimmung. Positive Korrelationen zwischen dem kodierten Niveau und dem externen Kriterium Antwortlänge weisen darauf hin, dass die Kodierung mit naïve Bayes keine gültigen Ergebnisse liefert. Bedeutsame Attribute, die die Algorithmen bei der Klassifikation nutzen, entsprechen relevanten Begriffen der Niveaufestlegungen im zugrunde liegenden Kodierleitfaden. Abschließend wird diskutiert, inwieweit maschinelles Lernen mit den eingesetzten Algorithmen bei Aussagen zur Modellkompetenz die Qualität einer menschlichen Kodierung erreicht und damit für Zweitkodierungen oder in Vermittlungssituationen genutzt werden könnte.


2013 ◽  
pp. 1306-1316
Author(s):  
Wei Xiong ◽  
Min Song ◽  
Lori deVersterre

Word sense disambiguation is the problem of selecting a sense for a word from a set of predefined possibilities. This is a significant problem in the biomedical domain where a single word may be used to describe a gene, protein, or abbreviation. In this paper, we evaluate SENSATIONAL, a novel unsupervised WSD technique, in comparison with two popular learning algorithms: support vector machines (SVM) and K-means. Based on the accuracy measure, our results show that SENSATIONAL outperforms SVM and K-means by 2% and 17%, respectively. In addition, we develop a polysemy-based search engine and an experimental visualization application that utilizes SENSATIONAL’s clustering technique.


Information ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 383
Author(s):  
Francis Effirim Botchey ◽  
Zhen Qin ◽  
Kwesi Hughes-Lartey

The onset of COVID-19 has re-emphasized the importance of FinTech especially in developing countries as the major powers of the world are already enjoying the advantages that come with the adoption of FinTech. Handling of physical cash has been established as a means of transmitting the novel corona virus. Again, research has established that, been unbanked raises the potential of sinking one into abject poverty. Over the years, developing countries have been piloting the various forms of FinTech, but the very one that has come to stay is the Mobile Money Transactions (MMT). As mobile money transactions attempt to gain a foothold, it faces several problems, the most important of them is mobile money fraud. This paper seeks to provide a solution to this problem by looking at machine learning algorithms based on support vector machines (kernel-based), gradient boosted decision tree (tree-based) and Naïve Bayes (probabilistic based) algorithms, taking into consideration the imbalanced nature of the dataset. Our experiments showed that the use of gradient boosted decision tree holds a great potential in combating the problem of mobile money fraud as it was able to produce near perfect results.


2020 ◽  
Vol 10 (2) ◽  
Author(s):  
Mahmood Umar ◽  
Nor Bahiah Ahmad ◽  
Anazida Zainal

This study investigates the performance of machine learning algorithms for sentiment analysis of students’ opinions on programming assessment. Previous researches show that Support Vector Machines (SVM) performs the best among all techniques, followed by Naïve Bayes (NB) in sentiment analysis. This study proposes a framework for classifying sentiments, as positive or negative using NB algorithm and Lexicon-based approach on small data set. The performance of NB algorithm was evaluated using SVM. NB and SVM conquer the Lexicon-based approach opinion lexicon technique in terms of accuracy in the specific area for which it is trained. The Lexicon-based technique, on the other hand, avoids difficult steps needed to train the classifier. Data was analyzed from 75 first year undergraduate students in School of Computing, Universiti Teknologi Malaysia taking programming subject. The student’s sentiments were gathered based on their opinions for the zero-score policy for unsuccessful compilation of program during skill-based test. The result of the study reveals that the students tend to have negative sentiments on programming assessment as it gives them scary emotions. The experimental result of applying NB algorithm yields a prediction accuracy of 85% which outperform both the SVM with 70% and Lexicon-based approach with 60% accuracy. The result shows that NB works better than SVM and Lexicon-based approach on small dataset. 


Machine learning is one of the fast growing aspect in current world. Machine learning (ML) and Artificial Neural Network (ANN) are helpful in detection and diagnosis of various heart diseases. Naïve Bayes Classification is a vital approach of classification in machine learning. The heart disease consists of set of range disorders affecting the heart. It includes blood vessel problems such as irregular heart beat issues, weak heart muscles, congenital heart defects, cardio vascular disease and coronary artery disease. Coronary heart disorder is a familiar type of heart disease. It reduces the blood flow to the heart leading to a heart attack. In this paper the UCI machine learning repository data set consisting of patients suffering from heart disease is analyzed using Naïve Bayes classification and support vector machines. The classification accuracy of the patients suffering from heart disease is predicted using Naïve Bayes classification and support vector machines. Implementation is done using R language.


Sign in / Sign up

Export Citation Format

Share Document