YorAA: An Authorship Attribution of Yorùbá Texts

2021 ◽  
Vol 28 (1) ◽  
Author(s):  
Abayomi O. Agbeyangi ◽  
Safiriyu I. Eludiora ◽  
Felix A. Fabunmi

The process of establishing the most likely author of a collection of texts or documents whose authorship must be verified is known as authorship attribution. Several studies have been reported in the literature on the task, but rarely any reported work on Yorùbá language texts. In this paper, the development of an automatic Yorùbá written texts authorship attribution system (YorAA) is reported. The literary works of six Yorùbá authors were considered. Stylometry features were extracted from the texts using the BoW approach and lexical/syntactic word frequencies approach. The Support Vector Machine, Multilayer Perceptron and Random Forest algorithms were used for the classification analysis. The experimental results showed that the developed YorAA system achieved accuracy, recall, precision and F1 measures values of 95%, 83%, 84% and 84% respectively on the average, for all the six authors. The results demonstrate that with a database of written texts in Yorùbá language, that is enough to extract relevant stylometry ´ features of the author and appropriate methods and tools applied to such features; the authorship of the texts can be identified or verified.

2020 ◽  
Author(s):  
Laércio Mesquita ◽  
Antônio De C. Filho ◽  
Alcilene De Sousa ◽  
Patrícia Drumond

Os sistemas CADx têm ganhado cada vez mais atenção devido sua importância na área médica, tornando o diagnóstico por parte dos especialistas mais preciso. Para criação dessas ferramentas são utilizados métodos computacionais, processamento digital de imagens e conhecimentos sobre a doença. Neste trabalho utilizam-se índices de diversidade filogenética para extração de características baseada na textura. Tais índices são utilizados como características para os classificadores: Support Vector Machine, Random Forest, Random Basis Function e MultiLayer Perceptron identificando em tecidos de mamografias a presença de massa e não massa, perfazendo assim parte de um Computer-Aided Diagnosis. Para validação da metodologia, foram utilizadas 200 imagens de mamografia, onde 100 contêm massa e as demais não massa. Os resultados mostram-se promissores, pois os melhores resultados alcançam uma acurácia de 91,5%, sensibilidade de 89,5%, especificidade de 94% e uma taxa de falsos positivos de 0,085 por exame.


2021 ◽  
Author(s):  
Ronieri Nogueira de Sousa ◽  
Roney Nogueira de Sousa ◽  
Rhyan Ximenes de Brito ◽  
Janaide Nogueira de Sousa Ximenes

A dislexia é uma das dificuldades de aprendizagem mais comum nas salas de aula. Dessa forma o estudo teve como finalidade a classificação de crianças com ou sem dislexia através da aplicação de técnicas de Inteligência Computacional (IC). Para a metodologia utilizou-se de uma base de dados pública e da aplicação das arquiteturas neurais, Multilayer Perceptron (MLP), Radial Basis Function (RBF) e Extreme Learning Machine (ELM) e dos classificadores estatísticos, Support Vector Machine (SVM), Random Forest (RF) e K-Nearest Neighbors (K-NN), assim como das técnicas k-fold, SMOTE e normalização z-score. Os resultados demonstraram que o classificador SVM obteve a melhor taxa média de acerto com 98,03% de acurácia.


2020 ◽  
Vol 5 (2) ◽  
pp. 504
Author(s):  
Matthias Omotayo Oladele ◽  
Temilola Morufat Adepoju ◽  
Olaide ` Abiodun Olatoke ◽  
Oluwaseun Adewale Ojo

Yorùbá language is one of the three main languages that is been spoken in Nigeria. It is a tonal language that carries an accent on the vowel alphabets. There are twenty-five (25) alphabets in Yorùbá language with one of the alphabets a digraph (GB). Due to the difficulty in typing handwritten Yorùbá documents, there is a need to develop a handwritten recognition system that can convert the handwritten texts to digital format. This study discusses the offline Yorùbá handwritten word recognition system (OYHWR) that recognizes Yorùbá uppercase alphabets. Handwritten characters and words were obtained from different writers using the paint application and M708 graphics tablets. The characters were used for training and the words were used for testing. Pre-processing was done on the images and the geometric features of the images were extracted using zoning and gradient-based feature extraction. Geometric features are the different line types that form a particular character such as the vertical, horizontal, and diagonal lines. The geometric features used are the number of horizontal lines, number of vertical lines, number of right diagonal lines, number of left diagonal lines, total length of all horizontal lines, total length of all vertical lines, total length of all right slanting lines, total length of all left-slanting lines and the area of the skeleton. The characters are divided into 9 zones and gradient feature extraction was used to extract the horizontal and vertical components and geometric features in each zone. The words were fed into the support vector machine classifier and the performance was evaluated based on recognition accuracy. Support vector machine is a two-class classifier, hence a multiclass SVM classifier least square support vector machine (LSSVM) was used for word recognition. The one vs one strategy and RBF kernel were used and the recognition accuracy obtained from the tested words ranges between 66.7%, 83.3%, 85.7%, 87.5%, and 100%. The low recognition rate for some of the words could be as a result of the similarity in the extracted features.


2020 ◽  
Vol 27 (4) ◽  
pp. 329-336 ◽  
Author(s):  
Lei Xu ◽  
Guangmin Liang ◽  
Baowen Chen ◽  
Xu Tan ◽  
Huaikun Xiang ◽  
...  

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.


Author(s):  
Shikhar P. Acharya ◽  
Ivan G. Guardiola

Radio Frequency (RF) devices produce some amount of Unintended Electromagnetic Emissions (UEEs). UEEs are generally unique to a device and can be used as a signature for the purpose of detection and identification. The problem with UEEs is that they are very low in power and are often buried deep inside the noise band. The research herein provides the application of Support Vector Machine (SVM) for detection and identification of RF devices using their UEEs. Experimental Results shows that SVM can detect RF devices within the noise band, and can also identify RF devices using their UEEs.


2013 ◽  
Vol 721 ◽  
pp. 367-371
Author(s):  
Yong Kui Sun ◽  
Zhi Bin Yu

Analog circuits fault diagnosis using multifractal analysis is presented in this paper. The faulty response of circuit under test is analyzed by multifratal formalism, and the fault feature consists of multifractal spectrum parameters. Support vector machine is used to identify the faults. Experimental results prove the proposed method is effective and the diagnosis accuracy reaches 98%.


Sign in / Sign up

Export Citation Format

Share Document