Boosting Stemmer Performance Using Cache Method

Author(s):  
Muhammad Fadly Tanjung

Stemming is the process of returning the word to the base word by disappearing the append. This is important to support better information re-meeting. Some research in stemming algorithms includes nazief &adriani algorithms, confix stripping, enhanced confix stripping, arifin and porter algorithms. The stemming algorithm for Bahasa Indonesia is divided into two, namely those that use a dictionary and do not use a dictionary. Some studies have shown that stemmers that use dictionary have high accuracy but low process speed, while stemmers that do not use dictionary have low accuracy but higher process speed. In this study, two methods were used: the stemmer method using cache and stemmer without cache to see the comparison of process speed from stemmers that use dictionary. The test data for this study is text data obtained from the corpus site. Process analysis is completed by calculating each speed, memory usage and cpu of each method, then each method is compared. Results from tests from test data showed that the cache method improved stemmer performance.

Author(s):  
Arum Ratnaningsih ◽  
Titi Anjarini

<p><em>The background of this study is the lack of confidence in the use of Indonesian language so that passive students in lecturing activities. The purpose of this study is for humanist learning, active, and fun. Research method of R &amp; D with learning model of Project Based Learning (PBL) and Inquiry approach. Subjects in this study were PGSD students of University of Muhammadiyah Purworejo totaling 130 students divided into 2 classes as experimental class, 2 classes as control class, and 1 class of scale test. Data analysis used in this research is source triangulation, triangulation method, instrument triangulation, and SPSS. The result of this research is the improvement of Indonesian language skills of PGSD students. The experimental class has a higher score than the control class, so it can be concluded that the PBL learning model can significantly improve the PGSD students' skill in Bahasa Indonesia University of Muhammadiyah Purworejo.</em></p>


2019 ◽  
Vol 8 (5) ◽  
pp. 668 ◽  
Author(s):  
Yang Cao ◽  
Xin Fang ◽  
Johan Ottosson ◽  
Erik Näslund ◽  
Erik Stenberg

Background: Severe obesity is a global public health threat of growing proportions. Accurate models to predict severe postoperative complications could be of value in the preoperative assessment of potential candidates for bariatric surgery. So far, traditional statistical methods have failed to produce high accuracy. We aimed to find a useful machine learning (ML) algorithm to predict the risk for severe complication after bariatric surgery. Methods: We trained and compared 29 supervised ML algorithms using information from 37,811 patients that operated with a bariatric surgical procedure between 2010 and 2014 in Sweden. The algorithms were then tested on 6250 patients operated in 2015. We performed the synthetic minority oversampling technique tackling the issue that only 3% of patients experienced severe complications. Results: Most of the ML algorithms showed high accuracy (>90%) and specificity (>90%) in both the training and test data. However, none of the algorithms achieved an acceptable sensitivity in the test data. We also tried to tune the hyperparameters of the algorithms to maximize sensitivity, but did not yet identify one with a high enough sensitivity that can be used in clinical praxis in bariatric surgery. However, a minor, but perceptible, improvement in deep neural network (NN) ML was found. Conclusion: In predicting the severe postoperative complication among the bariatric surgery patients, ensemble algorithms outperform base algorithms. When compared to other ML algorithms, deep NN has the potential to improve the accuracy and it deserves further investigation. The oversampling technique should be considered in the context of imbalanced data where the number of the interested outcome is relatively small.


2011 ◽  
Vol 367 ◽  
pp. 133-141
Author(s):  
P.B. Osofisan ◽  
J.O. Ilevbare

The main objective of this research work was to use Artificial Neural Network (ANN) based method for solving Power Flow Problem for a power system in Nigeria. This was achieved using the Backpropagation (multilayered feed-forward) Neural Network model. Two Backpropagation neural networks were designed and trained; one for computing voltage magnitudes on all buses and the other for computing voltage phase angles on all PV and PQ buses for different load and generation conditions for a 7-bus 132 kV power system in South-West Nigeria (Ayede). Due to unavailability of historical field records, data representing different scenarios of loading and/or generation conditions had to be generated using Newton-Raphson non-linear iterative method. A total of 250 scenarios were generated out of which 50% were used to train the ANNs, 25% were used for validation and the remaining 25% were used as test data for the ANNs. The test data results showed very high accuracy for the ANN used for computing voltage magnitudes for all test data with a Mean Square Error (MSE) of less than 10-6. Also, the ANN used for computing voltage phase angles showed very high accuracy in about 80% of the test data and acceptable results in about 97% of the test data. The MSE for all the test data results for the ANN computing voltage phase angles was less than 10-2.


2020 ◽  
Author(s):  
Rianto Rianto ◽  
Achmad Benny Mutiara ◽  
Eri Prasetyo Wibowo ◽  
Paulus Insap Santosa

Abstract Stemming has long been used in data pre-processing in information retrieval, which aims to make affix words into root words. However, there are not many stemming methods for non-formal Indonesian text processing. The existing stemming method has high accuracy for formal Indonesian, but low for non-formal Indonesian. Thus, the stemming method which has high accuracy for non-formal Indonesian classifier model is still an open-ended challenge. This study introduces a new stemming method to solve problems in the non-formal Indonesian text data pre-processing. Furthermore, this study aims to provide comprehensive research on improving the accuracy of text classifier models by strengthening on stemming method. Using the Support Vector Machine algorithm, a text classifier model is developed, and its accuracy is checked. The experimental evaluation was done by testing 550 datasets in Indonesian using two different stemming methods. The results show that using the proposed stemming method, the text classifier model has higher accuracy than the existing methods with a score of 0.85 and 0.73, respectively. In the future, the proposed stemming method can be used to develop the Indonesian text classifier model which can be used for various purposes including text clustering, summarization, detecting hate speech, and other text processing applications.


2019 ◽  
Vol 6 (1) ◽  
pp. 38-42
Author(s):  
Erick Fernando ◽  
Surjandy Surjandy ◽  
Muhamad Irsan ◽  
Hetty Rohayani A H ◽  
Fachruddin Fachruddin

In this article, it aims to present the AES encryption on the Raspberry Pi mini pc. this application also aims to illustrate that this AES algorithm can be applied with small resources. This research was conducted with an experimental approach, which carried out the implementation process in mini pc hardware and xampp software (php, apache). This AES algorithm is tested by PHP programming with Apache web server with text data. The results of the study, that the AES algorithm can run well with a hard minimum, like raspberry mini pc with a very fast time in the process, speed in the process and a lot of text data. So, AES algorithm can be widely adopted for various applications from raspberry PI mini pc computers with strong practicality in information security and reliability.


2020 ◽  
Author(s):  
Kadi L. Saar ◽  
Alexey S. Morgunov ◽  
Runzhang Qi ◽  
William E. Arter ◽  
Georg Krainer ◽  
...  

AbstractIntracellular phase separation of proteins into biomolecular condensates is increasingly recognised as an important phenomenon for cellular compartmentalisation and regulation of biological function. Different hypotheses about the parameters that determine the tendency of proteins to form condensates have been proposed with some of them probed experimentally through the use of constructs generated by sequence alterations. To broaden the scope of these observations, here, we established an in silico strategy for understanding on a global level the associations between protein sequence and condensate formation, and used this information to construct machine learning classifiers for predicting liquid–liquid phase separation (LLPS) from protein sequence. Our analysis highlighted that LLPS–prone sequences are more disordered, hydrophobic and of lower Shannon entropy than sequences in the Protein Data Bank or the Swiss-Prot database, and have their disordered regions enriched in polar, aromatic and charged residues. Using these determining features together with neural network based word2vec sequence embeddings, we developed machine learning classifiers for predicting protein condensate formation. Our model, trained to distinguish LLPS-prone sequences from structured proteins, achieved high accuracy (93%; 25-fold cross-validation) and identified condensate forming sequences from external independent test data at 97% sensitivity. Moreover, in combination with a classifier that had developed a nuanced insight into the features governing protein phase behaviour by learning to distinguish between sequences of varying LLPS propensity, the sensitivity was supplemented with high specificity (approximated ROC–AUC of 0.85). These results provide a platform rooted in molecular principles for understanding protein phase behaviour. The predictor is accessible from https://deephase.ch.cam.ac.uk/.Significance StatementThe tendency of many cellular proteins to form protein-rich biomolecular condensates underlies the formation of subcellular compartments and has been linked to various physiological functions. Understanding the molecular basis of this fundamental process and predicting protein phase behaviour have therefore become important objectives. To develop a global understanding of how protein sequence determines its phase behaviour, here, we constructed bespoke datasets of proteins of varying phase separation propensity and identified explicit biophysical and sequence-specific features common to phase separating proteins. Moreover, by combining this insight with neural network based sequence embeddings, we trained machine learning classifiers that identified phase separating sequences with high accuracy, including from independent external test data. The predictor is available from https://deephase.ch.cam.ac.uk/.


2021 ◽  
Author(s):  
Rianto Rianto ◽  
Achmad Benny Mutiara ◽  
Eri Prasetyo Wibowo ◽  
Paulus Insap Santosa

Abstract Background: Stemming has long been used in data pre-processing to retrieve information by tracking affixed words back into their root. In an Indonesian setting, existing stemming methods have been observed, and the existing stemming methods are proven to result in high accuracy level. However, there are not many stemming methods for non-formal Indonesian text processing. This study introduces a new stemming method to solve problems in the non-formal Indonesian text data pre-processing. Furthermore, this study aims to improve the accuracy of text classifier models by strengthening stemming method. Using the Support Vector Machine algorithm, a text classifier model is developed, and its accuracy is checked. The experimental evaluation was done by testing 550 datasets in Indonesian using two different stemming methods. Findings: The results show that using the proposed stemming method, the text classifier model has higher accuracy than the existing methods with a score of 0.85 and 0.73, respectively. These results indicate that the proposed stemming methods produces a classifier model with a small error rate, so it will be more accurate to predict a class of objects. Conclusion: The existing Indonesian stemming methods are still oriented towards Indonesian formal sentences, therefore the method has limitations to be used in Indonesian non-formal sentences. This phenomenon underlies the suggestion of developing a corpus by normalizing Indonesian non-formal into formal to be used as a better stemming method. The impact of using the corpus as a stemming method is that it can improve the accuracy of the classifier model. In the future, the proposed corpus and stemming methods can be used for various purposes including text clustering, summarizing, detecting hate speech, and other text processing applications in Indonesian.


2016 ◽  
Vol 8 (2) ◽  
pp. 88
Author(s):  
Muhammad Ridha Damanik ◽  
Deny Setiawan

Penelitian pengembangan ini bertujuan untuk menghasilkan instrumen penilaian autentik berbasis karakter pada ranah keterampilan di Fakultas Ilmu Sosial Unimed. Subjek penelitian ini yaitu: (1) Lima orang pakar untuk validasi produk yang memiliki kriteria sebagai ahli: (a) PIPS, (b) Pendidikan Karakter, (c) instrumen Penilaian Autentik, (d) Bahasa Indonesia, dan (e) Psikologi; (2) Dosen FIS Unimed; dan (3) Mahasiswa FIS Unimed. Metode yang digunakan dalam penelitian ini adalah metode riset pengembangan tipe formative research, yang dalam pengembangannya difokuskan pada 2 tahap yaitu tahap preliminary dan tahap formative evaluation yang meliputi self evaluation, prototyping (expert reviews, one-to-one, dan small group), serta field test. Data dikumpulkan melalui angket dan uji coba lapangan. Hasil validasi ahli menunjukkan bahwa instrumen penilaian autentik berbasis karakter pada ranah keterampilan sudah valid dan berkategori sangat baik. Hasil uji coba lapangan (skala kecil dan skala besar) menunjukkan bahwa instrumen penilaian autentik berbasis karakter pada ranah keterampilan yang dikembangkan mempunyai validitas dan efektivitas yang sangat baik. Dengan demikian, berdasarkan hasil validasi ahli dan uji coba lapangan dapat disimpulkan bahwa instrumen penilaian autentik berbasis karakter pada ranah keterampilan sudah valid, efektif, dan berkategori sangat baik sehingga instrumen dapat digunakan untuk mengukur ketercapaian nilai-nilai karakter mahasiswa khususnya pada ranah keterampilan.


2016 ◽  
Author(s):  
Martin Šošić ◽  
Mile Šikić

AbstractWe present Edlib, an open-source C/C++ library for exact pairwise sequence alignment using edit distance. We compare Edlib to other libraries and show that it is the fastest while not lacking in functionality, and can also easily handle very large sequences. Being easy to use, flexible, fast and low on memory usage, we expect it to be a cornerstone for many future bioinformatics tools.Source code, installation instructions and test data are freely available for download at https://github.com/Martinsos/edlib, implemented in C/C++ and supported on Linux, MS Windows, and Mac OS.Contact:[email protected]


2018 ◽  
Author(s):  
Yang Cao ◽  
Xin Fang ◽  
Johan Ottosson ◽  
Erik Näslund ◽  
Erik Stenberg

AbstractAccurate models to predict severe postoperative complications could be of value in the preoperative assessment of potential candidates for bariatric surgery. Traditional statistical methods have so far failed to produce high accuracy. To find a useful algorithm to predict the risk for severe complication after bariatric surgery, we trained and compared 29 supervised machine learning (ML) algorithms using information from 37,811 patients operated with a bariatric surgical procedure between 2010 and 2014 in Sweden. The algorithms were then tested on 6,250 patients operated in 2015. Most ML algorithms showed high accuracy (>90%) and specificity (>0.9) in both the training and test data. However, none achieved an acceptable sensitivity in the test data. ML methods may improve accuracy of prediction but we did not yet identify one with a high enough sensitivity that can be used in clinical praxis in bariatric surgery. Further investigation on deeper neural network algorithms is needed.


Sign in / Sign up

Export Citation Format

Share Document