scholarly journals Influence of GUJarati STEmmeR in Supervised Learning of Web Page Categorization

2021 ◽  
Vol 13 (3) ◽  
pp. 23-34
Author(s):  
Chandrakant D. Patel ◽  
◽  
Jayesh M. Patel

With the large quantity of information offered on-line, it's equally essential to retrieve correct information for a user query. A large amount of data is available in digital form in multiple languages. The various approaches want to increase the effectiveness of on-line information retrieval but the standard approach tries to retrieve information for a user query is to go looking at the documents within the corpus as a word by word for the given query. This approach is incredibly time intensive and it's going to miss several connected documents that are equally important. So, to avoid these issues, stemming has been extensively utilized in numerous Information Retrieval Systems (IRS) to extend the retrieval accuracy of all languages. These papers go through the problem of stemming with Web Page Categorization on Gujarati language which basically derived the stem words using GUJSTER algorithms [1]. The GUJSTER algorithm is based on morphological rules which is used to derived root or stem word from inflected words of the same class. In particular, we consider the influence of extracted a stem or root word, to check the integrity of the web page classification using supervised machine learning algorithms. This research work is intended to focus on the analysis of Web Page Categorization (WPC) of Gujarati language and concentrate on a research problem to do verify the influence of a stemming algorithm in a WPC application for the Gujarati language with improved accuracy between from 63% to 98% through Machine Learning supervised models with standard ratio 80% as training and 20% as testing.

Author(s):  
Inssaf El Guabassi ◽  
Zakaria Bousalem ◽  
Rim Marah ◽  
Aimad Qazdar

In recent years, the world's population is increasingly demanding to predict the future with certainty, predicting the right information in any area is becoming a necessity. One of the ways to predict the future with certainty is to determine the possible future. In this sense, machine learning is a way to analyze huge datasets to make strong predictions or decisions. The main objective of this research work is to build a predictive model for evaluating students’ performance. Hence, the contributions are threefold. The first is to apply several supervised machine learning algorithms (i.e. ANCOVA, Logistic Regression, Support Vector Regression, Log-linear Regression, Decision Tree Regression, Random Forest Regression, and Partial Least Squares Regression) on our education dataset. The second purpose is to compare and evaluate algorithms used to create a predictive model based on various evaluation metrics. The last purpose is to determine the most important factors that influence the success or failure of the students. The experimental results showed that the Log-linear Regression provides a better prediction as well as the behavioral factors that influence students’ performance.


Author(s):  
P. Singh ◽  
V. Maurya ◽  
R. Dwivedi

Abstract. Landslide is one of the most common natural disasters triggered mainly due to heavy rainfall, cloud burst, earthquake, volcanic eruptions, unorganized constructions of roads, and deforestation. In India, field surveying is the most common method used to identify potential landslide regions and update the landslide inventories maintained by the Geological Survey of India, but it is very time-consuming, costly, and inefficient. Alternatively, advanced remote sensing technologies in landslide analysis allow rapid and easy data acquisitions and help to improve the traditional method of landslide detection capabilities. Supervised Machine learning algorithms, for example, Support Vector Machine (SVM), are challenging to conventional techniques by predicting disasters with astounding accuracy. In this research work, we have utilized open-source datasets (Landsat 8 multi-band images and JAXA ALOS DSM) and Google Earth Engine (GEE) to identify landslides in Rudraprayag using machine learning techniques. Rudraprayag is a district of Uttarakhand state in India, which has always been the center of attention of geological studies due to its higher density of landslide-prone zones. For the training and validation purpose, labeled landslide locations obtained from landslide inventory (prepared by the Geological Survey of India) and layers such as NDVI, NDWI, and slope (generated from JAXA ALOS DSM and Landsat 8 satellite multi-band imagery) were used. The landslide identification has been performed using SVM, Classification and Regression Trees (CART), Minimum Distance, Random forest (RF), and Naïve Bayes techniques, in which SVM and RF outperformed all other techniques by achieving an 87.5% true positive rate (TPR).


2020 ◽  
Vol 14 (2) ◽  
pp. 140-159
Author(s):  
Anthony-Paul Cooper ◽  
Emmanuel Awuni Kolog ◽  
Erkki Sutinen

This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.


2021 ◽  
Vol 1916 (1) ◽  
pp. 012042
Author(s):  
Ranjani Dhanapal ◽  
A AjanRaj ◽  
S Balavinayagapragathish ◽  
J Balaji

Diagnostics ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 642
Author(s):  
Yi-Da Wu ◽  
Ruey-Kai Sheu ◽  
Chih-Wei Chung ◽  
Yen-Ching Wu ◽  
Chiao-Chi Ou ◽  
...  

Background: Antinuclear antibody pattern recognition is vital for autoimmune disease diagnosis but labor-intensive for manual interpretation. To develop an automated pattern recognition system, we established machine learning models based on the International Consensus on Antinuclear Antibody Patterns (ICAP) at a competent level, mixed patterns recognition, and evaluated their consistency with human reading. Methods: 51,694 human epithelial cells (HEp-2) cell images with patterns assigned by experienced medical technologists collected in a medical center were used to train six machine learning algorithms and were compared by their performance. Next, we choose the best performing model to test the consistency with five experienced readers and two beginners. Results: The mean F1 score in each classification of the best performing model was 0.86 evaluated by Testing Data 1. For the inter-observer agreement test on Testing Data 2, the average agreement was 0.849 (?) among five experienced readers, 0.844 between the best performing model and experienced readers, 0.528 between experienced readers and beginners. The results indicate that the proposed model outperformed beginners and achieved an excellent agreement with experienced readers. Conclusions: This study demonstrated that the developed model could reach an excellent agreement with experienced human readers using machine learning methods.


Author(s):  
David Blondheim

AbstractMachine learning (ML) is unlocking patterns and insight into data to provide financial value and knowledge for organizations. Use of machine learning in manufacturing environments is increasing, yet sometimes these applications fail to produce meaningful results. A critical review of how defects are classified is needed to appropriately apply machine learning in a production foundry and other manufacturing processes. Four elements associated with defect classification are proposed: Binary Acceptance Specifications, Stochastic Formation of Defects, Secondary Process Variation, and Visual Defect Inspection. These four elements create data space overlap, which influences the bias associated with training supervised machine learning algorithms. If this influence is significant enough, the predicted error of the model exceeds a critical error threshold (CET). There is no financial motivation to implement the ML model in the manufacturing environment if its error is greater than the CET. The goal is to bring awareness to these four elements, define the critical error threshold, and offer guidance and future study recommendations on data collection and machine learning that will increase the success of ML within manufacturing.


Sign in / Sign up

Export Citation Format

Share Document