Discovering News Frames

The Internet is a major source of online news content. Current efforts to evaluate online news content including text, storyline, and sources is limited by the use of small-scale manual techniques that are time consuming and dependent on human judgments. This article explores the use of machine learning algorithms and mathematical techniques for Internet-scale data mining and semantic discovery of news content that will enable researchers to mine, analyze, and visualize large-scale datasets. This research has the potential to inform the integration and application of data mining to address real-world socio-environmental issues, including water insecurity in the Southwestern United States. This paper establishes a formal definition of framing and proposes an approach for the discovery of distinct patterns that characterize prominent frames. The authors' experimental evaluation shows the proposed process is an effective approach for advancing semi-supervised machine learning and may assist in advancing tools for making sense of unstructured text.

Download Full-text

A supervised machine-learning method for optimizing the automatic transmission system of wind turbines

Engineering Solid Mechanics ◽

10.5267/j.esm.2021.11.001 ◽

2022 ◽

Vol 10 (1) ◽

pp. 35-56 ◽

Cited By ~ 1

Author(s):

Habeeb A. H. R. Aladwani ◽

Mohd Khairol Anuar Ariffin ◽

Faizal Mustapha

Keyword(s):

Machine Learning ◽

Wind Speed ◽

Wind Turbines ◽

Large Scale ◽

Learning Algorithm ◽

Automatic Transmission ◽

Transmission System ◽

Supervised Machine Learning ◽

Small Scale ◽

Low Efficiency

Large-scale wind turbines mostly use Continuously Variable Transmission (CVT) as the transmission system, which is highly efficient. However, it comes with high complexity and cost too. In contrast, the small-scale wind turbines that are available in the market offer a one-speed gearing system only where no gear ratios are varied, resulting in low efficiency of harvesting energy and leading to gears failure. In this research, an unsupervised machine-learning algorithm is proposed to address the energy efficiency of the automatic transmission system in vertical axis wind turbines (VAWT), to increase its efficiency in harvesting energy. The aim is to find the best adjustment for VAWT while the automatic transmission system is taken into account. For this purpose, the system is simulated and tested under various gear ratios conditions while a centrifugal clutch is applied to automatic gear shifting. The outcomes indicated that the automatic transmission system could successfully adjust the spinning in line with the wind speed. As a result, the obtained level of harvested voltage and power by VAWT with the automatic transmission system are improved significantly. Consequently, it is concluded that automatic VAWTs, equipped with the machine-learning capability can readjust themselves with the wind speed more efficiently.

Download Full-text

Research on the Application of Machine Learning Algorithms in Credit Risk Assessment of Minor Enterprises

CONVERTER ◽

10.17762/converter.220 ◽

2021 ◽

pp. 696-706

Author(s):

Huichao Mi

Keyword(s):

Machine Learning ◽

Credit Risk ◽

Industrial Development ◽

Large Scale ◽

Manufacturing Industry ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Small Scale ◽

Scale Data

Under the influence of COVID-19, minor enterprises, especially the manufacturing industry, are facing greater financial pressure and the possibility of non-performing loans is increasing. It is very important for financial institutions to reduce financial risks while providing financial support for minor enterprises to promote industrial development and economic recovery. In order to understand the function of machine learning algorithms in predicting enterprise credit risk, the research designs five models, including Logistic Regression, Decision Tree, Naïve Bayesian, Support Vector Machine and Deep Neural Network, and adopts SMOTE and Undersampling to process imbalanced data. Experiments show that machine learning algorithms have high accuracy for both large-scale data and small-scale data.

Download Full-text

A Scalable Machine Learning Pipeline for Paddy Rice Classification Using Multi-Temporal Sentinel Data

Remote Sensing ◽

10.3390/rs13091769 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1769

Author(s):

Vasileios Sitokonstantinou ◽

Alkiviadis Koukos ◽

Thanassis Drivas ◽

Charalampos Kontoes ◽

Ioannis Papoutsis ◽

...

Keyword(s):

Machine Learning ◽

Satellite Data ◽

High Performance ◽

Large Scale ◽

Paddy Rice ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Rice Area ◽

Multi Temporal

The demand for rice production in Asia is expected to increase by 70% in the next 30 years, which makes evident the need for a balanced productivity and effective food security management at a national and continental level. Consequently, the timely and accurate mapping of paddy rice extent and its productivity assessment is of utmost significance. In turn, this requires continuous area monitoring and large scale mapping, at the parcel level, through the processing of big satellite data of high spatial resolution. This work designs and implements a paddy rice mapping pipeline in South Korea that is based on a time-series of Sentinel-1 and Sentinel-2 data for the year of 2018. There are two challenges that we address; the first one is the ability of our model to manage big satellite data and scale for a nationwide application. The second one is the algorithm’s capacity to cope with scarce labeled data to train supervised machine learning algorithms. Specifically, we implement an approach that combines unsupervised and supervised learning. First, we generate pseudo-labels for rice classification from a single site (Seosan-Dangjin) by using a dynamic k-means clustering approach. The pseudo-labels are then used to train a Random Forest (RF) classifier that is fine-tuned to generalize in two other sites (Haenam and Cheorwon). The optimized model was then tested against 40 labeled plots, evenly distributed across the country. The paddy rice mapping pipeline is scalable as it has been deployed in a High Performance Data Analytics (HPDA) environment using distributed implementations for both k-means and RF classifiers. When tested across the country, our model provided an overall accuracy of 96.69% and a kappa coefficient 0.87. Even more, the accurate paddy rice area mapping was returned early in the year (late July), which is key for timely decision-making. Finally, the performance of the generalized paddy rice classification model, when applied in the sites of Haenam and Cheorwon, was compared to the performance of two equivalent models that were trained with locally sampled labels. The results were comparable and highlighted the success of the model’s generalization and its applicability to other regions.

Download Full-text

Text Polarity Detection using Multiple Supervised Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8449.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 1612-1618

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

The Public ◽

Social Media Platforms ◽

Day By Day

Sentiment analysis is the classifying of a review, opinion or a statement into categories, which brings clarity about specific sentiments of customers or the concerned group to businesses and developers. These categorized data are very critical to the development of businesses and understanding the public opinion. The need for accurate opinion and large-scale sentiment analysis on social media platforms is growing day by day. In this paper, a number of machine learning algorithms are trained and applied on twitter datasets and their respective accuracies are determined separately on different polarities of data, thereby giving a glimpse to which algorithm works best and which works worst..

Download Full-text

Supervised Machine Learning Algorithms for Sentiment Analysis of Bangla Newspaper

International Journal of Innovative Computing ◽

10.11113/ijic.v11n2.321 ◽

2021 ◽

Vol 11 (2) ◽

pp. 15-23

Author(s):

Sabrina Jahan Maisha ◽

Nuren Nafisa ◽

Abdul Kadar Muhammad Masum

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Nearest Neighbor ◽

Online News ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

K Nearest Neighbor ◽

Aged People

We can state undoubtedly that Bangla language is rich enough to work with and implement various Natural Language Processing (NLP) tasks. Though it needs proper attention, hardly NLP field has been explored with it. In this age of digitalization, large amount of Bangla news contents are generated in online platforms. Some of the contents are inappropriate for the children or aged people. With the motivation to filter out news contents easily, the aim of this work is to perform document level sentiment analysis (SA) on Bangla online news. In this respect, the dataset is created by collecting news from online Bangla newspaper archive. Further, the documents are manually annotated into positive and negative classes. Composite process technique of “Pipeline” class including Count Vectorizer, transformer (TF-IDF) and machine learning (ML) classifiers are employed to extract features and to train the dataset. Six supervised ML classifiers (i.e. Multinomial Naive Bayes (MNB), K-Nearest Neighbor (K-NN), Random Forest (RF), (C4.5) Decision Tree (DT), Logistic Regression (LR) and Linear Support Vector Machine (LSVM)) are used to analyze the best classifier for the proposed model. There has been very few works on SA of Bangla news. So, this work is a small attempt to contribute in this field. This model showed remarkable efficiency through better results in both the validation process of percentage split method and 10-fold cross validation. Among all six classifiers, RF has outperformed others by 99% accuracy. Even though LSVM has shown lowest accuracy of 80%, it is also considered as good output. However, this work has also exhibited surpassing outcome for recent and critical Bangla news indicating proper feature extraction to build up the model.

Download Full-text

COVID-19 Forecasting using Multivariate Linear Regression

10.21203/rs.3.rs-71963/v1 ◽

2020 ◽

Author(s):

R. Suganya ◽

R.Arunadevi ◽

Seyed M.Buhari

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Large Scale ◽

Learning Algorithms ◽

Multivariate Linear Regression ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Error Matrix ◽

The World

Abstract Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe respiratory syndrome coronavirus 2 (SARS-CoV-2). It was first identified in December 2019 in Wuhan, the capital of China’s Hubei province. The objective of this research is to propose a forecasting model using the COVID-19 available dataset from top affected regions across the world using machine learning algorithms. Machine Learning algorithms help us achieve this objective. Regression models are one of the supervised machine learning techniques to classify large-scale data. This research aims to apply Multivariate Linear Regression to predict the number of confirmed and death COVID-19 cases for a span of one and two weeks. The experimental results explain 99\% variability in prediction with the R-squared statistics scores of 0.992. The algorithms are evaluated using the error matrix such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and accuracy for top affected regions across the world.

Download Full-text

Exploring the Use of Machine Learning to Automate the Qualitative Coding of Church-related Tweets

Fieldwork in Religion ◽

10.1558/firn.40610 ◽

2020 ◽

Vol 14 (2) ◽

pp. 140-159

Author(s):

Anthony-Paul Cooper ◽

Emmanuel Awuni Kolog ◽

Erkki Sutinen

Keyword(s):

Machine Learning ◽

Online Community ◽

High Volume ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Social Media Data ◽

Twitter Data ◽

Resource Intensity ◽

Media Data ◽

Better Than

This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.

Download Full-text