A Comprehensive Review on Online News Popularity Prediction using Machine Learning Approach

2019 ◽  
Vol 5 (1) ◽  
pp. 7
Author(s):  
Priyanka Rathord ◽  
Dr. Anurag Jain ◽  
Chetan Agrawal

With the help of Internet, the online news can be instantly spread around the world. Most of peoples now have the habit of reading and sharing news online, for instance, using social media like Twitter and Facebook. Typically, the news popularity can be indicated by the number of reads, likes or shares. For the online news stake holders such as content providers or advertisers, it’s very valuable if the popularity of the news articles can be accurately predicted prior to the publication. Thus, it is interesting and meaningful to use the machine learning techniques to predict the popularity of online news articles. Various works have been done in prediction of online news popularity. Popularity of news depends upon various features like sharing of online news on social media, comments of visitors for news, likes for news articles etc. It is necessary to know what makes one online news article more popular than another article. Unpopular articles need to get optimize for further popularity. In this paper, different methodologies are analyzed which predict the popularity of online news articles. These methodologies are compared, their parameters are considered and improvements are suggested. The proposed methodology describes online news popularity predicting system.

One of the most dynamic and invigorate advancement in information technology is advent of Internet of Things (IoT). IoT is territory of interrelated computational and digital devices with intelligence to transfer data. Along with swift expansion of IoT devices through the world security of things is not at expected height. As a consequence of ubiquitous nature of IoT environment most of the user do not have expertise or willingness to secure devices by themselves. Machine learning approach could be very effective to address security challenges in IoT environment. In recent related papers, the researcher have used machine learning techniques, approaches or methods for securing things in IoT environment. This paper attempts to review the related research on machine learning approaches to secure IoT devices


2020 ◽  
pp. 193-201 ◽  
Author(s):  
Hayder A. Alatabi ◽  
Ayad R. Abbas

Over the last period, social media achieved a widespread use worldwide where the statistics indicate that more than three billion people are on social media, leading to large quantities of data online. To analyze these large quantities of data, a special classification method known as sentiment analysis, is used. This paper presents a new sentiment analysis system based on machine learning techniques, which aims to create a process to extract the polarity from social media texts. By using machine learning techniques, sentiment analysis achieved a great success around the world. This paper investigates this topic and proposes a sentiment analysis system built on Bayesian Rough Decision Tree (BRDT) algorithm. The experimental results show the success of this system where the accuracy of the system is more than 95% on social media data.


Author(s):  
Zhao Zhang ◽  
Yun Yuan ◽  
Xianfeng (Terry) Yang

Accurate and timely estimation of freeway traffic speeds by short segments plays an important role in traffic monitoring systems. In the literature, the ability of machine learning techniques to capture the stochastic characteristics of traffic has been proved. Also, the deployment of intelligent transportation systems (ITSs) has provided enriched traffic data, which enables the adoption of a variety of machine learning methods to estimate freeway traffic speeds. However, the limitation of data quality and coverage remain a big challenge in current traffic monitoring systems. To overcome this problem, this study aims to develop a hybrid machine learning approach, by creating a new training variable based on the second-order traffic flow model, to improve the accuracy of traffic speed estimation. Grounded on a novel integrated framework, the estimation is performed using three machine learning techniques, that is, Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN). All three models are trained with the integrated dataset including the traffic flow model estimates and the iPeMS and PeMS data from the Utah Department of Transportation (DOT). Further using the PeMS data as the ground truth for model evaluation, the comparisons between the hybrid approach and pure machine learning models show that the hybrid approach can effectively capture the time-varying pattern of the traffic and help improve the estimation accuracy.


Pollution exposure and human health in the industry contaminated area are always a concern. The need for industrialization urges to concentrate on sustainable life of residents in the vicinity of the industrial area rather than opposing the industrialists. Literature in epidemiological studies reveal that air pollution is one of the major problems for health risks faced by residents in the industrial area. Main pollutants in industry related air pollution are particulate matter (PM2.5, PM10), SO2 , NO2 , and other pollutants upon the industry. Data for epidemiological studies obtained from different sources which are limited to public access include residents’ sociodemographic characters, health problems, and air quality index for personal exposure to pollutants. This combined data and limited resources make the analysis more complex so that statistical methods cannot compensate. Our review finds that there is an increase in literature that evaluates the connection between ambient air pollution exposure and associated health events of residents in the industrially polluted area using statistical methods, mainly regression models. A very few applies machine learning techniques to figure out the impact of common air pollution exposure on human health. Most of the machine learning approach to epidemiological studies end up in air pollution exposure monitoring, not to correlate its association with diseases. A machine learning approach to epidemiological studies can automatically characterize the residents’ exposure to pollutants and its associated health effects. Uniqueness of the model depends on the appropriate exhaustive data that characterizes the features, and machine learning algorithm used to build the model. In this contribution, we discuss various existing approaches that evaluate residents’ health effects and the source of irritation in association with air pollution exposure, focuses machine learning techniques and mathematical background for epidemiological studies for residents’ sustainable life.


2020 ◽  
Vol 13 (9) ◽  
pp. 204
Author(s):  
Rodrigo A. Nava Lara ◽  
Jesús A. Beltrán ◽  
Carlos A. Brizuela ◽  
Gabriel Del Rio

Polypharmacologic human-targeted antimicrobials (polyHAM) are potentially useful in the treatment of complex human diseases where the microbiome is important (e.g., diabetes, hypertension). We previously reported a machine-learning approach to identify polyHAM from FDA-approved human targeted drugs using a heterologous approach (training with peptides and non-peptide compounds). Here we discover that polyHAM are more likely to be found among antimicrobials displaying a broad-spectrum antibiotic activity and that topological, but not chemical features, are most informative to classify this activity. A heterologous machine-learning approach was trained with broad-spectrum antimicrobials and tested with human metabolites; these metabolites were labeled as antimicrobials or non-antimicrobials based on a naïve text-mining approach. Human metabolites are not commonly recognized as antimicrobials yet circulate in the human body where microbes are found and our heterologous model was able to classify those with antimicrobial activity. These results provide the basis to develop applications aimed to design human diets that purposely alter metabolic compounds proportions as a way to control human microbiome.


Author(s):  
Erick Omuya ◽  
George Okeyo ◽  
Michael Kimwele

Social media has been embraced by different people as a convenient and official medium of communication. People write messages and attach images and videos on Twitter, Facebook and other social media which they share. Social media therefore generates a lot of data that is rich in sentiments from these updates. Sentiment analysis has been used to determine opinions of clients, for instance, relating to a particular product or company. Knowledge based approach and Machine learning approach are among the strategies that have been used to analyze these sentiments. The performance of sentiment analysis is however distorted by noise, the curse of dimensionality, the data domains and size of data used for training and testing. This research aims at developing a model for sentiment analysis in which dimensionality reduction and the use of different parts of speech improves sentiment analysis performance. It uses natural language processing for filtering, storing and performing sentiment analysis on the data from social media. The model is tested using Naïve Bayes, Support Vector Machines and K-Nearest neighbor machine learning algorithms and its performance compared with that of two other Sentiment Analysis models. Experimental results show that the model improves sentiment analysis performance using machine learning techniques.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Apostolos Ampountolas ◽  
Mark P. Legg

Purpose This study aims to predict hotel demand through text analysis by investigating keyword series to increase demand predictions’ precision. To do so, this paper presents a framework for modeling hotel demand that incorporates machine learning techniques. Design/methodology/approach The empirical forecasting is conducted by introducing a segmented machine learning approach of leveraging hierarchical clustering tied to machine learning and deep learning techniques. These features allow the model to yield more precise estimates. This study evaluates an extensive range of social media–derived words with the most significant probability of gradually establishing an understanding of an optimal outcome. Analyzes were performed on a major hotel chain in an urban market setting within the USA. Findings The findings indicate that while traditional methods, being the naïve approach and ARIMA models, struggled with forecasting accuracy, segmented boosting methods (XGBoost) leveraging social media predict hotel occupancy with greater precision for all examined time horizons. Additionally, the segmented learning approach improved the forecasts’ stability and robustness while mitigating common overfitting issues within a highly dimensional data set. Research limitations/implications Incorporating social media into a segmented learning framework can augment the current generation of forecasting methods’ accuracy. Moreover, the segmented learning approach mitigates the negative effects of market shifts (e.g. COVID-19) that can reduce in-production forecasts’ life-cycles. The ability to be more robust to market deviations will allow hospitality firms to minimize development time. Originality/value The results are expected to generate insights by providing revenue managers with an instrument for predicting demand.


2021 ◽  
Author(s):  
Emily Hunt ◽  
Joshua O.S. Hunt ◽  
Vernon J. Richardson ◽  
David Rosser

In this paper, we investigate whether misstatement risk estimated using advanced machine learning techniques, hereafter referred to as estimated misstatement risk (EMR), approximates auditors' risk assessments in practice. We find that auditors price EMR and that auditor turnover is more likely to occur when EMR increases, indicating that EMR is associated with auditors' risk assessment. We also find evidence that EMR is positively and significantly associated with audit fees and auditor switching for companies with Big N auditors but not for other companies, suggesting that Big N auditors are more responsive to risks captured by EMR. Additional analyses reveal that companies switching auditors when EMR increases are more likely to engage non-Big N auditors. Surprisingly, we find little evidence that the association between audit quality and EMR differs by auditor type. Our findings are consistent with the notion that the documented association between audit fees and EMR primarily reflects a risk premium in our setting.


2022 ◽  
pp. 349-366
Author(s):  
Roopashree S. ◽  
Anitha J. ◽  
Madhumathy P.

Ayurveda medicines uses herbs for curing many ailments without side effects. The biggest concern related to Ayurveda medicine is extinction of many important medicinal herbs, which may be due to insufficient knowledge, weather conditions, and urbanization. Another reason consists of lack of online facts on Indian herbs because it is dependent on books and experts. This concern has motivated in utilizing the machine learning techniques to identify and reveal few details of Indian medicinal herbs because, until now, it is identified manually, which is cumbersome and may lead to errors. Many researchers have shown decent results in identifying and classifying plants with good accuracy and robustness. But no complete framework and strong evidence is projected on Indian medicinal herbs. Accordingly, the chapter aims to provide an outline on how machine learning techniques can be adopted to enrich the knowledge of Indian herbs, which advantages both common man and the domain experts with wide information on traditional herbs.


Distributed Denial of Service Attack (DDoS) is a deadliest weapon which overwhelm the server or network by sending flood of packets towards it. The attack disrupts the services running on the target thereby blocking the legitimate traffic accessing its services. Various advanced machine learning techniques have been applied for detection of different types of DDoS attacks but still the attack remains a potential threat to the world. There are mainly two broad categories of machine learning techniques: supervised machine learning approach and unsupervised machine learning approach. Supervised machine learning approach requires labelled attack traffic datasets whereas unsupervised machine learning approach analyses incoming network traffic and then categorizes it. In this paper we have attempted to apply four different classifiers for the detection of DDoS attacks. The four classifiers applied are Logistic Regression, Naïve Bayes, K- Nearest Neighbor and Artificial Neural Network. The chosen classifiers provide stable results when there is a large dataset. We compared their detection accuracy on KDD dataset which is a benchmark dataset in the field of network security. This paper is novel as it explains each pre-processing step with python conversion functions and explained in detail all the classifiers and detection accuracy with their functions in python as well.


Sign in / Sign up

Export Citation Format

Share Document