scholarly journals Machine Learning with Crowdsourcing: A Brief Summary of the Past Research and Future Directions

Author(s):  
Victor S. Sheng ◽  
Jing Zhang

With crowdsourcing systems, labels can be obtained with low cost, which facilitates the creation of training sets for prediction model learning. However, the labels obtained from crowdsourcing are often imperfect, which brings great challenges in model learning. Since 2008, the machine learning community has noticed the great opportunities brought by crowdsourcing and has developed a large number of techniques to deal with inaccuracy, randomness, and uncertainty issues when learning with crowdsourcing. This paper summarizes the technical progress in this field during past eleven years. We focus on two fundamental issues: the data (label) quality and the prediction model quality. For data quality, we summarize ground truth inference methods and some machine learning based methods to further improve data quality. For the prediction model quality, we summarize several learning paradigms developed under the crowdsourcing scenario. Finally, we further discuss several promising future research directions to attract researchers to make contributions in crowdsourcing.

2021 ◽  
Author(s):  
Syeda Nadia Firdaus

Social network is a hot topic of interest for researchers in the field of computer science in recent years. These social networks such as Facebook, Twitter, Instagram play an important role in information diffusion. Social network data are created by its users. Users’ online activities and behavior have been studied in various past research efforts in order to get a better understanding on how information is diffused on social networks. In this study, we focus on Twitter and we explore the impact of user behavior on their retweet activity. To represent a user’s behavior for predicting their retweet decision, we introduce 10-dimentional emotion and 35-dimensional personality related features. We consider the difference of a user being an author and a retweeter in terms of their behaviors, and propose a machine learning based retweet prediction model considering this difference. We also propose two approaches for matrix factorization retweet prediction model which learns the latent relation between users and tweets to predict the user’s retweet decision. In the experiment, we have tested our proposed models. We find that models based on user behavior related features provide good improvement (3% - 6% in terms of F1- score) over baseline models. By only considering user’s behavior as a retweeter, the data processing time is reduced while the prediction accuracy is comparable to the case when both retweeting and posting behaviors are considered. In the proposed matrix factorization models, we include tweet features into the basic factorization model through newly defined regularization terms and improve the performance by 3% - 4% in terms of F1-score. Finally, we compare the performance of machine learning and matrix factorization models for retweet prediction and find that none of the models is superior to the other in all occasions. Therefore, different models should be used depending on how prediction results will be used. Machine learning model is preferable when a model’s performance quality is important such as for tweet re-ranking and tweet recommendation. Matrix factorization is a preferred option when model’s positive retweet prediction capability is more important such as for marketing campaign and finding potential retweeters.


2021 ◽  
Author(s):  
Syeda Nadia Firdaus

Social network is a hot topic of interest for researchers in the field of computer science in recent years. These social networks such as Facebook, Twitter, Instagram play an important role in information diffusion. Social network data are created by its users. Users’ online activities and behavior have been studied in various past research efforts in order to get a better understanding on how information is diffused on social networks. In this study, we focus on Twitter and we explore the impact of user behavior on their retweet activity. To represent a user’s behavior for predicting their retweet decision, we introduce 10-dimentional emotion and 35-dimensional personality related features. We consider the difference of a user being an author and a retweeter in terms of their behaviors, and propose a machine learning based retweet prediction model considering this difference. We also propose two approaches for matrix factorization retweet prediction model which learns the latent relation between users and tweets to predict the user’s retweet decision. In the experiment, we have tested our proposed models. We find that models based on user behavior related features provide good improvement (3% - 6% in terms of F1- score) over baseline models. By only considering user’s behavior as a retweeter, the data processing time is reduced while the prediction accuracy is comparable to the case when both retweeting and posting behaviors are considered. In the proposed matrix factorization models, we include tweet features into the basic factorization model through newly defined regularization terms and improve the performance by 3% - 4% in terms of F1-score. Finally, we compare the performance of machine learning and matrix factorization models for retweet prediction and find that none of the models is superior to the other in all occasions. Therefore, different models should be used depending on how prediction results will be used. Machine learning model is preferable when a model’s performance quality is important such as for tweet re-ranking and tweet recommendation. Matrix factorization is a preferred option when model’s positive retweet prediction capability is more important such as for marketing campaign and finding potential retweeters.


2021 ◽  
Vol 17 (2) ◽  
pp. 1-44
Author(s):  
Francesco Concas ◽  
Julien Mineraud ◽  
Eemil Lagerspetz ◽  
Samu Varjonen ◽  
Xiaoli Liu ◽  
...  

The significance of air pollution and the problems associated with it are fueling deployments of air quality monitoring stations worldwide. The most common approach for air quality monitoring is to rely on environmental monitoring stations, which unfortunately are very expensive both to acquire and to maintain. Hence, environmental monitoring stations are typically sparsely deployed, resulting in limited spatial resolution for measurements. Recently, low-cost air quality sensors have emerged as an alternative that can improve the granularity of monitoring. The use of low-cost air quality sensors, however, presents several challenges: They suffer from cross-sensitivities between different ambient pollutants; they can be affected by external factors, such as traffic, weather changes, and human behavior; and their accuracy degrades over time. Periodic re-calibration can improve the accuracy of low-cost sensors, particularly with machine-learning-based calibration, which has shown great promise due to its capability to calibrate sensors in-field. In this article, we survey the rapidly growing research landscape of low-cost sensor technologies for air quality monitoring and their calibration using machine learning techniques. We also identify open research challenges and present directions for future research.


2019 ◽  
Vol 8 (3) ◽  
pp. 7071-7081

Current generation real-world data sets processed through machine learning are imbalanced by nature. This imbalanced data enables the researchers with a challenging scenario in the context of perdition for both the machine learning and data mining algorithms. It is observed from the past research studies most of the imbalanced data sets consists of the major classes and minor classes and the major class leads the minor class. Several standards and hybrid prediction algorithms are proposed in various application domains but in most of the real-time data sets analyzed in the studies are imbalanced by nature thereby affecting the accuracy of the prediction. This paper presents a systematic survey of the past research studies to analyze intrinsic data characteristics and techniques utilized for handling class-imbalanced data. In addition, this study reveals the research gaps, trends and patterns in existing studies and discusses briefly on future research directions


2021 ◽  
Vol 12 (1) ◽  
pp. 89
Author(s):  
Ruiqi Chen ◽  
Tianyu Wu ◽  
Yuchen Zheng ◽  
Ming Ling

In Internet of Things (IoT) scenarios, it is challenging to deploy Machine Learning (ML) algorithms on low-cost Field Programmable Gate Arrays (FPGAs) in a real-time, cost-efficient, and high-performance way. This paper introduces Machine Learning on FPGA (MLoF), a series of ML IP cores implemented on the low-cost FPGA platforms, aiming at helping more IoT developers to achieve comprehensive performance in various tasks. With Verilog, we deploy and accelerate Artificial Neural Networks (ANNs), Decision Trees (DTs), K-Nearest Neighbors (k-NNs), and Support Vector Machines (SVMs) on 10 different FPGA development boards from seven producers. Additionally, we analyze and evaluate our design with six datasets, and compare the best-performing FPGAs with traditional SoC-based systems including NVIDIA Jetson Nano, Raspberry Pi 3B+, and STM32L476 Nucle. The results show that Lattice’s ICE40UP5 achieves the best overall performance with low power consumption, on which MLoF averagely reduces power by 891% and increases performance by 9 times. Moreover, its cost, power, Latency Production (CPLP) outperforms SoC-based systems by 25 times, which demonstrates the significance of MLoF in endpoint deployment of ML algorithms. Furthermore, we make all of the code open-source in order to promote future research.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Rajshree Varma ◽  
Yugandhara Verma ◽  
Priya Vijayvargiya ◽  
Prathamesh P. Churi

PurposeThe rapid advancement of technology in online communication and fingertip access to the Internet has resulted in the expedited dissemination of fake news to engage a global audience at a low cost by news channels, freelance reporters and websites. Amid the coronavirus disease 2019 (COVID-19) pandemic, individuals are inflicted with these false and potentially harmful claims and stories, which may harm the vaccination process. Psychological studies reveal that the human ability to detect deception is only slightly better than chance; therefore, there is a growing need for serious consideration for developing automated strategies to combat fake news that traverses these platforms at an alarming rate. This paper systematically reviews the existing fake news detection technologies by exploring various machine learning and deep learning techniques pre- and post-pandemic, which has never been done before to the best of the authors’ knowledge.Design/methodology/approachThe detailed literature review on fake news detection is divided into three major parts. The authors searched papers no later than 2017 on fake news detection approaches on deep learning and machine learning. The papers were initially searched through the Google scholar platform, and they have been scrutinized for quality. The authors kept “Scopus” and “Web of Science” as quality indexing parameters. All research gaps and available databases, data pre-processing, feature extraction techniques and evaluation methods for current fake news detection technologies have been explored, illustrating them using tables, charts and trees.FindingsThe paper is dissected into two approaches, namely machine learning and deep learning, to present a better understanding and a clear objective. Next, the authors present a viewpoint on which approach is better and future research trends, issues and challenges for researchers, given the relevance and urgency of a detailed and thorough analysis of existing models. This paper also delves into fake new detection during COVID-19, and it can be inferred that research and modeling are shifting toward the use of ensemble approaches.Originality/valueThe study also identifies several novel automated web-based approaches used by researchers to assess the validity of pandemic news that have proven to be successful, although currently reported accuracy has not yet reached consistent levels in the real world.


2018 ◽  
Vol 7 (3.33) ◽  
pp. 51
Author(s):  
Min Sun Kim ◽  
Eun Soo Choi ◽  
Min Soo Kang

KONEPS is the National Comprehensive Electronic Procurement System of the Public Procurement Service. If KONEPS can know the bidding possibility and trend before bidding, it will be more efficient for companies to bid. In this paper, we used in the experiment was the data of "Progress Bidding Classification" of the Procurement Information Open Portal. And preprocessing process was performed to facilitate prediction model learning. Prior to learning, preprocessed 1,158 data sets were normalized to match the range of data or to make the distribution similar. After normalization we select the number of cluster. As a result of K-Means Clustering, Biddropping is 77 ~ 80%, Budget Allocated is about 2 billion Won(₩), Biddropping is 83 ~ 87%, Budget Allocated is about 1 billion won, bid dropping is 87 ~ 90% Budget Allocated is distributed around 500 million won. And can be confirmed that the cluster is divided based on the number of enterprise 58. Through the results, it is possible to study the tendering trends through the community by learning the prediction models of the bidder companies, the number of bidders, and the tendency of the bidding business, and it will help KONEPS to develop the next generation ISP. 


2021 ◽  
pp. 147592172110545
Author(s):  
Furui Wang

Recently, the issue of bolt looseness has attracted more attention due to its severe consequences. Among different methods for bolt looseness detection, the active sensing method that is based on stress wave signals is preferred since it is low cost and high robust. However, current active sensing method depends on permanent contact sensors, which may be impractical. Moreover, the investigation of multi-bolt looseness detection via the active sensing is very limited so far. With the above deficiency in mind, we propose a new robotic-assisted active sensing method based on our newly designed PZT-enabled smart gloves (SGs) and position-based visual servoing (PBVS) technique. Particularly, another main contribution is that we develop a new Siamese CapsNet to classify stress wave signals under different cases for multi-bolt looseness detection. Compared to machine learning (ML) and traditional deep learning techniques such as Convolutional Neural Networks (CNN), the proposed Siamese CapsNet model can achieve better performance and realize the recognition of signals that is never used during the training, which is impossible for common classification methods. Finally, an experiment is conducted to verify the effectiveness of the proposed method and Siamese CapsNet, which can guide future research significantly.


Author(s):  
A. Al-Shammari ◽  
E. Levin ◽  
R. Shults

<p><strong>Abstract.</strong> This paper provides an overview of oil spill scenarios and the remote sensing methods used for detection and mapping the spills. It also discusses the different kinds of thermal sensors used in oil spills detection. As UAS is becoming an important player in the oil and gas industry for the low operating costs involved, this research involved working with a cheap thermal airborne sensor mounted on DJI Phantom 4 system. Data were collected in two scenarios, first scenario is collecting data in Michigan’s Upper Peninsula at a petroleum company location and the second scenario was an indoor experiment simulating an offshore spill. The aim of this research is to inspect the capability of Lepton LWIR inexpensive sensor to detect the areas contaminated with oil. Data processing to create classification maps involved using ArcGIS 10.5.1, ERDAS Imagine 2015 and ENVI 5.3. Depending accuracy assessment (confusion matrices) for the classified images and comparing classified images with ground truth, results shows the Lepton thermal sensor worked well in differentiating oil from water and was not a good option when there are many objects in the area of interest. Future research recommendations and conclusions are presented.</p>


Sign in / Sign up

Export Citation Format

Share Document