Development of a web-based application using machine learning algorithms to facilitate systematic literature reviews

Abstract Background Despite existing research on text mining and machine learning for title and abstract screening, the role of machine learning within systematic literature reviews (SLRs) for health technology assessment (HTA) remains unclear given lack of extensive testing and of guidance from HTA agencies. We sought to address two knowledge gaps: to extend ML algorithms to provide a reason for exclusion—to align with current practices—and to determine optimal parameter settings for feature-set generation and ML algorithms. Methods We used abstract and full-text selection data from five large SLRs (n = 3089 to 12,769 abstracts) across a variety of disease areas. Each SLR was split into training and test sets. We developed a multi-step algorithm to categorize each citation into the following categories: included; excluded for each PICOS criterion; or unclassified. We used a bag-of-words approach for feature-set generation and compared machine learning algorithms using support vector machines (SVMs), naïve Bayes (NB), and bagged classification and regression trees (CART) for classification. We also compared alternative training set strategies: using full data versus downsampling (i.e., reducing excludes to balance includes/excludes because machine learning algorithms perform better with balanced data), and using inclusion/exclusion decisions from abstract versus full-text screening. Performance comparisons were in terms of specificity, sensitivity, accuracy, and matching the reason for exclusion. Results The best-fitting model (optimized sensitivity and specificity) was based on the SVM algorithm using training data based on full-text decisions, downsampling, and excluding words occurring fewer than five times. The sensitivity and specificity of this model ranged from 94 to 100%, and 54 to 89%, respectively, across the five SLRs. On average, 75% of excluded citations were excluded with a reason and 83% of these citations matched the reviewers’ original reason for exclusion. Sensitivity significantly improved when both downsampling and abstract decisions were used. Conclusions ML algorithms can improve the efficiency of the SLR process and the proposed algorithms could reduce the workload of a second reviewer by identifying exclusions with a relevant PICOS reason, thus aligning with HTA guidance. Downsampling can be used to improve study selection, and improvements using full-text exclusions have implications for a learn-as-you-go approach.

Download Full-text

An Effective System to Detect Fake Research

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a9118.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2971-2975

Keyword(s):

Machine Learning ◽

Decision Making ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Web Based ◽

User Reviews ◽

Effective System ◽

Media Systems ◽

The Right ◽

Fake Reviews

Detection of spam review is an important operation for present e-commwebsites and apps.We address the issue on fake review detection in user reviews in e-commerce application, which wasimportant for implementing anti-opinion spam.First we analyze the characteristics of fake reviews and we apply the machine learning algorithms on that data. Spam or fake reviews of the itemsreducing the reliability of decision making and competitive analysis.The presence of fake reviews makes the customer cannot make the right decisions of sellers, which can also causes the goodwill of the platform decreased. There is a chance of leaving appraisals via web-based networking media systems whether states or harming by spammers on specific item, firm alongside their answers by recognizing these spammers just as in like manner spams so as to understand the assessments in the interpersonal organizations sites, we exist a stand-out structure called Netspam which uses spam highlights for demonstrating tribute datasets as heterogeneous subtleties systems to guide spam location treatment directly into gathering issue in such systems.

Download Full-text

Analysis and Outcome Prediction of Crowdfunding Campaigns

International Journal of Information Retrieval Research ◽

10.4018/ijirr.289575 ◽

2022 ◽

Vol 12 (1) ◽

pp. 1-14

Author(s):

Parmeet Kaur ◽

Sanya Deshmukh ◽

Pranjal Apoorva ◽

Simar Batra

Keyword(s):

Machine Learning ◽

Outcome Prediction ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

New Venture ◽

Web Based ◽

Small Organizations ◽

Huge Data ◽

Nosql Database ◽

Existing Data

Humongous volumes of data are being generated every minute by individual users as well as organizations. This data can be turned into a valuable asset only if it is analyzed, interpreted and used for improving processes or for benefiting users. One such source that is contributing huge data every year is a large number of web-based crowd-funding projects. These projects and related campaigns help ventures to raise money by acquiring small amounts of funding from different small organizations and people. The funds raised for crowdfunded projects and hence, their success depends on multiple elements of the project. The current work predicts the success of a new venture by analysis and visualization of the existing data and determining the parameters on which success of a project depends. The prediction of a project’s outcome is performed by application of machine learning algorithms on crowd-funding data stored in the NoSQL database, MongoDB. The results of this work can prove beneficial for the investors to have an estimate about the success of a project before investing in it.

Download Full-text

Machine Learning Algorithms for Spam Detection in Social Networks

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2019.8.s3.2090 ◽

2019 ◽

Vol 8 (S3) ◽

pp. 41-44

Author(s):

K. Nagaramani ◽

K. Vandanarao ◽

B. Mamatha

Keyword(s):

Machine Learning ◽

Social Networks ◽

Social Systems ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Spam Detection ◽

Web Based ◽

The Times ◽

The Web

Most of the web based social systems like Face book, twitter, other mailing systems and social networks are developed for users to share their information, to interact and engage with the community. Most of the times these social networks will give some troubles to the users by spam messages, threaten messages, hackers and so on.. Many of the researchers worked on this and gave several approaches to detect the spam, hackers and other trouble shoots. In this paper we are discussing some tools to detect the spam messages in social networks. Here we are using RF, SVM, KNN and MLP machine learning algorithms across rapid miner and WEKA. It gives the better results when compared with other tools.

Download Full-text

Medical Big Data Analytics using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a5290.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3517-3526

Keyword(s):

Machine Learning ◽

Cost Efficiency ◽

Learning Algorithms ◽

Big Data Analytics ◽

Modern Medicine ◽

Machine Learning Algorithms ◽

Web Based ◽

Medical Big Data ◽

Systematic Data Collection

Artificial intelligence and expert systems plays a key role in modern medicine sciences for disease prediction, surveillance interventions, cost efficiency and better quality of life etc. With the arrival of new web-based data sources and systematic data collection through surveys and medical reporting, there is a need of the hour to develop effective recommendation systems which can support practitioners in better decision-making process. Machine Learning Algorithms (MLA) is a powerful tool which enables computers to learn from data. While many novel developed MLA constantly evolves, there is need to develop more systematic, robust algorithm which can interpret with highest possible accuracy, sensitivity and specificity. The study reviews previously published series on different algorithms their advantages and limitations which shall help make future recommendations for researchers and experts seeking to develop an effective algorithm for predicting the likelihood of various diseases.

Download Full-text

Supplemental Material for One Model to Rule Them All? Using Machine Learning Algorithms to Determine the Number of Factors in Exploratory Factor Analysis

Psychological Methods ◽

10.1037/met0000262.supp ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Factor Analysis ◽

Exploratory Factor Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Number Of Factors

Download Full-text

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The Impact

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text