A Systematic Literature Review on Using Machine Learning Algorithms for Software Requirements Identification on Stack Overflow

Context. The improvements made in the last couple of decades in the requirements engineering (RE) processes and methods have witnessed a rapid rise in effectively using diverse machine learning (ML) techniques to resolve several multifaceted RE issues. One such challenging issue is the effective identification and classification of the software requirements on Stack Overflow (SO) for building quality systems. The appropriateness of ML-based techniques to tackle this issue has revealed quite substantial results, much effective than those produced by the usual available natural language processing (NLP) techniques. Nonetheless, a complete, systematic, and detailed comprehension of these ML based techniques is considerably scarce. Objective. To identify or recognize and classify the kinds of ML algorithms used for software requirements identification primarily on SO. Method. This paper reports a systematic literature review (SLR) collecting empirical evidence published up to May 2020. Results. This SLR study found 2,484 published papers related to RE and SO. The data extraction process of the SLR showed that (1) Latent Dirichlet Allocation (LDA) topic modeling is among the widely used ML algorithm in the selected studies and (2) precision and recall are amongst the most commonly utilized evaluation methods for measuring the performance of these ML algorithms. Conclusion. Our SLR study revealed that while ML algorithms have phenomenal capabilities of identifying the software requirements on SO, they still are confronted with various open problems/issues that will eventually limit their practical applications and performances. Our SLR study calls for the need of close collaboration venture between the RE and ML communities/researchers to handle the open issues confronted in the development of some real world machine learning-based quality systems.

Download Full-text

Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01403-2 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Alan Brnabic ◽

Lisa M. Hess

Keyword(s):

Machine Learning ◽

Decision Making ◽

Literature Review ◽

Systematic Literature Review ◽

Real World ◽

Learning Algorithms ◽

External Validation ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.

Download Full-text

Use of AI and Machine Learning for Asthma Patients: A Systematic Literature Review

Malaysian Journal of Medical and Biological Research ◽

10.18034/mjmbr.v7i2.517 ◽

2020 ◽

Vol 7 (2) ◽

pp. 129-134

Author(s):

Takudzwa Fadziso

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Literature Review ◽

Systematic Literature Review ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Modern Times ◽

Predictive Algorithms ◽

Asthma Patients

In modern times, the collection of data is not a big deal but using it in a meaningful is a challenging task. Different organizations are using artificial intelligence and machine learning for collecting and utilizing the data. These should also be used in the medical because different disease requires the prediction. One of these diseases is asthma that is continuously increasing and affecting more and more people. The major issue is that it is difficult to diagnose in children. Machine learning algorithms can help in diagnosing it early so that the doctors can start the treatment early. Machine learning algorithms can perform this prediction so this study will be helpful for both the doctors and patients. There are different machine learning predictive algorithms are available that have been used for this purpose.

Download Full-text

Machine Learning Approaches to Bike-Sharing Systems: A Systematic Literature Review

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10020062 ◽

2021 ◽

Vol 10 (2) ◽

pp. 62

Author(s):

Vitória Albuquerque ◽

Miguel Sales Dias ◽

Fernando Bacao

Keyword(s):

Machine Learning ◽

Literature Review ◽

Systematic Literature Review ◽

Smart Cities ◽

Research Question ◽

Machine Learning Algorithms ◽

Urban Transport ◽

Machine Learning Techniques ◽

Future Research ◽

Bike Sharing

Cities are moving towards new mobility strategies to tackle smart cities’ challenges such as carbon emission reduction, urban transport multimodality and mitigation of pandemic hazards, emphasising on the implementation of shared modes, such as bike-sharing systems. This paper poses a research question and introduces a corresponding systematic literature review, focusing on machine learning techniques’ contributions applied to bike-sharing systems to improve cities’ mobility. The preferred reporting items for systematic reviews and meta-analyses (PRISMA) method was adopted to identify specific factors that influence bike-sharing systems, resulting in an analysis of 35 papers published between 2015 and 2019, creating an outline for future research. By means of systematic literature review and bibliometric analysis, machine learning algorithms were identified in two groups: classification and prediction.

Download Full-text

SALTClass: classifying clinical short notes using background knowledge from unlabeled data

10.1101/801944 ◽

2019 ◽

Author(s):

Ayoub Bagheri ◽

Daniel Oberski ◽

Arjan Sammani ◽

Peter G.M. van der Heijden ◽

Folkert W. Asselbergs

Keyword(s):

Machine Learning ◽

Language Processing ◽

Text Classification ◽

Latent Dirichlet Allocation ◽

Machine Learning Algorithms ◽

Unlabeled Data ◽

Specific Information ◽

Short Text ◽

Link Type ◽

Python Package

AbstractBackgroundWith the increasing use of unstructured text in electronic health records, extracting useful related information has become a necessity. Text classification can be applied to extract patients’ medical history from clinical notes. However, the sparsity in clinical short notes, that is, excessively small word counts in the text, can lead to large classification errors. Previous studies demonstrated that natural language processing (NLP) can be useful in the text classification of clinical outcomes. We propose incorporating the knowledge from unlabeled data, as this may alleviate the problem of short noisy sparse text.ResultsThe software package SALTClass (short and long text classifier) is a machine learning NLP toolkit. It uses seven clustering algorithms, namely, latent Dirichlet allocation, K-Means, MiniBatchK-Means, BIRCH, MeanShift, DBScan, and GMM. Smoothing methods are applied to the resulting cluster information to enrich the representation of sparse text. For the subsequent prediction step, SALTClass can be used on either the original document-term matrix or in an enrichment pipeline. To this end, ten different supervised classifiers have also been integrated into SALTClass. We demonstrate the effectiveness of the SALTClass NLP toolkit in the identification of patients’ family history in a Dutch clinical cardiovascular text corpus from University Medical Center Utrecht, the Netherlands.ConclusionsThe considerable amount of unstructured short text in healthcare applications, particularly in clinical cardiovascular notes, has created an urgent need for tools that can parse specific information from text reports. Using machine learning algorithms for enriching short text can improve the representation for further applications.AvailabilitySALTClass can be downloaded as a Python package from Python Package Index (PyPI) website athttps://pypi.org/project/saltclassand from GitHub athttps://github.com/bagheria/saltclass.

Download Full-text

Stock Market Prediction using Machine Learning: A Systematic Literature Review

American Journal of Trade and Policy ◽

10.18034/ajtp.v4i3.521 ◽

2017 ◽

Vol 4 (3) ◽

pp. 123-128

Author(s):

Siddhartha Vadlamudi

Keyword(s):

Machine Learning ◽

Literature Review ◽

Stock Market ◽

Systematic Literature Review ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

The Other ◽

Stock Market Prediction ◽

Stock Value ◽

Challenging Tasks

Different machine learning algorithms are discussed in this literature review. These algorithms can be used for predicting the stock market. The prediction of the stock market is one of the challenging tasks that must have to be handled. In this paper, it is discussed how the machine learning algorithms can be used for predicting the stock value. Different attributes are identified that can be used for training the algorithm for this purpose. Some of the other factors are also discussed that can have an effect on the stock value.

Download Full-text

Stress Detection using Natural Language Processing and Machine Learning over social Interactions

10.21203/rs.3.rs-994868/v1 ◽

2021 ◽

Author(s):

Tanya Nijhawan ◽

Girija Attigeri ◽

Ananthakrishna T

Keyword(s):

Machine Learning ◽

Social Interactions ◽

Language Processing ◽

Large Scale ◽

Latent Dirichlet Allocation ◽

Well Being ◽

Machine Learning Algorithms ◽

Textual Data ◽

The Status ◽

Micro Level

Abstract Cyberspace is a vast soapbox for people to post anything that they witness in their day-to-day lives. Subsequently, it can be used as a very effective tool in detecting the stress levels of an individual based on the posts and comments shared by him/her on social networking platforms. We leverage large-scale datasets with tweets to successfully accomplish sentiment analysis with the aid of machine learning algorithms. We take the help of a capable deep learning pre-trained model called BERT to solve the problems which come with sentiment classification. The BERT model outperforms a lot of other well-known models for this job without any sophisticated architecture. We also adopted Latent Dirichlet Allocation which is an unsupervised machine learning method that’s skilled in scanning a group of documents, recognizing the word and phrase patterns within them, and gathering word groups and alike expressions that most precisely illustrate a set of documents. This helps us predict which topic is linked to the textual data. With the aid of the models suggested, we will be able to detect the emotion of users online. We are primarily working with Twitter data because Twitter is a website where people express their thoughts often. In conclusion, this proposal is for the well- being of one’s mental health. The results are evaluated using various metric at macro and micro level and indicate that the trained model detects the status of emotions bases on social interactions.

Download Full-text

An Empirical Evaluation of Machine Learning Algorithms for Identifying Software Requirements on Stack Overflow: Initial Results

2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS) ◽

10.1109/icsess47205.2019.9040720 ◽

2019 ◽

Author(s):

Arshad Ahmad ◽

Chong Feng ◽

Adnan Tahir ◽

Asif Khan ◽

Muhammad Waqas ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Empirical Evaluation ◽

Machine Learning Algorithms ◽

Software Requirements ◽

Stack Overflow ◽

Initial Results

Download Full-text

An Intelligent Literature Review: an Inductive Approach to define Machine Learning Applications in the clinical domain

10.21203/rs.3.rs-1090813/v1 ◽

2022 ◽

Author(s):

Renu Sabharwal ◽

Shah Jahan Miah

Keyword(s):

Machine Learning ◽

Literature Review ◽

Systematic Literature Review ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Big Data Analytics ◽

Decision Makers ◽

Machine Learning Applications ◽

Two Stages ◽

Clinical Domain

Abstract Big data analytics utilizes different analytics techniques to transform large volume and diversified big dataset. The analytics uses various computational methods such as different Machine Learning (ML) in convert raw data to valuable insights. The ML assist individuals to perform work activities quicker and better, and empower decision-makers in system use. Since academics and industry practitioners have growing interests on ML, how different applications of ML in specific problem domains have been explored, but not in a holistic manner from the past literature. This paper aims to promote the utilization of intelligent literature review for researchers by introducing a step-by-step framework on a case providing the code template. We offer an intelligent literature review to obtain in-depth analytical insight of ML applications in the clinical domain to: a) develop the intelligent literature framework using traditional literature and Latent Dirichlet Allocation (LDA) topic modeling, b) analyze research documents using traditional systematic literature review revealing ML applications, and c) identify topics from documents using LDA topic modeling. We used a PRISMA framework for the traditional literature review, reviewed four databases (e.g. IEEE, PubMed, Scopus, and Google Scholar), which are published between 2016 and 2021 (September). The framework comprises two stages – Traditional systematic literature review and LDA topic modeling. The intelligent literature review framework reviewed 305 research documents in a transparent, reliable, and faster way.

Download Full-text

Machine learning in medicine: a practical introduction to natural language processing

BMC Medical Research Methodology ◽

10.1186/s12874-021-01347-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Conrad J. Harrison ◽

Chris J. Sidey-Gibbons

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Mental Health Problems ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector

Abstract Background Unstructured text, including medical records, patient feedback, and social media comments, can be a rich source of data for clinical research. Natural language processing (NLP) describes a set of techniques used to convert passages of written text into interpretable datasets that can be analysed by statistical and machine learning (ML) models. The purpose of this paper is to provide a practical introduction to contemporary techniques for the analysis of text-data, using freely-available software. Methods We performed three NLP experiments using publicly-available data obtained from medicine review websites. First, we conducted lexicon-based sentiment analysis on open-text patient reviews of four drugs: Levothyroxine, Viagra, Oseltamivir and Apixaban. Next, we used unsupervised ML (latent Dirichlet allocation, LDA) to identify similar drugs in the dataset, based solely on their reviews. Finally, we developed three supervised ML algorithms to predict whether a drug review was associated with a positive or negative rating. These algorithms were: a regularised logistic regression, a support vector machine (SVM), and an artificial neural network (ANN). We compared the performance of these algorithms in terms of classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity and specificity. Results Levothyroxine and Viagra were reviewed with a higher proportion of positive sentiments than Oseltamivir and Apixaban. One of the three LDA clusters clearly represented drugs used to treat mental health problems. A common theme suggested by this cluster was drugs taking weeks or months to work. Another cluster clearly represented drugs used as contraceptives. Supervised machine learning algorithms predicted positive or negative drug ratings with classification accuracies ranging from 0.664, 95% CI [0.608, 0.716] for the regularised regression to 0.720, 95% CI [0.664,0.776] for the SVM. Conclusions In this paper, we present a conceptual overview of common techniques used to analyse large volumes of text, and provide reproducible code that can be readily applied to other research studies using open-source software.

Download Full-text

Federated Learning in a Medical Context: A Systematic Literature Review

ACM Transactions on Internet Technology ◽

10.1145/3412357 ◽

2021 ◽

Vol 21 (2) ◽

pp. 1-31

Author(s):

Bjarne Pfitzner ◽

Nico Steckhan ◽

Bert Arnrich

Keyword(s):

Machine Learning ◽

Literature Review ◽

Systematic Literature Review ◽

Data Privacy ◽

Research Area ◽

Learning Models ◽

Related Data ◽

Private Data ◽

Large Databases ◽

Machine Learning Models

Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients’ anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.

Download Full-text