Anatomy of a Data Science Software Toolkit That Uses Machine Learning to Aid ‘Bench-to-Bedside’ Medical Research—With Essential Concepts of Data Mining and Analysis Explained

Data science and machine learning are buzzwords of the early 21st century. Now pervasive through human civilization, how do these concepts translate to use by researchers and clinicians in the life-science and medical field? Here, we describe a software toolkit, just large enough in scale, so that it can be maintained and extended by a small team, optimised for problems that arise in small/medium laboratories. In particular, this system may be managed from data ingestion statistics preparation predictions by a single person. At the system’s core is a graph type database, so that it is flexible in terms of irregular, constantly changing data types, as such data types are common during explorative research. At the system’s outermost shell, the concept of ’user stories’ is introduced to help the end-user researchers perform various tasks separated by their expertise: these range from simple data input, data curation, statistics, and finally to predictions via machine learning algorithms. We compiled a sizable list of already existing, modular Python platform libraries usable for data analysis that may be used as a reference in the field and may be incorporated into this software. We also provide an insight into basic concepts, such as labelled-unlabelled data, supervised vs. unsupervised learning, regression vs. classification, evaluation by different error metrics, and an advanced concept of cross-validation. Finally, we show some examples from our laboratory using our blood sample and blood clot data from thrombosis patients (sufferers from stroke, heart and peripheral thrombosis disease) and how such tools can help to set up realistic expectations and show caveats.

Download Full-text

Drill-Core Mineral Abundance Estimation Using Hyperspectral and High-Resolution Mineralogical Data

Remote Sensing ◽

10.3390/rs12071218 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1218

Author(s):

Laura Tuşa ◽

Mahdi Khodadadzadeh ◽

Cecilia Contreras ◽

Kasra Rafiezadeh Shahi ◽

Margret Fuchs ◽

...

Keyword(s):

Machine Learning ◽

High Resolution ◽

Ore Deposits ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Drill Core ◽

Data Types ◽

Mineralogical Characterization ◽

Core Samples

Due to the extensive drilling performed every year in exploration campaigns for the discovery and evaluation of ore deposits, drill-core mapping is becoming an essential step. While valuable mineralogical information is extracted during core logging by on-site geologists, the process is time consuming and dependent on the observer and individual background. Hyperspectral short-wave infrared (SWIR) data is used in the mining industry as a tool to complement traditional logging techniques and to provide a rapid and non-invasive analytical method for mineralogical characterization. Additionally, Scanning Electron Microscopy-based image analyses using a Mineral Liberation Analyser (SEM-MLA) provide exhaustive high-resolution mineralogical maps, but can only be performed on small areas of the drill-cores. We propose to use machine learning algorithms to combine the two data types and upscale the quantitative SEM-MLA mineralogical data to drill-core scale. This way, quasi-quantitative maps over entire drill-core samples are obtained. Our upscaling approach increases result transparency and reproducibility by employing physical-based data acquisition (hyperspectral imaging) combined with mathematical models (machine learning). The procedure is tested on 5 drill-core samples with varying training data using random forests, support vector machines and neural network regression models. The obtained mineral abundance maps are further used for the extraction of mineralogical parameters such as mineral association.

Download Full-text

Heat Disease Prediction using Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36372 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 846-852

Author(s):

Prof. Dr. R. Sandhiya

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Data Science ◽

Machine Learning Algorithms ◽

Assessment Process ◽

Disease Prediction ◽

Test Results ◽

Medical Field ◽

Modern Age ◽

Linear Svm

In recent times, the diagnosis of heart disease has become a very critical task in the medical field. In the modern age, one person dies every minute due to heart disease. Data science has an important role in processing big amounts of data in the field of health sciences. Since the diagnosis of heart disease is a complex task, the assessment process should be automated to avoid the risks associated with it and alert the patient in advance. This paper uses the heart disease dataset available in the UCI Machine Learning Repository. The proposed work assesses the risk of heart disease in a patient by applying various data mining methods such as Naive Bayes, Decision Tree, KNN, Linear SVM, RBF SVM, Gaussian Process, Neural Network, Adabost, QDA and Random Forest. This paper provides a comparative study by analyzing the performance of various machine learning algorithms. Test results confirm that the KNN algorithm achieved the highest 97% accuracy compared to other implemented ML algorithms.

Download Full-text

Machine Learning Techniques for Internet of Things

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Integrating the Internet of Things Into Software Engineering Practices ◽

10.4018/978-1-5225-7790-4.ch008 ◽

2019 ◽

pp. 160-180

Author(s):

P. Priakanth ◽

S. Gopikrishnan

Keyword(s):

Machine Learning ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Independent Learning ◽

Machine Learning Techniques ◽

Analytical Models ◽

Guided Learning ◽

Learning Techniques ◽

Learning Machine

The idea of an intelligent, independent learning machine has fascinated humans for decades. The philosophy behind machine learning is to automate the creation of analytical models in order to enable algorithms to learn continuously with the help of available data. Since IoT will be among the major sources of new data, data science will make a great contribution to make IoT applications more intelligent. Machine learning can be applied in cases where the desired outcome is known (guided learning) or the data is not known beforehand (unguided learning) or the learning is the result of interaction between a model and the environment (reinforcement learning). This chapter answers the questions: How could machine learning algorithms be applied to IoT smart data? What is the taxonomy of machine learning algorithms that can be adopted in IoT? And what are IoT data characteristics in real-world which requires data analytics?

Download Full-text

A Literature Review on Thyroid Hormonal Problems in Women Using Data Science and Analytics

Advances in Data Mining and Database Management - Handbook of Research on Engineering, Business, and Healthcare Applications of Data Science and Analytics ◽

10.4018/978-1-7998-3053-5.ch021 ◽

2021 ◽

pp. 416-428

Author(s):

R. Suganya ◽

Rajaram S. ◽

Kameswari M.

Keyword(s):

Machine Learning ◽

Literature Review ◽

Data Science ◽

Learning Algorithms ◽

Research Literature ◽

Machine Learning Algorithms ◽

Thyroid Disorder ◽

Classification Models ◽

Indian Women ◽

Using Data

Currently, thyroid disorders are more common and widespread among women worldwide. In India, seven out of ten women are suffering from thyroid problems. Various research literature studies predict that about 35% of Indian women are examined with prevalent goiter. It is very necessary to take preventive measures at its early stages, otherwise it causes infertility problem among women. The recent review discusses various analytics models that are used to handle different types of thyroid problems in women. This chapter is planned to analyze and compare different classification models, both machine learning algorithms and deep leaning algorithms, to classify different thyroid problems. Literature from both machine learning and deep learning algorithms is considered. This literature review on thyroid problems will help to analyze the reason and characteristics of thyroid disorder. The dataset used to build and to validate the algorithms was provided by UCI machine learning repository.

Download Full-text

Machine Learning Techniques for Internet of Things

Research Anthology on Artificial Intelligence Applications in Security ◽

10.4018/978-1-7998-7705-9.ch067 ◽

2021 ◽

pp. 1490-1506

Author(s):

P. Priakanth ◽

S. Gopikrishnan

Keyword(s):

Machine Learning ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Independent Learning ◽

Machine Learning Techniques ◽

Analytical Models ◽

Guided Learning ◽

Learning Techniques ◽

Learning Machine

Download Full-text

Introduction to Data Science and Machine Learning Algorithms

Data Science and Multiple Criteria Decision Making Approaches in Finance - Multiple Criteria Decision Making ◽

10.1007/978-3-030-74176-1_1 ◽

2021 ◽

pp. 1-15

Author(s):

Gökhan Silahtaroğlu ◽

Hasan Dinçer ◽

Serhat Yüksel

Keyword(s):

Machine Learning ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Machine learning algorithms, applications, and practices in data science

Handbook of Statistics - Principles and Methods for Data Science ◽

10.1016/bs.host.2020.01.002 ◽

2020 ◽

pp. 81-206

Author(s):

Kalidas Yeturu

Keyword(s):

Machine Learning ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Application of Data Science and Machine Learning Algorithms for ROP Optimization in West Texas: Turning Data into Knowledge

10.4043/29288-ms ◽

2019 ◽

Cited By ~ 2

Author(s):

Christine Ikram Noshi

Keyword(s):

Machine Learning ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

West Texas

Download Full-text

Large-Scale Machine Learning Algorithms for Biomedical Data Science

Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics - BCB '19 ◽

10.1145/3307339.3342130 ◽

2019 ◽

Author(s):

Heng Huang

Keyword(s):

Machine Learning ◽

Large Scale ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Biomedical Data

Download Full-text

Machine learning for determining accurate outcomes in criminal trials

Law Probability and Risk ◽

10.1093/lpr/mgaa003 ◽

2020 ◽

Vol 19 (1) ◽

pp. 43-65

Author(s):

Jane Mitchell ◽

Simon Mitchell ◽

Cliff Mitchell

Keyword(s):

Machine Learning ◽

Decision Making ◽

Data Science ◽

Positive Impact ◽

Machine Learning Algorithms ◽

Test Cases ◽

Judicial Process ◽

Criminal Trials ◽

Wrongful Convictions ◽

Potential Applications

Abstract Advances in mathematical and computational technologies have brought unique and ground-breaking benefits to diverse fields throughout society (engineering, medicine, economics, etc.). Within legal systems, however, the potential applications of data science and innovative mathematical tools have yet to be embraced with the same ambition. The complex decision-making that is needed for reaching just verdicts is often seen as out of reach for such approaches and, in the case of criminal trials, this inhibits exploration into whether machine learning could have a positive impact. Here, through assigning numerical scores to prosecution and defence evidence, and employing an approach based on dimensionality reduction, we showed that evidence strands presented at historical murder trials could be used to train effective machine-learning algorithms (or models). We tested the evidence quantification approach with the trained model and showed that, through machine learning, criminal cases could be clearly classified (probability >99.9%) as belonging to either a guilty or a not-guilty category. The classification was found to be as expected for all test cases. All guilty test cases that were not wrongful convictions were correctly assigned to the guilty category by our model and, crucially, test cases that were wrongful convictions were correctly assigned to the not-guilty category. This work demonstrated the potential for machine learning to benefit criminal trial decision-making, and should motivate further testing and development of the model and datasets for assisting the judicial process.

Download Full-text