scholarly journals Anatomy of a Data Science Software Toolkit That Uses Machine Learning to Aid ‘Bench-to-Bedside’ Medical Research—With Essential Concepts of Data Mining and Analysis Explained

2021 ◽  
Vol 11 (24) ◽  
pp. 12135
Author(s):  
László Beinrohr ◽  
Eszter Kail ◽  
Péter Piros ◽  
Erzsébet Tóth ◽  
Rita Fleiner ◽  
...  

Data science and machine learning are buzzwords of the early 21st century. Now pervasive through human civilization, how do these concepts translate to use by researchers and clinicians in the life-science and medical field? Here, we describe a software toolkit, just large enough in scale, so that it can be maintained and extended by a small team, optimised for problems that arise in small/medium laboratories. In particular, this system may be managed from data ingestion statistics preparation predictions by a single person. At the system’s core is a graph type database, so that it is flexible in terms of irregular, constantly changing data types, as such data types are common during explorative research. At the system’s outermost shell, the concept of ’user stories’ is introduced to help the end-user researchers perform various tasks separated by their expertise: these range from simple data input, data curation, statistics, and finally to predictions via machine learning algorithms. We compiled a sizable list of already existing, modular Python platform libraries usable for data analysis that may be used as a reference in the field and may be incorporated into this software. We also provide an insight into basic concepts, such as labelled-unlabelled data, supervised vs. unsupervised learning, regression vs. classification, evaluation by different error metrics, and an advanced concept of cross-validation. Finally, we show some examples from our laboratory using our blood sample and blood clot data from thrombosis patients (sufferers from stroke, heart and peripheral thrombosis disease) and how such tools can help to set up realistic expectations and show caveats.

2020 ◽  
Vol 12 (7) ◽  
pp. 1218
Author(s):  
Laura Tuşa ◽  
Mahdi Khodadadzadeh ◽  
Cecilia Contreras ◽  
Kasra Rafiezadeh Shahi ◽  
Margret Fuchs ◽  
...  

Due to the extensive drilling performed every year in exploration campaigns for the discovery and evaluation of ore deposits, drill-core mapping is becoming an essential step. While valuable mineralogical information is extracted during core logging by on-site geologists, the process is time consuming and dependent on the observer and individual background. Hyperspectral short-wave infrared (SWIR) data is used in the mining industry as a tool to complement traditional logging techniques and to provide a rapid and non-invasive analytical method for mineralogical characterization. Additionally, Scanning Electron Microscopy-based image analyses using a Mineral Liberation Analyser (SEM-MLA) provide exhaustive high-resolution mineralogical maps, but can only be performed on small areas of the drill-cores. We propose to use machine learning algorithms to combine the two data types and upscale the quantitative SEM-MLA mineralogical data to drill-core scale. This way, quasi-quantitative maps over entire drill-core samples are obtained. Our upscaling approach increases result transparency and reproducibility by employing physical-based data acquisition (hyperspectral imaging) combined with mathematical models (machine learning). The procedure is tested on 5 drill-core samples with varying training data using random forests, support vector machines and neural network regression models. The obtained mineral abundance maps are further used for the extraction of mineralogical parameters such as mineral association.


Author(s):  
Prof. Dr. R. Sandhiya

In recent times, the diagnosis of heart disease has become a very critical task in the medical field. In the modern age, one person dies every minute due to heart disease. Data science has an important role in processing big amounts of data in the field of health sciences. Since the diagnosis of heart disease is a complex task, the assessment process should be automated to avoid the risks associated with it and alert the patient in advance. This paper uses the heart disease dataset available in the UCI Machine Learning Repository. The proposed work assesses the risk of heart disease in a patient by applying various data mining methods such as Naive Bayes, Decision Tree, KNN, Linear SVM, RBF SVM, Gaussian Process, Neural Network, Adabost, QDA and Random Forest. This paper provides a comparative study by analyzing the performance of various machine learning algorithms. Test results confirm that the KNN algorithm achieved the highest 97% accuracy compared to other implemented ML algorithms.


Author(s):  
P. Priakanth ◽  
S. Gopikrishnan

The idea of an intelligent, independent learning machine has fascinated humans for decades. The philosophy behind machine learning is to automate the creation of analytical models in order to enable algorithms to learn continuously with the help of available data. Since IoT will be among the major sources of new data, data science will make a great contribution to make IoT applications more intelligent. Machine learning can be applied in cases where the desired outcome is known (guided learning) or the data is not known beforehand (unguided learning) or the learning is the result of interaction between a model and the environment (reinforcement learning). This chapter answers the questions: How could machine learning algorithms be applied to IoT smart data? What is the taxonomy of machine learning algorithms that can be adopted in IoT? And what are IoT data characteristics in real-world which requires data analytics?


Author(s):  
R. Suganya ◽  
Rajaram S. ◽  
Kameswari M.

Currently, thyroid disorders are more common and widespread among women worldwide. In India, seven out of ten women are suffering from thyroid problems. Various research literature studies predict that about 35% of Indian women are examined with prevalent goiter. It is very necessary to take preventive measures at its early stages, otherwise it causes infertility problem among women. The recent review discusses various analytics models that are used to handle different types of thyroid problems in women. This chapter is planned to analyze and compare different classification models, both machine learning algorithms and deep leaning algorithms, to classify different thyroid problems. Literature from both machine learning and deep learning algorithms is considered. This literature review on thyroid problems will help to analyze the reason and characteristics of thyroid disorder. The dataset used to build and to validate the algorithms was provided by UCI machine learning repository.


Author(s):  
P. Priakanth ◽  
S. Gopikrishnan

The idea of an intelligent, independent learning machine has fascinated humans for decades. The philosophy behind machine learning is to automate the creation of analytical models in order to enable algorithms to learn continuously with the help of available data. Since IoT will be among the major sources of new data, data science will make a great contribution to make IoT applications more intelligent. Machine learning can be applied in cases where the desired outcome is known (guided learning) or the data is not known beforehand (unguided learning) or the learning is the result of interaction between a model and the environment (reinforcement learning). This chapter answers the questions: How could machine learning algorithms be applied to IoT smart data? What is the taxonomy of machine learning algorithms that can be adopted in IoT? And what are IoT data characteristics in real-world which requires data analytics?


2020 ◽  
Vol 19 (1) ◽  
pp. 43-65
Author(s):  
Jane Mitchell ◽  
Simon Mitchell ◽  
Cliff Mitchell

Abstract Advances in mathematical and computational technologies have brought unique and ground-breaking benefits to diverse fields throughout society (engineering, medicine, economics, etc.). Within legal systems, however, the potential applications of data science and innovative mathematical tools have yet to be embraced with the same ambition. The complex decision-making that is needed for reaching just verdicts is often seen as out of reach for such approaches and, in the case of criminal trials, this inhibits exploration into whether machine learning could have a positive impact. Here, through assigning numerical scores to prosecution and defence evidence, and employing an approach based on dimensionality reduction, we showed that evidence strands presented at historical murder trials could be used to train effective machine-learning algorithms (or models). We tested the evidence quantification approach with the trained model and showed that, through machine learning, criminal cases could be clearly classified (probability >99.9%) as belonging to either a guilty or a not-guilty category. The classification was found to be as expected for all test cases. All guilty test cases that were not wrongful convictions were correctly assigned to the guilty category by our model and, crucially, test cases that were wrongful convictions were correctly assigned to the not-guilty category. This work demonstrated the potential for machine learning to benefit criminal trial decision-making, and should motivate further testing and development of the model and datasets for assisting the judicial process.


Sign in / Sign up

Export Citation Format

Share Document