A machine learning-based approach for estimating and testing associations with multivariate outcomes

AbstractWe propose a method for summarizing the strength of association between a set of variables and a multivariate outcome. Classical summary measures are appropriate when linear relationships exist between covariates and outcomes, while our approach provides an alternative that is useful in situations where complex relationships may be present. We utilize machine learning to detect nonlinear relationships and covariate interactions and propose a measure of association that captures these relationships. A hypothesis test about the proposed associative measure can be used to test the strong null hypothesis of no association between a set of variables and a multivariate outcome. Simulations demonstrate that this hypothesis test has greater power than existing methods against alternatives where covariates have nonlinear relationships with outcomes. We additionally propose measures of variable importance for groups of variables, which summarize each groups’ association with the outcome. We demonstrate our methodology using data from a birth cohort study on childhood health and nutrition in the Philippines.

Download Full-text

Modeling Relevance Relations Using Machine Learning Techniques

Advances in Machine Learning Applications in Software Engineering ◽

10.4018/978-1-59140-941-1.ch008 ◽

2011 ◽

pp. 168-207

Author(s):

Jelber Sayyad Shirabad ◽

Timothy C. Lethbridge ◽

Stan Matwin

Keyword(s):

Machine Learning ◽

Empirical Evaluation ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Software Engineers ◽

Field Deployment ◽

Complex Relationships ◽

Using Data

This chapter presents the notion of relevance relations, an abstraction to represent relationships between software entities. Relevance relations map tuples of software entities to values that reflect how related they are to each other. Although there are no clear definitions for these relationships, software engineers can typically identify instances of these complex relationships. We show how a classifier can model a relevance relation. We also present the process of creating such models by using data mining and machine learning techniques. In a case study, we applied this process to a large legacy system; our system learned models of a relevance relation that predict whether a change in one file may require a change in another file. Our empirical evaluation shows that the predictive quality of such models makes them a viable choice for field deployment. We also show how by assigning different misclassification costs such models can be tuned to meet the needs of the user in terms of their precision and recall.

Download Full-text

Instant medical care and drug suggestion service using data mining and machine learning based intelligent self-diagnosis medical system

International Journal of Advanced Life Sciences ◽

10.26627/ijals/2017/10.03.0022 ◽

2017 ◽

Vol 10 (03) ◽

pp. 318-325

Author(s):

sudha M

Keyword(s):

Machine Learning ◽

Data Mining ◽

Medical Care ◽

Medical System ◽

Using Data

Download Full-text

Using Machine Learning Methods to Identify Particle Types from Doppler Lidar Measurements in Iceland

Remote Sensing ◽

10.3390/rs13132433 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2433

Author(s):

Shu Yang ◽

Fengchao Peng ◽

Sibylle von Löwis ◽

Guðrún Nína Petersen ◽

David Christian Finger

Keyword(s):

Machine Learning ◽

Weather Conditions ◽

Dust Storms ◽

Machine Learning Algorithms ◽

Lidar Data ◽

Data Sets ◽

Doppler Lidar ◽

Lidar Measurements ◽

Using Data ◽

Filter Noise

Doppler lidars are used worldwide for wind monitoring and recently also for the detection of aerosols. Automatic algorithms that classify the lidar signals retrieved from lidar measurements are very useful for the users. In this study, we explore the value of machine learning to classify backscattered signals from Doppler lidars using data from Iceland. We combined supervised and unsupervised machine learning algorithms with conventional lidar data processing methods and trained two models to filter noise signals and classify Doppler lidar observations into different classes, including clouds, aerosols and rain. The results reveal a high accuracy for noise identification and aerosols and clouds classification. However, precipitation detection is underestimated. The method was tested on data sets from two instruments during different weather conditions, including three dust storms during the summer of 2019. Our results reveal that this method can provide an efficient, accurate and real-time classification of lidar measurements. Accordingly, we conclude that machine learning can open new opportunities for lidar data end-users, such as aviation safety operators, to monitor dust in the vicinity of airports.

Download Full-text

Validating Game-Theoretic Models of Terrorism: Insights from Machine Learning

Games ◽

10.3390/g12030054 ◽

2021 ◽

Vol 12 (3) ◽

pp. 54

Author(s):

James T. Bang ◽

Atin Basuchoudhary ◽

Aniruddha Mitra

Keyword(s):

Machine Learning ◽

Terror Attacks ◽

Nonlinear Relationships ◽

Game Theoretic ◽

Competing Models

There are many competing game-theoretic analyses of terrorism. Most of these models suggest nonlinear relationships between terror attacks and some variable of interest. However, to date, there have been very few attempts to empirically sift between competing models of terrorism or identify nonlinear patterns. We suggest that machine learning can be an effective way of undertaking both. This feature can help build more salient game-theoretic models to help us understand and prevent terrorism.

Download Full-text

Classification and photometric redshift estimation of quasars in photometric surveys

Proceedings of the International Astronomical Union ◽

10.1017/s1743921320001829 ◽

2020 ◽

Vol 15 (S359) ◽

pp. 40-41

Author(s):

L. M. Izuti Nakazono ◽

C. Mendes de Oliveira ◽

N. S. T. Hirata ◽

S. Jeram ◽

A. Gonzalez ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbour ◽

Random Forest Algorithm ◽

Photometric Redshift ◽

Using Data

AbstractWe present a machine learning methodology to separate quasars from galaxies and stars using data from S-PLUS in the Stripe-82 region. In terms of quasar classification, we achieved 95.49% for precision and 95.26% for recall using a Random Forest algorithm. For photometric redshift estimation, we obtained a precision of 6% using k-Nearest Neighbour.

Download Full-text

Effective Prediction of Heart Disease Using Data Mining and Machine Learning: A Review

2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS) ◽

10.1109/icais50930.2021.9395963 ◽

2021 ◽

Author(s):

Simran Verma ◽

Abhishek Gupta

Keyword(s):

Machine Learning ◽

Data Mining ◽

Heart Disease ◽

Using Data

Download Full-text

Investigating the Spread of Coronavirus Disease via Edge-AI and Air Pollution Correlation

ACM Transactions on Internet Technology ◽

10.1145/3424222 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1-10

Author(s):

V. Gomathy ◽

K. Janarthanan ◽

Fadi Al-Turjman ◽

R. Sitharthan ◽

M. Rajesh ◽

...

Keyword(s):

Machine Learning ◽

Air Pollution ◽

Mortality Rate ◽

Respiratory Diseases ◽

Past Research ◽

Viral Disease ◽

Machine Learning Method ◽

Learning Method ◽

Death Rates ◽

Using Data

Coronavirus Disease 19 (COVID-19) is a highly infectious viral disease affecting millions of people worldwide in 2020. Several studies have shown that COVID-19 results in a severe acute respiratory syndrome and may lead to death. In past research, a greater number of respiratory diseases has been caused by exposure to air pollution for long periods of time. This article investigates the spread of COVID-19 as a result of air pollution by applying linear regression in machine learning method based edge computing. The analysis in this investigation have been based on the death rates caused by COVID-19 as well as the region of death rates based on hazardous air pollution using data retrieved from the Copernicus Sentinel-5P satellite. The results obtained in the investigation prove that the mortality rate due to the spread of COVID-19 is 77% higher in areas with polluted air. This investigation also proves that COVID-19 severely affected 68% of the individuals who had been exposed to polluted air.

Download Full-text

Machine Learning Protocols in Early Cancer Detection Based on Liquid Biopsy: A Survey

Life ◽

10.3390/life11070638 ◽

2021 ◽

Vol 11 (7) ◽

pp. 638

Author(s):

Linjing Liu ◽

Xingjian Chen ◽

Olutomilayo Olayemi Petinrin ◽

Weitong Zhang ◽

Saifur Rahaman ◽

...

Keyword(s):

Machine Learning ◽

Cancer Detection ◽

Liquid Biopsy ◽

Cancer Diagnostics ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Cancer Subtype ◽

Tumor Origin ◽

Complex Relationships ◽

Learning Protocols

With the advances of liquid biopsy technology, there is increasing evidence that body fluid such as blood, urine, and saliva could harbor the potential biomarkers associated with tumor origin. Traditional correlation analysis methods are no longer sufficient to capture the high-resolution complex relationships between biomarkers and cancer subtype heterogeneity. To address the challenge, researchers proposed machine learning techniques with liquid biopsy data to explore the essence of tumor origin together. In this survey, we review the machine learning protocols and provide corresponding code demos for the approaches mentioned. We discuss algorithmic principles and frameworks extensively developed to reveal cancer mechanisms and consider the future prospects in biomarker exploration and cancer diagnostics.

Download Full-text

A New Approach to the Quality Control of Slope and Aspect Classes Derived from Digital Elevation Models

Remote Sensing ◽

10.3390/rs13112069 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2069

Author(s):

M. V. Alba-Fernández ◽

F. J. Ariza-López ◽

M. D. Jiménez-Gamero

Keyword(s):

Quality Control ◽

Hypothesis Test ◽

Simulated Data ◽

Slope Aspect ◽

New Approach ◽

Reference Set ◽

Digital Elevation ◽

Using Data ◽

Elevation Model ◽

Degree Of Similarity

The usefulness of the parameters (e.g., slope, aspect) derived from a Digital Elevation Model (DEM) is limited by its accuracy. In this paper, a thematic-like quality control (class-based) of aspect and slope classes is proposed. A product can be compared against a reference dataset, which provides the quality requirements to be achieved, by comparing the product proportions of each class with those of the reference set. If a distance between the product proportions and the reference proportions is smaller than a small enough positive tolerance, which is fixed by the user, it will be considered that the degree of similarity between the product and the reference set is acceptable, and hence that its quality meets the requirements. A formal statistical procedure, based on a hypothesis test, is developed and its performance is analyzed using simulated data. It uses the Hellinger distance between the proportions. The application to the slope and aspect is illustrated using data derived from a 2×2 m DEM (reference) and 5×5 m DEM in Allo (province of Navarra, Spain).

Download Full-text

Using of data science in healthcare

Problems of Innovation and Investment Development ◽

10.33813/2224-1213.24.2021.15 ◽

2021 ◽

pp. 149-156

Author(s):

Ihor Ponomarenko ◽

Oleksandra Lubkovska

Keyword(s):

Machine Learning ◽

Health Care ◽

Business Intelligence ◽

Data Science ◽

Large Data ◽

Science Methods ◽

Medical Field ◽

Learning Methods ◽

Machine Learning Methods ◽

Using Data

The subject of the research is the approach to the possibility of using data science methods in the field of health care for integrated data processing and analysis in order to optimize economic and specialized processes The purpose of writing this article is to address issues related to the specifics of the use of Data Science methods in the field of health care on the basis of comprehensive information obtained from various sources. Methodology. The research methodology is system-structural and comparative analyzes (to study the application of BI-systems in the process of working with large data sets); monograph (the study of various software solutions in the market of business intelligence); economic analysis (when assessing the possibility of using business intelligence systems to strengthen the competitive position of companies). The scientific novelty the main sources of data on key processes in the medical field. Examples of innovative methods of collecting information in the field of health care, which are becoming widespread in the context of digitalization, are presented. The main sources of data in the field of health care used in Data Science are revealed. The specifics of the application of machine learning methods in the field of health care in the conditions of increasing competition between market participants and increasing demand for relevant products from the population are presented. Conclusions. The intensification of the integration of Data Science in the medical field is due to the increase of digitized data (statistics, textual informa- tion, visualizations, etc.). Through the use of machine learning methods, doctors and other health professionals have new opportunities to improve the efficiency of the health care system as a whole. Key words: Data science, efficiency, information, machine learning, medicine, Python, healthcare.

Download Full-text