scholarly journals A machine learning-based approach for estimating and testing associations with multivariate outcomes

2020 ◽  
Vol 0 (0) ◽  
Author(s):  
David Benkeser ◽  
Andrew Mertens ◽  
John M. Colford ◽  
Alan Hubbard ◽  
Benjamin F. Arnold ◽  
...  

AbstractWe propose a method for summarizing the strength of association between a set of variables and a multivariate outcome. Classical summary measures are appropriate when linear relationships exist between covariates and outcomes, while our approach provides an alternative that is useful in situations where complex relationships may be present. We utilize machine learning to detect nonlinear relationships and covariate interactions and propose a measure of association that captures these relationships. A hypothesis test about the proposed associative measure can be used to test the strong null hypothesis of no association between a set of variables and a multivariate outcome. Simulations demonstrate that this hypothesis test has greater power than existing methods against alternatives where covariates have nonlinear relationships with outcomes. We additionally propose measures of variable importance for groups of variables, which summarize each groups’ association with the outcome. We demonstrate our methodology using data from a birth cohort study on childhood health and nutrition in the Philippines.

Author(s):  
Jelber Sayyad Shirabad ◽  
Timothy C. Lethbridge ◽  
Stan Matwin

This chapter presents the notion of relevance relations, an abstraction to represent relationships between software entities. Relevance relations map tuples of software entities to values that reflect how related they are to each other. Although there are no clear definitions for these relationships, software engineers can typically identify instances of these complex relationships. We show how a classifier can model a relevance relation. We also present the process of creating such models by using data mining and machine learning techniques. In a case study, we applied this process to a large legacy system; our system learned models of a relevance relation that predict whether a change in one file may require a change in another file. Our empirical evaluation shows that the predictive quality of such models makes them a viable choice for field deployment. We also show how by assigning different misclassification costs such models can be tuned to meet the needs of the user in terms of their precision and recall.


2021 ◽  
Vol 13 (13) ◽  
pp. 2433
Author(s):  
Shu Yang ◽  
Fengchao Peng ◽  
Sibylle von Löwis ◽  
Guðrún Nína Petersen ◽  
David Christian Finger

Doppler lidars are used worldwide for wind monitoring and recently also for the detection of aerosols. Automatic algorithms that classify the lidar signals retrieved from lidar measurements are very useful for the users. In this study, we explore the value of machine learning to classify backscattered signals from Doppler lidars using data from Iceland. We combined supervised and unsupervised machine learning algorithms with conventional lidar data processing methods and trained two models to filter noise signals and classify Doppler lidar observations into different classes, including clouds, aerosols and rain. The results reveal a high accuracy for noise identification and aerosols and clouds classification. However, precipitation detection is underestimated. The method was tested on data sets from two instruments during different weather conditions, including three dust storms during the summer of 2019. Our results reveal that this method can provide an efficient, accurate and real-time classification of lidar measurements. Accordingly, we conclude that machine learning can open new opportunities for lidar data end-users, such as aviation safety operators, to monitor dust in the vicinity of airports.


Games ◽  
2021 ◽  
Vol 12 (3) ◽  
pp. 54
Author(s):  
James T. Bang ◽  
Atin Basuchoudhary ◽  
Aniruddha Mitra

There are many competing game-theoretic analyses of terrorism. Most of these models suggest nonlinear relationships between terror attacks and some variable of interest. However, to date, there have been very few attempts to empirically sift between competing models of terrorism or identify nonlinear patterns. We suggest that machine learning can be an effective way of undertaking both. This feature can help build more salient game-theoretic models to help us understand and prevent terrorism.


2020 ◽  
Vol 15 (S359) ◽  
pp. 40-41
Author(s):  
L. M. Izuti Nakazono ◽  
C. Mendes de Oliveira ◽  
N. S. T. Hirata ◽  
S. Jeram ◽  
A. Gonzalez ◽  
...  

AbstractWe present a machine learning methodology to separate quasars from galaxies and stars using data from S-PLUS in the Stripe-82 region. In terms of quasar classification, we achieved 95.49% for precision and 95.26% for recall using a Random Forest algorithm. For photometric redshift estimation, we obtained a precision of 6% using k-Nearest Neighbour.


2021 ◽  
Vol 21 (4) ◽  
pp. 1-10
Author(s):  
V. Gomathy ◽  
K. Janarthanan ◽  
Fadi Al-Turjman ◽  
R. Sitharthan ◽  
M. Rajesh ◽  
...  

Coronavirus Disease 19 (COVID-19) is a highly infectious viral disease affecting millions of people worldwide in 2020. Several studies have shown that COVID-19 results in a severe acute respiratory syndrome and may lead to death. In past research, a greater number of respiratory diseases has been caused by exposure to air pollution for long periods of time. This article investigates the spread of COVID-19 as a result of air pollution by applying linear regression in machine learning method based edge computing. The analysis in this investigation have been based on the death rates caused by COVID-19 as well as the region of death rates based on hazardous air pollution using data retrieved from the Copernicus Sentinel-5P satellite. The results obtained in the investigation prove that the mortality rate due to the spread of COVID-19 is 77% higher in areas with polluted air. This investigation also proves that COVID-19 severely affected 68% of the individuals who had been exposed to polluted air.


Life ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 638
Author(s):  
Linjing Liu ◽  
Xingjian Chen ◽  
Olutomilayo Olayemi Petinrin ◽  
Weitong Zhang ◽  
Saifur Rahaman ◽  
...  

With the advances of liquid biopsy technology, there is increasing evidence that body fluid such as blood, urine, and saliva could harbor the potential biomarkers associated with tumor origin. Traditional correlation analysis methods are no longer sufficient to capture the high-resolution complex relationships between biomarkers and cancer subtype heterogeneity. To address the challenge, researchers proposed machine learning techniques with liquid biopsy data to explore the essence of tumor origin together. In this survey, we review the machine learning protocols and provide corresponding code demos for the approaches mentioned. We discuss algorithmic principles and frameworks extensively developed to reveal cancer mechanisms and consider the future prospects in biomarker exploration and cancer diagnostics.


2021 ◽  
Vol 13 (11) ◽  
pp. 2069
Author(s):  
M. V. Alba-Fernández ◽  
F. J. Ariza-López ◽  
M. D. Jiménez-Gamero

The usefulness of the parameters (e.g., slope, aspect) derived from a Digital Elevation Model (DEM) is limited by its accuracy. In this paper, a thematic-like quality control (class-based) of aspect and slope classes is proposed. A product can be compared against a reference dataset, which provides the quality requirements to be achieved, by comparing the product proportions of each class with those of the reference set. If a distance between the product proportions and the reference proportions is smaller than a small enough positive tolerance, which is fixed by the user, it will be considered that the degree of similarity between the product and the reference set is acceptable, and hence that its quality meets the requirements. A formal statistical procedure, based on a hypothesis test, is developed and its performance is analyzed using simulated data. It uses the Hellinger distance between the proportions. The application to the slope and aspect is illustrated using data derived from a 2×2 m DEM (reference) and 5×5 m DEM in Allo (province of Navarra, Spain).


Author(s):  
Ihor Ponomarenko ◽  
Oleksandra Lubkovska

The subject of the research is the approach to the possibility of using data science methods in the field of health care for integrated data processing and analysis in order to optimize economic and specialized processes The purpose of writing this article is to address issues related to the specifics of the use of Data Science methods in the field of health care on the basis of comprehensive information obtained from various sources. Methodology. The research methodology is system-structural and comparative analyzes (to study the application of BI-systems in the process of working with large data sets); monograph (the study of various software solutions in the market of business intelligence); economic analysis (when assessing the possibility of using business intelligence systems to strengthen the competitive position of companies). The scientific novelty the main sources of data on key processes in the medical field. Examples of innovative methods of collecting information in the field of health care, which are becoming widespread in the context of digitalization, are presented. The main sources of data in the field of health care used in Data Science are revealed. The specifics of the application of machine learning methods in the field of health care in the conditions of increasing competition between market participants and increasing demand for relevant products from the population are presented. Conclusions. The intensification of the integration of Data Science in the medical field is due to the increase of digitized data (statistics, textual informa- tion, visualizations, etc.). Through the use of machine learning methods, doctors and other health professionals have new opportunities to improve the efficiency of the health care system as a whole. Key words: Data science, efficiency, information, machine learning, medicine, Python, healthcare.


Sign in / Sign up

Export Citation Format

Share Document