scholarly journals Meta-analysis cum machine learning approaches address the structure and biogeochemical potential of marine copepod associated bacteriobiomes

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Balamurugan Sadaiappan ◽  
Chinnamani PrasannaKumar ◽  
V. Uthara Nambiar ◽  
Mahendran Subramanian ◽  
Manguesh U. Gauns

AbstractCopepods are the dominant members of the zooplankton community and the most abundant form of life. It is imperative to obtain insights into the copepod-associated bacteriobiomes (CAB) in order to identify specific bacterial taxa associated within a copepod, and to understand how they vary between different copepods. Analysing the potential genes within the CAB may reveal their intrinsic role in biogeochemical cycles. For this, machine-learning models and PICRUSt2 analysis were deployed to analyse 16S rDNA gene sequences (approximately 16 million reads) of CAB belonging to five different copepod genera viz., Acartia spp., Calanus spp., Centropages sp., Pleuromamma spp., and Temora spp.. Overall, we predict 50 sub-OTUs (s-OTUs) (gradient boosting classifiers) to be important in five copepod genera. Among these, 15 s-OTUs were predicted to be important in Calanus spp. and 20 s-OTUs as important in Pleuromamma spp.. Four bacterial s-OTUs Acinetobacter johnsonii, Phaeobacter, Vibrio shilonii and Piscirickettsiaceae were identified as important s-OTUs in Calanus spp., and the s-OTUs Marinobacter, Alteromonas, Desulfovibrio, Limnobacter, Sphingomonas, Methyloversatilis, Enhydrobacter and Coriobacteriaceae were predicted as important s-OTUs in Pleuromamma spp., for the first time. Our meta-analysis revealed that the CAB of Pleuromamma spp. had a high proportion of potential genes responsible for methanogenesis and nitrogen fixation, whereas the CAB of Temora spp. had a high proportion of potential genes involved in assimilatory sulphate reduction, and cyanocobalamin synthesis. The CAB of Pleuromamma spp. and Temora spp. have potential genes accountable for iron transport.

2021 ◽  
Author(s):  
Balamurugan Sadaiappan ◽  
Prasannakumar Chinnamani ◽  
Uthara V Nambiar ◽  
Mahendran Subramanian ◽  
Manguesh U Gauns

2019 ◽  
Vol 11 (10) ◽  
pp. 1195 ◽  
Author(s):  
Minsang Kim ◽  
Myung-Sook Park ◽  
Jungho Im ◽  
Seonyoung Park ◽  
Myong-In Lee

This study compared detection skill for tropical cyclone (TC) formation using models based on three different machine learning (ML) algorithms-decision trees (DT), random forest (RF), and support vector machines (SVM)-and a model based on Linear Discriminant Analysis (LDA). Eight predictors were derived from WindSat satellite measurements of ocean surface wind and precipitation over the western North Pacific for 2005–2009. All of the ML approaches performed better with significantly higher hit rates ranging from 94 to 96% compared with LDA performance (~77%), although false alarm rate by MLs is slightly higher (21–28%) than that by LDA (~13%). Besides, MLs could detect TC formation at the time as early as 26–30 h before the first time diagnosed as tropical depression by the JTWC best track, which was also 5 to 9 h earlier than that by LDA. The skill differences across MLs were relatively smaller than difference between MLs and LDA. Large yearly variation in forecast lead time was common in all models due to the limitation in sampling from orbiting satellite. This study highlights that ML approaches provide an improved skill for detecting TC formation compared with conventional linear approaches.


2018 ◽  
Vol 7 (11) ◽  
pp. 428 ◽  
Author(s):  
Hyung-Chul Lee ◽  
Soo Yoon ◽  
Seong-Mi Yang ◽  
Won Kim ◽  
Ho-Geol Ryu ◽  
...  

Acute kidney injury (AKI) after liver transplantation has been reported to be associated with increased mortality. Recently, machine learning approaches were reported to have better predictive ability than the classic statistical analysis. We compared the performance of machine learning approaches with that of logistic regression analysis to predict AKI after liver transplantation. We reviewed 1211 patients and preoperative and intraoperative anesthesia and surgery-related variables were obtained. The primary outcome was postoperative AKI defined by acute kidney injury network criteria. The following machine learning techniques were used: decision tree, random forest, gradient boosting machine, support vector machine, naïve Bayes, multilayer perceptron, and deep belief networks. These techniques were compared with logistic regression analysis regarding the area under the receiver-operating characteristic curve (AUROC). AKI developed in 365 patients (30.1%). The performance in terms of AUROC was best in gradient boosting machine among all analyses to predict AKI of all stages (0.90, 95% confidence interval [CI] 0.86–0.93) or stage 2 or 3 AKI. The AUROC of logistic regression analysis was 0.61 (95% CI 0.56–0.66). Decision tree and random forest techniques showed moderate performance (AUROC 0.86 and 0.85, respectively). The AUROC of support the vector machine, naïve Bayes, neural network, and deep belief network was smaller than that of the other models. In our comparison of seven machine learning approaches with logistic regression analysis, the gradient boosting machine showed the best performance with the highest AUROC. An internet-based risk estimator was developed based on our model of gradient boosting. However, prospective studies are required to validate our results.


2020 ◽  
Vol 7 (1) ◽  
pp. 190824
Author(s):  
Jasmeet Kaler ◽  
Jurgen Mitsch ◽  
Jorge A. Vázquez-Diosdado ◽  
Nicola Bollard ◽  
Tania Dottorini ◽  
...  

Lameness in sheep is the biggest cause of concern regarding poor health and welfare among sheep-producing countries. Best practice for lameness relies on rapid treatment, yet there are no objective measures of lameness detection. Accelerometers and gyroscopes have been widely used in human activity studies and their use is becoming increasingly common in livestock. In this study, we used 23 datasets (10 non-lame and 13 lame sheep) from an accelerometer- and gyroscope-based ear sensor with a sampling frequency of 16 Hz to develop and compare algorithms that can differentiate lameness within three different activities (walking, standing and lying). We show for the first time that features extracted from accelerometer and gyroscope signals can differentiate between lame and non-lame sheep while standing, walking and lying. The random forest algorithm performed best for classifying lameness with an accuracy of 84.91% within lying, 81.15% within standing and 76.83% within walking and overall correctly classified over 80% sheep within activities. Both accelerometer- and gyroscope-based features ranked among the top 10 features for classification. Our results suggest that novel behavioural differences between lame and non-lame sheep across all three activities could be used to develop an automated system for lameness detection.


2020 ◽  
Vol 5 (8) ◽  
pp. 62
Author(s):  
Clint Morris ◽  
Jidong J. Yang

Generating meaningful inferences from crash data is vital to improving highway safety. Classic statistical methods are fundamental to crash data analysis and often regarded for their interpretability. However, given the complexity of crash mechanisms and associated heterogeneity, classic statistical methods, which lack versatility, might not be sufficient for granular crash analysis because of the high dimensional features involved in crash-related data. In contrast, machine learning approaches, which are more flexible in structure and capable of harnessing richer data sources available today, emerges as a suitable alternative. With the aid of new methods for model interpretation, the complex machine learning models, previously considered enigmatic, can be properly interpreted. In this study, two modern machine learning techniques, Linear Discriminate Analysis and eXtreme Gradient Boosting, were explored to classify three major types of multi-vehicle crashes (i.e., rear-end, same-direction sideswipe, and angle) occurred on Interstate 285 in Georgia. The study demonstrated the utility and versatility of modern machine learning methods in the context of crash analysis, particularly in understanding the potential features underlying different crash patterns on freeways.


2019 ◽  
Author(s):  
Allan C. Just ◽  
Yang Liu ◽  
Meytar Sorek-Hamer ◽  
Johnathan Rush ◽  
Michael Dorman ◽  
...  

Abstract. The atmospheric products of the Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm include column water vapor (CWV) at 1 km resolution, derived from daily overpasses of NASA’s Moderate Resolution Imaging Spectroradiometer (MODIS) instruments aboard the Aqua and Terra satellites. We have recently shown that machine learning using extreme gradient boosting (XGBoost) can improve the estimation of MAIAC aerosol optical depth (AOD). Although MAIAC CWV is generally well validated (Pearson’s R > 0.97 versus CWV from AERONET sun photometers), it has not yet been assessed whether machine-learning approaches can further improve CWV. Using a novel spatiotemporal cross-validation approach to avoid overfitting, our XGBoost model with nine features derived from land use terms, date, and ancillary variables from the MAIAC retrieval, quantifies and can correct a substantial portion of measurement error relative to collocated measures at AERONET sites (26.9 % and 16.5 % decrease in Root Mean Square Error (RMSE) for Terra and Aqua datasets, respectively) in the Northeastern USA, 2000–2015. We use machine-learning interpretation tools to illustrate complex patterns of measurement error and describe a positive bias in MAIAC Terra CWV worsening in recent summertime conditions. We validate our predictive model on MAIAC CWV estimates at independent stations from the SuomiNet GPS network where our corrections decrease the RMSE by 19.7 % and 9.5 % for Terra and Aqua MAIAC CWV. Empirically correcting for measurement error with machine-learning algorithms is a post-processing opportunity to improve satellite-derived CWV data for Earth science and remote sensing applications.


2020 ◽  
Vol 13 (9) ◽  
pp. 4669-4681
Author(s):  
Allan C. Just ◽  
Yang Liu ◽  
Meytar Sorek-Hamer ◽  
Johnathan Rush ◽  
Michael Dorman ◽  
...  

Abstract. The atmospheric products of the Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm include column water vapor (CWV) at a 1 km resolution, derived from daily overpasses of NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) instruments aboard the Aqua and Terra satellites. We have recently shown that machine learning using extreme gradient boosting (XGBoost) can improve the estimation of MAIAC aerosol optical depth (AOD). Although MAIAC CWV is generally well validated (Pearson's R > 0.97 versus CWV from AERONET sun photometers), it has not yet been assessed whether machine-learning approaches can further improve CWV. Using a novel spatiotemporal cross-validation approach to avoid overfitting, our XGBoost model, with nine features derived from land use terms, date, and ancillary variables from the MAIAC retrieval, quantifies and can correct a substantial portion of measurement error relative to collocated measurements at AERONET sites (26.9 % and 16.5 % decrease in root mean square error (RMSE) for Terra and Aqua datasets, respectively) in the Northeastern USA, 2000–2015. We use machine-learning interpretation tools to illustrate complex patterns of measurement error and describe a positive bias in MAIAC Terra CWV worsening in recent summertime conditions. We validate our predictive model on MAIAC CWV estimates at independent stations from the SuomiNet GPS network where our corrections decrease the RMSE by 19.7 % and 9.5 % for Terra and Aqua MAIAC CWV. Empirically correcting for measurement error with machine-learning algorithms is a postprocessing opportunity to improve satellite-derived CWV data for Earth science and remote sensing applications.


2020 ◽  
Author(s):  
Hang Qiu ◽  
Lin Luo ◽  
Ziqi Su ◽  
Li Zhou ◽  
Liya Wang ◽  
...  

Abstract Background: Accumulating evidence has linked environmental exposure, such as ambient air pollution and meteorological factors, to the development and severity of cardiovascular diseases (CVDs), resulting in increased healthcare demand. Effective prediction of demand for healthcare services, particularly those associated with peak events of CVDs, can be useful in optimizing the allocation of medical resources. However, few studies have attempted to adopt machine learning approaches with excellent predictive abilities to forecast the healthcare demand for CVDs. This study aims to develop and compare several machine learning models in predicting the peak demand days of CVDs admissions using the hospital admissions data, air quality data and meteorological data in Chengdu, China from 2015 to 2017.Methods: Six machine learning algorithms, including logistic regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) were applied to build the predictive models with a unique feature set. The area under a receiver operating characteristic curve (AUC), logarithmic loss function, accuracy, sensitivity, specificity, precision, and F1 score were used to evaluate the predictive performances of the six models.Results: The LightGBM model exhibited the highest AUC (0.940, 95% CI: 0.900-0.980), which was significantly higher than that of LR (0.842, 95% CI: 0.783-0.901), SVM (0.834, 95% CI: 0.774-0.894) and ANN (0.890, 95% CI: 0.836-0.944), but did not differ significantly from that of RF (0.926, 95% CI: 0.879-0.974) and XGBoost (0.930, 95% CI: 0.878-0.982). In addition, the LightGBM has the optimal logarithmic loss function (0.218), accuracy (91.3%), specificity (94.1%), precision (0.695), and F1 score (0.725). Feature importance identification indicated that the contribution rate of meteorological conditions and air pollutants for the prediction was 32% and 43%, respectively.Conclusion: This study suggests that ensemble learning models, especially the LightGBM model, can be used to effectively predict the peak events of CVDs admissions, and therefore could be a very useful decision-making tool for medical resource management.


Author(s):  
Gebreab K. Zewdie ◽  
David J. Lary ◽  
Estelle Levetin ◽  
Gemechu F. Garuma

Allergies to airborne pollen are a significant issue affecting millions of Americans. Consequently, accurately predicting the daily concentration of airborne pollen is of significant public benefit in providing timely alerts. This study presents a method for the robust estimation of the concentration of airborne Ambrosia pollen using a suite of machine learning approaches including deep learning and ensemble learners. Each of these machine learning approaches utilize data from the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric weather and land surface reanalysis. The machine learning approaches used for developing a suite of empirical models are deep neural networks, extreme gradient boosting, random forests and Bayesian ridge regression methods for developing our predictive model. The training data included twenty-four years of daily pollen concentration measurements together with ECMWF weather and land surface reanalysis data from 1987 to 2011 is used to develop the machine learning predictive models. The last six years of the dataset from 2012 to 2017 is used to independently test the performance of the machine learning models. The correlation coefficients between the estimated and actual pollen abundance for the independent validation datasets for the deep neural networks, random forest, extreme gradient boosting and Bayesian ridge were 0.82, 0.81, 0.81 and 0.75 respectively, showing that machine learning can be used to effectively forecast the concentrations of airborne pollen.


Sign in / Sign up

Export Citation Format

Share Document