An Automated Snow Mapper Powered by Machine Learning

Snow preserves fresh water and impacts regional climate and the environment. Enabled by modern satellite Earth observations, fast and accurate automated snow mapping is now possible. In this study, we developed the Automated Snow Mapper Powered by Machine Learning (AutoSMILE), which is the first machine learning-based open-source system for snow mapping. It is built in a Python environment based on object-based analysis. AutoSMILE was first applied in a mountainous area of 1002 km2 in Bome County, eastern Tibetan Plateau. A multispectral image from Sentinel-2B, a digital elevation model, and machine learning algorithms such as random forest and convolutional neural network, were utilized. Taking only 5% of the study area as the training zone, AutoSMILE yielded an extraordinarily satisfactory result over the rest of the study area: the producer’s accuracy, user’s accuracy, intersection over union and overall accuracy reached 99.42%, 98.78%, 98.21% and 98.76%, respectively, at object level, corresponding to 98.84%, 98.35%, 97.23% and 98.07%, respectively, at pixel level. The model trained in Bome County was subsequently used to map snow at the Qimantag Mountain region in the northern Tibetan Plateau, and a high overall accuracy of 97.22% was achieved. AutoSMILE outperformed threshold-based methods at both sites and exhibited superior performance especially in handling complex land covers. The outstanding performance and robustness of AutoSMILE in the case studies suggest that AutoSMILE is a fast and reliable tool for large-scale high-accuracy snow mapping and monitoring.

Download Full-text

Evolution of eastern Tibetan river systems is driven by the indentation of India

Communications Earth & Environment ◽

10.1038/s43247-021-00330-4 ◽

2021 ◽

Vol 2 (1) ◽

Author(s):

Yi Chen ◽

Baosheng Wu ◽

Zhongyu Xiong ◽

Jinbo Zan ◽

Bangwen Zhang ◽

...

Keyword(s):

Tibetan Plateau ◽

Large Scale ◽

Network Models ◽

The Tibetan Plateau ◽

River Systems ◽

Channel Network ◽

Erosion Rates ◽

Digital Elevation ◽

Tectonically Active ◽

Elevation Model

AbstractThe main rivers that originate from the Tibetan Plateau are important as a resource and for the sedimentary and biogeochemical exchange between mountains and oceans. However, the dominant mechanism for the evolution of eastern Tibetan river systems remains ambiguous. Here we conduct geomorphological analyses of river systems and assess catchment-average erosion rates in the eastern Tibetan Plateau using a digital elevation model and cosmogenic radionuclide data. We find that major dividing ranges have northeast oriented asymmetric geometries and that erosion rates reduce in the same direction. This coincides with the northeastward indentation of India and we suggest this indicates a primarily tectonic influence on the large-scale configuration of eastern Tibetan river systems. In contrast, low-level streams appear to be controlled by fluvial self-organization processes. We propose that this distinction between high- and low-order channel evolution highlights the importance of local optimization of optimal channel network models in tectonically active areas.

Download Full-text

Efficient Image Retrieval approach for Large-scale Chest X Ray data using Hand-Crafted Features and Machine Learning Algorithms

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i11.890896 ◽

2018 ◽

Vol 6 (11) ◽

pp. 890-896

Author(s):

Irene Getzi S ◽

D. Christopher Durairaj ◽

V Joseph Raj

Keyword(s):

Machine Learning ◽

Image Retrieval ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

X Ray ◽

Chest X Ray

Download Full-text

Automotive OEM Demand Forecasting: A Comparative Study of Forecasting Algorithms and Strategies

Applied Sciences ◽

10.3390/app11156787 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6787

Author(s):

Jože M. Rožanec ◽

Blaž Kažič ◽

Maja Škrjanc ◽

Blaž Fortuna ◽

Dunja Mladenić

Keyword(s):

Machine Learning ◽

Automotive Industry ◽

Demand Management ◽

Demand Forecasting ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Forecast Errors ◽

Manufacturing Companies ◽

Case Scenario ◽

Crucial Component

Demand forecasting is a crucial component of demand management, directly impacting manufacturing companies’ planning, revenues, and actors through the supply chain. We evaluate 21 baseline, statistical, and machine learning algorithms to forecast smooth and erratic demand on a real-world use case scenario. The products’ data were obtained from a European original equipment manufacturer targeting the global automotive industry market. Our research shows that global machine learning models achieve superior performance than local models. We show that forecast errors from global models can be constrained by pooling product data based on the past demand magnitude. We also propose a set of metrics and criteria for a comprehensive understanding of demand forecasting models’ performance.

Download Full-text

Clinician checklist for assessing suitability of machine learning applications in healthcare

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100251 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100251

Author(s):

Ian Scott ◽

Stacey Carter ◽

Enrico Coiera

Keyword(s):

Machine Learning ◽

Large Scale ◽

Clinical Decision Making ◽

Improve Patient Care ◽

Clinical Decision ◽

Routine Care ◽

Machine Learning Algorithms ◽

Clinical Settings ◽

Machine Learning Applications ◽

Key Issues

Machine learning algorithms are being used to screen and diagnose disease, prognosticate and predict therapeutic responses. Hundreds of new algorithms are being developed, but whether they improve clinical decision making and patient outcomes remains uncertain. If clinicians are to use algorithms, they need to be reassured that key issues relating to their validity, utility, feasibility, safety and ethical use have been addressed. We propose a checklist of 10 questions that clinicians can ask of those advocating for the use of a particular algorithm, but which do not expect clinicians, as non-experts, to demonstrate mastery over what can be highly complex statistical and computational concepts. The questions are: (1) What is the purpose and context of the algorithm? (2) How good were the data used to train the algorithm? (3) Were there sufficient data to train the algorithm? (4) How well does the algorithm perform? (5) Is the algorithm transferable to new clinical settings? (6) Are the outputs of the algorithm clinically intelligible? (7) How will this algorithm fit into and complement current workflows? (8) Has use of the algorithm been shown to improve patient care and outcomes? (9) Could the algorithm cause patient harm? and (10) Does use of the algorithm raise ethical, legal or social concerns? We provide examples where an algorithm may raise concerns and apply the checklist to a recent review of diagnostic imaging applications. This checklist aims to assist clinicians in assessing algorithm readiness for routine care and identify situations where further refinement and evaluation is required prior to large-scale use.

Download Full-text

57 Precision neoantigen discovery using novel algorithms and expanded HLA-ligandome datasets

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0057 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A62-A62

Author(s):

Dattatreya Mellacheruvu ◽

Rachel Pyke ◽

Charles Abbott ◽

Nick Phillips ◽

Sejal Desai ◽

...

Keyword(s):

Machine Learning ◽

Cell Lines ◽

Antigen Processing ◽

Large Scale ◽

Prediction Models ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Training Data ◽

High Quality ◽

Tissue Samples

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.

Download Full-text

Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences

Bioinformatics ◽

10.1093/bioinformatics/btv674 ◽

2015 ◽

Vol 32 (6) ◽

pp. 821-827 ◽

Cited By ~ 19

Author(s):

Enrique Audain ◽

Yassel Ramos ◽

Henning Hermjakob ◽

Darren R. Flower ◽

Yasset Perez-Riverol

Keyword(s):

Machine Learning ◽

Isoelectric Point ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Basis Set ◽

Superior Performance ◽

Supplementary Information ◽

Training Dataset ◽

Accurate Estimation ◽

Prediction Methods

Abstract Motivation: In any macromolecular polyprotic system—for example protein, DNA or RNA—the isoelectric point—commonly referred to as the pI—can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge—and thus the electrophoretic mobility—of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: [email protected] Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

Digital Soil Mapping Using Machine Learning Algorithms in a Tropical Mountainous Area

Revista Brasileira de Ciência do Solo ◽

10.1590/18069657rbcs20170421 ◽

2018 ◽

Vol 42 (0) ◽

Cited By ~ 5

Author(s):

Martin Meier ◽

Eliana de Souza ◽

Marcio Rocha Francelino ◽

Elpídio Inácio Fernandes Filho ◽

Carlos Ernesto Gonçalves Reynaud Schaefer

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Soil Mapping ◽

Machine Learning Algorithms ◽

Digital Soil Mapping ◽

Mountainous Area

Download Full-text

Cereal yield forecasting combining satellite drought-based indices, regional climate and weather data using machine learning approaches in Morocco

10.5194/egusphere-egu21-14590 ◽

2021 ◽

Author(s):

El houssaine Bouras ◽

Lionel Jarlan ◽

Salah Er-Raki ◽

Riad Balaghi ◽

Abdelhakim Amazirh ◽

...

Keyword(s):

Machine Learning ◽

Regional Climate ◽

Model Development ◽

Machine Learning Algorithms ◽

Weather Data ◽

Drought Indices ◽

Support Vector ◽

Learning Approaches ◽

Climate Data ◽

Yield Forecasting

<p>Cereals are the main crop in Morocco. Its production exhibits a high inter-annual due to uncertain rainfall and recurrent drought periods. Considering the importance of this resource to the country's economy, it is thus important for decision makers to have reliable forecasts of the annual cereal production in order to pre-empt importation needs. In this study, we assessed the joint use of satellite-based drought indices, weather (precipitation and temperature) and climate data (pseudo-oscillation indices including NAO and the leading modes of sea surface temperature -SST- in the mid-latitude and in the tropical area) to predict cereal yields at the level of the agricultural province using machine learning algorithms (Support Vector Machine -SVM-, Random forest -FR- and eXtreme Gradient Boost -XGBoost-) in addition to Multiple Linear Regression (MLR). Also, we evaluate the models for different lead times along the growing season from January (about 5 months before harvest) to March (2 months before harvest). The results show the combination of data from the different sources outperformed the use of a single dataset; the highest accuracy being obtained when the three data sources were all considered in the model development. In addition, the results show that the models can accurately predict yields in January (5 months before harvesting) with an R&#178; = 0.90 and RMSE about 3.4 Qt.ha<sup>-1</sup>. &#160;When comparing the model&#8217;s performance, XGBoost represents the best one for predicting yields. Also, considering specific models for each province separately improves the statistical metrics by approximately 10-50% depending on the province with regards to one global model applied to all the provinces. The results of this study pointed out that machine learning is a promising tool for cereal yield forecasting. Also, the proposed methodology can be extended to different crops and different regions for crop yield forecasting.</p>

Download Full-text

Essentiality of Machine Learning Algorithms for Big Data Computation

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch011 ◽

2016 ◽

pp. 156-167

Author(s):

Manjunath Thimmasandra Narayanapppa ◽

T. P. Puneeth Kumar ◽

Ravindra S. Hegadi

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Learning Algorithms ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Real Time Analysis ◽

Large Scale Data ◽

Computational Environment ◽

Large Scale Data Processing

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.

Download Full-text

Fusion Analysis of Optical Satellite Images and Digital Elevation Model for Quantifying Volume in Debris Flow Disaster

Remote Sensing ◽

10.3390/rs11091096 ◽

2019 ◽

Vol 11 (9) ◽

pp. 1096 ◽

Cited By ~ 2

Author(s):

Hiroyuki Miura

Keyword(s):

Debris Flow ◽

Digital Elevation Model ◽

Large Scale ◽

Satellite Images ◽

Erosion Depth ◽

Digital Elevation ◽

Elevation Model ◽

Fusion Analysis ◽

Optical Satellite Images ◽

Debris Flow Disaster

Rapid identification of affected areas and volumes in a large-scale debris flow disaster is important for early-stage recovery and debris management planning. This study introduces a methodology for fusion analysis of optical satellite images and digital elevation model (DEM) for simplified quantification of volumes in a debris flow event. The LiDAR data, the pre- and post-event Sentinel-2 images and the pre-event DEM in Hiroshima, Japan affected by the debris flow disaster on July 2018 are analyzed in this study. Erosion depth by the debris flows is empirically modeled from the pre- and post-event LiDAR-derived DEMs. Erosion areas are detected from the change detection of the satellite images and the DEM-based debris flow propagation analysis by providing predefined sources. The volumes and their pattern are estimated from the detected erosion areas by multiplying the empirical erosion depth. The result of the volume estimations show good agreement with the LiDAR-derived volumes.

Download Full-text