Soil Mapping Based on the Integration of the Similarity-Based Approach and Random Forests

Digital soil mapping (DSM) is currently the primary framework for predicting the spatial variation of soil information (soil type or soil properties). Random forests and similarity-based methods have been used widely in DSM. However, the accuracy of the similarity-based approach is limited, and the performance of random forests is affected by the quality of the feature set. The objective of this study was to present a method for soil mapping by integrating the similarity-based approach and the random forests method. The Heshan area (Heilongjiang province, China) was selected as the case study for mapping soil subgroups. The results of the regular validation samples showed that the overall accuracy of the integrated method (71.79%) is higher than that of a similarity-based approach (58.97%) and random forests (66.67%). The results of the 5-fold cross-validation showed that the overall accuracy of the integrated method, similarity-based approach, and random forests range from 55% to 72.73%, 43.48% to 69.57%, and 54.17% to 70.83%, with an average accuracy of 66.61%, 57.39%, and 59.62%, respectively. These results suggest that the proposed method can produce a high-quality covariate set and achieve a better performance than either the random forests or similarity-based approach alone.

Download Full-text

Using the Google™ Search Engine for Health Information: Is There a Problem? Case Study: Supplements for Cancer

Current Developments in Nutrition ◽

10.1093/cdn/nzab002 ◽

2021 ◽

Vol 5 (2) ◽

Author(s):

Hannah C Cai ◽

Leanne E King ◽

Johanna T Dwyer

Keyword(s):

Health Information ◽

Search Engine ◽

Information Quality ◽

Nutrition Information ◽

High Quality ◽

Search Results ◽

Health And Nutrition ◽

Quality Rating

ABSTRACT We assessed the quality of online health and nutrition information using a Google™ search on “supplements for cancer”. Search results were scored using the Health Information Quality Index (HIQI), a quality-rating tool consisting of 12 objective criteria related to website domain, lack of commercial aspects, and authoritative nature of the health and nutrition information provided. Possible scores ranged from 0 (lowest) to 12 (“perfect” or highest quality). After eliminating irrelevant results, the remaining 160 search results had median and mean scores of 8. One-quarter of the results were of high quality (score of 10–12). There was no correlation between high-quality scores and early appearance in the sequence of search results, where results are presumably more visible. Also, 496 advertisements, over twice the number of search results, appeared. We conclude that the Google™ search engine may have shortcomings when used to obtain information on dietary supplements and cancer.

Download Full-text

An open-source R-package and web application for high-quality probabilistic predictions in hydrology

10.5194/egusphere-egu21-8549 ◽

2021 ◽

Author(s):

Jason Hunter ◽

Mark Thyer ◽

Dmitri Kavetski ◽

David McInerney

Keyword(s):

Open Source ◽

Web Application ◽

R Package ◽

Error Model ◽

Objective Functions ◽

High Quality ◽

Wide Range ◽

Probabilistic Error

<p>Probabilistic predictions provide crucial information regarding the uncertainty of hydrological predictions, which are a key input for risk-based decision-making. However, they are often excluded from hydrological modelling applications because suitable probabilistic error models can be both challenging to construct and interpret, and the quality of results are often reliant on the objective function used to calibrate the hydrological model.</p><p>We present an open-source R-package and an online web application that achieves the following two aims. Firstly, these resources are easy-to-use and accessible, so that users need not have specialised knowledge in probabilistic modelling to apply them. Secondly, the probabilistic error model that we describe provides high-quality probabilistic predictions for a wide range of commonly-used hydrological objective functions, which it is only able to do by including a new innovation that resolves a long-standing issue relating to model assumptions that previously prevented this broad application. &#160;</p><p>We demonstrate our methods by comparing our new probabilistic error model with an existing reference error model in an empirical case study that uses 54 perennial Australian catchments, the hydrological model GR4J, 8 common objective functions and 4 performance metrics (reliability, precision, volumetric bias and errors in the flow duration curve). The existing reference error model introduces additional flow dependencies into the residual error structure when it is used with most of the study objective functions, which in turn leads to poor-quality probabilistic predictions. In contrast, the new probabilistic error model achieves high-quality probabilistic predictions for all objective functions used in this case study.</p><p>The new probabilistic error model and the open-source software and web application aims to facilitate the adoption of probabilistic predictions in the hydrological modelling community, and to improve the quality of predictions and decisions that are made using those predictions. In particular, our methods can be used to achieve high-quality probabilistic predictions from hydrological models that are calibrated with a wide range of common objective functions.</p>

Download Full-text

A Novel Method for Gender and Age Detection Based on EEG Brain Signals

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/5/10 ◽

2021 ◽

Vol 18 (5) ◽

Author(s):

Haitham Issa ◽

Sali Issa ◽

Wahab Shah

Keyword(s):

Cross Validation ◽

Image Feature ◽

Emotional States ◽

Time Frequency ◽

Brain Signals ◽

Average Accuracy ◽

Gender And Age ◽

Novel Method ◽

Fold Cross Validation ◽

Validation Strategy

This paper presents a new gender and age classification system based on Electroencephalography (EEG) brain signals. First, Continuous Wavelet Transform (CWT) technique is used to get the time-frequency information of only one EEG electrode for eight distinct emotional states instead of the ordinary neutral or relax states. Then, sequential steps are implemented to extract the improved grayscale image feature. For system evaluation, a three-fold-cross validation strategy is applied to construct four different classifiers. The experimental test shows that the proposed extracted feature with Convolutional Neural Network (CNN) classifier improves the performance of both gender and age classification, and achieves an average accuracy of 96.3% and 89% for gender and age classification, respectively. Moreover, the ability to predict human gender and age during the mood of different emotional states is practically approved.

Download Full-text

Using Multivariate Statistical Methods to Analyze High-Quality Bicycle Path Service Systems: A Case Study of Popular Bicycle Paths in Taiwan

Sustainability ◽

10.3390/su12177185 ◽

2020 ◽

Vol 12 (17) ◽

pp. 7185

Author(s):

Shinn-Jou Lin ◽

Guey-Shin Shyu ◽

Wei-Ta Fang ◽

Bai-You Cheng

Keyword(s):

Explanatory Power ◽

Value Added ◽

Service Systems ◽

Image Management ◽

High Quality ◽

Multivariate Statistical ◽

The Individual ◽

Quantitative Indicators

Taiwan has promoted bicycle tourism for nearly 20 years, and the bicycle paths it has constructed throughout the island are diverse in design. In the present study, an evaluation scale for bicycle path sightseeing potential was devised with a focus on the overall service quality of the paths; 30 popular bicycle paths were analyzed using a field survey, with expert consultation on quantitative indicators, and a qualitative analysis entailing interviews with people regarding the bicycle paths. A multivariate statistical analysis was performed on the quality of the service systems for these paths. The results revealed that the quality of these service systems is influenced by four principal components, namely, landscape attractiveness, image management, bicycle-specific paths, and accessibility, for a total explanatory power of 76.21%; the individual explanatory power of these components was 25.89%, 21.49%, 16.81%, and 12.03%, respectively. Bicycle path conditions, service maintenance, and cleanliness and bicycle specificity are required for future high-quality bicycle paths; diverse bicycle rental services and bicycle types, entrance visibility, and ecological introduction boards along paths are value-added factors to bicycle path quality.

Download Full-text

The impact of indexing approaches on Arabic text classification

Journal of Information Science ◽

10.1177/0165551515625030 ◽

2016 ◽

Vol 43 (2) ◽

pp. 159-173 ◽

Cited By ~ 10

Author(s):

Amer Al-Badarneh ◽

Emad Al-Shawakfa ◽

Basel Bani-Ismail ◽

Khaleel Al-Rababah ◽

Safwan Shatnawi

Keyword(s):

Cross Validation ◽

Arabic Text ◽

Word Form ◽

Bayes Classifier ◽

Stem Form ◽

Average Accuracy ◽

Arabic Text Classification ◽

The Impact ◽

And Storage ◽

Fold Cross Validation

This paper investigates the impact of using different indexing approaches (full-word, stem, and root) when classifying Arabic text. In this study, the naïve Bayes classifier is used to construct the multinomial classification models and is evaluated using stratified k-fold cross-validation ( k ranges from 2 to 10). It is also uses a corpus that consists of 1000 normalized Arabic documents. The results of one experiment in this study show that significant accuracy improvements have occurred when the full-word form is used in most k-folds. Further experiments show that the classifier has achieved the highest accuracy in the eight-fold by using 7/8–1/8 train–test ratio, despite the indexing approach being used. The overall results of this study show that the classifier has achieved the maximum micro-average accuracy 99.36%, either by using the full-word form or the stem form. This proves that the stem is a better choice to use when classifying Arabic text, because it makes the corpus dataset smaller and this will enhance both the processing time and storage utilization, and achieve the highest level of accuracy.

Download Full-text

A novel computational model for predicting potential LncRNA-disease associations based on both direct and indirect features of LncRNA-disease pairs

BMC Bioinformatics ◽

10.1186/s12859-020-03906-7 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Yubin Xiao ◽

Zheng Xiao ◽

Xiang Feng ◽

Zhiping Chen ◽

Linai Kuang ◽

...

Keyword(s):

Computational Model ◽

Cross Validation ◽

State Of The Art ◽

Prediction Methods ◽

Good Prediction ◽

Average Case ◽

Comparison Results ◽

Disease Associations ◽

Fold Cross Validation

Abstract Background Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well. Results In this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (fivefold CV), 10-Fold Cross Validation (tenfold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in fivefold CV, tenfold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA. Conclusion The simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.

Download Full-text

The Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning

10.26434/chemrxiv.6895646.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Mahendra Awale ◽

Jean-Louis Reymond

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Target Prediction ◽

Molecular Shape ◽

Public Access ◽

Molecular Fingerprints ◽

Small Molecule Drug ◽

Fold Cross Validation

<div>Here we report PPB2 as a target prediction tool assigning targets to a query molecule based on ChEMBL data. PPB2 computes ligand similarities using molecular fingerprints encoding composition (MQN), molecular shape and pharmacophores (Xfp), and substructures (ECfp4), and features an unprecedented combination of nearest neighbor (NN) searches and Naïve Bayes (NB) machine learning, together with simple NN searches, NB and Deep Neural Network (DNN) machine learning models as further options. Although NN(ECfp4) gives the best results in terms of recall in a 10-fold cross-validation study, combining NN searches with NB machine learning provides superior precision statistics, as well as better results in a case study predicting off-targets of a recently reported TRPV6 calcium channel inhibitor, illustrating the value of this combined approach. PPB2 is available to assess possible off-targets of small molecule drug-like compounds by public access at ppb2.gdb.tools.</div>

Download Full-text

Sports and Health as Cornerstones of Tourism Development: Case Study of Montenegro

Sports Science and Human Health - Different Approaches ◽

10.5772/intechopen.89386 ◽

2020 ◽

Author(s):

Anđela Jakšić-Stojanović ◽

Neven Šerić

Keyword(s):

Natural Resources ◽

Driving Force ◽

Tourism Development ◽

International Market ◽

The Other ◽

Special Focus ◽

Health Tourism ◽

High Quality

The modern phenomenon of tourism is more focused on specific forms of tourism in which sports and health tourism play a very important role. That fact is not surprising having in mind that they represent interconnected activities that complement each other and give each other completely new dimension. On one side, sports and health represent very important content of tourist offer because of the fact they enable tourists to become active participants in various activities, and on the other side, they represent important driving force for visiting particular destination. The idea of this chapter is to provide a theoretical and practical framework of this issue with a special focus on case study of Montenegro. According to the results of the research that was carried out, the general conclusion is that Montenegro has extremely valuable natural resources and potentials for the development of sports and health tourism, but there are still a lot of challenges that should be faced in the future in order to improve the quality of tourist offer and the level of tourists’ satisfaction as well as to create completely new image of the destination and position it as high-quality sports and health tourist destination on international market.

Download Full-text

Integrated Quality-Based Production-Distribution Planning in Two-Echelon Supply Chains

Mathematical Problems in Engineering ◽

10.1155/2021/6615634 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Husein Pasha ◽

Isa Nakhai Kamalabadi ◽

Alireza Eydi

Keyword(s):

Mixed Integer ◽

High Quality ◽

Swarm Optimization ◽

Integrated Production ◽

Quality Degradation ◽

Production Distribution ◽

Quality Of Products ◽

Over Time

The integrated production-distribution (P-D) planning has turned into one of the most essential areas in supply chain (SC) management in recent years, especially in the case of perishable products in which the quality of products can change over time. Nonetheless, so far, the suggested models have focused on the P-D stages of the chain while the delivery of high-quality products to customers is of paramount significance in the perishable SC. In the present paper, a multiobjective, mixed-integer, and nonlinear programming (MOMINLP) mathematical model was developed for integrated P-D deteriorating items in a two-echelon SC that emphasizes quality degradation. Quality is monitored and calculated as a function of temperature and time throughout the SC, and the main purpose of the model is to first increase the quality of products delivered to customers and, second, minimize the SC costs. To optimize the problem, the particle swarm optimization (PSO) approach was also incorporated into the model. The obtained model was applied to a case study in Protein Gostar Sina Company in Iran, which resulted in decreased P-D costs as well as increased customer satisfaction.

Download Full-text

A Novel Computational Model for Predicting Potential LncRNA-Disease Associations based on Both Direct and Indirect Features of LncRNA-Disease Pairs

10.21203/rs.2.18937/v3 ◽

2020 ◽

Author(s):

Yubin Xiao ◽

Zheng Xiao ◽

Xiang Feng ◽

Zhiping Chen ◽

Linai Kuang ◽

...

Keyword(s):

Computational Model ◽

Cross Validation ◽

State Of The Art ◽

Prediction Methods ◽

Good Prediction ◽

Average Case ◽

Comparison Results ◽

Disease Associations ◽

Fold Cross Validation

Abstract Background: Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well.Results: In this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (5-fold CV), 10-Fold Cross Validation (10-fold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in 5-fold CV, 10-fold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA.Conclusion: The simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.

Download Full-text