Discrimination of Brazilian propolis according to the seasoning using chemometrics and machine learning based on UV-Vis scanning data

Summary Propolis is a chemically complex biomass produced by honeybees (Apis mellifera) from plant resins added of salivary enzymes, beeswax, and pollen. The biological activities described for propolis were also identified for donor plant’s resin, but a big challenge for the standardization of the chemical composition and biological effects of propolis remains on a better understanding of the influence of seasonality on the chemical constituents of that raw material. Since propolis quality depends, among other variables, on the local flora which is strongly influenced by (a)biotic factors over the seasons, to unravel the harvest season effect on the propolis chemical profile is an issue of recognized importance. For that, fast, cheap, and robust analytical techniques seem to be the best choice for large scale quality control processes in the most demanding markets, e.g., human health applications. For that, UV-Visible (UV-Vis) scanning spectrophotometry of hydroalcoholic extracts (HE) of seventy-three propolis samples, collected over the seasons in 2014 (summer, spring, autumn, and winter) and 2015 (summer and autumn) in Southern Brazil was adopted. Further machine learning and chemometrics techniques were applied to the UV-Vis dataset aiming to gain insights as to the seasonality effect on the claimed chemical heterogeneity of propolis samples determined by changes in the flora of the geographic region under study. Descriptive and classification models were built following a chemometric approach, i.e. principal component analysis (PCA) and hierarchical clustering analysis (HCA) supported by scripts written in the R language. The UV-Vis profiles associated with chemometric analysis allowed identifying a typical pattern in propolis samples collected in the summer. Importantly, the discrimination based on PCA could be improved by using the dataset of the fingerprint region of phenolic compounds ( λ= 280-400 ηm), suggesting that besides the biological activities of those secondary metabolites, they also play a relevant role for the discrimination and classification of that complex matrix through bioinformatics tools. Finally, a series of machine learning approaches, e.g., partial least square-discriminant analysis (PLS-DA), k-Nearest Neighbors (kNN), and Decision Trees showed to be complementary to PCA and HCA, allowing to obtain relevant information as to the sample discrimination.

Download Full-text

Recent Progress in Machine Learning-based Prediction of Peptide Activity for Drug Discovery

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666190122151634 ◽

2019 ◽

Vol 19 (1) ◽

pp. 4-16 ◽

Cited By ~ 6

Author(s):

Qihui Wu ◽

Hanzhong Ke ◽

Dongli Li ◽

Qi Wang ◽

Jiansong Fang ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Large Scale ◽

Recent Progress ◽

High Specificity ◽

Learning Approaches ◽

Anticancer Peptides ◽

The Past ◽

Traditional Approaches ◽

Large Scale Screening

Over the past decades, peptide as a therapeutic candidate has received increasing attention in drug discovery, especially for antimicrobial peptides (AMPs), anticancer peptides (ACPs) and antiinflammatory peptides (AIPs). It is considered that the peptides can regulate various complex diseases which are previously untouchable. In recent years, the critical problem of antimicrobial resistance drives the pharmaceutical industry to look for new therapeutic agents. Compared to organic small drugs, peptide- based therapy exhibits high specificity and minimal toxicity. Thus, peptides are widely recruited in the design and discovery of new potent drugs. Currently, large-scale screening of peptide activity with traditional approaches is costly, time-consuming and labor-intensive. Hence, in silico methods, mainly machine learning approaches, for their accuracy and effectiveness, have been introduced to predict the peptide activity. In this review, we document the recent progress in machine learning-based prediction of peptides which will be of great benefit to the discovery of potential active AMPs, ACPs and AIPs.

Download Full-text

Monitoring the Foliar Nutrients Status of Mango Using Spectroscopy-Based Spectral Indices and PLSR-Combined Machine Learning Models

Remote Sensing ◽

10.3390/rs13040641 ◽

2021 ◽

Vol 13 (4) ◽

pp. 641

Author(s):

Gopal Ramdas Mahajan ◽

Bappa Das ◽

Dayesh Murgaokar ◽

Ittai Herrmann ◽

Katja Berger ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Partial Least Square ◽

Least Square ◽

Partial Least Square Regression ◽

Support Vector ◽

Spectral Indices ◽

Learning Models ◽

Leaf Nutrients ◽

Machine Learning Models

Conventional methods of plant nutrient estimation for nutrient management need a huge number of leaf or tissue samples and extensive chemical analysis, which is time-consuming and expensive. Remote sensing is a viable tool to estimate the plant’s nutritional status to determine the appropriate amounts of fertilizer inputs. The aim of the study was to use remote sensing to characterize the foliar nutrient status of mango through the development of spectral indices, multivariate analysis, chemometrics, and machine learning modeling of the spectral data. A spectral database within the 350–1050 nm wavelength range of the leaf samples and leaf nutrients were analyzed for the development of spectral indices and multivariate model development. The normalized difference and ratio spectral indices and multivariate models–partial least square regression (PLSR), principal component regression, and support vector regression (SVR) were ineffective in predicting any of the leaf nutrients. An approach of using PLSR-combined machine learning models was found to be the best to predict most of the nutrients. Based on the independent validation performance and summed ranks, the best performing models were cubist (R2 ≥ 0.91, the ratio of performance to deviation (RPD) ≥ 3.3, and the ratio of performance to interquartile distance (RPIQ) ≥ 3.71) for nitrogen, phosphorus, potassium, and zinc, SVR (R2 ≥ 0.88, RPD ≥ 2.73, RPIQ ≥ 3.31) for calcium, iron, copper, boron, and elastic net (R2 ≥ 0.95, RPD ≥ 4.47, RPIQ ≥ 6.11) for magnesium and sulfur. The results of the study revealed the potential of using hyperspectral remote sensing data for non-destructive estimation of mango leaf macro- and micro-nutrients. The developed approach is suggested to be employed within operational retrieval workflows for precision management of mango orchard nutrients.

Download Full-text

The Influence of Food Service Quality and Innovation Production on Local Product of Thailand: Study of Creative Agriculture

Research in World Economy ◽

10.5430/rwe.v11n5p469 ◽

2020 ◽

Vol 11 (5) ◽

pp. 469

Author(s):

Waleerak Sittisom ◽

Thammarak Srimarut

Keyword(s):

Data Analysis ◽

Customer Satisfaction ◽

Service Quality ◽

Food Service ◽

Partial Least Square ◽

Least Square ◽

Raw Material ◽

Primary Data ◽

Local Product ◽

Product Promotion

Creative agriculture is a vast and deep knowledge of a product from its preparation from raw material to the end consumer of the product. Hence creative agriculture deals with deep analysis, production process, and commercialization, of a product. The present study explored the relationship between food service quality, innovation in production, customers’ satisfaction, and local product promotion. Both the foodservice quality and innovation in production increase customer satisfaction and local product promotion. An increased level of customer satisfaction is also promising for the increment in local product promotion. A survey from 300 food engineers working with different food providing companies, were the respondents of the present study for the collection of primary data. Then, a statistical software, named Partial Least Square (PLS) was used for the finalization of the data analysis process. The results achieved from the data analysis were used for the accomplishment of the end results of the present study.

Download Full-text

Machine learning identifies an immunological pattern associated with multiple juvenile idiopathic arthritis subtypes

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2018-214354 ◽

2019 ◽

Vol 78 (5) ◽

pp. 617-628 ◽

Cited By ~ 5

Author(s):

Erika Van Nieuwenhove ◽

Vasiliki Lagou ◽

Lien Van Eyck ◽

James Dooley ◽

Ulrich Bodenhofer ◽

...

Keyword(s):

Machine Learning ◽

Juvenile Idiopathic Arthritis ◽

Large Scale ◽

Inflammatory Diseases ◽

Adaptive Immune System ◽

Healthy Children ◽

Learning Approaches ◽

Data Set ◽

Immune Signature ◽

Systemic Jia

ObjectivesJuvenile idiopathic arthritis (JIA) is the most common class of childhood rheumatic diseases, with distinct disease subsets that may have diverging pathophysiological origins. Both adaptive and innate immune processes have been proposed as primary drivers, which may account for the observed clinical heterogeneity, but few high-depth studies have been performed.MethodsHere we profiled the adaptive immune system of 85 patients with JIA and 43 age-matched controls with indepth flow cytometry and machine learning approaches.ResultsImmune profiling identified immunological changes in patients with JIA. This immune signature was shared across a broad spectrum of childhood inflammatory diseases. The immune signature was identified in clinically distinct subsets of JIA, but was accentuated in patients with systemic JIA and those patients with active disease. Despite the extensive overlap in the immunological spectrum exhibited by healthy children and patients with JIA, machine learning analysis of the data set proved capable of discriminating patients with JIA from healthy controls with ~90% accuracy.ConclusionsThese results pave the way for large-scale immune phenotyping longitudinal studies of JIA. The ability to discriminate between patients with JIA and healthy individuals provides proof of principle for the use of machine learning to identify immune signatures that are predictive to treatment response group.

Download Full-text

Compréhension du microclimat urbain lyonnais par l’intégration de prédicteurs complémentaires à différentes échelles dans des modèles de régression

Climatologie ◽

10.1051/climat/202017002 ◽

2020 ◽

Vol 17 ◽

pp. 2

Author(s):

Lucille Alonso ◽

Florent Renard

Keyword(s):

Machine Learning ◽

Partial Least Square ◽

Least Square ◽

Light Detection And Ranging ◽

Landsat 8 ◽

Light Detection ◽

Changement Climatique ◽

Milieu Urbain ◽

Changements Climatiques

Le changement climatique est un phénomène majeur actuel générant de multiples conséquences. En milieu urbain, il exacerbe celui de l’îlot de chaleur urbain. Ces deux manifestations climatiques engendrent des conséquences sur la santé des habitants et sur la sensation d’inconfort thermique ressenti en milieu urbain. Ainsi, il est nécessaire d’estimer au mieux la température de l’air en tout point d’un territoire, notamment face à la rationalisation actuelle du réseau de stations météorologiques fixes de Météo France. La connaissance spatialisée de la température de l’air est de plus en plus demandée pour alimenter des modèles quantitatifs liés à un large éventail de domaines, tels que l’hydrologie, l’écologie ou les études sur les changements climatiques. Cette étude se propose ainsi de modéliser la température de l’air, mesurée durant 4 campagnes mobiles réalisées durant les mois d’été, entre 2016 et 2019, dans Lyon par temps clair, à l’aide de modèle de régressions à partir de 33 variables explicatives issues de données traditionnellement utilisées, de données issues de la télédétection par une acquisition LiDAR (Light Detection And Ranging) ou satellitaire Landsat 8. Trois types de régression statistique ont été expérimentés, la régression partial least square, la régression linéaire multiple et enfin, une méthode de machine learning, la forêt aléatoire de classification et de régression. Par exemple, pour la journée du 30 août 2016, la régression linéaire multiple a expliqué 89% de la variance pour les journées d’étude, avec un RMSE moyen de seulement 0,23°C. Des variables comme la température de surface, le NDVI ou encore le MNDWI impactent fortement le modèle d’estimation.

Download Full-text

Impacts of multicollinearity on CAPT modalities: An heterogeneous machine learning framework for computer-assisted French phoneme pronunciation training

PLoS ONE ◽

10.1371/journal.pone.0257901 ◽

2021 ◽

Vol 16 (10) ◽

pp. e0257901

Author(s):

Yanjing Bi ◽

Chao Li ◽

Yannick Benezeth ◽

Fan Yang

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Support Vector Machines ◽

Partial Least Square ◽

Least Square ◽

Support Vector ◽

Computer Assisted ◽

Long Distance ◽

Relationship Analysis ◽

Vector Machines

Phoneme pronunciations are usually considered as basic skills for learning a foreign language. Practicing the pronunciations in a computer-assisted way is helpful in a self-directed or long-distance learning environment. Recent researches indicate that machine learning is a promising method to build high-performance computer-assisted pronunciation training modalities. Many data-driven classifying models, such as support vector machines, back-propagation networks, deep neural networks and convolutional neural networks, are increasingly widely used for it. Yet, the acoustic waveforms of phoneme are essentially modulated from the base vibrations of vocal cords, and this fact somehow makes the predictors collinear, distorting the classifying models. A commonly-used solution to address this issue is to suppressing the collinearity of predictors via partial least square regressing algorithm. It allows to obtain high-quality predictor weighting results via predictor relationship analysis. However, as a linear regressor, the classifiers of this type possess very simple topology structures, constraining the universality of the regressors. For this issue, this paper presents an heterogeneous phoneme recognition framework which can further benefit the phoneme pronunciation diagnostic tasks by combining the partial least square with support vector machines. A French phoneme data set containing 4830 samples is established for the evaluation experiments. The experiments of this paper demonstrates that the new method improves the accuracy performance of the phoneme classifiers by 0.21 − 8.47% comparing to state-of-the-arts with different data training data density.

Download Full-text

Multivariate Classification of Drugs using Parametric and Nonparametric Machine Learning Models

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8740.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2021-2027

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Biological Activities ◽

Biological Effects ◽

Recursive Feature Elimination ◽

Drug Candidate ◽

Learning Models ◽

Machine Learning Models ◽

Non Parametric

In pharmaceutical research, traditional drug discovery process is time consuming and expensive, where several compounds are experimentally tested for their biological activities. Series of lab experiments are conducted to analyze newly synthesized drug’s pharmaceutical activities and its biological effects on human. With every new drug discovery, the required clinical properties can be determined using machine learning models and this greatly reduces the experimental cost. This paper explores parametric and non-parametric machine learning models to classify administration properties of drugs and its toxicity. The multinomial classification of drugs was based on their physicochemical and ADMET properties. Balanced data samples were drawn from chEMBL and was pre-processed. Features were reduced using Recursive Feature Elimination and the attributes were ranked based on their importance to reduce highly correlated attributes. The performance of parametric and non-parametric machine learning models was analyzed on cheminformatic data that includes physiochemical, biological and pharmaceutical properties of the drug molecules. Selecting the potent drug candidate along with its administration properties greatly reduces wet lab experimental time and cost. Multiclass classification can be determined efficiently using non-parametric machine learning model. Optimal feature engineering, tuning hyperparameters and adopting hybrid algorithms would result in more accurate predictions in future for cheminformatics data.

Download Full-text

Talent management and organizational sustainability: role of sustainable behaviour

International Journal of Organizational Analysis ◽

10.1108/ijoa-06-2020-2253 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Muhammad Mujtaba ◽

Muhammad Shujaat Mubarik

Keyword(s):

Structural Equation ◽

Large Scale ◽

Talent Management ◽

Partial Least Square ◽

Least Square ◽

Three Dimensions ◽

Organizational Sustainability ◽

Substantial Impact ◽

Content Type

Purpose This study aims to examine the role of talent management (TM) in improving organizational sustainability (OS). The study also investigates employees’ sustainable behaviour (SB) in achieving three-dimensional sustainability goals (i.e. economic, social and environmental). Design/methodology/approach This study focused on medium and large-scale manufacturing firms, whereas the sample size was 196 firms. Data was collected through close-ended questionnaires using the cluster sampling technique. The partial least square-structural equation modelling was used to estimate the modelled relationships. Findings Results show a significant direct impact of TM on OS. Likewise, the results also show a substantial impact of all three dimensions of TM (acquisition, development and retention) on OS. Results confirm that employees’ SB positively mediates between TM and OS. Research limitations/implications The study focuses on the manufacturing sector of Pakistan. The study’s findings imply that TM strategies are an indispensable source of sustainability to attract, develop and retain talented employees in the situation of talent shortage. Moreover, sustainable employees’ behaviour is also depicted as a positive role between TM and OS because sustainable success is not only required the expertize of employees, but it also needs the dedication of employees. Practical implications This study enhances the understanding of TM’s role in improving the OS. The findings imply that a firm should consider TM as the apex strategy for elevating the performance. Findings also reveal the need to adopt a comprehensive strategy or system to manage the talent of an organization. Originality/value Linking the TM with OS and SB is the novelty of the study.

Download Full-text

Machine Learning Based Taxonomy and Analysis of English Learners' Translation Errors

International Journal of Computer-Assisted Language Learning and Teaching ◽

10.4018/ijcallt.2019070105 ◽

2019 ◽

Vol 9 (3) ◽

pp. 68-83

Author(s):

Ying Qin

Keyword(s):

Machine Learning ◽

English Learners ◽

Large Scale ◽

Learning Approaches ◽

Efl Learners ◽

Translation Error ◽

Chinese Learners ◽

Error Taxonomy ◽

Skill Improvement ◽

Translation Errors

This study extracts the comments from a large scale of Chinese EFL learners' translation corpus to study the taxonomy of translation errors. Two unsupervised machine learning approaches are used to obtain the computational evidences of translation error taxonomy. After manually revision, ten types of English to Chinese (E2C) and eight types Chinese to English (C2E) translation errors are finally confirmed. There probably exists three categories of top-level errors according to the hierarchical clustering results. In addition, three supervised learning methods are applied to automatically recognize the types of errors, among which the highest performance reaches F1 = 0.85 on E2C and F1 = 0.90 on C2E translation. Further comparison to the intuitive or theoretical studies on translation taxonomy shows some phenomenon accompanied by language skill improvement of Chinese learners. Analysis on translation problems based on machine learning provides the objective insight and understanding on the students' translations.

Download Full-text

Big Data’s Role in Health and Risk Messaging

Oxford Research Encyclopedia of Communication ◽

10.1093/acrefore/9780190228613.013.359 ◽

2017 ◽

Author(s):

Bradford William Hesse

Keyword(s):

Machine Learning ◽

Big Data ◽

Risk Communication ◽

Large Scale ◽

Protein Identification ◽

Machine Learning Algorithms ◽

National Committee ◽

Learning Approaches ◽

Road Map ◽

Data Flows

The presence of large-scale data systems can be felt, consciously or not, in almost every facet of modern life, whether through the simple act of selecting travel options online, purchasing products from online retailers, or navigating through the streets of an unfamiliar neighborhood using global positioning system (GPS) mapping. These systems operate through the momentum of big data, a term introduced by data scientists to describe a data-rich environment enabled by a superconvergence of advanced computer-processing speeds and storage capacities; advanced connectivity between people and devices through the Internet; the ubiquity of smart, mobile devices and wireless sensors; and the creation of accelerated data flows among systems in the global economy. Some researchers have suggested that big data represents the so-called fourth paradigm in science, wherein the first paradigm was marked by the evolution of the experimental method, the second was brought about by the maturation of theory, the third was marked by an evolution of statistical methodology as enabled by computational technology, while the fourth extended the benefits of the first three, but also enabled the application of novel machine-learning approaches to an evidence stream that exists in high volume, high velocity, high variety, and differing levels of veracity. In public health and medicine, the emergence of big data capabilities has followed naturally from the expansion of data streams from genome sequencing, protein identification, environmental surveillance, and passive patient sensing. In 2001, the National Committee on Vital and Health Statistics published a road map for connecting these evidence streams to each other through a national health information infrastructure. Since then, the road map has spurred national investments in electronic health records (EHRs) and motivated the integration of public surveillance data into analytic platforms for health situational awareness. More recently, the boom in consumer-oriented mobile applications and wireless medical sensing devices has opened up the possibility for mining new data flows directly from altruistic patients. In the broader public communication sphere, the ability to mine the digital traces of conversation on social media presents an opportunity to apply advanced machine learning algorithms as a way of tracking the diffusion of risk communication messages. In addition to utilizing big data for improving the scientific knowledge base in risk communication, there will be a need for health communication scientists and practitioners to work as part of interdisciplinary teams to improve the interfaces to these data for professionals and the public. Too much data, presented in disorganized ways, can lead to what some have referred to as “data smog.” Much work will be needed for understanding how to turn big data into knowledge, and just as important, how to turn data-informed knowledge into action.

Download Full-text