scholarly journals Predicting Heritability of Oil Palm Breeding Using Phenotypic Traits and Machine Learning

2021 ◽  
Vol 13 (22) ◽  
pp. 12613
Author(s):  
Najihah Ahmad Latif ◽  
Fatini Nadhirah Mohd Nain ◽  
Nurul Hashimah Ahamed Hassain Malim ◽  
Rosni Abdullah ◽  
Muhammad Farid Abdul Rahim ◽  
...  

Oil palm is one of the main crops grown to help achieve sustainability in Malaysia. The selection of the best breeds will produce quality crops and increase crop yields. This study aimed to examine machine learning (ML) in oil palm breeding (OPB) using factors other than genetic data. A new conceptual framework to adopt the ML in OPB will be presented at the end of this paper. At first, data types, phenotype traits, current ML models, and evaluation technique will be identified through a literature survey. This study found that the phenotype and genotype data are widely used in oil palm breeding programs. The average bunch weight, bunch number, and fresh fruit bunch are the most important characteristics that can influence the genetic improvement of progenies. Although machine learning approaches have been applied to increase the productivity of the crop, most studies focus on molecular markers or genotypes for plant breeding, rather than on phenotype. Theoretically, the use of phenotypic data related to offspring should predict high breeding values by using ML. Therefore, a new ML conceptual framework to study the phenotype and progeny data of oil palm breeds will be discussed in relation to achieving the Sustainable Development Goals (SDGs).

Author(s):  
Mamehgol Yousefi ◽  
Azmin Shakrine ◽  
Samsuzana bt. Abd Aziz ◽  
Syaril Azrad ◽  
Mohamed Mazmira ◽  
...  

2021 ◽  
Vol 11 (8) ◽  
pp. 785
Author(s):  
Quentin Miagoux ◽  
Vidisha Singh ◽  
Dereck de Mézquita ◽  
Valerie Chaudru ◽  
Mohamed Elati ◽  
...  

Rheumatoid arthritis (RA) is a multifactorial, complex autoimmune disease that involves various genetic, environmental, and epigenetic factors. Systems biology approaches provide the means to study complex diseases by integrating different layers of biological information. Combining multiple data types can help compensate for missing or conflicting information and limit the possibility of false positives. In this work, we aim to unravel mechanisms governing the regulation of key transcription factors in RA and derive patient-specific models to gain more insights into the disease heterogeneity and the response to treatment. We first use publicly available transcriptomic datasets (peripheral blood) relative to RA and machine learning to create an RA-specific transcription factor (TF) co-regulatory network. The TF cooperativity network is subsequently enriched in signalling cascades and upstream regulators using a state-of-the-art, RA-specific molecular map. Then, the integrative network is used as a template to analyse patients’ data regarding their response to anti-TNF treatment and identify master regulators and upstream cascades affected by the treatment. Finally, we use the Boolean formalism to simulate in silico subparts of the integrated network and identify combinations and conditions that can switch on or off the identified TFs, mimicking the effects of single and combined perturbations.


2016 ◽  
Author(s):  
Shraddha Pai ◽  
Shirley Hui ◽  
Ruth Isserlin ◽  
Muhammad A Shah ◽  
Hussam Kaka ◽  
...  

AbstractPatient classification has widespread biomedical and clinical applications, including diagnosis, prognosis and treatment response prediction. A clinically useful prediction algorithm should be accurate, generalizable, be able to integrate diverse data types, and handle sparse data. A clinical predictor based on genomic data needs to be easily interpretable to drive hypothesis-driven research into new treatments. We describe netDx, a novel supervised patient classification framework based on patient similarity networks. netDx meets the above criteria and particularly excels at data integration and model interpretability. As a machine learning method, netDx demonstrates consistently excellent performance in a cancer survival benchmark across four cancer types by integrating up to six genomic and clinical data types. In these tests, netDx has significantly higher average performance than most other machine-learning approaches across most cancer types and its best model outperforms all other methods for two cancer types. In comparison to traditional machine learning-based patient classifiers, netDx results are more interpretable, visualizing the decision boundary in the context of patient similarity space. When patient similarity is defined by pathway-level gene expression, netDx identifies biological pathways important for outcome prediction, as demonstrated in diverse data sets of breast cancer and asthma. Thus, netDx can serve both as a patient classifier and as a tool for discovery of biological features characteristic of disease. We provide a software complete implementation of netDx along with sample files and automation workflows in R.


2020 ◽  
Vol 12 (22) ◽  
pp. 9320 ◽  
Author(s):  
Ana De Las Heras ◽  
Amalia Luque-Sendra ◽  
Francisco Zamora-Polo

The unprecedented urban growth of recent years requires improved urban planning and management to make urban spaces more inclusive, safe, resilient and sustainable. Additionally, humanity faces the COVID pandemic, which especially complicates the management of Smart Cities. A possible solution to address these two problems (environmental and health) in Smart Cities may be the use of Machine Learning techniques. One of the objectives of our work is to thoroughly analyze the link between the concepts of Smart Cities, Machine Learning techniques and their applicability. In this work, an exhaustive study of the relationship between Smart Cities and the applicability of Machine Learning (ML) techniques is carried out with the aim of optimizing sustainability. For this, the ML models, analyzed from the point of view of the models, techniques and applications, are studied. The areas and dimensions of sustainability addressed are analyzed, and the Sustainable Development Goals (SDGs) are discussed. The main objective is to propose a model (EARLY) that allows us to tackle these problems in the future. An inclusive perspective on applicability, sustainability scopes and dimensions, SDGs, tools, data types and Machine Learning techniques is provided. Finally, a case study applied to an Andalusian city is presented.


Author(s):  
Glen Williams ◽  
Lucas Puentes ◽  
Jacob Nelson ◽  
Jessica Menold ◽  
Conrad Tucker ◽  
...  

Abstract Online data repositories provide designers and engineers with a convenient source of data. Over time, the wealth and type of readily-available data within online repositories has greatly expanded. This data increase permits new uses for machine learning approaches which rely on large, high-dimensional datasets. However, a comparison of the efficacies of attribute-based data, which lends itself well to traditional machine learning algorithms, and form-based data, which lends itself to deep machine learning algorithms, has not fully been established. This paper presents one such comparison for an exemplar dataset. As the efficacy of different machine learning approaches may be dependent on the specific application, this method is intended to lay the groundwork for future studies that produce more extensive comparisons. Specifically, this work makes use of a manufactured gear dataset for sale price prediction. Two traditional machine learning algorithms, M5Rules and SMOreg, are selected due to their applicability to the gear attribute-based data. These algorithms are compared to a neural network model that is trained on a voxelized version of the gears’ 3D models, defined in this work as form-based data. Results show that both data types provide comparable predictive accuracy.


Cancers ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 2811
Author(s):  
Gift Nyamundanda ◽  
Katherine Eason ◽  
Justin Guinney ◽  
Christopher J. Lord ◽  
Anguraj Sadanandam

One of the major challenges in defining clinically-relevant and less heterogeneous tumor subtypes is assigning biological and/or clinical interpretations to etiological (intrinsic) subtypes. Conventional clustering/subtyping approaches often fail to define such subtypes, as they involve several discrete steps. Here we demonstrate a unique machine-learning method, phenotype mapping (PhenMap), which jointly integrates single omics data with phenotypic information using three published breast cancer datasets (n = 2045). The PhenMap framework uses a modified factor analysis method that is governed by a key assumption that, features from different omics data types are correlated due to specific “hidden/mapping” variables (context-specific mapping variables (CMV)). These variables can be simultaneously modeled with phenotypic data as covariates to yield functional subtypes and their associated features (e.g., genes) and phenotypes. In one example, we demonstrate the identification and validation of six novel “functional” (discrete) subtypes with differential responses to a cyclin-dependent kinase (CDK)4/6 inhibitor and etoposide by jointly integrating transcriptome profiles with four different drug response data from 37 breast cancer cell lines. These robust subtypes are also present in patient breast tumors with different prognosis. In another example, we modeled patient gene expression profiles and clinical covariates together to identify continuous subtypes with clinical/biological implications. Overall, this genome-phenome machine-learning integration tool, PhenMap identifies functional and phenotype-integrated discrete or continuous subtypes with clinical translational potential.


2021 ◽  
Vol 12 ◽  
Author(s):  
Temidayo Adeluwa ◽  
Brett A. McGregor ◽  
Kai Guo ◽  
Junguk Hur

A major challenge in drug development is safety and toxicity concerns due to drug side effects. One such side effect, drug-induced liver injury (DILI), is considered a primary factor in regulatory clearance. The Critical Assessment of Massive Data Analysis (CAMDA) 2020 CMap Drug Safety Challenge goal was to develop prediction models based on gene perturbation of six preselected cell-lines (CMap L1000), extended structural information (MOLD2), toxicity data (TOX21), and FDA reporting of adverse events (FAERS). Four types of DILI classes were targeted, including two clinically relevant scores and two control classifications, designed by the CAMDA organizers. The L1000 gene expression data had variable drug coverage across cell lines with only 247 out of 617 drugs in the study measured in all six cell types. We addressed this coverage issue by using Kru-Bor ranked merging to generate a singular drug expression signature across all six cell lines. These merged signatures were then narrowed down to the top and bottom 100, 250, 500, or 1,000 genes most perturbed by drug treatment. These signatures were subject to feature selection using Fisher’s exact test to identify genes predictive of DILI status. Models based solely on expression signatures had varying results for clinical DILI subtypes with an accuracy ranging from 0.49 to 0.67 and Matthews Correlation Coefficient (MCC) values ranging from -0.03 to 0.1. Models built using FAERS, MOLD2, and TOX21 also had similar results in predicting clinical DILI scores with accuracy ranging from 0.56 to 0.67 with MCC scores ranging from 0.12 to 0.36. To incorporate these various data types with expression-based models, we utilized soft, hard, and weighted ensemble voting methods using the top three performing models for each DILI classification. These voting models achieved a balanced accuracy up to 0.54 and 0.60 for the clinically relevant DILI subtypes. Overall, from our experiment, traditional machine learning approaches may not be optimal as a classification method for the current data.


Genetika ◽  
2020 ◽  
Vol 52 (3) ◽  
pp. 1021-1029
Author(s):  
Rad Naroui ◽  
Gholamali Keykha ◽  
Jahangir Abbaskoohpayegani ◽  
Ramin Rafezi

Phenotyping of native cultivars is becoming more essential, as they are an important for breeders as a genetic source for breeding. The variability of morphological properties plays critical role in melon breeding. In this paper various machine learning approaches were implemented to identify melon accession classes. A field experiment was conducted in Zahak Agriculture station to differentiate 144 melon accessions based on 14 traits. For this, Partial Least Square Discriminant Analysis (PLS-DA), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN) and Classification And Regression Trees (CART) were compared. The most commonly used performance values comprise overall accuracy, kappa value, Receiver Operating Characteristics (ROC) and Area Under Curve (AUC) were performed to identify accuracy of the models. The results showed the best performance for CART than others. The AUC and kappa value were 0.85 and 0.80 and fruit weight was the most important trait that affecting diversity in melon accessions. Regarding to these results Classification And Regression Trees (CART) is reliable for identification of melon accessions classes.


Sign in / Sign up

Export Citation Format

Share Document