scholarly journals PhGC: A Machine Learning Based Workflow for Phenotype-Genotype Co-analysis on Autism

10.29007/ctfl ◽  
2020 ◽  
Author(s):  
Safa Shubbar ◽  
Chen Fu ◽  
Zhi Liu ◽  
Anthony Wynshaw-Boris ◽  
Qiang Guan

Autism spectrum disorder (ASD) is a heterogeneous disorder, diagnostic tools attempt to identify homogeneous subtypes within ASD. Previous studies found many behavioral/- physiological commodities for ASD, but the clear association between commodities and underlying genetic mechanisms remains unknown. In this paper, we want to leverage ma- chine learning to figure out the relationship between genotype and phenotype in ASD. To this purpose, we propose PhGC pipeline to leverage machine learning approach to to identify behavioral phenotypes of ASD based on their corresponding genomics data. We utilize unsupervised clustering algorithms to extract the core members of each clusters and profile the core member subsets to explore the characteristics using genotype data from the same dataset. Our genome annotation results showed that most of the alleles with different frequency among clusters were represented by the core members.

2018 ◽  
Vol 74 (2) ◽  
pp. 210-224 ◽  
Author(s):  
Jernej Jevšenak ◽  
Sašo Džeroski ◽  
Saša Zavadlav ◽  
Tom Levanič

2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Qingfeng Zhou ◽  
Chun Janice Wong ◽  
Xian Su

Since the number of bicycles is critical to the sustainable development of dockless PBS, this research practiced the introduction of a machine learning approach to quantity management using OFO bike operation data in Shenzhen. First, two clustering algorithms were used to identify the bicycle gathering area, and the available bike number and coefficient of available bike number variation were analyzed in each bicycle gathering area’s type. Second, five classification algorithms were compared in the accuracy of distinguishing the type of bicycle gathering areas using 25 impact factors. Finally, the application of the knowledge gained from the existing dockless bicycle operation data to guide the number planning and management of public bicycles was explored. We found the following. (1) There were 492 OFO bicycle gathering areas that can be divided into four types: high inefficient, normal inefficient, high efficient, and normal efficient. The high inefficient and normal inefficient areas gathered about 110,000 bicycles with low usage. (2) More types of bicycle gathering area will affect the accuracy of the classification algorithm. The random forest classification had the best performance in identifying bicycle gathering area types in five classification algorithms with an accuracy of more than 75%. (3) There were obvious differences in the characteristics of 25 impact factors in four types of bicycle gathering areas. It is feasible to use these factors to predict area type to optimize the number of available bicycles, reduce operating costs, and improve utilization efficiency. This work helps operators and government understand the characteristics of dockless PBS and contributes to promoting long-term sustainable development of the system through a machine learning approach.


2018 ◽  
Author(s):  
Bun Yamagata ◽  
Takashi Itahashi ◽  
Junya Fujino ◽  
Haruhisa Ohta ◽  
Motoaki Nakamura ◽  
...  

AbstractEndophenotype refers to a measurable and heritable component between genetics and diagnosis and exists in both individuals with a diagnosis and their unaffected siblings. We aimed to identify a pattern of endophenotype consisted of multiple connections. We enrolled adult male individuals with autism spectrum disorder (ASD) endophenotype (i.e., individuals with ASD and their unaffected siblings) and individuals without ASD endophenotype (i.e., pairs of typical development (TD) siblings) and utilized a machine learning approach to classify people with and without endophenotypes, based on resting-state functional connections (FCs). A sparse logistic regression successfully classified people as to the endophenotype (area under the curve=0.78, classification accuracy=75%), suggesting the existence of endophenotype pattern. A binomial test identified that nine FCs were consistently selected as inputs for the classifier. The least absolute shrinkage and selection operator with these nine FCs predicted severity of communication impairment among individuals with ASD (r=0.68, p=0.021). In addition, two of the nine FCs were statistically significantly correlated with the severity of communication impairment (r=0.81, p=0.0026 and r=-0.60, p=0.049). The current findings suggest that an ASD endophenotype pattern exists in FCs with a multivariate manner and is associated with clinical ASD phenotype.


Sign in / Sign up

Export Citation Format

Share Document