scholarly journals Evolutionary Sparse Learning for phylogenomics

2021 ◽  
Author(s):  
Sudhir Kumar ◽  
Sudip Sharma

We introduce a supervised machine learning approach with sparsity constraints for phylogenomics, referred to as evolutionary sparse learning (ESL). ESL builds models with genomic loci-such as genes, proteins, genomic segments, and positions-as parameters. Using the Least Absolute Shrinkage and Selection Operator (LASSO), ESL selects only the most important genomic loci to explain a given phylogenetic hypothesis or presence/absence of a trait. ESL does not directly model conventional parameters such as rates of substitutions between nucleotides, rate variation among positions, and phylogeny branch lengths. Instead, ESL directly employs the concordance of variation across sequences in an alignment with the evolutionary hypothesis of interest. ESL provides a natural way to combine different molecular and non-molecular data types and incorporate biological and functional annotations of genomic loci directly in model building. We propose positional, gene, function, and hypothesis sparsity scores, illustrate their use through an example and suggest several applications of ESL. The ESL framework has the potential to drive the development of a new class of computational methods that will complement traditional approaches in evolutionary genomics. ESL's fast computational times and small memory footprint will also help democratize big data analytics and improve scientific rigor in phylogenomics.

2020 ◽  
Author(s):  
Fernando Lopes ◽  
Larissa R Oliveira ◽  
Amanda Kessler ◽  
Yago Beux ◽  
Enrique Crespo ◽  
...  

Abstract The phylogeny and systematics of fur seals and sea lions (Otariidae) have long been studied with diverse data types, including an increasing amount of molecular data. However, only a few phylogenetic relationships have reached acceptance because of strong gene-tree species tree discordance. Divergence times estimates in the group also vary largely between studies. These uncertainties impeded the understanding of the biogeographical history of the group, such as when and how trans-equatorial dispersal and subsequent speciation events occurred. Here we used high-coverage genome-wide sequencing for 14 of the 15 species of Otariidae to elucidate the phylogeny of the family and its bearing on the taxonomy and biogeographical history. Despite extreme topological discordance among gene trees, we found a fully supported species tree that agrees with the few well-accepted relationships and establishes monophyly of the genus Arctocephalus. Our data support a relatively recent trans-hemispheric dispersal at the base of a southern clade, which rapidly diversified into six major lineages between 3 to 2.5 Ma. Otaria diverged first, followed by Phocarctos and then four major lineages within Arctocephalus. However, we found Zalophus to be non-monophyletic, with California (Z. californianus) and Steller sea lions (Eumetopias jubatus) grouping closer than the Galapagos sea lion (Z. wollebaeki) with evidence for introgression between the two genera. Overall, the high degree of genealogical discordance was best explained by incomplete lineage sorting resulting from quasi-simultaneous speciation within the southern clade with introgresssion playing a subordinate role in explaining the incongruence among and within prior phylogenetic studies of the family.


2020 ◽  
Vol 4 (2) ◽  
pp. 5 ◽  
Author(s):  
Ioannis C. Drivas ◽  
Damianos P. Sakas ◽  
Georgios A. Giannakopoulos ◽  
Daphne Kyriaki-Manessi

In the Big Data era, search engine optimization deals with the encapsulation of datasets that are related to website performance in terms of architecture, content curation, and user behavior, with the purpose to convert them into actionable insights and improve visibility and findability on the Web. In this respect, big data analytics expands the opportunities for developing new methodological frameworks that are composed of valid, reliable, and consistent analytics that are practically useful to develop well-informed strategies for organic traffic optimization. In this paper, a novel methodology is implemented in order to increase organic search engine visits based on the impact of multiple SEO factors. In order to achieve this purpose, the authors examined 171 cultural heritage websites and their retrieved data analytics about their performance and user experience inside them. Massive amounts of Web-based collections are included and presented by cultural heritage organizations through their websites. Subsequently, users interact with these collections, producing behavioral analytics in a variety of different data types that come from multiple devices, with high velocity, in large volumes. Nevertheless, prior research efforts indicate that these massive cultural collections are difficult to browse while expressing low visibility and findability in the semantic Web era. Against this backdrop, this paper proposes the computational development of a search engine optimization (SEO) strategy that utilizes the generated big cultural data analytics and improves the visibility of cultural heritage websites. One step further, the statistical results of the study are integrated into a predictive model that is composed of two stages. First, a fuzzy cognitive mapping process is generated as an aggregated macro-level descriptive model. Secondly, a micro-level data-driven agent-based model follows up. The purpose of the model is to predict the most effective combinations of factors that achieve enhanced visibility and organic traffic on cultural heritage organizations’ websites. To this end, the study contributes to the knowledge expansion of researchers and practitioners in the big cultural analytics sector with the purpose to implement potential strategies for greater visibility and findability of cultural collections on the Web.


2020 ◽  
Vol 6 (1) ◽  
pp. 67-101
Author(s):  
Yong Gui ◽  
Ronggui Huang ◽  
Yi Ding

Left-leaning social thoughts are not a unitary and coherent theoretical system, and leftists can be divided into divergent groups. Based on inductive qualitative observations, this article proposes a theoretical typology of two dimensions of theoretical resources and position orientations to describe left-wing social thoughts communicated in online space. Empirically, we used a mixed approach, an integration of case observations and big-data analyses of Weibo tweets, to investigate three types of left-leaning social thoughts. The identified left-leaning social thoughts include state-centered leftism, populist leftism, and liberal leftism, which are consistent with the proposed theoretical typology. State-centered leftism features strong support of the state and the current regime and a negative attitude toward the West, populist leftism is characterized by unequivocal affirmation of the revolutionary legacy and support for disadvantaged grassroots, and liberal leftism harbors a grassroots position and a decided affirmation of individual rights. In addition, we used supervised machine learning and social network analysis techniques to identify online communities that harbor the afore-mentioned left-leaning social thoughts and analyzed the interaction patterns within and across communities as well as the evolutions of community structures. We found that during the study period of 2012–2014, the liberal leftists gradually declined and the corresponding communities dissolved; the interactions between populist leftists and state-centered leftists intensified, and the ideational cleavage between these two camps increased the online confrontations. This article demonstrates that the mixed method approach of integrating traditional methods with big-data analytics has enormous potential in the sub-discipline of digital sociology.


Data & Policy ◽  
2021 ◽  
Vol 3 ◽  
Author(s):  
Munisamy Gopinath ◽  
Feras A. Batarseh ◽  
Jayson Beckman ◽  
Ajay Kulkarni ◽  
Sei Jeong

Abstract Focusing on seven major agricultural commodities with a long history of trade, this study employs data-driven analytics to decipher patterns of trade, namely using supervised machine learning (ML), as well as neural networks. The supervised ML and neural network techniques are trained on data until 2010 and 2014, respectively. Results show the high relevance of ML models to forecasting trade patterns in near- and long-term relative to traditional approaches, which are often subjective assessments or time-series projections. While supervised ML techniques quantified key economic factors underlying agricultural trade flows, neural network approaches provide better fits over the long term.


2020 ◽  
Vol 11 (4) ◽  
pp. 6870-6875
Author(s):  
Prem Jacob T ◽  
Polakam Sukanya ◽  
Thatiparthi Madhavi

The segmentation of attractive reverberation images assumes a critical job in therapeutic fields since it removes the required territory from the picture. Generally, there is no unique methodology for the segmentation of the picture. Tumour division from MRI information is a critical tedious manual undertaking performed by therapeutic specialists. In this paper, the Brain Cancer prediction System has been detailed. The framework utilizes PC based methods to recognize tumor squares and classify the tumour utilizing Artificial Neural Network. The picture preparing strategies, for example, histogram evening out, picture division, picture improvement, and highlight extraction, have been produced for the location of the cerebrum tumor in the MRI pictures of the malignant growth Detected patients. This paper focuses around another and exceptionally acclaimed algorithm for mind tumor division of MRI scan image by ANN and SVM algorithms to analyze precisely the locale of malignant growth as a result of its straightforwardness and computational proficiency. The MATLAB output will be shown in pc and furthermore observe the yield to insert framework utilizing wired communication. To the best of our insight into the zone of therapeutic big data analytics, none of the current work concentrated on the two data types. Contrasted with a few runs of the typical algorithms, the computation precision of our proposed algorithm achieves 94.8% with an assembly speed, which is quicker than that of the Decision tree disease hazard prediction.


2020 ◽  
Author(s):  
Mike A. Nalls ◽  
Cornelis Blauwendraat ◽  
Lana Sargent ◽  
Dan Vitale ◽  
Hampton Leonard ◽  
...  

SUMMARYBackgroundPrevious research using genome wide association studies (GWAS) has identified variants that may contribute to lifetime risk of multiple neurodegenerative diseases. However, whether there are common mechanisms that link neurodegenerative diseases is uncertain. Here, we focus on one gene, GRN, encoding progranulin, and the potential mechanistic interplay between genetic risk, gene expression in the brain and inflammation across multiple common neurodegenerative diseases.MethodsWe utilized GWAS, expression quantitative trait locus (eQTL) mapping and Bayesian colocalization analyses to evaluate potential causal and mechanistic inferences. We integrate various molecular data types from public resources to infer disease connectivity and shared mechanisms using a data driven process.FindingseQTL analyses combined with GWAS identified significant functional associations between increasing genetic risk in the GRN region and decreased expression of the gene in Parkinson’s, Alzheimer’s and amyotrophic lateral sclerosis. Additionally, colocalization analyses show a connection between blood based inflammatory biomarkers relating to platelets and GRN expression in the frontal cortex.InterpretationGRN expression mediates neuroinflammation function related to general neurodegeneration. This analysis suggests shared mechanisms for Parkinson’s, Alzheimer’s and amyotrophic lateral sclerosis.FundingNational Institute on Aging, National Institute of Neurological Disorders and Stroke, and the Michael J. Fox Foundation.


2021 ◽  
Vol 40 (5) ◽  
pp. 324-334
Author(s):  
Rongxin Huang ◽  
Zhigang Zhang ◽  
Zedong Wu ◽  
Zhiyuan Wei ◽  
Jiawei Mei ◽  
...  

Seismic imaging using full-wavefield data that includes primary reflections, transmitted waves, and their multiples has been the holy grail for generations of geophysicists. To be able to use the full-wavefield data effectively requires a forward-modeling process to generate full-wavefield data, an inversion scheme to minimize the difference between modeled and recorded data, and, more importantly, an accurate velocity model to correctly propagate and collapse energy of different wave modes. All of these elements have been embedded in the framework of full-waveform inversion (FWI) since it was proposed three decades ago. However, for a long time, the application of FWI did not find its way into the domain of full-wavefield imaging, mostly owing to the lack of data sets with good constraints to ensure the convergence of inversion, the required compute power to handle large data sets and extend the inversion frequency to the bandwidth needed for imaging, and, most significantly, stable FWI algorithms that could work with different data types in different geologic settings. Recently, with the advancement of high-performance computing and progress in FWI algorithms at tackling issues such as cycle skipping and amplitude mismatch, FWI has found success using different data types in a variety of geologic settings, providing some of the most accurate velocity models for generating significantly improved migration images. Here, we take a step further to modify the FWI workflow to output the subsurface image or reflectivity directly, potentially eliminating the need to go through the time-consuming conventional seismic imaging process that involves preprocessing, velocity model building, and migration. Compared with a conventional migration image, the reflectivity image directly output from FWI often provides additional structural information with better illumination and higher signal-to-noise ratio naturally as a result of many iterations of least-squares fitting of the full-wavefield data.


Web Services ◽  
2019 ◽  
pp. 1430-1443
Author(s):  
Louise Leenen ◽  
Thomas Meyer

The Governments, military forces and other organisations responsible for cybersecurity deal with vast amounts of data that has to be understood in order to lead to intelligent decision making. Due to the vast amounts of information pertinent to cybersecurity, automation is required for processing and decision making, specifically to present advance warning of possible threats. The ability to detect patterns in vast data sets, and being able to understanding the significance of detected patterns are essential in the cyber defence domain. Big data technologies supported by semantic technologies can improve cybersecurity, and thus cyber defence by providing support for the processing and understanding of the huge amounts of information in the cyber environment. The term big data analytics refers to advanced analytic techniques such as machine learning, predictive analysis, and other intelligent processing techniques applied to large data sets that contain different data types. The purpose is to detect patterns, correlations, trends and other useful information. Semantic technologies is a knowledge representation paradigm where the meaning of data is encoded separately from the data itself. The use of semantic technologies such as logic-based systems to support decision making is becoming increasingly popular. However, most automated systems are currently based on syntactic rules. These rules are generally not sophisticated enough to deal with the complexity of decisions required to be made. The incorporation of semantic information allows for increased understanding and sophistication in cyber defence systems. This paper argues that both big data analytics and semantic technologies are necessary to provide counter measures against cyber threats. An overview of the use of semantic technologies and big data technologies in cyber defence is provided, and important areas for future research in the combined domains are discussed.


2017 ◽  
pp. 83-99
Author(s):  
Sivamathi Chokkalingam ◽  
Vijayarani S.

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.


Sign in / Sign up

Export Citation Format

Share Document