Degree-Based Sampling Method with Partition-Based Subgraph Finder for Larger Motif Detection

2011 ◽  
Vol 135-136 ◽  
pp. 509-515
Author(s):  
Jia Ji Zhou ◽  
De Sheng Kong ◽  
Jie Yue He

Network motifs are subnetworks that appear in the network far more frequently than in randomized networks. They have gathered much attention for uncovering structural design principles of complex networks. One of the previous approaches for motif detection is sampling method, in- troduced to perform the computational challenging task. However, it suffers from sampling bias and probability assignment. In addition, subgraph search, being very time-consuming, is a critical process in motif detection as we need to enumerate subgraphs of given sizes in the original input graph and an ensemble of random generated graphs. Therefore, we present a Degree-based Sampling Method with Partition-based Subgraph Finder for larger motif detection. Inspired by the intrinsic feature of real biological networks, Degree-based Sampling is a new solution for probability assignment based on degree. And, Partition-based Subgraph Finder takes its inspiration from the idea of partition, which improves computational efficiency and lowers space consumption. Experimental study on UETZ and E.COLI data set shows that the proposed method achieves more accuracy and efficiency than previous methods and scales better with increasing subgraph size.

2021 ◽  
Vol 50 (1) ◽  
pp. 138-152
Author(s):  
Mujeeb Ur Rehman ◽  
Dost Muhammad Khan

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.


Author(s):  
Daniel Lukic ◽  
Jonas Eberle ◽  
Jana Thormann ◽  
Carolus Holzschuh ◽  
Dirk Ahrens

DNA-barcoding and DNA-based species delimitation are major tools in DNA taxonomy. Sampling has been a central debate in this context, because the geographical composition of samples affect the accuracy and performance of DNA-barcoding. Performance of complex DNA-based species delimitation is to be tested under simpler conditions in absence of geographic sampling bias. Here, we present an empirical data set sampled from a single locality in a Southeast-Asian biodiversity hotspot (Laos: Phou Pan mountain). We investigate the performance of various species delimitation approaches on a megadiverse assemblage of herbivore chafer beetles (Coleoptera: Scarabaeidae) to infer whether species delimitation suffers in the same way from exaggerate infraspecific variation despite the lack of geographic genetic variation that led to inconsistencies between entities from DNA-based and morphology-based species inference in previous studies. For this purpose, a 658 bp fragment of the mitochondrial cytochrome c oxidase subunit 1 (cox1) was analysed for a total of 186 individuals of 56 morphospecies. Tree based and distance based species delimitation methods were used. All approaches showed a rather limited match ratio (max. 77%) with morphospecies. PTP and TCS prevailingly over-splitted morphospecies, while 3% clustering and ABGD also lumped several species into one entity. ABGD revealed the highest congruence between molecular operational taxonomic units (MOTUs) and morphospecies. Disagreements between morphospecies and MOTUs were discussed in the context of historically acquired geographic genetic differentiation, incomplete lineage sorting, and hybridization. The study once again highlights how important morphology still is in order to correctly interpret the results of molecular species delimitation.


2004 ◽  
Vol 83 (2) ◽  
pp. 143-154 ◽  
Author(s):  
WEN ZENG ◽  
SUJIT GHOSH ◽  
BAILIAN LI

Diallel mating is a frequently used design for estimating the additive and dominance genetic (polygenic) effects involved in quantitative traits observed in the half- and full-sib progenies generated in plant breeding programmes. Gibbs sampling has been used for making statistical inferences for a mixed-inheritance model (MIM) that includes both major genes and polygenes. However, using this approach it has not been possible to incorporate the genetic properties of major genes with the additive and dominance polygenic effects in a diallel mating population. A parent block Gibbs sampling method was developed in this study to make statistical inferences about the major gene and polygenic effects on quantitative traits for progenies derived from a half-diallel mating design. Using simulated data sets with different major and polygenic effects, the proposed method accurately estimated the major and polygenic effects of quantitative traits, and possible genotypes of parents and progenies. The impact of specifying different prior distributions was examined and was found to have little effect on inference on the posterior distribution. This approach was applied to an experimental data set of Loblolly pine (Pinus taeda L.) derived from a 6-parent half-diallel mating. The result indicated that there might be a recessive major gene affecting height growth in this diallel population.


2011 ◽  
Vol 183-185 ◽  
pp. 734-738
Author(s):  
Lin Cong Zhou ◽  
Yi Feng Zheng ◽  
Jian Hui Qiu

The evaluation of the reliability of structural systems is of extreme importance in structural design, mainly when the variables are random. A method is presented to efficiently assess the random response of stochastic structures. The article uses two-level sampling method to partial fiber element. First, the homogeneous random field of concrete and rebar can be created by modified Latin-hypercube sampling. Then section discretization method is adopted to assign fiber random variables of concrete section fiber. The algorithm is then used to analyze the random response of a concrete beam, and the result proves that the method is efficient.


1997 ◽  
Vol 54 (5) ◽  
pp. 1135-1141 ◽  
Author(s):  
B G Fraser ◽  
D D Williams

A series of interstitial faunal samples was taken from a riffle in the Speed River, southern Ontario, Canada, to compare the field performance of four hyporheic samplers: the standpipe, colonization, and freeze corers and a pump sampler. Each of the samplers proved useful for collecting purely qualitative data, but statistical differences in some of the measured quantitative parameters were identified. The colonization corer significantly underestimated invertebrate density at each of the depths tested (20, 40, and 60 cm below the surface of the river bed). Taxonomic richness did not differ among the samplers. A sampling bias in the pump sampling method was identified in terms of both the proportion of insect larvae captured and the mean chironomid body size and is probably the result of a filtering effect of the interstices. Sampling precision estimates of density, richness, and organismal size ranged from 20 to 40%, but no pattern among the four samplers for any of the measures was observed. We conclude that, whereas the standpipe and freeze coring methods most effectively characterize the hyporheos, one of the other methods might prove acceptable under specific field circumstances or under certain practical constraints.


Paleobiology ◽  
1995 ◽  
Vol 21 (1) ◽  
pp. 74-91 ◽  
Author(s):  
Anne Raymond ◽  
Cheryl Metz

In phytogeographic data sets, the number of assemblages or floras from each interval may provide a test of the influence of sampling intensity on land-plant diversity. Using a data set of Silurian and Devonian compression-impression plant genera from Laurussia and the Acadian terrain, regression of five measures of land-plant diversity (total diversity, mean genus richness of floras, median assemblage diversity, most diverse assemblage, and standing diversity at interval boundaries) against the number assemblages or floras from thirteen intervals suggests that sampling bias influences all of the diversity measures to some extent, including within-habitat measures. The standing diversity of land plants at interval boundaries, the measure least influenced by sampling (r = 0.65, p = 0.05), increased steadily from the Middle Silurian to the late Givetian/early–middle Frasnian boundary, fell sharply in the early–middle Frasnian and remained low throughout the late Frasnian–middle Famennian. Standing diversity rose dramatically in the late Famennian and Strunian (latest Devonian): the Frasnian–Famennian extinction event may have affected land plants. The standing diversity of Silurian and Devonian microspore genera at interval boundaries mirrors that of compression-impression genera: neither record supports a land-plant diversity equilibrium during the Devonian.


2015 ◽  
Vol 89 (24) ◽  
pp. 12341-12348 ◽  
Author(s):  
Tiago Gräf ◽  
Bram Vrancken ◽  
Dennis Maletich Junqueira ◽  
Rúbia Marília de Medeiros ◽  
Marc A. Suchard ◽  
...  

ABSTRACTThe phylogeographic history of the Brazilian HIV-1 subtype C (HIV-1C) epidemic is still unclear. Previous studies have mainly focused on the capital cities of Brazilian federal states, and the fact that HIV-1C infections increase at a higher rate than subtype B infections in Brazil calls for a better understanding of the process of spatial spread. A comprehensive sequence data set sampled across 22 Brazilian locations was assembled and analyzed. A Bayesian phylogeographic generalized linear model approach was used to reconstruct the spatiotemporal history of HIV-1C in Brazil, considering several potential explanatory predictors of the viral diffusion process. Analyses were performed on several subsampled data sets in order to mitigate potential sample biases. We reveal a central role for the city of Porto Alegre, the capital of the southernmost state, in the Brazilian HIV-1C epidemic (HIV-1C_BR), and the northward expansion of HIV-1C_BR could be linked to source populations with higher HIV-1 burdens and larger proportions of HIV-1C infections. The results presented here bring new insights to the continuing discussion about the HIV-1C epidemic in Brazil and raise an alternative hypothesis for its spatiotemporal history. The current work also highlights how sampling bias can confound phylogeographic analyses and demonstrates the importance of incorporating external information to protect against this.IMPORTANCESubtype C is responsible for the largest HIV infection burden worldwide, but our understanding of its transmission dynamics remains incomplete. Brazil witnessed a relatively recent introduction of HIV-1C compared to HIV-1B, but it swiftly spread throughout the south, where it now circulates as the dominant variant. The northward spread has been comparatively slow, and HIV-1B still prevails in that region. While epidemiological data and viral genetic analyses have both independently shed light on the dynamics of spread in isolation, their combination has not yet been explored. Here, we complement publically available sequences and new genetic data from 13 cities with epidemiological data to reconstruct the history of HIV-1C spread in Brazil. The combined approach results in more robust reconstructions and can protect against sampling bias. We found evidence for an alternative view of the HIV-1C spatiotemporal history in Brazil that, contrary to previous explanations, integrates seamlessly with other observational data.


2019 ◽  
Vol 15 (1) ◽  
pp. 155014771882052 ◽  
Author(s):  
Bowen Qin ◽  
Fuyuan Xiao

Due to its efficiency to handle uncertain information, Dempster–Shafer evidence theory has become the most important tool in many information fusion systems. However, how to determine basic probability assignment, which is the first step in evidence theory, is still an open issue. In this article, a new method integrating interval number theory and k-means++ cluster method is proposed to determine basic probability assignment. At first, k-means++ clustering method is used to calculate lower and upper bound values of interval number with training data. Then, the differentiation degree based on distance and similarity of interval number between the test sample and constructed models are defined to generate basic probability assignment. Finally, Dempster’s combination rule is used to combine multiple basic probability assignments to get the final basic probability assignment. The experiments on Iris data set that is widely used in classification problem illustrated that the proposed method is effective in determining basic probability assignment and classification problem, and the proposed method shows more accurate results in which the classification accuracy reaches 96.7%.


Koedoe ◽  
2020 ◽  
Vol 62 (1) ◽  
Author(s):  
Jody M. Barends ◽  
Darren W. Pietersen ◽  
Guinevere Zambatis ◽  
Donovan R.C. Tye ◽  
Bryan Maritz

o effectively conserve and manage species, it is important to (1) understand how they are spatially distributed across the globe at both broad and fine spatial resolutions and (2) elucidate the determinants of these distributions. However, information pertaining to the distributions of many species remains poor as occurrence data are often scarce or collected with varying motivations, making the resulting patterns susceptible to sampling bias. Exacerbating an already limited quantity of occurrence data with an assortment of biases hinders their effectiveness for research, thus making it important to identify and understand the biases present within species occurrence data sets. We quantitatively assessed occurrence records of 126 reptile species occurring in the Kruger National Park (KNP), South Africa, to quantify the severity of sampling bias within this data set. We collated a data set of 7118 occurrence records from museum, literature and citizen science sources and analysed these at a biologically relevant spatial resolution of 1 km × 1 km. As a result of logistical challenges associated with sampling in KNP, approximately 92% of KNP is data deficient for reptile occurrences at the 1 km × 1 km resolution. Additionally, the spatial coverage of available occurrences varied at species and family levels, and the majority of occurrence records were strongly associated with publicly accessible human infrastructure. Furthermore, we found that sampled areas within KNP were not necessarily ecologically representative of KNP as a whole, suggesting that areas of unique environmental space remain to be sampled. Our findings highlight the need for substantially greater sampling effort for reptiles across KNP and emphasise the need to carefully consider the sampling biases within existing data should these be used for conservation management decision-making. Modelling species distributions could potentially serve as a short-term solution, but a concomitant increase in surveys across the park is needed.Conservation implications: The sampling biases present within KNP reptile occurrence data inhibit the inference of fine-scale species distributions within and across the park, which limits the usage of these data towards meaningfully informing conservation management decisions as applicable to reptile species in KNP.


F1000Research ◽  
2015 ◽  
Vol 3 ◽  
pp. 139 ◽  
Author(s):  
Giovanni Scardoni ◽  
Gabriele Tosadori ◽  
Mohammed Faizan ◽  
Fausto Spoto ◽  
Franco Fabbri ◽  
...  

The growing dimension and complexity of the available experimental data generating biological networks have increased the need for tools that help in categorizing nodes by their topological relevance. Here we present CentiScaPe, a Cytoscape app specifically designed to calculate centrality indexes used for the identification of the most important nodes in a network. CentiScaPe is a comprehensive suite of algorithms dedicated to network nodes centrality analysis, computing several centralities for undirected, directed and weighted networks. The results of the topological analysis can be integrated with data set from lab experiments, like expression or phosphorylation levels for each protein represented in the network. Our app opens new perspectives in the analysis of biological networks, since the integration of topological analysis with lab experimental data enhance the predictive power of the bioinformatics analysis.


Sign in / Sign up

Export Citation Format

Share Document