scholarly journals Detecting communities by suspecting the maximum degree nodes

2019 ◽  
Vol 33 (13) ◽  
pp. 1950133 ◽  
Author(s):  
Mei Chen ◽  
Mei Zhang ◽  
Ming Li ◽  
Mingwei Leng ◽  
Zhichong Yang ◽  
...  

Detecting the natural communities in a real-world network can uncover its underlying structure and potential function. In this paper, a novel community algorithm SUM is introduced. The fundamental idea of SUM is that a node with relatively low degree stays faithful to its community, because it only has links with nodes in one community, while a node with relatively high degree not only has links with nodes within but also outside its community, and this may cause confusion when detecting communities. Based on this idea, SUM detects communities by suspecting the links of the maximum degree nodes to their neighbors within a community, and relying mainly on the nodes with relatively low degree simultaneously. SUM elegantly defines a similarity which takes into account both the commonality and the rejective degree of two adjacent nodes. After putting similar nodes into one community, SUM generates initial communities by reassigning the maximum degree nodes. Next, SUM assigns nodes without labels to the initial communities, and adjusts the border node to its most linked community. To evaluate the effectiveness of SUM, SUM is compared with seven baselines, including four classical and three state-of-the-art methods on a wide range of complex networks. On the small size networks with ground-truth community structures, results are visually demonstrated, as well as quantitatively measured with ARI, NMI and Modularity. On the relatively large size networks without ground-truth community structures, the performances of these algorithms are evaluated according to Modularity. Experimental results indicate that SUM can effectively determine community structures on small or relatively large size networks with high quality, and also outperforms the compared state-of-the-art methods.

2020 ◽  
Vol 36 (10) ◽  
pp. 3011-3017 ◽  
Author(s):  
Olga Mineeva ◽  
Mateo Rojas-Carulla ◽  
Ruth E Ley ◽  
Bernhard Schölkopf ◽  
Nicholas D Youngblut

Abstract Motivation Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies. Results We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications. Conclusions DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects. Availability and implementation DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Zbigniew Jelonek ◽  
Monika Fabiańska ◽  
Iwona Jelonek

Abstract Thirty-one batches of commercial charcoal from various regions of Poland and Germany were tested for the presence of twenty toxic elements and polycyclic aromatic hydrocarbons (PAHs) using gas chromatography - mass spectrometry (GC-MS). Elements that are toxic to living organisms were determined using atomic absorption spectroscopy (AAS). They were classified as elements representing a very high degree of hazard (As, Cd, Cu, Hg, and Pb), high degree of hazard (Zn, Ba, Cr, Mn, and Mo), moderate degree of hazard (Co, Ni, Sn, and Te), and a low degree of hazard for living organisms and the environment (Ag, Bi, Ce, Se, Sr, and Zr). When it comes to the most toxic elements, the highest concentration in the whole tested material was recorded for Cu. In addition, considerable amounts of Ba, Mn, and Sr, i.e., elements representing high or moderate degree of hazard, were found in the tested charcoals. Moreover, all charcoals contained a wide range of PAHs, from naphthalene to benzo(ghi)perylene, with concentrations in the range between 12.55 and 3554.11 ng/g of charcoal. In total, 25 unsubstituted PAHs were identified in the charcoal extracts. PAHs distributions were dominated by 5-ring PAHs. The results indicate the high carcinogenicity with ∑PAHcarc/∑PAHtot close to 1, as well as high TEQ and MEQ values. Thus, prolonged exposure to charcoal and charcoal dust might cause serious health problems. This applies to employees actively involved in the production and transport of charcoal, and, to a lesser extent, also to users of this fuel.


2019 ◽  
Vol 9 (20) ◽  
pp. 4364 ◽  
Author(s):  
Frédéric Bousefsaf ◽  
Alain Pruski ◽  
Choubeila Maaoui

Remote pulse rate measurement from facial video has gained particular attention over the last few years. Research exhibits significant advancements and demonstrates that common video cameras correspond to reliable devices that can be employed to measure a large set of biomedical parameters without any contact with the subject. A new framework for measuring and mapping pulse rate from video is presented in this pilot study. The method, which relies on convolutional 3D networks, is fully automatic and does not require any special image preprocessing. In addition, the network ensures concurrent mapping by producing a prediction for each local group of pixels. A particular training procedure that employs only synthetic data is proposed. Preliminary results demonstrate that this convolutional 3D network can effectively extract pulse rate from video without the need for any processing of frames. The trained model was compared with other state-of-the-art methods on public data. Results exhibit significant agreement between estimated and ground-truth measurements: the root mean square error computed from pulse rate values assessed with the convolutional 3D network is equal to 8.64 bpm, which is superior to 10 bpm for the other state-of-the-art methods. The robustness of the method to natural motion and increases in performance correspond to the two main avenues that will be considered in future works.


2020 ◽  
Vol 38 (2) ◽  
pp. 276-292
Author(s):  
Khawla Asmi ◽  
Dounia Lotfi ◽  
Mohamed El Marraki

Purpose The state-of-the-art methods designed for overlapping community detection are limited by their high execution time as in CPM or the need to provide some parameters like the number of communities in Bigclam and Nise_sph, which is a nontrivial information. Hence, there is a need to develop the accuracy that represents the primordial goal, where the actual state-of-the-art methods do not succeed to achieve high correspondence with the ground truth for many instances of networks. The paper aims to discuss this issue. Design/methodology/approach The authors offer a new method that explore the union of all maximum spanning trees (UMST) and models the strength of links between nodes. Also, each node in the UMST is linked with its most similar neighbor. From this model, the authors extract local community for each node, and then they combine the produced communities according to their number of shared nodes. Findings The experiments on eight real-world data sets and four sets of artificial networks show that the proposed method achieves obvious improvements over four state-of-the-art (BigClam, OSLOM, Demon, SE, DMST and ST) methods in terms of the F-score and ONMI for the networks with ground truth (Amazon, Youtube, LiveJournal and Orkut). Also, for the other networks, it provides communities with a good overlapping modularity. Originality/value In this paper, the authors investigate the UMST for the overlapping community detection.


Author(s):  
Kuo-Liang Chung ◽  
Yu-Ling Tseng ◽  
Tzu-Hsien Chan ◽  
Ching-Sheng Wang

In this paper, we rst propose a fast and eective region-based depth map upsampling method, and then propose a joint upsampling and location map-free reversible data hiding method, simpled called the JUR method. In the proposed upsampling method, all the missing depth pixels are partitioned into three disjoint regions: the homogeneous, semi-homogeneous, and non- homogeneous regions. Then, we propose the depth copying, mean value, and bicubic interpolation approaches to reconstruct the three kinds of missing depth pixels quickly, respectively. In the proposed JUR method, without any location map overhead, using the neighboring ground truth depth pixels of each missing depth pixel, achieving substantial quality, and embedding capacity merits. The comprehensive experiments have been carried out to not only justify the execution-time and quality merits of the upsampled depth maps by our upsampling method relative to the state-of-the-art methods, but also justify the embedding capacity and quality merits of our JUR method when compared with the state-of-the-art methods.


2019 ◽  
Author(s):  
Mateo Rojas-Carulla ◽  
Ruth E. Ley ◽  
Bernhard Schölkopf ◽  
Nicholas D. Youngblut

AbstractMotivation/backgroundMethodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large scale metagenome assemblies.ResultsWe present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates close to a 5% contig misassembly rate in two recent large-scale metagenome assembly publications.ConclusionsDeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modelling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects.AvailabilityDeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED.


Author(s):  
Anjali Sifar ◽  
Nisheeth Srivastava

Supervised learning operates on the premise that labels unambiguously represent ground truth. This premise is reasonable in domains wherein a high degree of consensus is easily possible for any given data record, e.g. in agreeing on whether an image contains an elephant or not. However, there are several domains wherein people disagree with each other on the appropriate label to assign to a record, e.g. whether a tweet is toxic. We argue that data labeling must be understood as a process with some degree of domain-dependent noise and that any claims of predictive prowess must be sensitive to the degree of this noise. We present a method for quantifying labeling noise in a particular domain wherein people are seen to disagree with their own past selves on the appropriate label to assign to a record: choices under prospect uncertainty. Our results indicate that `state-of-the-art' choice models of decisions from description, by failing to consider the intrinsic variability of human choice behavior, find themselves in the odd position of predicting humans' choices better than the same humans' own previous choices for the same problem. We conclude with observations on how the predicament we empirically demonstrate in our work could be handled in the practice of supervised learning.


Resources ◽  
2021 ◽  
Vol 10 (7) ◽  
pp. 69
Author(s):  
Zbigniew Jelonek ◽  
Monika Fabiańska ◽  
Iwona Jelonek

Thirty-one batches of commercial charcoal from various regions of Poland and Germany were tested for the presence of 20 toxic elements and polycyclic aromatic hydrocarbons (PAHs) using gas chromatography-mass spectrometry (GC-MS). Elements that are toxic to living organisms were determined using atomic absorption spectroscopy (AAS). They were classified as elements representing a very high degree of hazard (As, Cd, Cu, Hg, and Pb), high degree of hazard (Zn, Ba, Cr, Mn, and Mo), moderate degree of hazard (Co, Ni, Sn, and Te), and a low degree of hazard for living organisms and the environment (Ag, Bi, Ce, Se, Sr, and Zr). In regard to the most toxic elements, the highest concentration in the whole tested material was recorded for Cu. In addition, considerable amounts of Ba, Mn, and Sr, i.e., elements representing a high or moderate degree of hazard, were found in the tested charcoals. Moreover, all charcoals contained a wide range of PAHs, from naphthalene to benzo(ghi)perylene, with concentrations in a range between 12.55 and 3554.11 ng/g charcoal. In total, 25 unsubstituted PAHs were identified in the charcoal extracts. PAHs distributions were dominated by five-ring PAHs. The results indicate high carcinogenicity with ∑PAHcarc/∑PAHtot close to 1, as well as high TEQ and MEQ values. Thus, prolonged exposure to charcoal and charcoal dust might cause serious health problems. This applies to employees actively involved in the production and transport of charcoal and, to a lesser extent, users of this fuel.


Author(s):  
Elena Genad’evna Sidorova

The definition of linguo-ecology as an interdisciplinary science that studies language phenomena as part of the system, as well as through the prism of the communicative needs of native speakers, is taken as the starting point. The object of the research is the geographical names proper, nominating settlements, – oikonyms. The concept of the oikonymic space of a region is introduced, which is a set of fixed nominations of settlements located in a certain territory. The main problems that violate the criteria for the ecology of the names of settlements are characterized. A scale of oikonyms was developed and presented in accordance with the degree of their uniqueness within the region, including four degrees of uniqueness: the unique degree itself has the maximum degree; high degree-single-root morphemic close names; medium degree-complex or compound oikonyms; completely identical, doublet oikonyms are characterized by a low degree. The zones of linguo-ecological risk in the oikonymic space of the Volgograd region are indicated, which are connected with the arising contradictions between the desire for accuracy in nomination and the tendency to save speech. Defectiveness of symbolic names of settlements, which do not reflect their possible relationship with specific geographic objects, as well as the unjustified nature of certain renames, leading to a violation of the principle of historical continuity, was demonstrated.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Claudio Durán ◽  
Alessandro Muscoloni ◽  
Carlo Vittorio Cannistraci

AbstractMarkov clustering is an effective unsupervised pattern recognition algorithm for data clustering in high-dimensional feature space. However, its community detection performance in complex networks has been demonstrating results far from the state of the art methods such as Infomap and Louvain. The crucial issue is to convert the unweighted network topology in a ‘smart-enough’ pre-weighted connectivity that adequately steers the stochastic flow procedure behind Markov clustering. Here we introduce a conceptual innovation and we discuss how to leverage network latent geometry notions in order to design similarity measures for pre-weighting the adjacency matrix used in Markov clustering community detection. Our results demonstrate that the proposed strategy improves Markov clustering significantly, to the extent that it is often close to the performance of current state of the art methods for community detection. These findings emerge considering both synthetic ‘realistic’ networks (with known ground-truth communities) and real networks (with community metadata), and even when the real network connectivity is corrupted by noise artificially induced by missing or spurious links. Our study enhances the generalized understanding of how network geometry plays a fundamental role in the design of algorithms based on network navigability.


Sign in / Sign up

Export Citation Format

Share Document