scholarly journals DeepMAsED: evaluating the quality of metagenomic assemblies

2020 ◽  
Vol 36 (10) ◽  
pp. 3011-3017 ◽  
Author(s):  
Olga Mineeva ◽  
Mateo Rojas-Carulla ◽  
Ruth E Ley ◽  
Bernhard Schölkopf ◽  
Nicholas D Youngblut

Abstract Motivation Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies. Results We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications. Conclusions DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects. Availability and implementation DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Mateo Rojas-Carulla ◽  
Ruth E. Ley ◽  
Bernhard Schölkopf ◽  
Nicholas D. Youngblut

AbstractMotivation/backgroundMethodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large scale metagenome assemblies.ResultsWe present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates close to a 5% contig misassembly rate in two recent large-scale metagenome assembly publications.ConclusionsDeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modelling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects.AvailabilityDeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED.


Author(s):  
A. V. Ponomarev

Introduction: Large-scale human-computer systems involving people of various skills and motivation into the information processing process are currently used in a wide spectrum of applications. An acute problem in such systems is assessing the expected quality of each contributor; for example, in order to penalize incompetent or inaccurate ones and to promote diligent ones.Purpose: To develop a method of assessing the expected contributor’s quality in community tagging systems. This method should only use generally unreliable and incomplete information provided by contributors (with ground truth tags unknown).Results:A mathematical model is proposed for community image tagging (including the model of a contributor), along with a method of assessing the expected contributor’s quality. The method is based on comparing tag sets provided by different contributors for the same images, being a modification of pairwise comparison method with preference relation replaced by a special domination characteristic. Expected contributors’ quality is evaluated as a positive eigenvector of a pairwise domination characteristic matrix. Community tagging simulation has confirmed that the proposed method allows you to adequately estimate the expected quality of community tagging system contributors (provided that the contributors' behavior fits the proposed model).Practical relevance: The obtained results can be used in the development of systems based on coordinated efforts of community (primarily, community tagging systems). 


2015 ◽  
Vol 821-823 ◽  
pp. 528-532 ◽  
Author(s):  
Dirk Lewke ◽  
Karl Otto Dohnke ◽  
Hans Ulrich Zühlke ◽  
Mercedes Cerezuela Barret ◽  
Martin Schellenberger ◽  
...  

One challenge for volume manufacturing of 4H-SiC devices is the state-of-the-art wafer dicing technology – the mechanical blade dicing which suffers from high tool wear and low feed rates. In this paper we discuss Thermal Laser Separation (TLS) as a novel dicing technology for large scale production of SiC devices. We compare the latest TLS experimental data resulting from fully processed 4H-SiC wafers with results obtained by mechanical dicing technology. Especially typical product relevant features like process control monitoring (PCM) structures and backside metallization, quality of diced SiC-devices as well as productivity are considered. It could be shown that with feed rates up to two orders of magnitude higher than state-of-the-art, no tool wear and high quality of diced chips, TLS has a very promising potential to fulfill the demands of volume manufacturing of 4H-SiC devices.


2019 ◽  
Vol 11 (9) ◽  
pp. 190 ◽  
Author(s):  
Jamal ◽  
Xianqiao ◽  
Aldabbas

Emotions detection in social media is very effective to measure the mood of people about a specific topic, news, or product. It has a wide range of applications, including identifying psychological conditions such as anxiety or depression in users. However, it is a challenging task to distinguish useful emotions’ features from a large corpus of text because emotions are subjective, with limited fuzzy boundaries that may be expressed in different terminologies and perceptions. To tackle this issue, this paper presents a hybrid approach of deep learning based on TensorFlow with Keras for emotions detection on a large scale of imbalanced tweets’ data. First, preprocessing steps are used to get useful features from raw tweets without noisy data. Second, the entropy weighting method is used to compute the importance of each feature. Third, class balancer is applied to balance each class. Fourth, Principal Component Analysis (PCA) is applied to transform high correlated features into normalized forms. Finally, the TensorFlow based deep learning with Keras algorithm is proposed to predict high-quality features for emotions classification. The proposed methodology is analyzed on a dataset of 1,600,000 tweets collected from the website ‘kaggle’. Comparison is made of the proposed approach with other state of the art techniques on different training ratios. It is proved that the proposed approach outperformed among other techniques.


Complexity ◽  
2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Adrian Carballal ◽  
Carlos Fernandez-Lozano ◽  
Nereida Rodriguez-Fernandez ◽  
Luz Castro ◽  
Antonino Santos

An important topic in evolutionary art is the development of systems that can mimic the aesthetics decisions made by human begins, e.g., fitness evaluations made by humans using interactive evolution in generative art. This paper focuses on the analysis of several datasets used for aesthetic prediction based on ratings from photography websites and psychological experiments. Since these datasets present problems, we proposed a new dataset that is a subset of DPChallenge.com. Subsequently, three different evaluation methods were considered, one derived from the ratings available at DPChallenge.com and two obtained under experimental conditions related to the aesthetics and quality of images. We observed different criteria in the DPChallenge.com ratings, which had more to do with the photographic quality than with the aesthetic value. Finally, we explored learning systems other than state-of-the-art ones, in order to predict these three values. The obtained results were similar to those using state-of-the-art procedures.


2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Samira Melki ◽  
Moncef Gueddari

The production of phosphoric acid by the Tunisian Chemical Group, in Sfax, Tunisia, led to the degradation of the groundwater quality of the Sfax-Agareb aquifer mainly by the phosphogypsum leachates infiltration. Spatiotemporal monitoring of the quality of groundwater was carried out by performing bimonthly sampling between October 2013 and October 2014. Samples culled in the current study were subject to physicochemical parameters measurements and analysis of the major elements, orthophosphates, fluorine, trace metals, and stable isotopes (18O, 2H). The obtained results show that the phosphogypsum leachates infiltration has a major effect on the downstream part of the aquifer, where the highest values of conductivity, SO42-, Ortho-P, and F-, and the lowest pH were recorded. In addition, these results indicated that phosphogypsum leachates contained much higher amount of Cr, Cd, Zn, Cu, Fe, and Al compared to the groundwater. Spatiotemporal variation of the conductivity and concentrations of major elements is linked to the phosphogypsum leachates infiltration as well as to a wide range of factors such as the natural conditions of feeding and the water residence time. Contents of O18 and 2H showed that the water of the Sfax-Agareb aquifer undergoes a large scale evaporation process originated from recent rainfall.


2018 ◽  
Vol 175 ◽  
pp. 03001
Author(s):  
Han Yang ◽  
Chen Kerui ◽  
Li Yang ◽  
Qu Bao

In twenty-first Century, China vigorously promoted the research and construction of AC and DC transmission technology in order to ensure the optimal allocation of energy resources in a large scale[1]. In the construction of AC UHV transmission line, the welding quality of tower and stiffening plate as the load bearing tower and the tension of the welded structure plays an important role in the overall quality of the steel structure. In the past, the welding process of semi automatic carbon dioxide solid core welding wire often has the characteristics of weld spatter not easy to clean up and low efficiency of welding. The semi-automatic CO2 flux cored arc welding, has the characteristics of current and voltage to adapt to a wide range, melting speed, has important significance for improving the process, this paper describes the technology in practical engineering applications, and developed the basic strategy of training for grid steel structure welding technicians. This paper also lists both V groove plate butt FCAW welding typical welding project, hope this welding process will continue to spread.


2017 ◽  
Author(s):  
Roye Rozov ◽  
Gil Goldshlager ◽  
Eran Halperin ◽  
Ron Shamir

AbstractMotivationWe present Faucet, a 2-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased.ResultsFaucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata - coverage counts collected at junction k-mers and connections bridging between junction pairs - contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Faucet’s resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency - namely, Minia and LightAssembler. However, on metagenomes tested, Faucet’s outputs had 14-110% higher mean NGA50 lengths compared to Minia, and 2-11-fold higher mean NGA50 lengths compared to LightAssembler, the only other streaming assembler available.AvailabilityFaucet is available at https://github.com/Shamir-Lab/[email protected],[email protected] information:Supplementary data are available at Bioinformatics online.


Author(s):  
С.И. Носков ◽  
М.П. Базилевский ◽  
Ю.А. Трофимов ◽  
А. Буяннэмэх

В статье рассматривается проблема разработки (формирования) функции эффективности (агрегированного критерия, свертки критериев) входящих в состав Улан-Баторской железной дороги (УБЖД) участков, которая содержала бы специальным образом взвешенные частные характеристики качества функционирования этих участков. Решение этой проблемы осуществляется на основе разработанной в Иркутском государственном университете путей сообщения информационно-вычислительной технологии (ИВТ) многокритериального оценивания эффективности функционирования сложных социально-экономических и технических систем. ИВТ позволяет на модельном уровне оценивать эту эффективность одним числом (выраженным, например, в процентах), что открывает широкие возможности в управлении этими системами, поскольку позволяет выполнять, в частности, масштабный многофакторный сравнительный анализ деятельности однородных организационных и других структур и принимать на этой основе решения самого различного характера. Построена функция эффективности функционирования участков УБЖД, включающая в свой состав взвешенные частные индикаторы такой эффективности: погрузка, статическая нагрузка, выгрузка, отправление вагонов, перевозка пассажиров, простои вагонов с одной переработкой, простои местных вагонов, простои транзитных вагонов с переработкой, простои транзитных вагонов без переработки. На основе этой функции рассчитана масштабированная на сто процентов эффективность каждого участка. При этом все показатели предпочтения упорядочены по убыванию значимости. Подобная информация, формируемая с годичной периодичностью, может быть весьма полезна руководству УБЖД для принятия широкого спектра управленческих, в том числе кадровых, решений. Аналогичная работа может быть выполнена в интересах РАО РЖД. The article discusses the problem of developing (forming) an efficiency function (aggregated criterion, convolution of criteria) of the sections included in the Ulan Bator Railway (UBZhD), which would contain specially weighted private characteristics of the quality of the functioning of these sections. The solution to this problem is carried out on the basis of the information and computational technology (ICT) developed at the Irkutsk State University of communication lines for multi-criteria assessment of the effectiveness of the functioning of complex socio-economic and technical systems. IWT makes it possible at the model level to evaluate this efficiency by one number (expressed, for example, as a percentage), which opens up ample opportunities in the management of these systems, since it allows performing, in particular, a large-scale multifactorial comparative analysis of the activities of homogeneous organizational and other structures and on this basis solutions of the most varied nature. The function of the effectiveness of the functioning of the UBZhD sections has been built, which includes weighted private indicators of such efficiency: loading, static load, unloading, dispatch of cars, transportation of passengers, idle time of cars with one processing, idle time of local cars, idle time of transit cars with processing, idle time of transit cars without processing. Based on this function, a 100% scaled efficiency is calculated for each site. Moreover, all preference indicators are sorted in descending order of importance. Such information, generated on a yearly basis, can be very useful to the UBZhD leadership for making a wide range of managerial, including personnel, decisions. Similar work can be performed in the interests of RAO Russian Railways.


2021 ◽  
Vol 13 (19) ◽  
pp. 3971
Author(s):  
Wenxiang Chen ◽  
Yingna Li ◽  
Zhengang Zhao

Insulator detection is one of the most significant issues in high-voltage transmission line inspection using unmanned aerial vehicles (UAVs) and has attracted attention from researchers all over the world. The state-of-the-art models in object detection perform well in insulator detection, but the precision is limited by the scale of the dataset and parameters. Recently, the Generative Adversarial Network (GAN) was found to offer excellent image generation. Therefore, we propose a novel model called InsulatorGAN based on using conditional GANs to detect insulators in transmission lines. However, due to the fixed categories in datasets such as ImageNet and Pascal VOC, the generated insulator images are of a low resolution and are not sufficiently realistic. To solve these problems, we established an insulator dataset called InsuGenSet for model training. InsulatorGAN can generate high-resolution, realistic-looking insulator-detection images that can be used for data expansion. Moreover, InsulatorGAN can be easily adapted to other power equipment inspection tasks and scenarios using one generator and multiple discriminators. To give the generated images richer details, we also introduced a penalty mechanism based on a Monte Carlo search in InsulatorGAN. In addition, we proposed a multi-scale discriminator structure based on a multi-task learning mechanism to improve the quality of the generated images. Finally, experiments on the InsuGenSet and CPLID datasets demonstrated that our model outperforms existing state-of-the-art models by advancing both the resolution and quality of the generated images as well as the position of the detection box in the images.


Sign in / Sign up

Export Citation Format

Share Document