Learning Models
Recently Published Documents





2022 ◽  
Vol 9 (3) ◽  
pp. 1-22
Mohammad Daradkeh

This study presents a data analytics framework that aims to analyze topics and sentiments associated with COVID-19 vaccine misinformation in social media. A total of 40,359 tweets related to COVID-19 vaccination were collected between January 2021 and March 2021. Misinformation was detected using multiple predictive machine learning models. Latent Dirichlet Allocation (LDA) topic model was used to identify dominant topics in COVID-19 vaccine misinformation. Sentiment orientation of misinformation was analyzed using a lexicon-based approach. An independent-samples t-test was performed to compare the number of replies, retweets, and likes of misinformation with different sentiment orientations. Based on the data sample, the results show that COVID-19 vaccine misinformation included 21 major topics. Across all misinformation topics, the average number of replies, retweets, and likes of tweets with negative sentiment was 2.26, 2.68, and 3.29 times higher, respectively, than those with positive sentiment.

2021 ◽  
Vol 10 (1-2) ◽  
pp. 30-42
Guan-Yuan Wang

Abstract Since the smartphone market is an oligopoly market structure, consumer purchase intention is usually driven by brand preference. This research analyses the customer-to-customer market of second-hand smartphones, pointing out how the brand factor affects the consumers’ purchasing behaviour. It is found that the recovery value and life cycle of Apple smartphones are higher and longer than those of other brands. Moreover, the recovery value of other brand smartphones is significantly driven by the debut date of the Apple smartphones, implicitly forming a consumption cycle. In addition, through machine learning models, the predictability for the recovery value is able to reach 93.55%.

2021 ◽  
Vol 7 (1) ◽  
Elisabeth J. Schiessler ◽  
Tim Würger ◽  
Sviatlana V. Lamaka ◽  
Robert H. Meißner ◽  
Christian J. Cyron ◽  

AbstractThe degradation behaviour of magnesium and its alloys can be tuned by small organic molecules. However, an automatic identification of effective organic additives within the vast chemical space of potential compounds needs sophisticated tools. Herein, we propose two systematic approaches of sparse feature selection for identifying molecular descriptors that are most relevant for the corrosion inhibition efficiency of chemical compounds. One is based on the classical statistical tool of analysis of variance, the other one based on random forests. We demonstrate how both can—when combined with deep neural networks—help to predict the corrosion inhibition efficiencies of chemical compounds for the magnesium alloy ZE41. In particular, we demonstrate that this framework outperforms predictions relying on a random selection of molecular descriptors. Finally, we point out how autoencoders could be used in the future to enable even more accurate automated predictions of corrosion inhibition efficiencies.

2021 ◽  
Vol 14 (23) ◽  
Hany Gamal ◽  
Salaheldin Elkatatny ◽  
Ahmed Abdulhamid Mahmoud

Andrew McDonald ◽  

Decades of subsurface exploration and characterization have led to the collation and storage of large volumes of well-related data. The amount of data gathered daily continues to grow rapidly as technology and recording methods improve. With the increasing adoption of machine-learning techniques in the subsurface domain, it is essential that the quality of the input data is carefully considered when working with these tools. If the input data are of poor quality, the impact on precision and accuracy of the prediction can be significant. Consequently, this can impact key decisions about the future of a well or a field. This study focuses on well-log data, which can be highly multidimensional, diverse, and stored in a variety of file formats. Well-log data exhibits key characteristics of big data: volume, variety, velocity, veracity, and value. Well data can include numeric values, text values, waveform data, image arrays, maps, and volumes. All of which can be indexed by time or depth in a regular or irregular way. A significant portion of time can be spent gathering data and quality checking it prior to carrying out petrophysical interpretations and applying machine-learning models. Well-log data can be affected by numerous issues causing a degradation in data quality. These include missing data ranging from single data points to entire curves, noisy data from tool-related issues, borehole washout, processing issues, incorrect environmental corrections, and mislabeled data. Having vast quantities of data does not mean it can all be passed into a machine-learning algorithm with the expectation that the resultant prediction is fit for purpose. It is essential that the most important and relevant data are passed into the model through appropriate feature selection techniques. Not only does this improve the quality of the prediction, but it also reduces computational time and can provide a better understanding of how the models reach their conclusion. This paper reviews data quality issues typically faced by petrophysicists when working with well-log data and deploying machine-learning models. This is achieved by first providing an overview of machine learning and big data within the petrophysical domain, followed by a review of the common well-log data issues, their impact on machine-learning algorithms, and methods for mitigating their influence.

2021 ◽  
Igor Soares ◽  
Fernando Camargo ◽  
Adriano Marques ◽  
Oliver Crook

Abstract Genome engineering is undergoing unprecedented development and is now becoming widely available. To ensure responsible biotechnology innovation and to reduce misuse of engineered DNA sequences, it is vital to develop tools to identify the lab-of-origin of engineered plasmids. Genetic engineering attribution (GEA), the ability to make sequence-lab associations, would supportforensic experts in this process. Here, we propose a method, based on metric learning, that ranks the most likely labs-of-origin whilstsimultaneously generating embeddings for plasmid sequences and labs. These embeddings can be used to perform various downstreamtasks, such as clustering DNA sequences and labs, as well as using them as features in machine learning models. Our approach employsa circular shift augmentation approach and is able to correctly rank the lab-of-origin90%of the time within its top 10 predictions -outperforming all current state-of-the-art approaches. We also demonstrate that we can perform few-shot-learning and obtain76%top-10 accuracy using only10%of the sequences. This means, we outperform the previous CNN approach using only one-tenth of the data. We also demonstrate that we are able to extract key signatures in plasmid sequences for particular labs, allowing for an interpretable examination of the model’s outputs.CCS Concepts: Information systems→Similarity measures; Learning to rank.

2021 ◽  
Christina Humer ◽  
Henry Heberle ◽  
Floriane Montanari ◽  
Thomas Wolf ◽  
Florian Huber ◽  

The introduction of machine learning to small molecule research – an inherently multidisciplinary field in which chemists and data scientists combine their expertise and collaborate – has been vital to making screening processes more efficient. In recent years, numerous models that predict pharmacokinetic properties or bioactivity have been published, and these are used on a daily basis by chemists to make decisions and prioritize ideas. The emerging field of explainable artificial intelligence is opening up new possibilities for understanding the reasoning that underlies a model. In small molecule research, this means relating contributions of substructures of compounds to their predicted properties, which in turn also allows the areas of the compounds that have the greatest influence on the outcome to be identified. However, there is no interactive visualization tool that facilitates such interdisciplinary collaborations towards interpretability of machine learning models for small molecules. To fill this gap, we present CIME (ChemInformatics Model Explorer), an interactive web-based system that allows users to inspect chemical data sets, visualize model explanations, compare interpretability techniques, and explore subgroups of compounds. The tool is model-agnostic and can be run on a server or a workstation.

Sign in / Sign up

Export Citation Format

Share Document