Business analytics using machine learning and large-scale textual data: Three essays

Abstract Cyberspace is a vast soapbox for people to post anything that they witness in their day-to-day lives. Subsequently, it can be used as a very effective tool in detecting the stress levels of an individual based on the posts and comments shared by him/her on social networking platforms. We leverage large-scale datasets with tweets to successfully accomplish sentiment analysis with the aid of machine learning algorithms. We take the help of a capable deep learning pre-trained model called BERT to solve the problems which come with sentiment classification. The BERT model outperforms a lot of other well-known models for this job without any sophisticated architecture. We also adopted Latent Dirichlet Allocation which is an unsupervised machine learning method that’s skilled in scanning a group of documents, recognizing the word and phrase patterns within them, and gathering word groups and alike expressions that most precisely illustrate a set of documents. This helps us predict which topic is linked to the textual data. With the aid of the models suggested, we will be able to detect the emotion of users online. We are primarily working with Twitter data because Twitter is a website where people express their thoughts often. In conclusion, this proposal is for the well- being of one’s mental health. The results are evaluated using various metric at macro and micro level and indicate that the trained model detects the status of emotions bases on social interactions.

Download Full-text

Domain Independent Automatic Labeling system for Large-scale Social Data using Lexicon and Web-based Augmentation

Information Technology And Control ◽

10.5755/j01.itc.49.1.23769 ◽

2020 ◽

Vol 49 (1) ◽

pp. 36-54

Author(s):

Shaheen Khatoon ◽

Lamis Abu Romman

Keyword(s):

Machine Learning ◽

Large Scale ◽

Web Based ◽

Stage Of Development ◽

Textual Data ◽

Semantic Orientation ◽

Training Examples ◽

Domain Independent ◽

Labeling System ◽

Tedious Process

Recently, with the large-scale adoption of social media, people have begun to express their opinion on these sites in the form of reviews. Potential consumers often forced to wade through huge amount of reviews to make informed decision. Sentiment analysis has become rapid and effective way to automatically gauge consumers’ opinion. However, such analysis often requires tedious process of manual tagging of large training examples or manually building a lexicon for the purpose of classifying reviews as positive or negative. In this paper, we present a method to automate the tedious process of labeling large textual data in an unsupervised, domain independent and scalable manner. The proposed method combines the lexicon-based and Web-based Point Wise Mutual Information (PMI) statistics to find the Semantic Orientation (SO) of opinion expressed in a review. Based on proposed methods a system called Domain Independent Automatic Labeling System (DIALS) has been implemented, which takes collection of text from any domain as input and generates fully labeled dataset in an unsupervised and scalable manner. The result generated can be used to track and summarize online discussion and/or use to train any classifier in the next stage of development. The effectiveness of system is tested by comparing it with baseline machine learning and lexicon-based methods. Experiments on multi-domains dataset has shown that proposed method consistently shown improved recall and accuracy as compared to baseline machine learning and lexicon-based methods.

Download Full-text

Information And Map-Made Control With The Functions Of Business Analytics For Urban Management

Promyshlennoe i Grazhdanskoe Stroitel stvo ◽

10.33622/0869-7019.2019.08.72-78 ◽

2019 ◽

pp. 72-78

Keyword(s):

Large Scale ◽

Information Support ◽

Urban Management ◽

Business Analytics ◽

Administrative District ◽

The North ◽

Model Of Management ◽

Complex Development ◽

Coordination System ◽

Key Aspects

The key aspects of the process of designing and developing an information and cartographic control tool with business analytics functions for the municipal level of urban management are considered. The review of functionality of the developed tool is given. Examples of its use for the analysis and monitoring of implementation of the program of complex development of territories are given. The importance of application of information support of management and coordination at all levels of management as an integral part of the basic model of management and coordination system of large-scale urban projects of dispersed construction is proved. Information and map-made tool with business intelligence functions was used and was highly appreciated in the preparation of information-analytical and presentation materials of the North-Eastern Administrative District of Moscow. Its use made it possible to significantly optimize the list of activities of the program of integrated development of territories, their priority and timing.

Download Full-text

Large-Scale Data Learning Method for Anomaly Detection using Machine Learning for Monitoring Vibration in Vehicle Equipment

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.140.480 ◽

2020 ◽

Vol 140 (6) ◽

pp. 480-487

Author(s):

Minoru Kondo

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Large Scale ◽

Learning Method ◽

Large Scale Data ◽

Scale Data

Download Full-text

Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning

10.1561/9781680837056 ◽

2020 ◽

Author(s):

Songze Li ◽

Salman Avestimehr

Keyword(s):

Machine Learning ◽

Distributed Computing ◽

Large Scale

Download Full-text

Evolution of Metastable Structures in Bimetallic Catalysts from Microscopy and Machine-Learning Molecular Dynamics

10.26434/chemrxiv.11811660.v1 ◽

2020 ◽

Author(s):

Jin Soo Lim ◽

Jonathan Vandermause ◽

Matthijs A. van Spronsen ◽

Albert Musaelian ◽

Christopher R. O’Connor ◽

...

Keyword(s):

Machine Learning ◽

Molecular Dynamics ◽

Large Scale ◽

Materials Science ◽

Complete Characterization ◽

Layer By Layer ◽

Surface Restructuring ◽

Metastable Structures ◽

Mechanistic Investigation ◽

Underlying Mechanisms

Restructuring of interface plays a crucial role in materials science and heterogeneous catalysis. Bimetallic systems, in particular, often adopt very different composition and morphology at surfaces compared to the bulk. For the first time, we reveal a detailed atomistic picture of the long-timescale restructuring of Pd deposited on Ag, using microscopy, spectroscopy, and novel simulation methods. Encapsulation of Pd by Ag always precedes layer-by-layer dissolution of Pd, resulting in significant Ag migration out of the surface and extensive vacancy pits. These metastable structures are of vital catalytic importance, as Ag-encapsulated Pd remains much more accessible to reactants than bulk-dissolved Pd. The underlying mechanisms are uncovered by performing fast and large-scale machine-learning molecular dynamics, followed by our newly developed method for complete characterization of atomic surface restructuring events. Our approach is broadly applicable to other multimetallic systems of interest and enables the previously impractical mechanistic investigation of restructuring dynamics.

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>

Download Full-text

Efficient Image Retrieval approach for Large-scale Chest X Ray data using Hand-Crafted Features and Machine Learning Algorithms

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i11.890896 ◽

2018 ◽

Vol 6 (11) ◽

pp. 890-896

Author(s):

Irene Getzi S ◽

D. Christopher Durairaj ◽

V Joseph Raj

Keyword(s):

Machine Learning ◽

Image Retrieval ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

X Ray ◽

Chest X Ray

Download Full-text

Adsorption Isotherm Predictions for Multiple Molecules in MOFs Using the Same Deep Learning Model

10.26434/chemrxiv.9894224.v1 ◽

2019 ◽

Author(s):

Ryther Anderson ◽

Achay Biong ◽

Diego Gómez-Gualdrón

Keyword(s):

Neural Network ◽

Machine Learning ◽

Molecular Simulation ◽

Large Scale ◽

Learning Model ◽

Operating Conditions ◽

Small Subset ◽

Screening Methods ◽

Large Set ◽

Metal Organic

<div>Tailoring the structure and chemistry of metal-organic frameworks (MOFs) enables the manipulation of their adsorption properties to suit specific energy and environmental applications. As there are millions of possible MOFs (with tens of thousands already synthesized), molecular simulation, such as grand canonical Monte Carlo (GCMC), has frequently been used to rapidly evaluate the adsorption performance of a large set of MOFs. This allows subsequent experiments to focus only on a small subset of the most promising MOFs. In many instances, however, even molecular simulation becomes prohibitively time consuming, underscoring the need for alternative screening methods, such as machine learning, to precede molecular simulation efforts. In this study, as a proof of concept, we trained a neural network as the first example of a machine learning model capable of predicting full adsorption isotherms of different molecules not included in the training of the model. To achieve this, we trained our neural network only on alchemical species, represented only by their geometry and force field parameters, and used this neural network to predict the loadings of real adsorbates. We focused on predicting room temperature adsorption of small (one- and two-atom) molecules relevant to chemical separations. Namely, argon, krypton, xenon, methane, ethane, and nitrogen. However, we also observed surprisingly promising predictions for more complex molecules, whose properties are outside the range spanned by the alchemical adsorbates. Prediction accuracies suitable for large-scale screening were achieved using simple MOF (e.g. geometric properties and chemical moieties), and adsorbate (e.g. forcefield parameters and geometry) descriptors. Our results illustrate a new philosophy of training that opens the path towards development of machine learning models that can predict the adsorption loading of any new adsorbate at any new operating conditions in any new MOF.</div>

Download Full-text

Recent Progress in Machine Learning-based Prediction of Peptide Activity for Drug Discovery

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666190122151634 ◽

2019 ◽

Vol 19 (1) ◽

pp. 4-16 ◽

Cited By ~ 6

Author(s):

Qihui Wu ◽

Hanzhong Ke ◽

Dongli Li ◽

Qi Wang ◽

Jiansong Fang ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Large Scale ◽

Recent Progress ◽

High Specificity ◽

Learning Approaches ◽

Anticancer Peptides ◽

The Past ◽

Traditional Approaches ◽

Large Scale Screening

Over the past decades, peptide as a therapeutic candidate has received increasing attention in drug discovery, especially for antimicrobial peptides (AMPs), anticancer peptides (ACPs) and antiinflammatory peptides (AIPs). It is considered that the peptides can regulate various complex diseases which are previously untouchable. In recent years, the critical problem of antimicrobial resistance drives the pharmaceutical industry to look for new therapeutic agents. Compared to organic small drugs, peptide- based therapy exhibits high specificity and minimal toxicity. Thus, peptides are widely recruited in the design and discovery of new potent drugs. Currently, large-scale screening of peptide activity with traditional approaches is costly, time-consuming and labor-intensive. Hence, in silico methods, mainly machine learning approaches, for their accuracy and effectiveness, have been introduced to predict the peptide activity. In this review, we document the recent progress in machine learning-based prediction of peptides which will be of great benefit to the discovery of potential active AMPs, ACPs and AIPs.

Download Full-text