scholarly journals CryptoChem for Encoding and Storing Information Using Chemical Structures

Author(s):  
Phyo Phyo Zin ◽  
Xinhao Li ◽  
Dhoha TRIKI ◽  
Denis Fourches

This study presents CryptoChem, a new method and associated software to securely store and transfer information using chemicals. Relying on the concept of Big Chemical Data, molecular descriptors and machine learning techniques, CryptoChem offers a highly complex and robust system with multiple layers of security for transmitting confidential information. This revolutionary technology adds fully untapped layers of complexity and is thus of relevance for different types of applications and users. The algorithm directly uses chemical structures and their properties as the central element of the secured storage. QSDR (Quantitative Structure-Data Relationship) models are used as private keys to encode and decode the data. Herein, we validate the software with a series of five datasets consisting of numerical and textual information with increasing size and complexity. We discuss <i>(i)</i> the initial concept and current features of CryptoChem, <i>(ii)</i> the associated Molread and Molwrite programs which encode messages as series of molecules and decodes them with an ensemble of QSDR machine learning models, <i>(iii)</i> the Analogue Retriever and Label Swapper methods, which enforce additional layers of security, <i>(iv)</i> the results of encoding and decoding the five datasets using CryptoChem, and (v) the comparison of CryptoChem to contemporary encryption methods. CryptoChem is freely available for testing at <a href="https://github.com/XinhaoLi74/CryptoChem">https://github.com/XinhaoLi74/CryptoChem</a>

2020 ◽  
Author(s):  
Phyo Phyo Zin ◽  
Xinhao Li ◽  
Dhoha TRIKI ◽  
Denis Fourches

This study presents CryptoChem, a new method and associated software to securely store and transfer information using chemicals. Relying on the concept of Big Chemical Data, molecular descriptors and machine learning techniques, CryptoChem offers a highly complex and robust system with multiple layers of security for transmitting confidential information. This revolutionary technology adds fully untapped layers of complexity and is thus of relevance for different types of applications and users. The algorithm directly uses chemical structures and their properties as the central element of the secured storage. QSDR (Quantitative Structure-Data Relationship) models are used as private keys to encode and decode the data. Herein, we validate the software with a series of five datasets consisting of numerical and textual information with increasing size and complexity. We discuss <i>(i)</i> the initial concept and current features of CryptoChem, <i>(ii)</i> the associated Molread and Molwrite programs which encode messages as series of molecules and decodes them with an ensemble of QSDR machine learning models, <i>(iii)</i> the Analogue Retriever and Label Swapper methods, which enforce additional layers of security, <i>(iv)</i> the results of encoding and decoding the five datasets using CryptoChem, and (v) the comparison of CryptoChem to contemporary encryption methods. CryptoChem is freely available for testing at <a href="https://github.com/XinhaoLi74/CryptoChem">https://github.com/XinhaoLi74/CryptoChem</a>


2020 ◽  
Vol 28 (2) ◽  
pp. 253-265 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Amauri Duarte da Silva ◽  
Walter Filgueira de Azevedo

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.


Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


2021 ◽  
Vol 11 (3) ◽  
pp. 1323
Author(s):  
Medard Edmund Mswahili ◽  
Min-Jeong Lee ◽  
Gati Lother Martin ◽  
Junghyun Kim ◽  
Paul Kim ◽  
...  

Cocrystals are of much interest in industrial application as well as academic research, and screening of suitable coformers for active pharmaceutical ingredients is the most crucial and challenging step in cocrystal development. Recently, machine learning techniques are attracting researchers in many fields including pharmaceutical research such as quantitative structure-activity/property relationship. In this paper, we develop machine learning models to predict cocrystal formation. We extract descriptor values from simplified molecular-input line-entry system (SMILES) of compounds and compare the machine learning models by experiments with our collected data of 1476 instances. As a result, we found that artificial neural network shows great potential as it has the best accuracy, sensitivity, and F1 score. We also found that the model achieved comparable performance with about half of the descriptors chosen by feature selection algorithms. We believe that this will contribute to faster and more accurate cocrystal development.


student performance measured in CO-PO (Course Outcome and Program Outcome) attainment for OMR based answer sheet automation playing very curtail role in pupil concert analysis in this approach. In the proposed work, marks evaluation sheet is consider as input image, then apply frame cropping technique to extract the marks filled table by subdividing into cells as individual images by frame cropping technique. In order to recognition of hand written digit in each frame, various machine learning models are adopted, trained. Experimental results from proposed work show that convolutional neural network excels higher in identification digits from frames. The outputs are then converted to CSV version, which is used to evaluate CO-PO attainment for each learner. The experiments have been conducted and tested in proposed work on various machine learning techniques and compared the results to pick the optimal model


2022 ◽  
pp. 220-249
Author(s):  
Md Ariful Haque ◽  
Sachin Shetty

Financial sectors are lucrative cyber-attack targets because of their immediate financial gain. As a result, financial institutions face challenges in developing systems that can automatically identify security breaches and separate fraudulent transactions from legitimate transactions. Today, organizations widely use machine learning techniques to identify any fraudulent behavior in customers' transactions. However, machine learning techniques are often challenging because of financial institutions' confidentiality policy, leading to not sharing the customer transaction data. This chapter discusses some crucial challenges of handling cybersecurity and fraud in the financial industry and building machine learning-based models to address those challenges. The authors utilize an open-source e-commerce transaction dataset to illustrate the forensic processes by creating a machine learning model to classify fraudulent transactions. Overall, the chapter focuses on how the machine learning models can help detect and prevent fraudulent activities in the financial sector in the age of cybersecurity.


Artificial intelligence (AI) can be implemented using Machine Learning which allows the computing to potentially robotically study and improve from its previous experiences without being manually typed. Data can be accessed and used by the computer programs developed using Machine learning. This paper mainly focused on implementation of machine learning in the arena of sports to predict the captivating team of an IPL match. Cricket is a popular uncertain sport, particularly the T-20 format, there’s a possibility of the complete game play to change with the effect of any single over. Millions of spectators watch the Indian Premier League (IPL) every year, hence it becomes a real-time problem to compose a technique that will forecast the conclusion of matches. Many aspects and features determine the result of a cricket match each of which has a weighted impact on the result of a T20 cricket match. This paper describes all those features in detail. A multivariate regression-based approach is proposed to measure the team's points in the league. The past performance of every team determines its probability of winning a match against a particular opponent. Finally, a set of seven factors or attributes is identified that can be used for predicting the IPL match winner. Various machine learning models were trained and used to perform within the time lapse between the toss and initiation of the match, to predict the winner. The performance of the model developed are evaluated with various classification techniques where Random Forest and Decision Tree have given good results.


Author(s):  
Daniel Elton ◽  
Zois Boukouvalas ◽  
Mark S. Butrico ◽  
Mark D. Fuge ◽  
Peter W. Chung

We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods - sum over bonds, custom descriptors, Coulomb matrices, bag of bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with 309 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights.


Sign in / Sign up

Export Citation Format

Share Document