Stegomalware: A Systematic Survey of Malware Hiding and Detection in Images, Machine Learning Models and Research Challenges

Current Status ◽

Malware Analysis ◽

Source Of Infection ◽

Adversarial Networks ◽

File Formats

<div>Malware distribution to the victim network is commonly performed through file attachments in phishing email or downloading illegitimate files from the internet, when the victim interacts with the source of infection. To detect and prevent the malware distribution in the victim machine, the existing end device security applications may leverage sophisticated techniques such as signature-based or anomaly-based, machine learning techniques. The well-known file formats Portable Executable (PE) for Windows and Executable and Linkable Format (ELF) for Linux based operating system are used for malware analysis and the malware detection capabilities of these files has been well advanced for real time detection. But the malware payload hiding in multimedia like cover images using steganography detection has been a challenge for enterprises, as these are rarely seen and usually act as a stager in sophisticated attacks. In this article, to our knowledge, we are the first to try to address the knowledge gap between the current progress in image steganography and steganalysis academic research focusing on data hiding and the review of the stegomalware (malware payload hiding in images) targeting enterprises with cyberattacks current status. We present the stegomalware history, generation tools, file format specification description. Based on our findings, we perform the detail review of the image steganography techniques including the recent Generative Adversarial Networks (GAN) based models and the image steganalysis methods including the Deep Learning opportunities and challenges in stegomalware generation and detection are presented based on our findings.</div>

Handling Imbalanced Data in Intrusion Detection Systems using Generative Adversarial Networks

Research and Development on Information and Communication Technology ◽

10.32913/mic-ict-research.v2020.n1.894 ◽

2020 ◽

Vol 2020 (1) ◽

pp. 1-13

Author(s):

Ly Vu ◽

Quang Uy Nguyen

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Detection System ◽

Imbalanced Data ◽

Original Data ◽

Detection Systems ◽

Adversarial Networks ◽

Attack Data

Machine learning-based intrusion detection hasbecome more popular in the research community thanks to itscapability in discovering unknown attacks. To develop a gooddetection model for an intrusion detection system (IDS) usingmachine learning, a great number of attack and normal datasamples are required in the learning process. While normaldata can be relatively easy to collect, attack data is muchrarer and harder to gather. Subsequently, IDS datasets areoften dominated by normal data and machine learning modelstrained on those imbalanced datasets are ineffective in detect-ing attacks. In this paper, we propose a novel solution to thisproblem by using generative adversarial networks to generatesynthesized attack data for IDS. The synthesized attacks aremerged with the original data to form the augmented dataset.Three popular machine learning techniques are trained on theaugmented dataset. The experiments conducted on the threecommon IDS datasets and one our own dataset show thatmachine learning algorithms achieve better performance whentrained on the augmented dataset of the generative adversarialnetworks compared to those trained on the original datasetand other sampling techniques. The visualization techniquewas also used to analyze the properties of the synthesizeddata of the generative adversarial networks and the others.

Proposed Improvements for Automated Chemical Safety Evaluations Using In-Silico Techniques

10.20944/preprints202005.0408.v1 ◽

2020 ◽

Author(s):

Bryan Jordan

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Chemical Space ◽

Training Dataset ◽

Chemical Safety ◽

Adversarial Networks ◽

Wide Range ◽

Traditional Drug

The vastness of chemical-space constrains traditional drug-discovery methods to the organic laws that are guiding the chemistry involved in filtering through candidates. Leveraging computing with machine-learning to intelligently generate compounds that meet a wide range of objectives can bring significant gains in time and effort needed to filter through a broad range of candidates. This paper details how the use of Generative-Adversarial-Networks, novel machine learning techniques to format the training dataset and the use of quantum computing offer new ways to expedite drug-discovery.

Toward a Comparison of Classical and New Privacy Mechanism

Entropy ◽

10.3390/e23040467 ◽

2021 ◽

Vol 23 (4) ◽

pp. 467

Author(s):

Daniel Heredia-Ductram ◽

Miguel Nunez-del-Prado ◽

Hugo Alatrista-Salas

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Current Effort ◽

Privacy Concerns ◽

Adversarial Networks ◽

Learning Techniques ◽

Statistical Disclosure ◽

Big Data Technologies

In the last decades, the development of interconnectivity, pervasive systems, citizen sensors, and Big Data technologies allowed us to gather many data from different sources worldwide. This phenomenon has raised privacy concerns around the globe, compelling states to enforce data protection laws. In parallel, privacy-enhancing techniques have emerged to meet regulation requirements allowing companies and researchers to exploit individual data in a privacy-aware way. Thus, data curators need to find the most suitable algorithms to meet a required trade-off between utility and privacy. This crucial task could take a lot of time since there is a lack of benchmarks on privacy techniques. To fill this gap, we compare classical approaches of privacy techniques like Statistical Disclosure Control and Differential Privacy techniques to more recent techniques such as Generative Adversarial Networks and Machine Learning Copies using an entire commercial database in the current effort. The obtained results allow us to show the evolution of privacy techniques and depict new uses of the privacy-aware Machine Learning techniques.

Spatial landslide susceptibility assessment using machine learning techniques assisted by additional data created with generative adversarial networks

Geoscience Frontiers ◽

10.1016/j.gsf.2020.09.002 ◽

2021 ◽

Vol 12 (2) ◽

pp. 625-637

Author(s):

HusamA.H. Al-Najjar ◽

Biswajeet Pradhan

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Additional Data ◽

Susceptibility Assessment ◽

Landslide Susceptibility Assessment ◽

Adversarial Networks ◽

Learning Techniques

ORGANIC (1).pdf

10.26434/chemrxiv.5309668.v1 ◽

2017 ◽

Author(s):

Benjamin Sanchez-Lengeling ◽

Carlos Outeiral ◽

Gabriel L. Guimaraes ◽

Alan Aspuru-Guzik

Keyword(s):

Machine Learning ◽

Learning Community ◽

Chemical Species ◽

Material Design ◽

Organic Photovoltaic ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks ◽

Photovoltaic Material

Molecular discovery seeks to generate chemical species tailored to very specific needs. In this paper, we present ORGANIC, a framework based on Objective-Reinforced Generative Adversarial Networks (ORGAN), capable of producing a distribution over molecular space that matches with a certain set of desirable metrics. This methodology combines two successful techniques from the machine learning community: a Generative Adversarial Network (GAN), to create non-repetitive sensible molecular species, and Reinforcement Learning (RL), to bias this generative distribution towards certain attributes. We explore several applications, from optimization of random physicochemical properties to candidates for drug discovery and organic photovoltaic material design.

Cocrystal Prediction Using Machine Learning Models and Descriptors

Applied Sciences ◽

10.3390/app11031323 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1323

Author(s):

Medard Edmund Mswahili ◽

Min-Jeong Lee ◽

Gati Lother Martin ◽

Junghyun Kim ◽

Paul Kim ◽

...

Keyword(s):

Machine Learning ◽

Academic Research ◽

Pharmaceutical Research ◽

Learning Models ◽

Pharmaceutical Ingredients ◽

Learning Techniques ◽

Comparable Performance ◽

Selection Algorithms ◽

Machine Learning Models

Cocrystals are of much interest in industrial application as well as academic research, and screening of suitable coformers for active pharmaceutical ingredients is the most crucial and challenging step in cocrystal development. Recently, machine learning techniques are attracting researchers in many fields including pharmaceutical research such as quantitative structure-activity/property relationship. In this paper, we develop machine learning models to predict cocrystal formation. We extract descriptor values from simplified molecular-input line-entry system (SMILES) of compounds and compare the machine learning models by experiments with our collected data of 1476 instances. As a result, we found that artificial neural network shows great potential as it has the best accuracy, sensitivity, and F1 score. We also found that the model achieved comparable performance with about half of the descriptors chosen by feature selection algorithms. We believe that this will contribute to faster and more accurate cocrystal development.

Machine Learning Data Center Workloads Using Generative Adversarial Networks

ACM SIGMETRICS Performance Evaluation Review ◽

10.1145/3439602.3439611 ◽

2020 ◽

Vol 48 (2) ◽

pp. 21-23

Author(s):

Boudewijn R. Haverkort ◽

Felix Finkbeiner ◽

Pieter-Tjerk de Boer

Keyword(s):

Machine Learning ◽

Data Center ◽

Adversarial Networks ◽

Learning Data

A New Era of Neuro-Oncology Research Pioneered by Multi-Omics Analysis and Machine Learning

Biomolecules ◽

10.3390/biom11040565 ◽

2021 ◽

Vol 11 (4) ◽

pp. 565

Author(s):

Satoshi Takahashi ◽

Masamichi Takahashi ◽

Shota Tanaka ◽

Shunsaku Takayanagi ◽

Hirokazu Takami ◽

...

Keyword(s):

Machine Learning ◽

Clinical Situation ◽

Current Status ◽

Omics Data ◽

Amount Of Information ◽

Oncology Research ◽

Omics Analysis ◽

Learning Techniques ◽

Comprehensive Survey

Although the incidence of central nervous system (CNS) cancers is not high, it significantly reduces a patient’s quality of life and results in high mortality rates. A low incidence also means a low number of cases, which in turn means a low amount of information. To compensate, researchers have tried to increase the amount of information available from a single test using high-throughput technologies. This approach, referred to as single-omics analysis, has only been partially successful as one type of data may not be able to appropriately describe all the characteristics of a tumor. It is presently unclear what type of data can describe a particular clinical situation. One way to solve this problem is to use multi-omics data. When using many types of data, a selected data type or a combination of them may effectively resolve a clinical question. Hence, we conducted a comprehensive survey of papers in the field of neuro-oncology that used multi-omics data for analysis and found that most of the papers utilized machine learning techniques. This fact shows that it is useful to utilize machine learning techniques in multi-omics analysis. In this review, we discuss the current status of multi-omics analysis in the field of neuro-oncology and the importance of using machine learning techniques.