binary file
Recently Published Documents


TOTAL DOCUMENTS

516
(FIVE YEARS 14)

H-INDEX

41
(FIVE YEARS 2)

2022 ◽  
Vol 22 (1) ◽  
pp. 1-22
Author(s):  
Yanchen Qiao ◽  
Weizhe Zhang ◽  
Xiaojiang Du ◽  
Mohsen Guizani

With the construction of smart cities, the number of Internet of Things (IoT) devices is growing rapidly, leading to an explosive growth of malware designed for IoT devices. These malware pose a serious threat to the security of IoT devices. The traditional malware classification methods mainly rely on feature engineering. To improve accuracy, a large number of different types of features will be extracted from malware files in these methods. That brings a high complexity to the classification. To solve these issues, a malware classification method based on Word2Vec and Multilayer Perception (MLP) is proposed in this article. First, for one malware sample, Word2Vec is used to calculate a word vector for all bytes of the binary file and all instructions in the assembly file. Second, we combine these vectors into a 256x256x2-dimensional matrix. Finally, we designed a deep learning network structure based on MLP to train the model. Then the model is used to classify the testing samples. The experimental results prove that the method has a high accuracy of 99.54%.


Author(s):  
Paul Ntim Yeboah ◽  
Stephen Kweku Amuquandoh ◽  
Haruna Balle Baz Musah

Conventional approaches to tackling malware attacks have proven to be futile at detecting never-before-seen (zero-day) malware. Research however has shown that zero-day malicious files are mostly semantic-preserving variants of already existing malware, which are generated via obfuscation methods. In this paper we propose and evaluate a machine learning based malware detection model using ensemble approach. We employ a strategy of ensemble where multiple feature sets generated from different n-gram sizes of opcode sequences are trained using a single classifier. Model predictions on the trained multi feature sets are weighted and combined on average to make a final verdict on whether a binary file is malicious or benign. To obtain optimal weight combination for the ensemble feature sets, we applied a grid search on a set of pre-defined weights in the range 0 to 1. With a balanced dataset of 2000 samples, an ensemble of n-gram opcode sequences of n sizes 1 and 2 with respective weight pair 0.3 and 0.7 yielded the best detection accuracy of 98.1% using random forest (RF) classifier. Ensemble n-gram sizes 2 and 3 obtained 99.7% as best precision using weight 0.5 for both models.


Author(s):  
Velin Kralev ◽  
Radoslava Kraleva ◽  
Petia Koprinkova-Hristova

Data modeling and data processing are important activities in any scientific research. This research focuses on the modeling of data and processing of data generated by a saccadometer. The approach used is based on the relational data model, but the processing and storage of the data is done with client datasets. The experiments were performed with 26 randomly selected files from a total of 264 experimental sessions. The data from each experimental session was stored in three different formats, respectively text, binary and extensible markup language (XML) based. The results showed that the text format and the binary format were the most compact. Several actions related to data processing were analyzed. Based on the results obtained, it was found that the two fastest actions are respectively loading data from a binary file and storing data into a binary file. In contrast, the two slowest actions were storing the data in XML format and loading the data from a text file, respectively. Also, one of the time-consuming operations turned out to be the conversion of data from text format to binary format. Moreover, the time required to perform this action does not depend in proportion on the number of records processed.


Author(s):  
Soham Pinge ◽  
Nitish Wadhavkar, ◽  
Amey Singh ◽  
Amol Deshpande

Analysis of dimensions of a product, its visualization as a 3D model & volume estimation, positively impact the operations of manufacturing department in an industry. In this paper, a system has been designed to integrate the proposed features like 3D reconstruction, dimensional analysis & support for manipulating the reconstructed model. Research was carried out to estimate the volume of objects using their virtual 3D model & allow the user to edit this model which later can be compared with the original object, for manipulation. The pictures of an object from different planes are captured and using image processing, it is reconstructed in 3D, which is used to estimate parameters like volume, surface area, maximum dimensions across different planes. This edited 3D model will be utilized for generating a binary file representing the changes. This system provides detailed information to the user along with flexibility, thus enhancing the way in which the manufacturing industry works, as it requires no manual input or interaction once the process begins.


2021 ◽  
Vol 7 ◽  
pp. e346
Author(s):  
Ferhat Ozgur Catak ◽  
Javed Ahmed ◽  
Kevser Sahinbas ◽  
Zahid Hussain Khand

Due to advancements in malware competencies, cyber-attacks have been broadly observed in the digital world. Cyber-attacks can hit an organization hard by causing several damages such as data breach, financial loss, and reputation loss. Some of the most prominent examples of ransomware attacks in history are WannaCry and Petya, which impacted companies’ finances throughout the globe. Both WannaCry and Petya caused operational processes inoperable by targeting critical infrastructure. It is quite impossible for anti-virus applications using traditional signature-based methods to detect this type of malware because they have different characteristics on each contaminated computer. The most important feature of this type of malware is that they change their contents using their mutation engines to create another hash representation of the executable file as they propagate from one computer to another. To overcome this method that attackers use to camouflage malware, we have created three-channel image files of malicious software. Attackers make different variants of the same software because they modify the contents of the malware. In the solution to this problem, we created variants of the images by applying data augmentation methods. This article aims to provide an image augmentation enhanced deep convolutional neural network (CNN) models for detecting malware families in a metamorphic malware environment. The main contributions of the article consist of three components, including image generation from malware samples, image augmentation, and the last one is classifying the malware families by using a CNN model. In the first component, the collected malware samples are converted into binary file to 3-channel images using the windowing technique. The second component of the system create the augmented version of the images, and the last part builds a classification model. This study uses five different deep CNN model for malware family detection. The results obtained by the classifier demonstrate accuracy up to 98%, which is quite satisfactory.


Author(s):  
William A Freyman ◽  
Kimberly F McManus ◽  
Suyash S Shringarpure ◽  
Ethan M Jewett ◽  
Katarzyna Bryc ◽  
...  

Abstract Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer (DTC) genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows-Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale datasets with millions of samples. Furthermore we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for non-commercial use in the code repository https://github.com/23andMe/phasedibd.


2020 ◽  
Vol 12 (12) ◽  
pp. 219
Author(s):  
Pin Yang ◽  
Huiyu Zhou ◽  
Yue Zhu ◽  
Liang Liu ◽  
Lei Zhang

The emergence of a large number of new malicious code poses a serious threat to network security, and most of them are derivative versions of existing malicious code. The classification of malicious code is helpful to analyze the evolutionary trend of malicious code families and trace the source of cybercrime. The existing methods of malware classification emphasize the depth of the neural network, which has the problems of a long training time and large computational cost. In this work, we propose the shallow neural network-based malware classifier (SNNMAC), a malware classification model based on shallow neural networks and static analysis. Our approach bridges the gap between precise but slow methods and fast but less precise methods in existing works. For each sample, we first generate n-grams from their opcode sequences of the binary file with a decompiler. An improved n-gram algorithm based on control transfer instructions is designed to reduce the n-gram dataset. Then, the SNNMAC exploits a shallow neural network, replacing the full connection layer and softmax with the average pooling layer and hierarchical softmax, to learn from the dataset and perform classification. We perform experiments on the Microsoft malware dataset. The evaluation result shows that the SNNMAC outperforms most of the related works with 99.21% classification precision and reduces the training time by more than half when compared with the methods using DNN (Deep Neural Networks).


2020 ◽  
Author(s):  
William A. Freyman ◽  
Kimberly F. McManus ◽  
Suyash S. Shringarpure ◽  
Ethan M. Jewett ◽  
Katarzyna Bryc ◽  
...  

AbstractEstimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer (DTC) genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows-Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale datasets with millions of samples. Furthermore we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for non-commercial use in the code repository https://github.com/23andMe/phasedibd.


2020 ◽  
Vol 36 (9) ◽  
pp. 2929-2931 ◽  
Author(s):  
Hakan Ozadam ◽  
Michael Geng ◽  
Can Cenik

Abstract Summary Ribosome occupancy measurements enable protein abundance estimation and infer mechanisms of translation. Recent studies have revealed that sequence read lengths in ribosome profiling data are highly variable and carry critical information. Consequently, data analyses require the computation and storage of multiple metrics for a wide range of ribosome footprint lengths. We developed a software ecosystem including a new efficient binary file format named ‘ribo’. Ribo files store all essential data grouped by ribosome footprint lengths. Users can assemble ribo files using our RiboFlow pipeline that processes raw ribosomal profiling sequencing data. RiboFlow is highly portable and customizable across a large number of computational environments with built-in capabilities for parallelization. We also developed interfaces for writing and reading ribo files in the R (RiboR) and Python (RiboPy) environments. Using RiboR and RiboPy, users can efficiently access ribosome profiling quality control metrics, generate essential plots and carry out analyses. Altogether, these components create a software ecosystem for researchers to study translation through ribosome profiling. Availability and implementation For a quickstart, please see https://ribosomeprofiling.github.io. Source code, installation instructions and links to documentation are available on GitHub: https://github.com/ribosomeprofiling. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Fatemeh Mansouri Hanis ◽  
Mehdi Teimouri

Abstract Objectives Classification of textual file formats is a topic of interest in network forensics. There are a few publicly available datasets of files with textual formats. Therewith, there is no public dataset for file fragments of textual file formats. So, a big research challenge in file fragment classification of textual file formats is to compare the performance of the developed methods over the same datasets. Data description In this study, we present a dataset that contains file fragments of five textual file formats: Binary file format for Word 97–Word 2003, Microsoft Word open XML format, portable document format, rich text file, and standard text document. This dataset contains the file fragments in three different languages: English, Persian, and Chinese. For each pair of file format and language, 1500 file fragments are provided. So, the dataset of file fragments contains 22,500 file fragments.


Sign in / Sign up

Export Citation Format

Share Document