Malicious Javascript Detection based on Clustering Techniques

Malicious JavaScript code is still a problem for website and web users. The complication and equivocation of this code make the detection which is based on signatures of antivirus programs becomes ineffective. So far, the alternative methods using machine learning have achieved encouraging results, and have detected malicious JavaScript code with high accuracy. However, according to the supervised learning method, the models, which are introduced, depend on the number of labeled symbols and require significant computational resources to activate. The rapid growth of malicious JavaScript is a real challenge to the solutions based on supervised learning due to the lacking of experience in detecting new forms of malicious JavaScript code. In this paper, we deal with the challenge by the method of detecting malicious JavaScript based on clustering techniques. The known symbols that will be analyzed, the characteristics which are extracted, and a detection processing technique applied on output clusters are included in the model. This method is not computationally complicated, as well as the typical case experiments gave positive results; specifically, it has detected new forms of malicious JavaScript code.

Download Full-text

A Study on UV Filter Chemicals from Annex VII of European Union Directive 76/768/EEC, in the In Vitro 3T3 NRU Phototoxicity Test

Alternatives to Laboratory Animals ◽

10.1177/026119299802600511 ◽

1998 ◽

Vol 26 (5) ◽

pp. 679-708 ◽

Cited By ~ 5

Author(s):

Horst Spielmann ◽

Michael Balls ◽

Jack Dupuis ◽

Wolfgang J. W. Pape ◽

Odile de Silva ◽

...

Keyword(s):

Neutral Red ◽

Optimum Concentration ◽

Alternative Methods ◽

Uv Filter ◽

Eu Directive ◽

Optimum Concentration Range ◽

The Impact ◽

Positive Results

In 1996, the Scientific Committee on Cosmetology of DGXXIV of the European Commission asked the European Centre for the Validation of Alternative Methods to test eight UV filter chemicals from the 1995 edition of Annex VII of Directive 76/768/EEC in a blind trial in the in vitro 3T3 cell neutral red uptake phototoxicity (3T3 NRU PT) test, which had been scientifically validated between 1992 and 1996. Since all the UV filter chemicals on the positive list of EU Directive 76/768/EEC have been shown not to be phototoxic in vivo in humans under use conditions, only negative effects would be expected in the 3T3 NRU PT test. To balance the number of positive and negative chemicals, ten phototoxic and ten non-phototoxic chemicals were tested under blind conditions in four laboratories. Moreover, to assess the optimum concentration range for testing, information was provided on appropriate solvents and on the solubility of the coded chemicals. In this study, the phototoxic potential of test chemicals was evaluated in a prediction model in which either the Photoirritation Factor (PIF) or the Mean Photo Effect (MPE) were determined. The results obtained with both PIF and MPE were highly reproducible in the four laboratories, and the correlation between in vitro and in vivo data was almost perfect. All the phototoxic test chemicals provided a positive result at concentrations of 1μg/ml, while nine of the ten non-phototoxic chemicals gave clear negative results, even at the highest test concentrations. One of the UV filter chemicals gave positive results in three of the four laboratories only at concentrations greater than 100μg/ml; the other laboratory correctly identified all 20 of the test chemicals. An analysis of the impact that exposure concentrations had on the performance of the test revealed that the optimum concentration range in the 3T3 NRU PT test for determining the phototoxic potential of chemicals is between 0.1μg/ml and 10μg/ml, and that false positive results can be obtained at concentrations greater than 100μg/ml. Therefore, the positive results obtained with some of the UV filter chemicals only at concentrations greater than 100μg/ml do not indicate a phototoxic potential in vivo. When this information was taken into account during calculation of the overall predictivity of the 3T3 NRU PT test in the present study, an almost perfect correlation of in vitro versus in vivo results was obtained (between 95% and 100%), when either PIF or MPE were used to predict the phototoxic potential. The management team and participants therefore conclude that the 3T3 NRU PT test is a valid test for correctly assessing the phototoxic potential of UV filter chemicals, if the defined concentration limits are taken into account.

Download Full-text

Fine-Tuned Pre-Trained Model for Script Recognition

International Journal of Mathematical Engineering and Management Sciences ◽

10.33889/ijmems.2021.6.5.078 ◽

2021 ◽

Vol 6 (5) ◽

pp. 1297-1314

Author(s):

Mamta Bisht ◽

Richa Gupta

Keyword(s):

Alternative Methods ◽

Text Recognition ◽

Computational Power ◽

Preliminary Step ◽

Script Recognition ◽

Original Dataset ◽

Classification Tasks ◽

Computational Resources ◽

Numeral Recognition ◽

Fine Tune

Script recognition is the first necessary preliminary step for text recognition. In the deep learning era, for this task two essential requirements are the availability of a large labeled dataset for training and computational resources to train models. But if we have limitations on these requirements then we need to think of alternative methods. This provides an impetus to explore the field of transfer learning, in which the previously trained model knowledge established in the benchmark dataset can be reused in another smaller dataset for another task, thus saving computational power as it requires to train only less number of parameters from the total parameters in the model. Here we study two pre-trained models and fine-tune them for script classification tasks. Firstly, the VGG-16 pre-trained model is fine-tuned for publically available CVSI-15 and MLe2e datasets for script recognition. Secondly, a well-performed model on Devanagari handwritten characters dataset has been adopted and fine-tuned for the Kaggle Devanagari numeral dataset for numeral recognition. The performance of proposed fine-tune models is related to the nature of the target dataset as similar or dissimilar from the original dataset and it has been analyzed with widely used optimizers.

Download Full-text

Autonomous Unmanned Aerial Vehicle for Post-Disaster Management With Cognitive Radio Communication

International Journal of Ambient Computing and Intelligence ◽

10.4018/ijaci.2021010102 ◽

2021 ◽

Vol 12 (1) ◽

pp. 29-52

Author(s):

Raja Guru R. ◽

Naresh Kumar P.

Keyword(s):

Unmanned Aerial Vehicles ◽

Processing Technique ◽

High Accuracy ◽

Image Processing Technique ◽

Radio Communication ◽

Dynamic Changes ◽

General Graph ◽

Vision Based Navigation ◽

Aerial Vehicle ◽

Post Disaster

Unmanned aerial vehicles (UAV) play a significant role in finding victims affected in the post-disaster zone, where a man cannot risk his life under a critical condition of the disaster environment. The proposed design incorporates autonomous vision-based navigation through the disaster environment based on general graph theory with dynamic changes on the length between two or multiple nodes, where a node is a pathway. Camera fixed on it continuously captures the surrounding footage, processing it frame by frame on-site using image processing technique based on a SOC. Identifies victims in the zone and the pathways available for traversal. UAV uses an ultrasonic rangefinder to avoid collision with obstacles. The system alerts the rescue team if any victim detected and transmits the frames using CRN to the off-site console. UAV learns navigation policy that achieves high accuracy in real-time environments; communication using CRN is uninterrupted and useful during such emergencies.

Download Full-text

Categorization of Data Clustering Techniques

Handbook of Research on Public Information Technology ◽

10.4018/978-1-59904-857-4.ch052 ◽

2008 ◽

pp. 568-577

Author(s):

Baoying Wang ◽

Imad Rahal ◽

Richard Leipold

Keyword(s):

Unsupervised Learning ◽

Supervised Learning ◽

Data Clustering ◽

Analysis Data ◽

Discovery Process ◽

Data Set ◽

Market Basket ◽

Clustering Techniques ◽

Data Points ◽

Class Labels

Data clustering is a discovery process that partitions a data set into groups (clusters) such that data points within the same group have high similarity while being very dissimilar to points in other groups (Han & Kamber, 2001). The ultimate goal of data clustering is to discover natural groupings in a set of patterns, points, or objects without prior knowledge of any class labels. In fact, in the machine-learning literature, data clustering is typically regarded as a form of unsupervised learning as opposed to supervised learning. In unsupervised learning or clustering, there is no training function as in supervised learning. There are many applications for data clustering including, but not limited to, pattern recognition, data analysis, data compression, image processing, understanding genomic data, and market-basket research.

Download Full-text

Naturalism, tractability and the adaptive toolbox

Synthese ◽

10.1007/s11229-019-02431-2 ◽

2019 ◽

Cited By ~ 2

Author(s):

Patricia Rich ◽

Mark Blokpoel ◽

Ronald de Haan ◽

Maria Otworowska ◽

Marieke Sweers ◽

...

Keyword(s):

High Accuracy ◽

The Other ◽

Ecological Rationality ◽

Adaptation Process ◽

Robust Property ◽

The One ◽

Minimal Rationality ◽

Computational Resources ◽

Simple Heuristics ◽

Do So

Abstract Many compelling examples have recently been provided in which people can achieve impressive epistemic success, e.g. draw highly accurate inferences, by using simple heuristics and very little information. This is possible by taking advantage of the features of the environment. The examples suggest an easy and appealing naturalization of rationality: on the one hand, people clearly can apply simple heuristics, and on the other hand, they intuitively ought do so when this brings them high accuracy at little cost.. The ‘ought-can’ principle is satisfied, and rationality is meaningfully normative. We show, however, that this naturalization program is endangered by a computational wrinkle in the adaptation process taken to be responsible for this heuristics-based (‘ecological’) rationality: for the adaptation process to guarantee even minimal rationality, it requires astronomical computational resources, making the problem intractable. We consider various plausible auxiliary assumptions in attempt to remove this obstacle, and show that they do not succeed; intractability is a robust property of adaptation. We discuss the implications of our findings for the project of naturalizing rationality.

Download Full-text

Large covariance matrices: accurate models without mocks

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz1359 ◽

2019 ◽

Vol 487 (2) ◽

pp. 2701-2717 ◽

Cited By ~ 8

Author(s):

Ross O’Connell ◽

Daniel J Eisenstein

Keyword(s):

High Accuracy ◽

Covariance Matrices ◽

Covariance Matrix Estimation ◽

Good Precision ◽

Matrix Estimation ◽

Sample Covariance Matrices ◽

Accuracy And Precision ◽

Sample Covariance ◽

Free Parameters ◽

Computational Resources

Abstract Covariance matrix estimation is a persistent challenge for cosmology. We focus on a class of model covariance matrices that can be generated with high accuracy and precision, using a tiny fraction of the computational resources that would be required to achieve comparably precise covariance matrices using mock catalogues. In previous work, the free parameters in these models were determined using sample covariance matrices computed using a large number of mocks, but we demonstrate that those parameters can be estimated consistently and with good precision by applying jackknife methods to a single survey volume. This enables model covariance matrices that are calibrated from data alone, with no reference to mocks.

Download Full-text

Predicting the Authenticity of Banknotes Using Supervised Learning

American Journal of Advanced Computing ◽

10.15864/ajac.1204 ◽

2020 ◽

Vol 1 (2) ◽

pp. 1-4

Author(s):

Priyam Guha ◽

Abhishek Mukherjee ◽

Abhishek Verma

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Confusion Matrix ◽

Learning Algorithms ◽

High Accuracy ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

False Negatives ◽

Supervised Learning Algorithms ◽

Very High

This research paper deals with using supervised machine learning algorithms to detect authenticity of bank notes. In this research we were successful in achieving very high accuracy (of the order of 99%) by applying some data preprocessing tricks and then running the processed data on supervised learning algorithms like SVM, Decision Trees, Logistic Regression, KNN. We then proceed to analyze the misclassified points. We examine the confusion matrix to find out which algorithms had more number of false positives and which algorithm had more number of False negatives. This research paper deals with using supervised machine learning algorithms to detect authenticity of bank notes. In this research we were successful in achieving very high accuracy (of the order of 99%) by applying some data preprocessing tricks and then running the processed data on supervised learning algorithms like SVM, Decision Trees, Logistic Regression, KNN. We then proceed to analyze the misclassified points. We examine the confusion matrix to find out which algorithms had more number of false positives and which algorithm had more number of False negatives.

Download Full-text

Methods of Feature Weighting Calculation and Case Retrieval in CBR Case Base

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.490-491.1391 ◽

2014 ◽

Vol 490-491 ◽

pp. 1391-1398

Author(s):

Xiao Ping Li

Keyword(s):

Least Squares Method ◽

Least Square Method ◽

High Accuracy ◽

Typical Case ◽

Feature Weighting ◽

Least Square ◽

Case Retrieval ◽

Use Of Data ◽

On Line ◽

Simulation Results

This article proposes the necessity and feasibility of the use of Data Mining and Knowledge Discovery in CBR reasoning. This paper focuses on the method of empowering feature items based on least squares method parameter identification, and achieve the method of Similarity case retrieval on this basis, the object is the typical case database of railway rescue. The simulation results show that: the least square method can effectively make estimation and identification of the feature parameters, and can continuously correct on-line. High accuracy and fast convergence characteristics of the assigned parameters show that the algorithm has a certain application value.

Download Full-text

Automated Assignment of Helpdesk Email Tickets: An AI Lifecycle Case Study

AI Magazine ◽

10.1609/aimag.v41i3.5321 ◽

2020 ◽

Vol 41 (3) ◽

pp. 45-62

Author(s):

Shivali Agarwal ◽

Jayachandu Bandlamudi ◽

Atri Mandal ◽

Anupama Ray ◽

Giriprasad Sridhara

Keyword(s):

Service Providers ◽

High Accuracy ◽

Primary Objective ◽

Business Continuity ◽

Automated Assignment ◽

Rule Engine ◽

The Face ◽

Business Needs ◽

Computational Resources

In this article, we present an end-to-end automated helpdesk email ticket assignment system driven by high accuracy, coverage, business continuity, scalability, and optimal usage of computational resources. The primary objective of the system is to determine the problem mentioned in an incoming email ticket and then automatically dispatch it to an appropriate resolver group with high accuracy. While meeting this objective, it should also meet the objective of being able to operate at desired accuracy levels in the face of changing business needs by automatically adapting to the changes. The proposed system uses a system of classifiers with separate strategies for handling frequent and sparse resolver groups augmented with a semiautomatic rule engine and retraining strategies to ensure that it is accurate, robust, and adaptive to changing business needs. Our system has been deployed in the production of six major service providers in diverse service domains and currently assigns 100,000 emails per month, on an average, with an accuracy close to ninety percent and covering at least ninety percent of email tickets. This translates to achieving human-level accuracy and results in a net savings of more than 50,000 man-hours of effort per annum. To date, our deployed system has already served more than two million tickets in production.

Download Full-text

Locater

Proceedings of the VLDB Endowment ◽

10.14778/3430915.3430923 ◽

2020 ◽

Vol 14 (3) ◽

pp. 329-341

Author(s):

Yiming Lin ◽

Daokun Jiang ◽

Roberto Yus ◽

Georgios Bouloukakis ◽

Andrew Chio ◽

...

Keyword(s):

Supervised Learning ◽

Data Cleaning ◽

Probabilistic Method ◽

High Accuracy ◽

Learning Method ◽

Large Area ◽

Missing Value ◽

Access Points ◽

Conference Room ◽

Semantic Localization

This paper explores the data cleaning challenges that arise in using WiFi connectivity data to locate users to semantic indoor locations such as buildings, regions, rooms. WiFi connectivity data consists of sporadic connections between devices and nearby WiFi access points (APs), each of which may cover a relatively large area within a building. Our system, entitled semantic LOCATion cleanER (LOCATER), postulates semantic localization as a series of data cleaning tasks - first, it treats the problem of determining the AP to which a device is connected between any two of its connection events as a missing value detection and repair problem. It then associates the device with the semantic subregion (e.g., a conference room in the region) by postulating it as a location disambiguation problem. LOCATER uses a bootstrapping semi-supervised learning method for coarse localization and a probabilistic method to achieve finer localization. The paper shows that LOCATER can achieve significantly high accuracy at both the coarse and fine levels.

Download Full-text