scholarly journals Detecting Malicious URLs using R-CNN and Cloud

Author(s):  
Miss. Priyanka Vasant Ambilwade

In today’s information age as use of websites, mobile apps and all forms of information sharing forms have increased which gave rise to malicious URL forms. These malicious URLs are forwarded and users attention is diverted from the main course for what he is searching to other non-necessary and harmful content, thus wasting a lot of time and money. Theses malicious URLs have given rise to authentication thefts, money thefts and bullying of a user who falls in to a trap set by hackers by accessing these URLs. To resolve and find a solution to this kind of menace there is need to detect and prevent users from accessing these URLs. So, while studying various techniques put forward by various authors in different research papers, we found a few techniques quite interesting and useful. The first is detecting malicious URLs using CNN and GRU. The second is where a text mining technique is proposed using Natural Language Processing (NLP) which can be used for classification. The third is a combination of CNN and NLP. By studying them we came to understand that there should be a combination of both NLP and CNN together to implement a successful malicious URL detection system. So, in our paper we are proposing a fusion of R-CNN, NLP and Cloud together. The main work in our paper is to collect malicious and healthy URL which will be done using internet and multiple sources and combined as one dataset. Thus, we will use Google cloud to create a blacklisted URL database of our own and not depend upon multiple sources internet for them. In our system first we will create a blacklist database on cloud and then apply classification on it using NLP and machine learning algorithm SVM. The second step will be to use same URL dataset to train a R-CNN AI algorithm and get an output in form of malicious identified URLs. Then in the final phase we will compare the final results from SVM and R-CNN and analyse which one is efficient and highs and lows of the technique.

Author(s):  
Yaseen Khather Yaseen ◽  
Alaa Khudhair Abbas ◽  
Ahmed M. Sana

Today, images are a part of communication between people. However, images are being used to share information by hiding and embedding messages within it, and images that are received through social media or emails can contain harmful content that users are not able to see and therefore not aware of. This paper presents a model for detecting spam on images. The model is a combination of optical character recognition, natural language processing, and the machine learning algorithm. Optical character recognition extracts the text from images, and natural language processing uses linguistics capabilities to detect and classify the language, to distinguish between normal text and slang language. The features for selected images are then extracted using the bag-of-words model, and the machine learning algorithm is run to detect any kind of spam that may be on it. Finally, the model can predict whether or not the image contains any harmful content. The results show that the proposed method using a combination of the machine learning algorithm, optical character recognition, and natural language processing provides high detection accuracy compared to using machine learning alone.


Lot of research has gone into Natural language processing and the state of the art algorithms in deep learning that unambiguously helps in converting an English text into a data structure without loss of meaning. Also with the advent of neural networks for learning word representations as vectors has helped a lot in revolutionizing the automatic feature extraction from text data corpus. A combination of word embedding and the use of a deep learning algorithm like a convolution neural network helped in better accuracy for text classification. In this era of Internet of things and the voluminous amounts of data that is overwhelming the users determining the veracity of the data is a very challenging task. There are many truth discovery algorithms in literature that help in resolving the conflicts that arise due to multiple sources of data. These algorithms help in estimating the trustworthiness of the data and reliability of the sources. In this paper, a convolution based truth discovery with multitasking is proposed to estimate the genuineness of the data for a given text corpus. The proposed algorithm has been tested on analysing the genuineness of Quora questions dataset and experimental results showed an improved accuracy and speed over other existing approaches.


Author(s):  
Zhiwen Xiong

AbstractMachine learning is a branch of the field of artificial intelligence. Deep learning is a complex machine learning algorithm that has unique advantages in image recognition, speech recognition, natural language processing, and industrial process control. Deep learning has It is widely used in the field of wireless communication. Prediction of geological disasters (such as landslides) is currently a difficult problem. Because landslides are difficult to detect in the early stage, this paper proposes a GPS-based wireless communication continuous detection system and applies it to landslide deformation monitoring to achieve early treatment and prevention. This article introduces the GPS multi-antenna detection system based on deep learning wireless communication, and introduces the time series analysis method and its application. The test results show that the GPS multi-antenna detection system of the wireless communication network has great advantages in response time, with high accuracy and small error. The horizontal accuracy is controlled at 0–2 mm and the vertical accuracy is about 1 mm. The analysis method is simple and efficient, and can obtain good results for short-term deformation prediction.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Lisa Grossman Liu ◽  
Raymond H. Grossman ◽  
Elliot G. Mitchell ◽  
Chunhua Weng ◽  
Karthik Natarajan ◽  
...  

AbstractThe recognition, disambiguation, and expansion of medical abbreviations and acronyms is of upmost importance to prevent medically-dangerous misinterpretation in natural language processing. To support recognition, disambiguation, and expansion, we present the Medical Abbreviation and Acronym Meta-Inventory, a deep database of medical abbreviations. A systematic harmonization of eight source inventories across multiple healthcare specialties and settings identified 104,057 abbreviations with 170,426 corresponding senses. Automated cross-mapping of synonymous records using state-of-the-art machine learning reduced redundancy, which simplifies future application. Additional features include semi-automated quality control to remove errors. The Meta-Inventory demonstrated high completeness or coverage of abbreviations and senses in new clinical text, a substantial improvement over the next largest repository (6–14% increase in abbreviation coverage; 28–52% increase in sense coverage). To our knowledge, the Meta-Inventory is the most complete compilation of medical abbreviations and acronyms in American English to-date. The multiple sources and high coverage support application in varied specialties and settings. This allows for cross-institutional natural language processing, which previous inventories did not support. The Meta-Inventory is available at https://bit.ly/github-clinical-abbreviations.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 656
Author(s):  
Xavier Larriva-Novo ◽  
Víctor A. Villagrá ◽  
Mario Vega-Barbas ◽  
Diego Rivera ◽  
Mario Sanz Rodrigo

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1081
Author(s):  
Tamon Miyake ◽  
Shintaro Yamamoto ◽  
Satoshi Hosono ◽  
Satoshi Funabashi ◽  
Zhengxue Cheng ◽  
...  

Gait phase detection, which detects foot-contact and foot-off states during walking, is important for various applications, such as synchronous robotic assistance and health monitoring. Gait phase detection systems have been proposed with various wearable devices, sensing inertial, electromyography, or force myography information. In this paper, we present a novel gait phase detection system with static standing-based calibration using muscle deformation information. The gait phase detection algorithm can be calibrated within a short time using muscle deformation data by standing in several postures; it is not necessary to collect data while walking for calibration. A logistic regression algorithm is used as the machine learning algorithm, and the probability output is adjusted based on the angular velocity of the sensor. An experiment is performed with 10 subjects, and the detection accuracy of foot-contact and foot-off states is evaluated using video data for each subject. The median accuracy is approximately 90% during walking based on calibration for 60 s, which shows the feasibility of the static standing-based calibration method using muscle deformation information for foot-contact and foot-off state detection.


2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Mohamed Idhammad ◽  
Karim Afdel ◽  
Mustapha Belouch

Cloud Computing services are often delivered through HTTP protocol. This facilitates access to services and reduces costs for both providers and end-users. However, this increases the vulnerabilities of the Cloud services face to HTTP DDoS attacks. HTTP request methods are often used to address web servers’ vulnerabilities and create multiple scenarios of HTTP DDoS attack such as Low and Slow or Flooding attacks. Existing HTTP DDoS detection systems are challenged by the big amounts of network traffic generated by these attacks, low detection accuracy, and high false positive rates. In this paper we present a detection system of HTTP DDoS attacks in a Cloud environment based on Information Theoretic Entropy and Random Forest ensemble learning algorithm. A time-based sliding window algorithm is used to estimate the entropy of the network header features of the incoming network traffic. When the estimated entropy exceeds its normal range the preprocessing and the classification tasks are triggered. To assess the proposed approach various experiments were performed on the CIDDS-001 public dataset. The proposed approach achieves satisfactory results with an accuracy of 99.54%, a FPR of 0.4%, and a running time of 18.5s.


Sign in / Sign up

Export Citation Format

Share Document