scholarly journals Automated Drone Detection Using YOLOv4

Drones ◽  
2021 ◽  
Vol 5 (3) ◽  
pp. 95
Author(s):  
Subroto Singha ◽  
Burchan Aydin

Drones are increasing in popularity and are reaching the public faster than ever before. Consequently, the chances of a drone being misused are multiplying. Automated drone detection is necessary to prevent unauthorized and unwanted drone interventions. In this research, we designed an automated drone detection system using YOLOv4. The model was trained using drone and bird datasets. We then evaluated the trained YOLOv4 model on the testing dataset, using mean average precision (mAP), frames per second (FPS), precision, recall, and F1-score as evaluation parameters. We next collected our own two types of drone videos, performed drone detections, and calculated the FPS to identify the speed of detection at three altitudes. Our methodology showed better performance than what has been found in previous similar studies, achieving a mAP of 74.36%, precision of 0.95, recall of 0.68, and F1-score of 0.79. For video detection, we achieved an FPS of 20.5 on the DJI Phantom III and an FPS of 19.0 on the DJI Mavic Pro.

Author(s):  
Anthony Anggrawan ◽  
Azhari

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.


2018 ◽  
Vol 10 (1) ◽  
pp. 57-64 ◽  
Author(s):  
Rizqa Raaiqa Bintana ◽  
Chastine Fatichah ◽  
Diana Purwitasari

Community-based question answering (CQA) is formed to help people who search information that they need through a community. One condition that may occurs in CQA is when people cannot obtain the information that they need, thus they will post a new question. This condition can cause CQA archive increased because of duplicated questions. Therefore, it becomes important problems to find semantically similar questions from CQA archive towards a new question. In this study, we use convolutional neural network methods for semantic modeling of sentence to obtain words that they represent the content of documents and new question. The result for the process of finding the same question semantically to a new question (query) from the question-answer documents archive using the convolutional neural network method, obtained the mean average precision value is 0,422. Whereas by using vector space model, as a comparison, obtained mean average precision value is 0,282. Index Terms—community-based question answering, convolutional neural network, question retrieval


2021 ◽  
pp. 1-11
Author(s):  
Tingting Zhao ◽  
Xiaoli Yi ◽  
Zhiyong Zeng ◽  
Tao Feng

YTNR (Yunnan Tongbiguan Nature Reserve) is located in the westernmost part of China’s tropical regions and is the only area in China with the tropical biota of the Irrawaddy River system. The reserve has abundant tropical flora and fauna resources. In order to realize the real-time detection of wild animals in this area, this paper proposes an improved YOLO (You only look once) network. The original YOLO model can achieve higher detection accuracy, but due to the complex model structure, it cannot achieve a faster detection speed on the CPU detection platform. Therefore, the lightweight network MobileNet is introduced to replace the backbone feature extraction network in YOLO, which realizes real-time detection on the CPU platform. In response to the difficulty in collecting wild animal image data, the research team deployed 50 high-definition cameras in the study area and conducted continuous observations for more than 1,000 hours. In the end, this research uses 1410 images of wildlife collected in the field and 1577 wildlife images from the internet to construct a research data set combined with the manual annotation of domain experts. At the same time, transfer learning is introduced to solve the problem of insufficient training data and the network is difficult to fit. The experimental results show that our model trained on a training set containing 2419 animal images has a mean average precision of 93.6% and an FPS (Frame Per Second) of 3.8 under the CPU. Compared with YOLO, the mean average precision is increased by 7.7%, and the FPS value is increased by 3.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1820
Author(s):  
Xiaotao Shao ◽  
Qing Wang ◽  
Wei Yang ◽  
Yun Chen ◽  
Yi Xie ◽  
...  

The existing pedestrian detection algorithms cannot effectively extract features of heavily occluded targets which results in lower detection accuracy. To solve the heavy occlusion in crowds, we propose a multi-scale feature pyramid network based on ResNet (MFPN) to enhance the features of occluded targets and improve the detection accuracy. MFPN includes two modules, namely double feature pyramid network (FPN) integrated with ResNet (DFR) and repulsion loss of minimum (RLM). We propose the double FPN which improves the architecture to further enhance the semantic information and contours of occluded pedestrians, and provide a new way for feature extraction of occluded targets. The features extracted by our network can be more separated and clearer, especially those heavily occluded pedestrians. Repulsion loss is introduced to improve the loss function which can keep predicted boxes away from the ground truths of the unrelated targets. Experiments carried out on the public CrowdHuman dataset, we obtain 90.96% AP which yields the best performance, 5.16% AP gains compared to the FPN-ResNet50 baseline. Compared with the state-of-the-art works, the performance of the pedestrian detection system has been boosted with our method.


Author(s):  
Peikai Yan ◽  
Shaohua Li ◽  
Zhou Zhou ◽  
Qian Liu ◽  
Jiahui Wu ◽  
...  

OBJECTIVE Little is known about the efficacy of using artificial intelligence to identify laryngeal carcinoma from images of vocal lesions taken in different hospitals with multiple laryngoscope systems. This multicenter study was aimed to establish an artificial intelligence system and provide a reliable auxiliary tool to screen for laryngeal carcinoma. Study Design: Multicentre case-control study Setting: Six tertiary care centers Participants: The laryngoscopy images were collected from 2179 patients with vocal lesions. Outcome Measures: An automatic detection system of laryngeal carcinoma was established based on Faster R-CNN, which was used to distinguish vocal malignant and benign lesions in 2179 laryngoscopy images acquired from 6 hospitals with 5 types of laryngoscopy systems. Pathology was the gold standard to identify malignant and benign vocal lesions. Results: Among 89 cases of the malignant group, the classifier was able to evaluate the laryngeal carcinoma in 66 patients (74.16%, sensitivity), while the classifier was able to assess the benign laryngeal lesion in 503 cases among 640 cases of the benign group (78.59%, specificity). Furthermore, the CNN-based classifier achieved an overall accuracy of 78.05% with a 95.63% negative prediction for the testing dataset. Conclusion: This automatic diagnostic system has the potential to assist clinical laryngeal carcinoma diagnosis, which may improve and standardize the diagnostic capacity of endoscopists using different laryngoscopes.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1507
Author(s):  
Feiyu Zhang ◽  
Luyang Zhang ◽  
Hongxiang Chen ◽  
Jiangjian Xie

Deep convolutional neural networks (DCNNs) have achieved breakthrough performance on bird species identification using a spectrogram of bird vocalization. Aiming at the imbalance of the bird vocalization dataset, a single feature identification model (SFIM) with residual blocks and modified, weighted, cross-entropy function was proposed. To further improve the identification accuracy, two multi-channel fusion methods were built with three SFIMs. One of these fused the outputs of the feature extraction parts of three SFIMs (feature fusion mode), the other fused the outputs of the classifiers of three SFIMs (result fusion mode). The SFIMs were trained with three different kinds of spectrograms, which were calculated through short-time Fourier transform, mel-frequency cepstrum transform and chirplet transform, respectively. To overcome the shortage of the huge number of trainable model parameters, transfer learning was used in the multi-channel models. Using our own vocalization dataset as a sample set, it is found that the result fusion mode model outperforms the other proposed models, the best mean average precision (MAP) reaches 0.914. Choosing three durations of spectrograms, 100 ms, 300 ms and 500 ms for comparison, the results reveal that the 300 ms duration is the best for our own dataset. The duration is suggested to be determined based on the duration distribution of bird syllables. As for the performance with the training dataset of BirdCLEF2019, the highest classification mean average precision (cmAP) reached 0.135, which means the proposed model has certain generalization ability.


2021 ◽  
Vol 13 (22) ◽  
pp. 4675
Author(s):  
William Yamada ◽  
Wei Zhao ◽  
Matthew Digman

An automatic method of obtaining geographic coordinates of bales using monovision un-crewed aerial vehicle imagery was developed utilizing a data set of 300 images with a 20-megapixel resolution containing a total of 783 labeled bales of corn stover and soybean stubble. The relative performance of image processing with Otsu’s segmentation, you only look once version three (YOLOv3), and region-based convolutional neural networks was assessed. As a result, the best option in terms of accuracy and speed was determined to be YOLOv3, with 80% precision, 99% recall, 89% F1 score, 97% mean average precision, and a 0.38 s inference time. Next, the impact of using lower-cost cameras was evaluated by reducing image quality to one megapixel. The lower-resolution images resulted in decreased performance, with 79% precision, 97% recall, 88% F1 score, 96% mean average precision, and 0.40 s inference time. Finally, the output of the YOLOv3 trained model, density-based spatial clustering, photogrammetry, and map projection were utilized to predict the geocoordinates of the bales with a root mean squared error of 2.41 m.


2021 ◽  
Author(s):  
Navroop Kaur ◽  
Meenakshi Bansal ◽  
Sukhwinder Singh S

Abstract In modern times the firewall and antivirus packages are not good enough to protect the organization from numerous cyber attacks. Computer IDS (Intrusion Detection System) is a crucial aspect that contributes to the success of an organization. IDS is a software application responsible for scanning organization networks for suspicious activities and policy rupturing. IDS ensures the secure and reliable functioning of the network within an organization. IDS underwent huge transformations since its origin to cope up with the advancing computer crimes. The primary motive of IDS has been to augment the competence of detecting the attacks without endangering the performance of the network. The research paper elaborates on different types and different functions performed by the IDS. The NSL KDD dataset has been considered for training and testing. The seven prominent classifiers LR (Logistic Regression), NB (Naïve Bayes), DT (Decision Tree), AB (AdaBoost), RF (Random Forest), kNN (k Nearest Neighbor), and SVM (Support Vector Machine) have been studied along with their pros and cons and the feature selection have been imposed to enhance the reading of performance evaluation parameters (Accuracy, Precision, Recall, and F1Score). The paper elaborates a detailed flowchart and algorithm depicting the procedure to perform feature selection using XGB (Extreme Gradient Booster) for four categories of attacks: DoS (Denial of Service), Probe, R2L (Remote to Local Attack), and U2R (User to Root Attack). The selected features have been ranked as per their occurrence. The implementation have been conducted at five different ratios of 60-40%, 70-30%, 90-10%, 50-50%, and 80-20%. Different classifiers scored best for different performance evaluation parameters at different ratios. NB scored with the best Accuracy and Recall values. DT and RF consistently performed with high accuracy. NB, SVM, and kNN achieved good F1Score.


2021 ◽  
Author(s):  
Komuravelli Prashanth ◽  
Kalidas Yeturu

<div>There are millions of scanned documents worldwide in around 4 thousand languages. Searching for information in a scanned document requires a text layer to be available and indexed. Preparation of a text layer requires recognition of character and sub-region patterns and associating with a human interpretation. Developing an optical character recognition (OCR) system for each and every language is a very difficult task if not impossible. There is a strong need for systems that add on top of the existing OCR technologies by learning from them and unifying disparate multitude of many a system. In this regard, we propose an algorithm that leverages the fact that we are dealing with scanned documents of handwritten text regions from across diverse domains and language settings. We observe that the text regions have consistent bounding box sizes and any large font or tiny font scenarios can be handled in preprocessing or postprocessing phases. The image subregions are smaller in size in scanned text documents compared to subregions formed by common objects in general purpose images. We propose and validate the hypothesis that a much simpler convolution neural network (CNN) having very few layers and less number of filters can be used for detecting individual subregion classes. For detection of several hundreds of classes, multiple such simpler models can be pooled to operate simultaneously on a document. The advantage of going by pools of subregion specific models is the ability to deal with incremental addition of hundreds of newer classes over time, without disturbing the previous models in the continual learning scenario. Such an approach has distinctive advantage over using a single monolithic model where subregions classes share and interfere via a bulky common neural network. We report here an efficient algorithm for building a subregion specific lightweight CNN models. The training data for the CNN proposed, requires engineering synthetic data points that consider both pattern of interest and non-patterns as well. We propose and validate the hypothesis that an image canvas in which optimal amount of pattern and non-pattern can be formulated using a means squared error loss function to influence filter for training from the data. The CNN hence trained has the capability to identify the character-object in presence of several other objects on a generalized test image of a scanned document. In this setting some of the key observations are in a CNN, learning a filter depends not only on the abundance of patterns of interest but also on the presence of a non-pattern context. Our experiments have led to some of the key observations - (i) a pattern cannot be over-expressed in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a non-pattern can be of salt and pepper type noise and finally (iv) it is sufficient to provide a non-pattern context to a modest representation of a pattern to result in strong individual sub-region class models. We have carried out studies and reported \textit{mean average precision} scores on various data sets including (1) MNIST digits(95.77), (2) E-MNIST capital alphabet(81.26), (3) EMNIST small alphabet(73.32) (4) Kannada digits(95.77), (5) Kannada letters(90.34), (6) Devanagari letters(100) (7) Telugu words(93.20) (8) Devanagari words(93.20) and also on medical prescriptions and observed high-performance metrics of mean average precision over 90%. The algorithm serves as a kernel in the automatic annotation of digital documents in diverse scenarios such as annotation of ancient manuscripts and hand-written health records.</div>


Sign in / Sign up

Export Citation Format

Share Document