Analysis and Text Classification of Privacy Policies From Rogue and Top-100 Fortune Global Companies

2019 ◽  
Vol 13 (2) ◽  
pp. 47-66
Author(s):  
Martin Boldt ◽  
Kaavya Rekanar

In the present article, the authors investigate to what extent supervised binary classification can be used to distinguish between legitimate and rogue privacy policies posted on web pages. 15 classification algorithms are evaluated using a data set that consists of 100 privacy policies from legitimate websites (belonging to companies that top the Fortune Global 500 list) as well as 67 policies from rogue websites. A manual analysis of all policy content was performed and clear statistical differences in terms of both length and adherence to seven general privacy principles are found. Privacy policies from legitimate companies have a 98% adherence to the seven privacy principles, which is significantly higher than the 45% associated with rogue companies. Out of the 15 evaluated classification algorithms, Naïve Bayes Multinomial is the most suitable candidate to solve the problem at hand. Its models show the best performance, with an AUC measure of 0.90 (0.08), which outperforms most of the other candidates in the statistical tests used.

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Rajit Nair ◽  
Santosh Vishwakarma ◽  
Mukesh Soni ◽  
Tejas Patel ◽  
Shubham Joshi

Purpose The latest 2019 coronavirus (COVID-2019), which first appeared in December 2019 in Wuhan's city in China, rapidly spread around the world and became a pandemic. It has had a devastating impact on daily lives, the public's health and the global economy. The positive cases must be identified as soon as possible to avoid further dissemination of this disease and swift care of patients affected. The need for supportive diagnostic instruments increased, as no specific automated toolkits are available. The latest results from radiology imaging techniques indicate that these photos provide valuable details on the virus COVID-19. User advanced artificial intelligence (AI) technologies and radiological imagery can help diagnose this condition accurately and help resolve the lack of specialist doctors in isolated areas. In this research, a new paradigm for automatic detection of COVID-19 with bare chest X-ray images is displayed. Images are presented. The proposed model DarkCovidNet is designed to provide correct binary classification diagnostics (COVID vs no detection) and multi-class (COVID vs no results vs pneumonia) classification. The implemented model computed the average precision for the binary and multi-class classification of 98.46% and 91.352%, respectively, and an average accuracy of 98.97% and 87.868%. The DarkNet model was used in this research as a classifier for a real-time object detection method only once. A total of 17 convolutionary layers and different filters on each layer have been implemented. This platform can be used by the radiologists to verify their initial application screening and can also be used for screening patients through the cloud. Design/methodology/approach This study also uses the CNN-based model named Darknet-19 model, and this model will act as a platform for the real-time object detection system. The architecture of this system is designed in such a way that they can be able to detect real-time objects. This study has developed the DarkCovidNet model based on Darknet architecture with few layers and filters. So before discussing the DarkCovidNet model, look at the concept of Darknet architecture with their functionality. Typically, the DarkNet architecture consists of 5 pool layers though the max pool and 19 convolution layers. Assume as a convolution layer, and as a pooling layer. Findings The work discussed in this paper is used to diagnose the various radiology images and to develop a model that can accurately predict or classify the disease. The data set used in this work is the images bases on COVID-19 and non-COVID-19 taken from the various sources. The deep learning model named DarkCovidNet is applied to the data set, and these have shown signification performance in the case of binary classification and multi-class classification. During the multi-class classification, the model has shown an average accuracy 98.97% for the detection of COVID-19, whereas in a multi-class classification model has achieved an average accuracy of 87.868% during the classification of COVID-19, no detection and Pneumonia. Research limitations/implications One of the significant limitations of this work is that a limited number of chest X-ray images were used. It is observed that patients related to COVID-19 are increasing rapidly. In the future, the model on the larger data set which can be generated from the local hospitals will be implemented, and how the model is performing on the same will be checked. Originality/value Deep learning technology has made significant changes in the field of AI by generating good results, especially in pattern recognition. A conventional CNN structure includes a convolution layer that extracts characteristics from the input using the filters it applies, a pooling layer that reduces calculation efficiency and the neural network's completely connected layer. A CNN model is created by integrating one or more of these layers, and its internal parameters are modified to accomplish a specific mission, such as classification or object recognition. A typical CNN structure has a convolution layer that extracts features from the input with the filters it applies, a pooling layer to reduce the size for computational performance and a fully connected layer, which is a neural network. A CNN model is created by combining one or more such layers, and its internal parameters are adjusted to accomplish a particular task, such as classification or object recognition.


Author(s):  
Alexander M. Zolotarev ◽  
Brian J. Hansen ◽  
Ekaterina A. Ivanova ◽  
Katelynn M. Helfrich ◽  
Ning Li ◽  
...  

Background: Atrial fibrillation (AF) can be maintained by localized intramural reentrant drivers. However, AF driver detection by clinical surface-only multielectrode mapping (MEM) has relied on subjective interpretation of activation maps. We hypothesized that application of machine learning to electrogram frequency spectra may accurately automate driver detection by MEM and add some objectivity to the interpretation of MEM findings. Methods: Temporally and spatially stable single AF drivers were mapped simultaneously in explanted human atria (n=11) by subsurface near-infrared optical mapping (NIOM; 0.3 mm 2 resolution) and 64-electrode MEM (higher density or lower density with 3 and 9 mm 2 resolution, respectively). Unipolar MEM and NIOM recordings were processed by Fourier transform analysis into 28 407 total Fourier spectra. Thirty-five features for machine learning were extracted from each Fourier spectrum. Results: Targeted driver ablation and NIOM activation maps efficiently defined the center and periphery of AF driver preferential tracks and provided validated annotations for driver versus nondriver electrodes in MEM arrays. Compared with analysis of single electrogram frequency features, averaging the features from each of the 8 neighboring electrodes, significantly improved classification of AF driver electrograms. The classification metrics increased when less strict annotation, including driver periphery electrodes, were added to driver center annotation. Notably, f1-score for the binary classification of higher-density catheter data set was significantly higher than that of lower-density catheter (0.81±0.02 versus 0.66±0.04, P <0.05). The trained algorithm correctly highlighted 86% of driver regions with higher density but only 80% with lower-density MEM arrays (81% for lower-density+higher-density arrays together). Conclusions: The machine learning model pretrained on Fourier spectrum features allows efficient classification of electrograms recordings as AF driver or nondriver compared with the NIOM gold-standard. Future application of NIOM-validated machine learning approach may improve the accuracy of AF driver detection for targeted ablation treatment in patients.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Patrick Beyersdorffer ◽  
Wolfgang Kunert ◽  
Kai Jansen ◽  
Johanna Miller ◽  
Peter Wilhelm ◽  
...  

Abstract Uncontrolled movements of laparoscopic instruments can lead to inadvertent injury of adjacent structures. The risk becomes evident when the dissecting instrument is located outside the field of view of the laparoscopic camera. Technical solutions to ensure patient safety are appreciated. The present work evaluated the feasibility of an automated binary classification of laparoscopic image data using Convolutional Neural Networks (CNN) to determine whether the dissecting instrument is located within the laparoscopic image section. A unique record of images was generated from six laparoscopic cholecystectomies in a surgical training environment to configure and train the CNN. By using a temporary version of the neural network, the annotation of the training image files could be automated and accelerated. A combination of oversampling and selective data augmentation was used to enlarge the fully labeled image data set and prevent loss of accuracy due to imbalanced class volumes. Subsequently the same approach was applied to the comprehensive, fully annotated Cholec80 database. The described process led to the generation of extensive and balanced training image data sets. The performance of the CNN-based binary classifiers was evaluated on separate test records from both databases. On our recorded data, an accuracy of 0.88 with regard to the safety-relevant classification was achieved. The subsequent evaluation on the Cholec80 data set yielded an accuracy of 0.84. The presented results demonstrate the feasibility of a binary classification of laparoscopic image data for the detection of adverse events in a surgical training environment using a specifically configured CNN architecture.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0253764
Author(s):  
Qingfang He ◽  
Guang Cheng ◽  
Huimin Ju

Breast cancer is the cancer with the highest incidence of malignant tumors in women, which seriously endangers women’s health. With the help of computer vision technology, it has important application value to automatically classify pathological tissue images to assist doctors in rapid and accurate diagnosis. Breast pathological tissue images have complex and diverse characteristics, and the medical data set of breast pathological tissue images is small, which makes it difficult to automatically classify breast pathological tissues. In recent years, most of the researches have focused on the simple binary classification of benign and malignant, which cannot meet the actual needs for classification of pathological tissues. Therefore, based on deep convolutional neural network, model ensembleing, transfer learning, feature fusion technology, this paper designs an eight-class classification breast pathology diagnosis model BCDnet. A user inputs the patient’s breast pathological tissue image, and the model can automatically determine what the disease is (Adenosis, Fibroadenoma, Tubular Adenoma, Phyllodes Tumor, Ductal Carcinoma, Lobular Carcinoma, Mucinous Carcinoma or Papillary Carcinoma). The model uses the VGG16 convolution base and Resnet50 convolution base as the parallel convolution base of the model. Two convolutional bases (VGG16 convolutional base and Resnet50 convolutional base) obtain breast tissue image features from different fields of view. After the information output by the fully connected layer of the two convolutional bases is fused, it is classified and output by the SoftMax function. The model experiment uses the publicly available BreaKHis data set. The number of samples of each class in the data set is extremely unevenly distributed. Compared with the binary classification, the number of samples in each class of the eight-class classification is also smaller. Therefore, the image segmentation method is used to expand the data set and the non-repeated random cropping method is used to balance the data set. Based on the balanced data set and the unbalanced data set, the BCDnet model, the pre-trained model Resnet50+ fine-tuning, and the pre-trained model VGG16+ fine-tuning are used for multiple comparison experiments. In the comparison experiment, the BCDnet model performed outstandingly, and the correct recognition rate of the eight-class classification model is higher than 98%. The results show that the model proposed in this paper and the method of improving the data set are reasonable and effective.


2018 ◽  
Vol 1 (1) ◽  
pp. 6 ◽  
Author(s):  
Lubna Farhi ◽  
Razia Zia ◽  
Zain Anwar Ali

Brain cancer has remained one of the key causes ofdeaths in people of all ages. One way to survival amongst patientsis to correctly diagnose cancer in its early stages. Recentlymachine learning has become a very important tool in medicalimage classification. Our approach is to examine and comparevarious machine learning classification algorithms that help inbrain tumor classification of Magnetic Resonance (MR) images.We have compared Artificial Neural Network (ANN), K-nearestNeighbor (KNN), Decision Tree (DT), Support Vector Machine(SVM) and Naïve Bayes (NB) classifiers to determine theaccuracy of each classifier and find the best amongst them forclassification of cancerous and noncancerous brain MR images.We have used 86 MR images and extracted a large number offeatures for each image. Since the equal number of images, havebeen used thus there is no suspicion of results being biased. Forour data set the most accurate results were provided by ANN. Itwas found that ANN provides better results for medium to largedatabase of Brain MR Images.


Author(s):  
Ruslan Babudzhan ◽  
Konstantyn Isaienkov ◽  
Oleksii Vodka ◽  
Danilo Krasiy ◽  
Ivan Zadorozhny ◽  
...  

The work describes rolling bearings operation data processing, and their use in the problem of constructing a mathematical model of the binary classification of the operating state of bearings by the method of a convolutional neural network with varying factors of dilatation of the kernel of convolutional layers. To classify bearings with defects, we used vibration acceleration data from our own test bench and a publicly available data set. The work also investigated a method for generalizing the classification of bearing signals obtained as a result of fundamentally different experiments and having different standard sizes. To unify signals, the following processing method is proposed: select data areas with displacement, go to the frequency space using fast Fourier transform, cut off frequencies exceeding 10 times the shaft rotation frequency, restore the signal while maintaining 10 shaft rotation periods, scale the received signal by dividing it by its diameter orbits of the rolling body and interpolate the signal at 2048 points. This algorithm also allows to generate a balanced sample for building a mathematical model. This feature is provided by varying the step of splitting the initial signal. The advantage of this algorithm over the classical methods of oversampling or undersampling is the generation of new objects that specify the statistical parameters of the general population. The signal processing algorithm was used both for binary classification problems within one dataset, and for training on one and testing on another. To increase the data set for training and testing the mathematical model, the bootstrapping method is used, based on multiple generation of samples using the Monte Carlo method. The quality of the mathematical model of binary classification was assessed by the proportion of correct answers. The problem is formulated as the problem of minimizing binary cross entropy. The results obtained are presented in the form of graphs demonstrating the neural network training process and graphs of the distribution density of metrics.


2020 ◽  
Vol 36 (3) ◽  
pp. 807-821
Author(s):  
Heidi Kühnemann ◽  
Arnout van Delden ◽  
Dick Windmeijer

Classification of enterprises by main economic activity according to NACE codes is a challenging but important task for national statistical institutes. Since manual editing is time-consuming, we investigated the automatic prediction from dedicated website texts using a knowledge-based approach. To that end, concept features were derived from a set of domain-specific keywords. Furthermore, we compared flat classification to a specific two-level hierarchy which was based on an approach used by manual editors. We limited ourselves to Naïve Bayes and Support Vector Machines models and only used texts from the main web pages. As a first step, we trained a filter model that classifies whether websites contain information about economic activity. The resulting filtered data set was subsequently used to predict 111 NACE classes. We found that using concept features did not improve the model performance compared to a model with character n-grams, i.e. non-informative features. Neither did the two-level hierarchy improve the performance relative to a flat classification. Nonetheless, prediction of the best three NACE classes clearly improved the overall prediction performance compared to a top-one prediction. We conclude that more effort is needed in order to achieve good results with a knowledge-based approach and discuss ideas for improvement.


Author(s):  
Shankar Shambhu ◽  
Deepika Koundal ◽  
Prasenjit Das ◽  
Chetan Sharma

COVID-19 pandemic has hit the world with such a force that the world's leading economies are finding it challenging to come out of it. Countries with the best medical facilities are even cannot handle the increasing number of cases and fatalities. This disease causes significant damage to the lungs and respiratory system of humans, leading to their death. Computed tomography (CT) images of the respiratory system are analyzed in the proposed work to classify the infected people with non-infected people. Deep learning binary classification algorithms have been applied, which have shown an accuracy of 86.9% on 746 CT images of chest having COVID-19 related symptoms.


2021 ◽  
Vol 35 (4) ◽  
pp. 341-347
Author(s):  
Aparna Gullapelly ◽  
Barnali Gupta Banik

Classifying moving objects in video surveillance can be difficult, and it is challenging to classify hard and soft objects with high Accuracy. Here rigid and non-rigid objects are limited to vehicles and people. CNN is used for the binary classification of rigid and non-rigid objects. A deep-learning system using convolutional neural networks was trained using python and categorized according to their appearance. The classification is supplemented by the use of a data set, which contains two classes of images that are both rigid and not rigid that differ by illuminations.


Sign in / Sign up

Export Citation Format

Share Document