private datasets
Recently Published Documents


TOTAL DOCUMENTS

28
(FIVE YEARS 19)

H-INDEX

6
(FIVE YEARS 2)

2022 ◽  
Vol 3 (1) ◽  
pp. 1-15
Author(s):  
Divya Jyothi Gaddipati ◽  
Jayanthi Sivaswamy

Early detection and treatment of glaucoma is of interest as it is a chronic eye disease leading to an irreversible loss of vision. Existing automated systems rely largely on fundus images for assessment of glaucoma due to their fast acquisition and cost-effectiveness. Optical Coherence Tomographic ( OCT ) images provide vital and unambiguous information about nerve fiber loss and optic cup morphology, which are essential for disease assessment. However, the high cost of OCT is a deterrent for deployment in screening at large scale. In this article, we present a novel CAD solution wherein both OCT and fundus modality images are leveraged to learn a model that can perform a mapping of fundus to OCT feature space. We show how this model can be subsequently used to detect glaucoma given an image from only one modality (fundus). The proposed model has been validated extensively on four public andtwo private datasets. It attained an AUC/Sensitivity value of 0.9429/0.9044 on a diverse set of 568 images, which is superior to the figures obtained by a model that is trained only on fundus features. Cross-validation was also done on nearly 1,600 images drawn from a private (OD-centric) and a public (macula-centric) dataset and the proposed model was found to outperform the state-of-the-art method by 8% (public) to 18% (private). Thus, we conclude that fundus to OCT feature space mapping is an attractive option for glaucoma detection.


2022 ◽  
Vol 1 ◽  
Author(s):  
Mickael Tardy ◽  
Diana Mateus

In breast cancer screening, binary classification of mammograms is a common task aiming to determine whether a case is malignant or benign. A Computer-Aided Diagnosis (CADx) system based on a trainable classifier requires clean data and labels coming from a confirmed diagnosis. Unfortunately, such labels are not easy to obtain in clinical practice, since the histopathological reports of biopsy may not be available alongside mammograms, while normal cases may not have an explicit follow-up confirmation. Such ambiguities result either in reducing the number of samples eligible for training or in a label uncertainty that may decrease the performances. In this work, we maximize the number of samples for training relying on multi-task learning. We design a deep-neural-network-based classifier yielding multiple outputs in one forward pass. The predicted classes include binary malignancy, cancer probability estimation, breast density, and image laterality. Since few samples have all classes available and confirmed, we propose to introduce the uncertainty related to the classes as a per-sample weight during training. Such weighting prevents updating the network's parameters when training on uncertain or missing labels. We evaluate our approach on the public INBreast and private datasets, showing statistically significant improvements compared to baseline and independent state-of-the-art approaches. Moreover, we use mammograms from Susan G. Komen Tissue Bank for fine-tuning, further demonstrating the ability to improve the performances in our multi-task learning setup from raw clinical data. We achieved the binary classification performance of AUC = 80.46 on our private dataset and AUC = 85.23 on the INBreast dataset.


Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5848
Author(s):  
Mohamed Chouai ◽  
Petr Dolezel ◽  
Dominik Stursa ◽  
Zdenek Nemec

In the field of computer vision, object detection consists of automatically finding objects in images by giving their positions. The most common fields of application are safety systems (pedestrian detection, identification of behavior) and control systems. Another important application is head/person detection, which is the primary material for road safety, rescue, surveillance, etc. In this study, we developed a new approach based on two parallel Deeplapv3+ to improve the performance of the person detection system. For the implementation of our semantic segmentation model, a working methodology with two types of ground truths extracted from the bounding boxes given by the original ground truths was established. The approach has been implemented in our two private datasets as well as in a public dataset. To show the performance of the proposed system, a comparative analysis was carried out on two deep learning semantic segmentation state-of-art models: SegNet and U-Net. By achieving 99.14% of global accuracy, the result demonstrated that the developed strategy could be an efficient way to build a deep neural network model for semantic segmentation. This strategy can be used, not only for the detection of the human head but also be applied in several semantic segmentation applications.


2021 ◽  
Vol 3 ◽  
Author(s):  
Niccolò Marini ◽  
Sebastian Otálora ◽  
Damian Podareanu ◽  
Mart van Rijthoven ◽  
Jeroen van der Laak ◽  
...  

Algorithms proposed in computational pathology can allow to automatically analyze digitized tissue samples of histopathological images to help diagnosing diseases. Tissue samples are scanned at a high-resolution and usually saved as images with several magnification levels, namely whole slide images (WSIs). Convolutional neural networks (CNNs) represent the state-of-the-art computer vision methods targeting the analysis of histopathology images, aiming for detection, classification and segmentation. However, the development of CNNs that work with multi-scale images such as WSIs is still an open challenge. The image characteristics and the CNN properties impose architecture designs that are not trivial. Therefore, single scale CNN architectures are still often used. This paper presents Multi_Scale_Tools, a library aiming to facilitate exploiting the multi-scale structure of WSIs. Multi_Scale_Tools currently include four components: a pre-processing component, a scale detector, a multi-scale CNN for classification and a multi-scale CNN for segmentation of the images. The pre-processing component includes methods to extract patches at several magnification levels. The scale detector allows to identify the magnification level of images that do not contain this information, such as images from the scientific literature. The multi-scale CNNs are trained combining features and predictions that originate from different magnification levels. The components are developed using private datasets, including colon and breast cancer tissue samples. They are tested on private and public external data sources, such as The Cancer Genome Atlas (TCGA). The results of the library demonstrate its effectiveness and applicability. The scale detector accurately predicts multiple levels of image magnification and generalizes well to independent external data. The multi-scale CNNs outperform the single-magnification CNN for both classification and segmentation tasks. The code is developed in Python and it will be made publicly available upon publication. It aims to be easy to use and easy to be improved with additional functions.


Author(s):  
Cuong Tran ◽  
Ferdinando Fioretto ◽  
Pascal Van Hentenryck ◽  
Zhiyan Yao

Many agencies release datasets and statistics about groups of individuals that are used as input to a number of critical decision processes. To conform with privacy and confidentiality requirements, these agencies are often required to release privacy-preserving versions of the data. This paper studies the release of differentially private datasets and analyzes their impact on some critical resource allocation tasks under a fairness perspective. The paper shows that, when the decisions take as input differentially private data, the noise added to achieve privacy disproportionately impacts some groups over others. The paper analyzes the reasons for these disproportionate impacts and proposes guidelines to mitigate these effects. The proposed approaches are evaluated on critical decision problems that use differentially private census data.


Author(s):  
Dongju Yang ◽  
Xiaojian Wang ◽  
Hanshuo Zhang

The key to the in-depth management of science and technology is to model the behavior characteristics of scientific and technological personnel and then find groups by analyzing the diverse associations among them. Aiming at the analysis of team relationship among scientific and technological personnel, this paper proposed a method to recognize the group of scientific and technological personnel based on relational graph. The relationship model of scientific and technological personnel was designed, and based on this, the entity and relationship recognition and extraction are performed on the structured and unstructured source data to construct a relational graph. An improved frequent item mining algorithm based on Hadoop was proposed, which enabled getting the group of scientific and technological personnel by mining and analyzing the data in the relational graph. In this paper, the proposed method was experimented on both open and private datasets, and compared with several classical algorithms. The results showed that the method proposed in this paper has a significant improvement in execution efficiency.


2021 ◽  
Author(s):  
Vincent Alcazer ◽  
Pierre Sujobert

Mutation detection by next generation sequencing (NGS) is routinely used for cancer diagnosis. Selecting an optimal set of genes for a given cancer is not trivial as it has to optimize informativity (i.e. the number of patients with at least one mutation in the panel), while minimizing panel length in order to reduce sequencing costs and increase sensitivity. We propose herein Panel Informativity Optimizer (PIO), an open-source software developed as an R package with a user-friendly graphical interface to help optimize cancer NGS panel informativity. Using patient-level mutational data from either private datasets or preloaded dataset of 91 independent cohort from 31 different cancer type, PIO selects an optimal set of genomic intervals to maximize informativity and panel size in a given cancer type. Different options are offered such as the definition of genomic intervals at the gene or exon level, and the use of optimization strategy at the patient or patient per kilobase level. PIO can also propose an optimal set of genomic intervals to increase informativity of custom panels. A panel tester function is also available for panel benchmarking. Using public databases, as well as data from real-life settings, we demonstrate that PIO allows panel size reduction of up to 1000kb, and accurately predicts the performance of custom or commercial panels. PIO is available online at https://vincentalcazer.shinyapps.io/Panel_informativity_optimizer/ or can be set on a locale machine from https://github.com/VincentAlcazer/PIO.


Author(s):  
T. Shiva Rama Krishna

Malicious software or malware continues to pose a major security concern in this digital age as computer users, corporations, and governments witness an exponential growth in malware attacks. Current malware detection solutions adopt Static and Dynamic analysis of malware signatures and behaviour patterns that are time consuming and ineffective in identifying unknown malwares. Recent malwares use polymorphic, metamorphic and other evasive techniques to change the malware behaviour’s quickly and to generate large number of malwares. Since new malwares are predominantly variants of existing malwares, machine learning algorithms are being employed recently to conduct an effective malware analysis. This requires extensive feature engineering, feature learning and feature representation. By using the advanced MLAs such as deep learning, the feature engineering phase can be completely avoided. Though some recent research studies exist in this direction, the performance of the algorithms is biased with the training data. There is a need to mitigate bias and evaluate these methods independently in order to arrive at new enhanced methods for effective zero-day malware detection. To fill the gap in literature, this work evaluates classical MLAs and deep learning architectures for malware detection, classification and categorization with both public and private datasets. The train and test splits of public and private datasets used in the experimental analysis are disjoint to each other’s and collected in different timescales. In addition, we propose a novel image processing technique with optimal parameters for MLAs and deep learning architectures. A comprehensive experimental evaluation of these methods indicate that deep learning architectures outperform classical MLAs. Overall, this work proposes an effective visual detection of malware using a scalable and hybrid deep learning framework for real-time deployments. The visualization and deep learning architectures for static, dynamic and image processing-based hybrid approach in a big data environment is a new enhanced method for effective zero-day malware detection.


2021 ◽  
Vol 13 (11) ◽  
pp. 2207
Author(s):  
Fengcheng Ji ◽  
Dongping Ming ◽  
Beichen Zeng ◽  
Jiawei Yu ◽  
Yuanzhao Qing ◽  
...  

Aircraft is a means of transportation and weaponry, which is crucial for civil and military fields to detect from remote sensing images. However, detecting aircraft effectively is still a problem due to the diversity of the pose, size, and position of the aircraft and the variety of objects in the image. At present, the target detection methods based on convolutional neural networks (CNNs) lack the sufficient extraction of remote sensing image information and the post-processing of detection results, which results in a high missed detection rate and false alarm rate when facing complex and dense targets. Aiming at the above questions, we proposed a target detection model based on Faster R-CNN, which combines multi-angle features driven and majority voting strategy. Specifically, we designed a multi-angle transformation module to transform the input image to realize the multi-angle feature extraction of the targets in the image. In addition, we added a majority voting mechanism at the end of the model to deal with the results of the multi-angle feature extraction. The average precision (AP) of this method reaches 94.82% and 95.25% on the public and private datasets, respectively, which are 6.81% and 8.98% higher than that of the Faster R-CNN. The experimental results show that the method can detect aircraft effectively, obtaining better performance than mature target detection networks.


2021 ◽  
Vol 51 (2) ◽  
pp. 1-17
Author(s):  
Nicoleta González Cancelas ◽  
Beatriz Molina Serrano ◽  
Francisco Soler Flores

Abstract The Spanish Port System is immersed in the process of digital transformation towards the concept of Ports 4.0. This entails new regulatory and connectivity requirements, making it necessary to implement the new technologies offered by the market towards digitalization. The digitalization of the individual processes in a first step helps the exchange of digital information between the members of the port community. The next step will mean that the information flow between the participants of a port community is done in a reliable, efficient, paperless way, and thanks to technologies. However, for the Spanish port sector, data exchange has a competitive disadvantage. That is why Federated Learning is proposed. This approach allows several organizations in the port sector to collaborate in the development of models, but without the need to directly share sensitive port data among themselves. Instead of gathering data on a single server, the data remains locked on your server, and the algorithms and predictive models travel between them. The goal of this approach is to benefit from a large set of data, which contributes to increased Machine Learning performance while respecting data ownership and privacy. Through an Inter-institution or “Cross-silo FL” model, different institutions contribute to the training with their local datasets in which different companies collaborate in training a learning machine for the discovery of patterns in private datasets of high sensitivity and high content. This environment is characterized by a smaller number of participants than the mobile case, with typically better bandwidth and less intermittency.


Sign in / Sign up

Export Citation Format

Share Document