feature spaces
Recently Published Documents


TOTAL DOCUMENTS

328
(FIVE YEARS 107)

H-INDEX

26
(FIVE YEARS 4)

2022 ◽  
Vol 40 (1) ◽  
pp. 1-29
Author(s):  
Hanrui Wu ◽  
Qingyao Wu ◽  
Michael K. Ng

Domain adaptation aims at improving the performance of learning tasks in a target domain by leveraging the knowledge extracted from a source domain. To this end, one can perform knowledge transfer between these two domains. However, this problem becomes extremely challenging when the data of these two domains are characterized by different types of features, i.e., the feature spaces of the source and target domains are different, which is referred to as heterogeneous domain adaptation (HDA). To solve this problem, we propose a novel model called Knowledge Preserving and Distribution Alignment (KPDA), which learns an augmented target space by jointly minimizing information loss and maximizing domain distribution alignment. Specifically, we seek to discover a latent space, where the knowledge is preserved by exploiting the Laplacian graph terms and reconstruction regularizations. Moreover, we adopt the Maximum Mean Discrepancy to align the distributions of the source and target domains in the latent space. Mathematically, KPDA is formulated as a minimization problem with orthogonal constraints, which involves two projection variables. Then, we develop an algorithm based on the Gauss–Seidel iteration scheme and split the problem into two subproblems, which are solved by searching algorithms based on the Barzilai–Borwein (BB) stepsize. Promising results demonstrate the effectiveness of the proposed method.


Algorithms ◽  
2021 ◽  
Vol 14 (12) ◽  
pp. 356
Author(s):  
Szabolcs Szekér ◽  
Ágnes Vathy-Fogarassy

An essential criterion for the proper implementation of case-control studies is selecting appropriate case and control groups. In this article, a new simulated annealing-based control group selection method is proposed, which solves the problem of selecting individuals in the control group as a distance optimization task. The proposed algorithm pairs the individuals in the n-dimensional feature space by minimizing the weighted distances between them. The weights of the dimensions are based on the odds ratios calculated from the logistic regression model fitted on the variables describing the probability of membership of the treated group. For finding the optimal pairing of the individuals, simulated annealing is utilized. The effectiveness of the newly proposed Weighted Nearest Neighbours Control Group Selection with Simulated Annealing (WNNSA) algorithm is presented by two Monte Carlo studies. Results show that the WNNSA method can outperform the widely applied greedy propensity score matching method in feature spaces where only a few covariates characterize individuals and the covariates can only take a few values.


2021 ◽  
Author(s):  
Joseph Caffarini ◽  
Klevest Gjini ◽  
Brinda Sevak ◽  
Roger Waleffe ◽  
Mariel Kalkach-Aparicio ◽  
...  

Abstract In this study we designed two deep neural networks to encode 16 feature latent spaces for early seizure detection in intracranial EEG and compared them to 16 widely used engineered metrics: Epileptogenicity Index (EI), Phase Locked High Gamma (PLHG), Time and Frequency Domain Cho Gaines Distance (TDCG, FDCG), relative band powers, and log absolute band powers (from alpha, beta, theta, delta, gamma, and high gamma bands. The deep learning models were pretrained for seizure identification on the time and frequency domains of one second single channel clips of 127 seizures (from 25 different subjects) using “leave-one-out” (LOO) cross validation. Each neural network extracted unique feature spaces that were used to train a Random Forest Classifier (RFC) for seizure identification and latency tasks. The Gini Importance of each feature was calculated from the pretrained RFC, enabling the most significant features (MSFs) for each task to be identified. The MSFs were extracted from the UPenn and Mayo Clinic's Seizure Detection Challenge to train another RFC for the contest. They obtained an AUC score of 0.93, demonstrating a transferable method to identify interpretable biomarkers for seizure detection.


2021 ◽  
pp. 102549
Author(s):  
Deanna Sessions ◽  
Venkatesh Meenakshisundaram ◽  
Andrew Gillman ◽  
Alexander Cook ◽  
Kazuko Fuchi ◽  
...  

2021 ◽  
Vol 11 (21) ◽  
pp. 10464
Author(s):  
Muhammad Asam ◽  
Shaik Javeed Hussain ◽  
Mohammed Mohatram ◽  
Saddam Hussain Khan ◽  
Tauseef Jamal ◽  
...  

Malware is a key component of cyber-crime, and its analysis is the first line of defence against cyber-attack. This study proposes two new malware classification frameworks: Deep Feature Space-based Malware classification (DFS-MC) and Deep Boosted Feature Space-based Malware classification (DBFS-MC). In the proposed DFS-MC framework, deep features are generated from the customized CNN architectures and are fed to a support vector machine (SVM) algorithm for malware classification, while, in the DBFS-MC framework, the discrimination power is enhanced by first combining deep feature spaces of two customized CNN architectures to achieve boosted feature spaces. Further, the detection of exceptional malware is performed by providing the deep boosted feature space to SVM. The performance of the proposed malware classification frameworks is evaluated on the MalImg malware dataset using the hold-out cross-validation technique. Malware variants like Autorun.K, Swizzor.gen!I, Wintrim.BX and Yuner.A is hard to be correctly classified due to their minor inter-class differences in their features. The proposed DBFS-MC improved performance for these difficult to discriminate malware classes using the idea of feature boosting generated through customized CNNs. The proposed classification framework DBFS-MC showed good results in term of accuracy: 98.61%, F-score: 0.96, precision: 0.96, and recall: 0.96 on stringent test data, using 40% unseen data.


Author(s):  
Zheng Jiang ◽  
Si-Rui Xiao ◽  
Rong Liu

Abstract The biological functions of DNA and RNA generally depend on their interactions with other molecules, such as small ligands, proteins and nucleic acids. However, our knowledge of the nucleic acid binding sites for different interaction partners is very limited, and identification of these critical binding regions is not a trivial work. Herein, we performed a comprehensive comparison between binding and nonbinding sites and among different categories of binding sites in these two nucleic acid classes. From the structural perspective, RNA may interact with ligands through forming binding pockets and contact proteins and nucleic acids using protruding surfaces, while DNA may adopt regions closer to the middle of the chain to make contacts with other molecules. Based on structural information, we established a feature-based ensemble learning classifier to identify the binding sites by fully using the interplay among different machine learning algorithms, feature spaces and sample spaces. Meanwhile, we designed a template-based classifier by exploiting structural conservation. The complementarity between the two classifiers motivated us to build an integrative framework for improving prediction performance. Moreover, we utilized a post-processing procedure based on the random walk algorithm to further correct the integrative predictions. Our unified prediction framework yielded promising results for different binding sites and outperformed existing methods.


Drones ◽  
2021 ◽  
Vol 5 (4) ◽  
pp. 104
Author(s):  
Zaide Duran ◽  
Kubra Ozcan ◽  
Muhammed Enes Atik

With the development of photogrammetry technologies, point clouds have found a wide range of use in academic and commercial areas. This situation has made it essential to extract information from point clouds. In particular, artificial intelligence applications have been used to extract information from point clouds to complex structures. Point cloud classification is also one of the leading areas where these applications are used. In this study, the classification of point clouds obtained by aerial photogrammetry and Light Detection and Ranging (LiDAR) technology belonging to the same region is performed by using machine learning. For this purpose, nine popular machine learning methods have been used. Geometric features obtained from point clouds were used for the feature spaces created for classification. Color information is also added to these in the photogrammetric point cloud. According to the LiDAR point cloud results, the highest overall accuracies were obtained as 0.96 with the Multilayer Perceptron (MLP) method. The lowest overall accuracies were obtained as 0.50 with the AdaBoost method. The method with the highest overall accuracy was achieved with the MLP (0.90) method. The lowest overall accuracy method is the GNB method with 0.25 overall accuracy.


Water ◽  
2021 ◽  
Vol 13 (18) ◽  
pp. 2551
Author(s):  
Susana C. Gomes ◽  
Susana Vinga ◽  
Rui Henriques

Monitoring disruptions to water distribution dynamics are essential to detect leakages, signal fraudlent and deviant consumptions, amongst other events of interest. State-of-the-art methods to detect anomalous behavior from flowarate and pressure signal show limited degrees of success as they generally neglect the simultaneously rich spatial and temporal content of signals produced by the multiple sensors placed at different locations of a water distribution network (WDN). This work shows that it is possible to (1) describe the dynamics of a WDN through spatiotemporal correlation analysis of pressure and volumetric flowrate sensors, and (2) analyze disruptions on the expected correlation to detect burst leakage dynamics and additional deviant phenomena. Results gathered from Portuguese WDNs reveal that the proposed shift from raw signal views into correlation-based views offers a simplistic and more robust means to handle the irregularity of consumption patterns and the heterogeneity of leakage profiles (both in terms of burst volume and location). We further show that the disruption caused by leakages can be detected shortly after the burst, highlighting the actionability of the proposed correlation-based principles for anomaly detection in heterogeneous and georeferenced time series. The computational approach is provided as an open-source tool available at GitHub.


2021 ◽  
Vol 118 (39) ◽  
pp. e2021699118
Author(s):  
Jae-Young Son ◽  
Apoorva Bhandari ◽  
Oriel FeldmanHall

In order to navigate a complex web of relationships, an individual must learn and represent the connections between people in a social network. However, the sheer size and complexity of the social world makes it impossible to acquire firsthand knowledge of all relations within a network, suggesting that people must make inferences about unobserved relationships to fill in the gaps. Across three studies (n = 328), we show that people can encode information about social features (e.g., hobbies, clubs) and subsequently deploy this knowledge to infer the existence of unobserved friendships in the network. Using computational models, we test various feature-based mechanisms that could support such inferences. We find that people’s ability to successfully generalize depends on two representational strategies: a simple but inflexible similarity heuristic that leverages homophily, and a complex but flexible cognitive map that encodes the statistical relationships between social features and friendships. Together, our studies reveal that people can build cognitive maps encoding arbitrary patterns of latent relations in many abstract feature spaces, allowing social networks to be represented in a flexible format. Moreover, these findings shed light on open questions across disciplines about how people learn and represent social networks and may have implications for generating more human-like link prediction in machine learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document