scholarly journals Matrix Completion of World Trade

Author(s):  
Giorgio Gnecco ◽  
Federico Nutarelli ◽  
Massimo Riccaboni

Abstract This work applies Matrix Completion (MC) – a class of machine-learning methods commonly used in the context of recommendation systems – to analyze economic complexity. MC is applied to reconstruct the Revealed Comparative Advantage (RCA) matrix, whose elements express the relative advantage of countries in given classes of products, as evidenced by yearly trade flows. A high-accuracy binary classifier is derived from the MC application, with the aim of discriminating between elements of the RCA matrix that are, respectively, higher/lower than one. We introduce a novel Matrix cOmpletion iNdex of Economic complexitY (MONEY) based on MC, and related to the degree of predictability of the RCA entries of different countries (the lower the predictability, the higher the complexity). Differently from previously-developed economic complexity indices, MONEY takes into account several singular vectors of the matrix reconstructed by MC, whereas other indices are based only on one/two eigenvectors of a suitable symmetric matrix, derived from the RCA matrix. Finally, MC is compared with a state-of-the-art economic complexity index (GENEPY), showing that the false positive rate per country of a binary classifier constructed starting from the average entry-wise output of MC is a proxy of GENEPY.

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Ginette Lafit ◽  
Francis Tuerlinckx ◽  
Inez Myin-Germeys ◽  
Eva Ceulemans

AbstractGaussian Graphical Models (GGMs) are extensively used in many research areas, such as genomics, proteomics, neuroimaging, and psychology, to study the partial correlation structure of a set of variables. This structure is visualized by drawing an undirected network, in which the variables constitute the nodes and the partial correlations the edges. In many applications, it makes sense to impose sparsity (i.e., some of the partial correlations are forced to zero) as sparsity is theoretically meaningful and/or because it improves the predictive accuracy of the fitted model. However, as we will show by means of extensive simulations, state-of-the-art estimation approaches for imposing sparsity on GGMs, such as the Graphical lasso, ℓ1 regularized nodewise regression, and joint sparse regression, fall short because they often yield too many false positives (i.e., partial correlations that are not properly set to zero). In this paper we present a new estimation approach that allows to control the false positive rate better. Our approach consists of two steps: First, we estimate an undirected network using one of the three state-of-the-art estimation approaches. Second, we try to detect the false positives, by flagging the partial correlations that are smaller in absolute value than a given threshold, which is determined through cross-validation; the flagged correlations are set to zero. Applying this new approach to the same simulated data, shows that it indeed performs better. We also illustrate our approach by using it to estimate (1) a gene regulatory network for breast cancer data, (2) a symptom network of patients with a diagnosis within the nonaffective psychotic spectrum and (3) a symptom network of patients with PTSD.


2020 ◽  
Author(s):  
Pui Anantrasirichai ◽  
Juliet Biggs ◽  
Fabien Albino ◽  
David Bull

<p>Satellite interferometric synthetic aperture radar (InSAR) can be used for measuring surface deformation for a variety of applications. Recent satellite missions, such as Sentinel-1, produce a large amount of data, meaning that visual inspection is impractical. Here we use deep learning, which has proved successful at object detection, to overcome this problem. Initially we present the use of convolutional neural networks (CNNs) for detecting rapid deformation events, which we test on a global dataset of over 30,000 wrapped interferograms at 900 volcanoes. We compare two potential training datasets: data augmentation applied to archive examples and synthetic models. Both are able to detect true positive results, but the data augmentation approach has a false positive rate of 0.205% and the synthetic approach has a false positive rate of 0.036%.  Then, I will present an enhanced technique for measuring slow, sustained deformation over a range of scales from volcanic unrest to urban sources of deformation such as coalfields. By rewrapping cumulative time series, the detection performance is improved when the deformation rate is slow, as more fringes are generated without altering the signal to noise ratio. We adapt the method to use persistent scatterer InSAR data, which is sparse in nature,  by using spatial interpolation methods such as modified matrix completion Finally, future perspectives for machine learning applications on InSAR data will be discussed.</p>


2019 ◽  
Vol 20 (2) ◽  
pp. 302 ◽  
Author(s):  
Jingzhong Gan ◽  
Jie Qiu ◽  
Canshang Deng ◽  
Wei Lan ◽  
Qingfeng Chen ◽  
...  

Protein phosphorylation is an important chemical modification catalyzed by kinases. It plays important roles in many cellular processes. Predicting kinase–substrate interactions is vital to understanding the mechanism of many diseases. Many computational methods have been proposed to identify kinase–substrate interactions. However, the prediction accuracy still needs to be improved. Therefore, it is necessary to develop an efficient computational method to predict kinase–substrate interactions. In this paper, we propose a novel computational approach, KSIMC, to identify kinase–substrate interactions based on matrix completion. Firstly, the kinase similarity and substrate similarity are calculated by aligning sequence of kinase–kinase and substrate–substrate, respectively. Then, the original association network is adjusted based on the similarities. Finally, the matrix completion is used to predict potential kinase–substrate interactions. The experiment results show that our method outperforms other state-of-the-art algorithms in performance. Furthermore, the relevant databases and scientific literature verify the effectiveness of our algorithm for new kinase–substrate interaction identification.


2020 ◽  
Vol 2020 (1) ◽  
pp. 235-255 ◽  
Author(s):  
Tobias Pulls ◽  
Rasmus Dahlberg

AbstractWebsite Fingerprinting (WF) attacks are a subset of traffic analysis attacks where a local passive attacker attempts to infer which websites a target victim is visiting over an encrypted tunnel, such as the anonymity network Tor. We introduce the security notion of a Website Oracle (WO) that gives a WF attacker the capability to determine whether a particular monitored website was among the websites visited by Tor clients at the time of a victim’s trace. Our simulations show that combining a WO with a WF attack—which we refer to as a WF+WO attack—significantly reduces false positives for about half of all website visits and for the vast majority of websites visited over Tor. The measured false positive rate is on the order one false positive per million classified website trace for websites around Alexa rank 10,000. Less popular monitored websites show orders of magnitude lower false positive rates.We argue that WOs are inherent to the setting of anonymity networks and should be an assumed capability of attackers when assessing WF attacks and defenses. Sources of WOs are abundant and available to a wide range of realistic attackers, e.g., due to the use of DNS, OCSP, and real-time bidding for online advertisement on the Internet, as well as the abundance of middleboxes and access logs. Access to a WO indicates that the evaluation of WF defenses in the open world should focus on the highest possible recall an attacker can achieve. Our simulations show that augmenting the Deep Fingerprinting WF attack by Sirinam et al. [60] with access to a WO significantly improves the attack against five state-of-the-art WF defenses, rendering some of them largely ineffective in this new WF+WO setting.


2019 ◽  
Vol 2019 (4) ◽  
pp. 292-310 ◽  
Author(s):  
Sanjit Bhat ◽  
David Lu ◽  
Albert Kwon ◽  
Srinivas Devadas

Abstract In recent years, there have been several works that use website fingerprinting techniques to enable a local adversary to determine which website a Tor user visits. While the current state-of-the-art attack, which uses deep learning, outperforms prior art with medium to large amounts of data, it attains marginal to no accuracy improvements when both use small amounts of training data. In this work, we propose Var-CNN, a website fingerprinting attack that leverages deep learning techniques along with novel insights specific to packet sequence classification. In open-world settings with large amounts of data, Var-CNN attains over 1% higher true positive rate (TPR) than state-of-the-art attacks while achieving 4× lower false positive rate (FPR). Var-CNN’s improvements are especially notable in low-data scenarios, where it reduces the FPR of prior art by 3.12% while increasing the TPR by 13%. Overall, insights used to develop Var-CNN can be applied to future deep learning based attacks, and substantially reduce the amount of training data needed to perform a successful website fingerprinting attack. This shortens the time needed for data collection and lowers the likelihood of having data staleness issues.


1995 ◽  
Vol 41 (11) ◽  
pp. 1614-1616 ◽  
Author(s):  
C Moore ◽  
D Lewis ◽  
J Leikin

Abstract To determine the number of false-negative results produced by inefficient extraction of drugs from meconium, three published procedures were compared by using previously confirmed positive and negative meconium specimens. The methods were not equivalent in their ability to extract drugs from the matrix. To determine the number of false positives reported by the use of screen-only (unconfirmed) results, 535 screen-positive meconium specimens were subjects to confirmation by gas chromatography-mass spectrometry. Fifty-seven percent of the samples were confirmed positive for one or more of the drugs under investigation, showing that a false-positive rate as high as 43% may exist when unconfirmed screening results are used.


2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Wendong Wang ◽  
Jianjun Wang

In this paper, we propose a new method to deal with the matrix completion problem. Different from most existing matrix completion methods that only pursue the low rank of underlying matrices, the proposed method simultaneously optimizes their low rank and smoothness such that they mutually help each other and hence yield a better performance. In particular, the proposed method becomes very competitive with the introduction of a modified second-order total variation, even when it is compared with some recently emerged matrix completion methods that also combine the low rank and smoothness priors of matrices together. An efficient algorithm is developed to solve the induced optimization problem. The extensive experiments further confirm the superior performance of the proposed method over many state-of-the-art methods.


2021 ◽  
Author(s):  
Clement Agret ◽  
Bastien Cazaux ◽  
Antoine Limasset

Motivation: To keep up with the scale of genomic databases, several methods rely on local sensitive hashing methods to efficiently find potential matches within large genome collections. Existing solutions rely on Minhash or Hyperloglog fingerprints and require reading the whole index to perform a query. Such solutions can not be considered scalable with the growing amount of documents to index. Results: We present NIQKI, a novel structure using well-designed fingerprints that lead to theoretical and practical query time improvements, outperforming state-of-the-art by orders of magnitude. Our contribution is threefold. First, we generalize the concept of Hyperminhash fingerprints in (h,m)-HMH fingerprints that can be tuned to present the lowest false positive rate given the expected sub-sampling applied. Second, we provide a structure able to index any kind of fingerprints based on inverted indexes that provide optimal queries, namely linear with the size of the output. Third, we implemented these approaches in a tool dubbed NIQKI that can index and calculate pairwise distances for over one million bacterial genomes from GenBank in a matter of days on a small cluster. We show that our approach can be orders of magnitude faster than state-of-the-art with comparable precision. We believe that this approach can lead to tremendous improvement allowing fast query, scaling on extensive genomic databases. Availability and implementation: We wrote the NIQKI index as an open-source C++ library under the AGPL3 license available at https://github.com/Malfoy/ NIQKI. It is designed as a user-friendly tool and comes along with usage sample


2020 ◽  
Author(s):  
Hanyu Li ◽  
Michał Januszewski ◽  
Viren Jain ◽  
Peter H. Li

AbstractRecent advances in 3d electron microscopy are yielding ever larger reconstructions of brain tissue, encompassing thousands of individual neurons interconnected by millions of synapses. Interpreting reconstructions at this scale demands advances in the automated analysis of neuronal morphologies, for example by identifying morphological and functional subcompartments within neurons. We present a method that for the first time uses full 3d input (voxels) to automatically classify reconstructed neuron fragments as axon, dendrite, or somal subcompartments. Based on 3d convolutional neural networks, this method achieves a mean f1-score of 0.972, exceeding the previous state of the art of 0.955. The resulting predictions can support multiple analysis and proofreading applications. In particular, we leverage finely localized subcompartment predictions for automated detection and correction of merge errors in the volume reconstruction, successfully detecting 90.6% of inter-class merge errors with a false positive rate of only 2.7%.


2019 ◽  
Vol 17 (05) ◽  
pp. 1940009 ◽  
Author(s):  
Rick Gelhausen ◽  
Sebastian Will ◽  
Ivo L. Hofacker ◽  
Rolf Backofen ◽  
Martin Raden

Efficient computational tools for the identification of putative target RNAs regulated by prokaryotic sRNAs rely on thermodynamic models of RNA secondary structures. While they typically predict RNA–RNA interaction complexes accurately, they yield many highly-ranked false positives in target screens. One obvious source of this low specificity appears to be the disability of current secondary-structure-based models to reflect steric constraints, which nevertheless govern the kinetic formation of RNA–RNA interactions. For example, often — even thermodynamically favorable — extensions of short initial kissing hairpin interactions are kinetically prohibited, since this would require unwinding of intra-molecular helices as well as sterically impossible bending of the interaction helix. Another source is the consideration of instable and thus unlikely subinteractions that enable better scoring of longer interactions. In consequence, the efficient prediction methods that do not consider such effects show a high false positive rate. To increase the prediction accuracy we devise IntaRNAhelix, a dynamic programming algorithm that length-restricts the runs of consecutive inter-molecular base pairs (perfect canonical stackings), which we hypothesize to implicitly model the steric and kinetic effects. The novel method is implemented by extending the state-of-the-art tool IntaRNA. Our comprehensive bacterial sRNA target prediction benchmark demonstrates significant improvements of the prediction accuracy and enables more than 40-times faster computations. These results indicate — supporting our hypothesis — that stable helix composition increases the accuracy of interaction prediction models compared to the current state-of-the-art approach.


Sign in / Sign up

Export Citation Format

Share Document