Deep Hurdle Networks for Zero-Inflated Multi-Target Regression: Application to Multiple Species Abundance Estimation

A key problem in computational sustainability is to understand the distribution of species across landscapes over time. This question gives rise to challenging large-scale prediction problems since (i) hundreds of species have to be simultaneously modeled and (ii) the survey data are usually inflated with zeros due to the absence of species for a large number of sites. The problem of tackling both issues simultaneously, which we refer to as the zero-inflated multi-target regression problem, has not been addressed by previous methods in statistics and machine learning. In this paper, we propose a novel deep model for the zero-inflated multi-target regression problem. To this end, we first model the joint distribution of multiple response variables as a multivariate probit model and then couple the positive outcomes with a multivariate log-normal distribution. By penalizing the difference between the two distributions’ covariance matrices, a link between both distributions is established. The whole model is cast as an end-to-end learning framework and we provide an efficient learning algorithm for our model that can be fully implemented on GPUs. We show that our model outperforms the existing state-of-the-art baselines on two challenging real-world species distribution datasets concerning bird and fish populations.

Download Full-text

Semantic Image Retrieval with Feature Space Rankings

International Journal of Semantic Computing ◽

10.1142/s1793351x17400074 ◽

2017 ◽

Vol 11 (02) ◽

pp. 171-192 ◽

Cited By ~ 2

Author(s):

Kai Li ◽

Guo-Jun Qi ◽

Jun Ye ◽

Tuoerhongjiang Yusuph ◽

Kien A. Hua

Keyword(s):

Large Scale ◽

Learning Algorithm ◽

Error Function ◽

Rank Correlation ◽

Feature Space ◽

Search Problem ◽

Research Attention ◽

Semantic Image Retrieval ◽

Low Dimensional ◽

Efficient Learning

Learning to hash is receiving increasing research attention due to its effectiveness in addressing the large-scale similarity search problem. Most of the existing hashing algorithms are focused on learning hash functions in the form of numeric quantization of some projected feature space. In this work, we propose a novel hash learning method that encodes features’ relative ordering instead of quantizing their numeric values in a set of low-dimensional ranking subspaces. We formulate the ranking-based hash learning problem as the optimization of a continuous probabilistic error function using softmax approximation and present an efficient learning algorithm to solve the problem. As a generalization of Winner-Take-All (WTA) hashing, the proposed algorithm naturally enjoys the numeric stability benefits of rank correlation measures while being optimized to achieve high precision with very compact code. Additionally, the proposed method can also be easily extended to nonlinear kernel spaces to discover ranking structures that can not be revealed in linear subspaces. We demonstrate through extensive experiments that the proposed method can achive competitive performances as compared to a number of state-of-the-art hashing methods.

Download Full-text

CONVERGENCE ANALYSIS OF CASCADE ERROR PROJECTION - AN EFFICIENT LEARNING ALGORITHM FOR HARDWARE IMPLEMENTATION

International Journal of Neural Systems ◽

10.1142/s0129065700000181 ◽

2000 ◽

Vol 10 (03) ◽

pp. 199-210 ◽

Cited By ~ 5

Author(s):

Tuan A. Duong ◽

Allen R. Stubberud

Keyword(s):

Neural Network ◽

Convergence Analysis ◽

Hardware Implementation ◽

Learning Algorithm ◽

Time Series Prediction ◽

Single Layer ◽

Mathematical Foundation ◽

Cascade Correlation ◽

Efficient Learning ◽

Prediction Problems

In this paper, we present a mathematical foundation, including a convergence analysis, for cascading architecture neural network. Our analysis also shows that the convergence of the cascade architecture neural network is assured because it satisfies Liapunov criteria, in an added hidden unit domain rather than in the time domain. From this analysis, a mathematical foundation for the cascade correlation learning algorithm can be found. Furthermore, it becomes apparent that the cascade correlation scheme is a special case from mathematical analysis in which an efficient hardware learning algorithm called Cascade Error Projection (CEP) is proposed. The CEP provides efficient learning in hardware and it is faster to train, because part of the weights are deterministically obtained, and the learning of the remaining weights from the inputs to the hidden unit is performed as a single-layer perceptron learning with previously determined weights kept frozen. In addition, one can start out with zero weight values (rather than random finite weight values) when the learning of each layer is commenced. Further, unlike cascade correlation algorithm (where a pool of candidate hidden units is added), only a single hidden unit is added at a time. Therefore, the simplicity in hardware implementation is also achieved. Finally, 5- to 8-bit parity and chaotic time series prediction problems are investigated; the simulation results demonstrate that 4-bit or more weight quantization is sufficient for learning neural network using CEP. In addition, it is demonstrated that this technique is able to compensate for less bit weight resolution by incorporating additional hidden units. However, generation result may suffer somewhat with lower bit weight quantization.

Download Full-text

Relationship between total plasma homocysteine and the risk of aneurysms – a meta-analysis

VASA ◽

10.1024/0301-1526/a000891 ◽

2020 ◽

pp. 1-6

Author(s):

Hanji Zhang ◽

Dexin Yin ◽

Yue Zhao ◽

Yezhou Li ◽

Dejiang Yao ◽

...

Keyword(s):

Large Scale ◽

Meta Analysis ◽

Single Species ◽

Total Plasma ◽

Control Groups ◽

Randomized Controlled ◽

Randomized Controlled Studies ◽

The Difference ◽

The Relationship ◽

Healthy Participants

Summary: Our meta-analysis focused on the relationship between homocysteine (Hcy) level and the incidence of aneurysms and looked at the relationship between smoking, hypertension and aneurysms. A systematic literature search of Pubmed, Web of Science, and Embase databases (up to March 31, 2020) resulted in the identification of 19 studies, including 2,629 aneurysm patients and 6,497 healthy participants. Combined analysis of the included studies showed that number of smoking, hypertension and hyperhomocysteinemia (HHcy) in aneurysm patients was higher than that in the control groups, and the total plasma Hcy level in aneurysm patients was also higher. These findings suggest that smoking, hypertension and HHcy may be risk factors for the development and progression of aneurysms. Although the heterogeneity of meta-analysis was significant, it was found that the heterogeneity might come from the difference between race and disease species through subgroup analysis. Large-scale randomized controlled studies of single species and single disease species are needed in the future to supplement the accuracy of the results.

Download Full-text

TO THE PROBLEMS OF COMBATING CORRUPTION IN MODERN RUSSIA AND INTERNATIONAL COOPERATION IN THE FIELD OF ANTI-CORRUPTION

Current problems of jurisprudence ◽

10.29039/02032-6/064-074 ◽

2020 ◽

Author(s):

Angela Dranishnikova

Keyword(s):

Large Scale ◽

The State ◽

Point Of View ◽

Modern Life ◽

Existing Problems ◽

Moral Point ◽

Material Goods ◽

The Russian Federation ◽

The Difference ◽

The Government

In the article, the author reflects the existing problems of the fight against corruption in the Russian Federation. He focuses on the opacity of the work of state bodies, leading to an increase in bribery and corruption. The topic we have chosen is socially exciting in our days, since its significance is growing on a large scale at all levels of the investigated aspect of our modern life. Democratic institutions are being jeopardized, the difference in the position of social strata of society in society’s access to material goods is growing, and the state of society is suffering from the moral point of view, citizens are losing confidence in the government, and in the top officials of the state.

Download Full-text

Large-Scale Analysis of the Spatiotemporal Changes of Net Ecosystem Production in Hindu Kush Himalayan Region

Remote Sensing ◽

10.3390/rs13061180 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1180

Author(s):

Da Guo ◽

Xiaoning Song ◽

Ronghai Hu ◽

Xinming Zhu ◽

Yazhen Jiang ◽

...

Keyword(s):

Large Scale ◽

Net Primary Production ◽

Carbon Sink ◽

Carbon Sources ◽

Spatial Dynamics ◽

Tibet Plateau ◽

Geostatistical Model ◽

Hindu Kush ◽

Temporal And Spatial ◽

The Difference

The Hindu Kush Himalayan (HKH) region is one of the most ecologically vulnerable regions in the world. Several studies have been conducted on the dynamic changes of grassland in the HKH region, but few have considered grassland net ecosystem productivity (NEP). In this study, we quantitatively analyzed the temporal and spatial changes of NEP magnitude and the influence of climate factors on the HKH region from 2001 to 2018. The NEP magnitude was obtained by calculating the difference between the net primary production (NPP) estimated by the Carnegie–Ames Stanford Approach (CASA) model and the heterotrophic respiration (Rh) estimated by the geostatistical model. The results showed that the grassland ecosystem in the HKH region exhibited weak net carbon uptake with NEP values of 42.03 gC∙m−2∙yr−1, and the total net carbon sequestration was 0.077 Pg C. The distribution of NEP gradually increased from west to east, and in the Qinghai–Tibet Plateau, it gradually increased from northwest to southeast. The grassland carbon sources and sinks differed at different altitudes. The grassland was a carbon sink at 3000–5000 m, while grasslands below 3000 m and above 5000 m were carbon sources. Grassland NEP exhibited the strongest correlation with precipitation, and it had a lagging effect on precipitation. The correlation between NEP and the precipitation of the previous year was stronger than that of the current year. NEP was negatively correlated with temperature but not with solar radiation. The study of the temporal and spatial dynamics of NEP in the HKH region can provide a theoretical basis to help herders balance grazing and forage.

Download Full-text

Neural methods for effective, efficient, and exposure-aware information retrieval

ACM SIGIR Forum ◽

10.1145/3476415.3476434 ◽

2021 ◽

Vol 55 (1) ◽

pp. 1-2

Author(s):

Bhaskar Mitra

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Large Scale ◽

Web Search ◽

Real Life ◽

Inverted Index ◽

Information Need ◽

Product Model ◽

Performance Improvements ◽

Deep Model

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.

Download Full-text

Can Mandatory Disclosure Policies Promote Corporate Environmental Responsibility?—Quasi-Natural Experimental Research on China

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18116033 ◽

2021 ◽

Vol 18 (11) ◽

pp. 6033

Author(s):

Yue Liu ◽

Pierre Failler ◽

Liming Chen

Keyword(s):

Property Rights ◽

Large Scale ◽

Environmental Responsibility ◽

Mandatory Disclosure ◽

Corporate Environmental Responsibility ◽

Disclosure Policy ◽

Protection Information ◽

The Difference ◽

Csr Disclosure ◽

Corporate Social

Corporate environmental responsibility (CER) is an important component of the corporate social responsibility (CSR) report, and an important carrier for enterprises to disclose environmental protection information. Based on the corporate micro data, this paper evaluates the effect of a mandatory CSR disclosure policy on the fulfillment of corporate environmental responsibility by adopting the difference-in-differences model (DID) with the release of a mandatory disclosure policy of China in 2008 as a quasi-natural experiment. The study draws the following conclusions: First, a mandatory CSR disclosure policy can promote the fulfillment of CER. Second, after the implementation of a mandatory CSR disclosure policy, enterprises can improve their CER level through two channels: improving the quality of environmental management disclosure and increasing the number of patents. Third, the heterogeneity of the impacts of mandatory CSR disclosure on CER is reflected in three aspects: different CER levels, different corporate scales and a different property rights structure. In terms of the CER level, there is an inverted U-shaped relationship between the CER level and mandatory CSR disclosure effect. In terms of the corporate scale, mandatory disclosure of CSR plays a greater role in large-scale enterprises. In terms of the structure of property rights, mandatory CSR disclosure has a greater effect on non-state-owned enterprises.

Download Full-text

Investigating the properties of a galaxy group at z = 0.6

Proceedings of the International Astronomical Union ◽

10.1017/s1743921320001945 ◽

2020 ◽

Vol 15 (S359) ◽

pp. 188-189

Author(s):

Daniela Hiromi Okido ◽

Cristina Furlanetto ◽

Marina Trevisan ◽

Mônica Tergolina

Keyword(s):

Large Scale ◽

The Other ◽

Numerical Density ◽

Spectroscopy Data ◽

Stellar Kinematics ◽

Galaxy Groups ◽

The Galaxy ◽

The Difference ◽

The Impact ◽

The Universe

AbstractGalaxy groups offer an important perspective on how the large-scale structure of the Universe has formed and evolved, being great laboratories to study the impact of the environment on the evolution of galaxies. We aim to investigate the properties of a galaxy group that is gravitationally lensing HELMS18, a submillimeter galaxy at z = 2.39. We obtained multi-object spectroscopy data using Gemini-GMOS to investigate the stellar kinematics of the central galaxies, determine its members and obtain the mass, radius and the numerical density profile of this group. Our final goal is to build a complete description of this galaxy group. In this work we present an analysis of its two central galaxies: one is an active galaxy with z = 0.59852 ± 0.00007, while the other is a passive galaxy with z = 0.6027 ± 0.0002. Furthermore, the difference between the redshifts obtained using emission and absorption lines indicates an outflow of gas with velocity v = 278.0 ± 34.3 km/s relative to the galaxy.

Download Full-text

Deep learning-based framework for the distinction of membranous nephropathy: a new approach through hyperspectral imagery

BMC Nephrology ◽

10.1186/s12882-021-02421-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Tianqi Tu ◽

Xueling Wei ◽

Yue Yang ◽

Nianrong Zhang ◽

Wei Li ◽

...

Keyword(s):

Deep Learning ◽

Renal Biopsy ◽

Membranous Nephropathy ◽

Learning Algorithm ◽

Hyperspectral Imagery ◽

Chinese Patients ◽

Support Vector ◽

Deep Learning Algorithm ◽

The Difference ◽

Complex Deposition

Abstract Background Common subtypes seen in Chinese patients with membranous nephropathy (MN) include idiopathic membranous nephropathy (IMN) and hepatitis B virus-related membranous nephropathy (HBV-MN). However, the morphologic differences are not visible under the light microscope in certain renal biopsy tissues. Methods We propose here a deep learning-based framework for processing hyperspectral images of renal biopsy tissue to define the difference between IMN and HBV-MN based on the component of their immune complex deposition. Results The proposed framework can achieve an overall accuracy of 95.04% in classification, which also leads to better performance than support vector machine (SVM)-based algorithms. Conclusion IMN and HBV-MN can be correctly separated via the deep learning framework using hyperspectral imagery. Our results suggest the potential of the deep learning algorithm as a new method to aid in the diagnosis of MN.

Download Full-text

Grounding semantic transparency in context

Morphology ◽

10.1007/s11525-021-09382-w ◽

2021 ◽

Author(s):

Rossella Varvara ◽

Gabriella Lapesa ◽

Sebastian Padó

Keyword(s):

Large Scale ◽

Point Of View ◽

Distributional Semantics ◽

Semantic Transparency ◽

Inclusion Measure ◽

The Difference ◽

Semantic Point ◽

The Many ◽

The Relationship

AbstractWe present the results of a large-scale corpus-based comparison of two German event nominalization patterns: deverbal nouns in -ung (e.g., die Evaluierung, ‘the evaluation’) and nominal infinitives (e.g., das Evaluieren, ‘the evaluating’). Among the many available event nominalization patterns for German, we selected these two because they are both highly productive and challenging from the semantic point of view. Both patterns are known to keep a tight relation with the event denoted by the base verb, but with different nuances. Our study targets a better understanding of the differences in their semantic import.The key notion of our comparison is that of semantic transparency, and we propose a usage-based characterization of the relationship between derived nominals and their bases. Using methods from distributional semantics, we bring to bear two concrete measures of transparency which highlight different nuances: the first one, cosine, detects nominalizations which are semantically similar to their bases; the second one, distributional inclusion, detects nominalizations which are used in a subset of the contexts of the base verb. We find that only the inclusion measure helps in characterizing the difference between the two types of nominalizations, in relation with the traditionally considered variable of relative frequency (Hay, 2001). Finally, the distributional analysis allows us to frame our comparison in the broader coordinates of the inflection vs. derivation cline.

Download Full-text