Controlling Hidden Layer Capacity Through Lateral Connections

1997 ◽  
Vol 9 (6) ◽  
pp. 1381-1402 ◽  
Author(s):  
Kwabena Agyepong ◽  
Ravi Kothari

We investigate the effects of including selected lateral interconnections in a feedforward neural network. In a network with one hidden layer consisting of m hidden neurons labeled 1,2… m, hidden neuron j is connected fully to the inputs, the outputs, and hidden neuron j + 1. As a consequence of the lateral connections, each hidden neuron receives two error signals: one from the output layer and one through the lateral interconnection. We show that the use of these lateral interconnections among the hidden-layer neurons facilitates controlled assignment of role and specialization of the hidden-layer neurons. In particular, we show that as training progresses, hidden neurons become progressively specialized—starting from the fringes (i.e., lower and higher numbered hidden neurons, e.g., 1, 2, m — 1 m) and leaving the neurons in the center of the hidden layer (i.e., hidden-layer neurons numbered close to m/2) unspecialized or functionally identical. Consequently, the network behaves like network growing algorithms without the explicit need to add hidden units, and like soft weight sharing due to functionally identical neurons in the center of the hidden layer. Experimental results from one classification and one function approximation problems are presented to illustrate selective specialization of the hidden-layer neurons. In addition, the improved generalization that results from a decrease in the effective number of free parameters is illustrated through a simple function approximation example and with a real-world data set. Besides the reduction in the number of free parameters, the localization of weight sharing may also allow for a method that allows procedural determination for the number of hidden-layer neurons required for a given learning task.

2011 ◽  
Vol 21 (03) ◽  
pp. 247-263 ◽  
Author(s):  
J. P. FLORIDO ◽  
H. POMARES ◽  
I. ROJAS

In function approximation problems, one of the most common ways to evaluate a learning algorithm consists in partitioning the original data set (input/output data) into two sets: learning, used for building models, and test, applied for genuine out-of-sample evaluation. When the partition into learning and test sets does not take into account the variability and geometry of the original data, it might lead to non-balanced and unrepresentative learning and test sets and, thus, to wrong conclusions in the accuracy of the learning algorithm. How the partitioning is made is therefore a key issue and becomes more important when the data set is small due to the need of reducing the pessimistic effects caused by the removal of instances from the original data set. Thus, in this work, we propose a deterministic data mining approach for a distribution of a data set (input/output data) into two representative and balanced sets of roughly equal size taking the variability of the data set into consideration with the purpose of allowing both a fair evaluation of learning's accuracy and to make reproducible machine learning experiments usually based on random distributions. The sets are generated using a combination of a clustering procedure, especially suited for function approximation problems, and a distribution algorithm which distributes the data set into two sets within each cluster based on a nearest-neighbor approach. In the experiments section, the performance of the proposed methodology is reported in a variety of situations through an ANOVA-based statistical study of the results.


2019 ◽  
Vol 10 (37) ◽  
pp. 31-44
Author(s):  
Engin Kandıran ◽  
Avadis Hacınlıyan

Artificial neural networks are commonly accepted as a very successful tool for global function approximation. Because of this reason, they are considered as a good approach to forecasting chaotic time series in many studies. For a given time series, the Lyapunov exponent is a good parameter to characterize the series as chaotic or not. In this study, we use three different neural network architectures to test capabilities of the neural network in forecasting time series generated from different dynamical systems. In addition to forecasting time series, using the feedforward neural network with single hidden layer, Lyapunov exponents of the studied systems are forecasted.


Multilayer Perceptron Neural Network (MLPNNs) constructs of input, at least one hidden and output layer. Number of the neurons in the hidden layer affects the NNs performance. It also consider difficult task to overcome. This research, aims to exanimate the performance of seven heuristic methods that have been used to estimate the neurons numbers in the hidden layer. The effectiveness of these methods was verified using a six of benchmark datasets. The number of hidden layer neurons that selected by each heuristic method for every data set was used to train the MLP. The results demonstrate that the number of hidden neurons selected by each method provides different accuracy and stability compared with other methods. The number of neurons selected by Hush method for ine data set was 26 neurons. It’s achieved the best accuracy with 99.90%and lowest accuracy achieved by Sheela method with 67.51% using 4 neurons. Using 22 neurons with 97.97% accuracy Ke, J method received the best result for Ionosphere data set. While the lowest accuracy was 96.95% with 5 neurons achieved by Kayama method.For Iris data set with 8 neurons achieved 97.19 as best accuracy achieved by Hush method. For the same data set the lowest results were 92.33 % using 3 neurons obtained by using Kayama method. For WBC data set 96.40% the best accuracy achieved using Sheela and Kaastra methods using 4 and 7neurons, while Kanellopoulos method achieved the lowest accuracy 94.18% with 7neurons. For Glass dataset, 87.15% was the best obtained accuracy using 18 neurons Hush method and using Wang method 82.27 % with 6 neurons was the lowest accuracy. Finally for PID 75.31% accuracy achieved by Kayama method with 3 neurons, where Kanellopoulos method obtained 72.17% through using 24 neurons.


1997 ◽  
Vol 9 (1) ◽  
pp. 205-225 ◽  
Author(s):  
Rudy Setiono

An algorithm for extracting rules from a standard three-layer feedforward neural network is proposed. The trained network is first pruned not only to remove redundant connections in the network but, more important, to detect the relevant inputs. The algorithm generates rules from the pruned network by considering only a small number of activation values at the hidden units. If the number of inputs connected to a hidden unit is sufficiently small, then rules that describe how each of its activation values is obtained can be readily generated. Otherwise the hidden unit will be split and treated as output units, with each output unit corresponding to an activation value. A hidden layer is inserted and a new subnetwork is formed, trained, and pruned. This process is repeated until every hidden unit in the network has a relatively small number of input units connected to it. Examples on how the proposed algorithm works are shown using real-world data arising from molecular biology and signal processing. Our results show that for these complex problems, the algorithm can extract reasonably compact rule sets that have high predictive accuracy rates.


2021 ◽  
pp. 1-13
Author(s):  
Hailin Liu ◽  
Fangqing Gu ◽  
Zixian Lin

Transfer learning methods exploit similarities between different datasets to improve the performance of the target task by transferring knowledge from source tasks to the target task. “What to transfer” is a main research issue in transfer learning. The existing transfer learning method generally needs to acquire the shared parameters by integrating human knowledge. However, in many real applications, an understanding of which parameters can be shared is unknown beforehand. Transfer learning model is essentially a special multi-objective optimization problem. Consequently, this paper proposes a novel auto-sharing parameter technique for transfer learning based on multi-objective optimization and solves the optimization problem by using a multi-swarm particle swarm optimizer. Each task objective is simultaneously optimized by a sub-swarm. The current best particle from the sub-swarm of the target task is used to guide the search of particles of the source tasks and vice versa. The target task and source task are jointly solved by sharing the information of the best particle, which works as an inductive bias. Experiments are carried out to evaluate the proposed algorithm on several synthetic data sets and two real-world data sets of a school data set and a landmine data set, which show that the proposed algorithm is effective.


Author(s):  
Serkan Kiranyaz ◽  
Junaid Malik ◽  
Habib Ben Abdallah ◽  
Turker Ince ◽  
Alexandros Iosifidis ◽  
...  

AbstractThe recently proposed network model, Operational Neural Networks (ONNs), can generalize the conventional Convolutional Neural Networks (CNNs) that are homogenous only with a linear neuron model. As a heterogenous network model, ONNs are based on a generalized neuron model that can encapsulate any set of non-linear operators to boost diversity and to learn highly complex and multi-modal functions or spaces with minimal network complexity and training data. However, the default search method to find optimal operators in ONNs, the so-called Greedy Iterative Search (GIS) method, usually takes several training sessions to find a single operator set per layer. This is not only computationally demanding, also the network heterogeneity is limited since the same set of operators will then be used for all neurons in each layer. To address this deficiency and exploit a superior level of heterogeneity, in this study the focus is drawn on searching the best-possible operator set(s) for the hidden neurons of the network based on the “Synaptic Plasticity” paradigm that poses the essential learning theory in biological neurons. During training, each operator set in the library can be evaluated by their synaptic plasticity level, ranked from the worst to the best, and an “elite” ONN can then be configured using the top-ranked operator sets found at each hidden layer. Experimental results over highly challenging problems demonstrate that the elite ONNs even with few neurons and layers can achieve a superior learning performance than GIS-based ONNs and as a result, the performance gap over the CNNs further widens.


2015 ◽  
Vol 17 (5) ◽  
pp. 719-732
Author(s):  
Dulakshi Santhusitha Kumari Karunasingha ◽  
Shie-Yui Liong

A simple clustering method is proposed for extracting representative subsets from lengthy data sets. The main purpose of the extracted subset of data is to use it to build prediction models (of the form of approximating functional relationships) instead of using the entire large data set. Such smaller subsets of data are often required in exploratory analysis stages of studies that involve resource consuming investigations. A few recent studies have used a subtractive clustering method (SCM) for such data extraction, in the absence of clustering methods for function approximation. SCM, however, requires several parameters to be specified. This study proposes a clustering method, which requires only a single parameter to be specified, yet it is shown to be as effective as the SCM. A method to find suitable values for the parameter is also proposed. Due to having only a single parameter, using the proposed clustering method is shown to be orders of magnitudes more efficient than using SCM. The effectiveness of the proposed method is demonstrated on phase space prediction of three univariate time series and prediction of two multivariate data sets. Some drawbacks of SCM when applied for data extraction are identified, and the proposed method is shown to be a solution for them.


Author(s):  
Shaoqiang Wang ◽  
Shudong Wang ◽  
Song Zhang ◽  
Yifan Wang

Abstract To automatically detect dynamic EEG signals to reduce the time cost of epilepsy diagnosis. In the signal recognition of electroencephalogram (EEG) of epilepsy, traditional machine learning and statistical methods require manual feature labeling engineering in order to show excellent results on a single data set. And the artificially selected features may carry a bias, and cannot guarantee the validity and expansibility in real-world data. In practical applications, deep learning methods can release people from feature engineering to a certain extent. As long as the focus is on the expansion of data quality and quantity, the algorithm model can learn automatically to get better improvements. In addition, the deep learning method can also extract many features that are difficult for humans to perceive, thereby making the algorithm more robust. Based on the design idea of ResNeXt deep neural network, this paper designs a Time-ResNeXt network structure suitable for time series EEG epilepsy detection to identify EEG signals. The accuracy rate of Time-ResNeXt in the detection of EEG epilepsy can reach 91.50%. The Time-ResNeXt network structure produces extremely advanced performance on the benchmark dataset (Berne-Barcelona dataset) and has great potential for improving clinical practice.


2015 ◽  
Vol 4 (2) ◽  
pp. 317-342 ◽  
Author(s):  
Daniel M. Kselman ◽  
Eleanor Neff Powell ◽  
Joshua A. Tucker

This paper develops a novel argument as to the conditions under which new political parties will form in democratic states. Our approach hinges on the manner in which politicians evaluate the policy implications of new party entry alongside considerations of incumbency for its own sake. We demonstrate that if candidates care sufficiently about policy outcomes, then the likelihood of party entry shouldincreasewith the effective number of status quo parties in the party system. This relationship weakens, and eventually disappears, as politicians’ emphasis on “office-seeking” motivations increases relative to their interest in public policy. We test these predictions with both aggregate electoral data in contemporary Europe and a data set on legislative volatility in Turkey, uncovering support for the argument that party system fragmentation should positively affect the likelihood of entry when policy-seeking motivations are relevant, but not otherwise.


Sign in / Sign up

Export Citation Format

Share Document