linear networks
Recently Published Documents


TOTAL DOCUMENTS

335
(FIVE YEARS 43)

H-INDEX

24
(FIVE YEARS 3)

Author(s):  
Maria Refinetti ◽  
Stéphane d'Ascoli ◽  
Ruben Ohana ◽  
Sebastian Goldt

Abstract Direct Feedback Alignment (DFA) is emerging as an eficient and biologically plausible alternative to backpropagation for training deep neural networks. Despite relying on random feedback weights for the backward pass, DFA successfully trains state-of-the-art models such as Transformers. On the other hand, it notoriously fails to train convolutional networks. An understanding of the inner workings of DFA to explain these diverging results remains elusive. Here, we propose a theory of feedback alignment algorithms. We ffrst show that learning in shallow networks proceeds in two steps: an alignment phase, where the model adapts its weights to align the approximate gradient with the true gradient of the loss function, is followed by a memorisation phase, where the model focuses on fftting the data. This two-step process has a degeneracy breaking eflect: out of all the low-loss solutions in the landscape, a network trained with DFA naturally converges to the solution which maximises gradient alignment. We also identify a key quantity underlying alignment in deep linear networks: the conditioning of the alignment matrices. The latter enables a detailed understanding of the impact of data structure on alignment, and suggests a simple explanation for the well-known failure of DFA to train convolutional neural networks. Numerical experiments on MNIST and CIFAR10 clearly demonstrate degeneracy breaking in deep non-linear networks and show that the align-then-memorize process occurs sequentially from the bottom layers of the network to the top.


2021 ◽  
pp. 127911
Author(s):  
A. Dellios ◽  
Peter D. Drummond ◽  
Bogdan Opanchuk ◽  
Run Yan Teh ◽  
Margaret D. Reid

2021 ◽  
Author(s):  
Federico Basili ◽  
Stefano Parrino ◽  
Giacomo Peruzzi ◽  
Alessandro Pozzebon
Keyword(s):  

Author(s):  
Wei Huang ◽  
Weitao Du ◽  
Richard Yi Da Xu

The prevailing thinking is that orthogonal weights are crucial to enforcing dynamical isometry and speeding up training. The increase in learning speed that results from orthogonal initialization in linear networks has been well-proven. However, while the same is believed to also hold for nonlinear networks when the dynamical isometry condition is satisfied, the training dynamics behind this contention have not been thoroughly explored. In this work, we study the dynamics of ultra-wide networks across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs) with orthogonal initialization via neural tangent kernel (NTK). Through a series of propositions and lemmas, we prove that two NTKs, one corresponding to Gaussian weights and one to orthogonal weights, are equal when the network width is infinite. Further, during training, the NTK of an orthogonally-initialized infinite-width network should theoretically remain constant. This suggests that the orthogonal initialization cannot speed up training in the NTK (lazy training) regime, contrary to the prevailing thoughts. In order to explore under what circumstances can orthogonality accelerate training, we conduct a thorough empirical investigation outside the NTK regime. We find that when the hyper-parameters are set to achieve a linear regime in nonlinear activation, orthogonal initialization can improve the learning speed with a large learning rate or large depth.


Author(s):  
Т.С. Глотова ◽  
Д.В. Журавлёв ◽  
В.В. Глотов

Различные типы СВЧ-устройств можно описать с помощью падающих и отражённых волн, которые распространяются в подключенных к ним линиях передач. Связь между этими волнами описывается волновой матрицей рассеяния или матрицей s-параметров. Оценка дифференциальных структур необходима для обеспечения оптимальных характеристик схемы. Комбинированные дифференциальные и синфазные (смешанные) параметры рассеяния (s-параметры) хорошо адаптированы для точных измерений линейных сетей на радиочастотах. Представлено преобразование между стандартными s-параметрами и s-параметрами смешанного режима, также описано графическое сравнение графиков стандартных и смешанных потерь s-параметра. S-параметры смешанного режима, полученные с помощью описанного метода, имеют хорошее согласие для возбудителя и реакции с одним и тем же режимом (общий или дифференциальный) и небольшую вариацию с разными режимами. Была изготовлена дифференциальная структура, которая измеряется с помощью двухпортового векторного анализатора цепей и четырехпортового анализатора цепей смешанного режима. Для прогнозирования поведения параметров смешанного режима с использованием традиционного двухпортового векторного анализатора цепей можно применить метод преобразования режимов, однако четырехпортовый анализатор цепей смешанного режима по-прежнему необходим для точного измерения влияния режима преобразования в реальные интегрированные дифференциальные тестовые структуры Various types of microwave devices can be described using incident and reflected waves that propagate in the transmission lines connected to them. The relationship between these waves is described by the scattering wave matrix or the S-parameter matrix. Evaluation of differential structures is necessary to ensure optimal circuit performance. The combined differential and common-mode (mixed) scatter parameters (s-parameters) are well suited for accurate measurements of linear networks at radio frequencies. We present the transformation between standard s-parameters and mixed-mode s-parameters, and a graphical comparison of graphs of standard and mixed s-parameter losses is also described. S-parameters of the mixed mode, obtained using the described method, have good agreement for the pathogen and the reaction with the same mode (general or differential) and little variation with different modes. We fabricated and measured a differential structure with a two-port vector network analyzer and a four-port mixed-mode network analyzer. Mode conversion can be used to predict the behavior of mixed-mode parameters using a traditional 2-port vector network analyzer, but a four-port mixed-mode network analyzer is still required to accurately measure the effect of conversion mode on real integrated differential test structures


Author(s):  
Johannes Nokkala ◽  
Rodrigo Martinez-Pena ◽  
Roberta Zambrini ◽  
Miguel C. Soriano

2021 ◽  
pp. 1-1
Author(s):  
Fatemeh Ghaffari ◽  
Seyed Pooya Shariatpanahi ◽  
Mahdi Jafari Siavoshani ◽  
Behnam Bahrak
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document