AbstractThis paper introduces two deep convolutional neural network training techniques that lead to more robust feature subspace separation in comparison to traditional training. Assume that dataset has M labels. The first method creates M deep convolutional neural networks called $$\{\text {DCNN}_i\}_{i=1}^{M}$$
{
DCNN
i
}
i
=
1
M
. Each of the networks $$\text {DCNN}_i$$
DCNN
i
is composed of a convolutional neural network ($$\text {CNN}_i$$
CNN
i
) and a fully connected neural network ($$\text {FCNN}_i$$
FCNN
i
). In training, a set of projection matrices $$\{\mathbf {P}_i\}_{i=1}^M$$
{
P
i
}
i
=
1
M
are created and adaptively updated as representations for feature subspaces $$\{\mathcal {S}_i\}_{i=1}^M$$
{
S
i
}
i
=
1
M
. A rejection value is computed for each training based on its projections on feature subspaces. Each $$\text {FCNN}_i$$
FCNN
i
acts as a binary classifier with a cost function whose main parameter is rejection values. A threshold value $$t_i$$
t
i
is determined for $$i^{th}$$
i
th
network $$\text {DCNN}_i$$
DCNN
i
. A testing strategy utilizing $$\{t_i\}_{i=1}^M$$
{
t
i
}
i
=
1
M
is also introduced. The second method creates a single DCNN and it computes a cost function whose parameters depend on subspace separations using the geodesic distance on the Grasmannian manifold of subspaces $$\mathcal {S}_i$$
S
i
and the sum of all remaining subspaces $$\{\mathcal {S}_j\}_{j=1,j\ne i}^M$$
{
S
j
}
j
=
1
,
j
≠
i
M
. The proposed methods are tested using multiple network topologies. It is shown that while the first method works better for smaller networks, the second method performs better for complex architectures.