Deep Variational Autoencoder with Shallow Parallel
Path for Top-N Recommendation (VASP)
Vojt
ˇ
ech Van
ˇ
cura
Faculty of Information Technology
Czech Technical University in Prague
Prague, Czech Republic
vancurv@fit.cvut.cz
Pavel Kord
´
ık
Faculty of Information Technology
Czech Technical University in Prague
Prague, Czech Republic
pavel.kordik@fit.cvut.cz
0000-0003-1433-0089
Abstract—Recently introduced EASE algorithm presents a
simple and elegant way, how to solve the top-N recommendation
task. In this paper, we introduce Neural EASE to further improve
the performance of this algorithm by incorporating techniques for
training modern neural networks. Also, there is a growing interest
in the recsys community to utilize variational autoencoders
(VAE) for this task. We introduce deep autoencoder FLVAE
benefiting from multiple non-linear layers without an information
bottleneck while not overfitting towards the identity. We show
how to learn FLVAE in parallel with Neural EASE and achieve
the state of the art performance on the MovieLens 20M dataset
and competitive results on the Netflix Prize dataset.
Index Terms—recommender systems, variational autoencoders,
I. INTRODUCTION
With the increasing amount of information on the Web,
Recommender Systems (RS) are an important way to over-
come infobesity. On the other hand, companies like NetFlix,
Youtube, Amazon, or Google are making significant revenues
from recommendations
1
. Thus, RS are gaining more attention
over the past two decades.
As online companies grow, RS have to scale to millions
of active users and millions of items. Speed of training and
recall are also increasingly important as available content often
change dynamically and RS need to be able to react in real-
time.
Proper evaluation of RS is also increasingly important topic
as offline evaluation is often biased predictor of the online
performance [17]. For offline evaluation, the recsys commu-
nity shifted towards Top-N approaches [9] as evaluating the
performance based on root mean squared error on top of a
predicted rating matrix can be very misleading.
In the Top-N recommendation scenario [3], RS is recom-
mending N most relevant items for every user. This is the
typical case in various domains, including media, news, or
e-commerce.
Various approaches have been proposed for solving the Top-
N recommendation task, including collaborative filtering with
matrix factorization to give an example. Recently, sparse-
data autoencoders [10], [13], [16], [18], [19], [23] gained
1
According to [25] 80% movies on NetFlix, 60% videos on Youtube are
watched based on recommendations; recommendations are responsible of
35% sale revenues on Amazon [22]
much attention and were providing state-of-the-art results in
solving this task. We examined various proposed models,
including denoising autoencoders, variational autoencoders,
and shallow autoencoder called the EASE [20], which despite
being a simple linear model, is providing competitive and
explainable results while addressing the biggest problem with
sparse autoencoders: overfitting towards identity.
Inspired by [2], our motivation was to build a RS model,
that is as elegant and explainable as EASE while leveraging
the potential of deep autoencoders to model complex nonlinear
patterns in the data.
In order to do this, we had to overcome several issues, most
importantly, the overfitting towards identity.
Traditionally, overfitting towards identity is addressed by
using dropout in the input layer [10], [13], [19]. However,
this approach is not effective enough and is not enabling the
usage of really deep architectures.
In this work, we propose three major contributions to
address the issues mentioned above. We propose:
the usage of focal loss for training autoencoders for Top-
N recommendation.
a simple yet effective data augmentation technique to pre-
vent Top-N recommending autoencoders from overfitting
towards identity.
a joint-learning technique based on the Hadamard product
for training different combinations of various models.
As a demonstration, we build the VASP, a Variational
Autoencoder with a Shallow parallel Path. VASP combines
deep Variational Autoencoder and a neural variant of shallow
EASE jointly trained together to model both linear and non-
linear patterns in the data. VASP was able to achieve state
of the art performance on the MovieLens 20M dataset and
competitive results on the Netflix Prize dataset.
II. RELATED WORK
Matrix Factorization (MF) has been the first choice model
for many years since the team “BellKor’s Pragmatic Chaos”
won the Netflix Prize [12].
In 2016 Sedhain et al. proposed AutoRec [18], an
autoencoder-based model for collaborative filtering with ex-
plicit ratings that outperforms all current baseline models.
arXiv:2102.05774v1 [cs.LG] 10 Feb 2021
After emergence of variational autoencoders (VAE), the col-
laborative filtering model MultVAE was proposed in 2018 by
Liang et al. [13] This model uses multinomial log-likelihood
for data distribution.
H. Steck proposed EASE [20], the Embarrasingelly Shal-
low Autoencoder with no hidden layers as opposed to deep
architectures. This approach was able to beat SOTA models
when introduced.
Several techniques for improving the MultVAE were pro-
posed recently. RecVAE [19] uses a separate regularization
term in the form of the KL divergence between the actual
parameter distribution and the distribution in previous training
step preventing instability during training.
H+VAMP [10] implements a variational autoencoder with
a variational mixture of posteriors prior (Vamp Prior) with
the goal to learn better latent representations of user-items
interactions.
During the evolution of recommender systems, many sim-
ple, shallow (linear, wide), and complex deep architectures
have been proposed. Cheng et al. proposed a combination
of those two approaches into a single framework called
Wide&Deep Learning [2] and introduced a technique called
joint training. The authors also point out the distinction
between joint training and ensembling. We are inspired by this
work, but our approach is quite different. We do not process
item attributes, just the interactions, therefore deep path is
not design to encode items but to find nonlinear interaction
patterns. Also, voting scheme is different.
In [4], the authors propose ensembling of pre-trained rec-
ommender models by variational autoencoder. However, joint-
learning of such model seems to be problematic from the
perspective of scaling and practical usability.
Many other deep learning techniques originally developed
for computer vision or natural language processing was later
successfully used in other fields such as recommender systems.
Residual networks [7] are good example. Another example is
using the approach from [8] for dense layers in RecVAE [19].
We follow this trend by adopting Focal Loss (FL) from
[14] for recommendation systems. This novel approach is
used for imbalanced classes in object detection addressing the
imbalance between the background class and other classes.
That means that the loss is higher for examples in the training
set that are difficult to classify. This perfectly fits the situation
in collaborative filtering, where some items are more popular
than others. It is more difficult to recommend niche items
as they do not have many interactions. Higher loss for these
items push recommender system to focus more on cold start
and niche items.
Another essential idea while training an autoencoder on
sparse data is to prevent overfitting towards learning identity
function between the input and the output layer of the au-
toencoder [21]. In [24], the authors proposed the Split-Brain
Autoencoder, which prevents learning identity by splitting
input image into two separate channels: grayscale channel
X
1
and color channels X
2
. Learning is then performed in
a separate way by training two networks, F
1
to perform
Fig. 1. VASP Architecture
automatic colorization by learning X
2
by showing X
1
and
F
2
to make a grayscale prediction by learning X
2
by showing
X
1
. On the other hand, our approach uses only one neural
network with automated data augmentation as a preprocessing
step.
III. OUR APPROACH
Following notation from [13], we index users as u
{1, ..., U}, items as i {1, ..., I}, and user-item interaction
matrix X N
U×I
. Lowercase x
u
= [x
u1
, x
u2
, ..., x
uI
]
T
N
I
denotes the interaction history and ˆx
u
= [ˆx
u1
, ˆx
u2
, ..., ˆx
uI
]
T
N
I
predicted ratings of the user u.
A. Neural EASE (NEASE)
Following [20], the EASE model can be described as:
ˆx
u
= W · x
u
, (1)
where W R
|I|×|I|
is the weight matrix. Diagonal of W
is constrained to zero to prevent learning identity function
between the input and the output. In [20] authors proposed
using square loss between the data x
u
and the predicted
scores ˆx
u
because this training objective has a closed-form
solution. The authors also suggest that using more complex
loss functions may lead to better prediction accuracy with
higher computational costs.
To enable running the model in parallel to a deep autoen-
coder, we interpret the EASE model (1) as a single-layer
perceptron without bias nodes and with forced zeros on the
diagonal, which can be trained with any suitable loss function
using backpropagation.
Our experiments with several different loss functions are
described in Section VI.
B. MultVAE with focal loss (FLVAE)
Consistently with any other variational autoencoder [11],
the MultVae model’s generative process starts by sampling k-
dimensional latent representation z
u
from a standard Gaussian
prior [13]. Then, under an assumption that interaction history
x
u
has been drawn from a multinomial distribution, a neural
network f
θ
(·) is used to produce a probability distribution
π(z
u
) over I items:
z
u
N (0, I
k
),
π(z
u
) exp{f
θ
(z
u
)},
x
u
M ult(N
u
, π(z
u
))
Variational autoencoder then aims to maximize the average
marginal likelihood p(z
u
|x
u
) =
R
p(x
u
|z
u
)p(z
u
)dz. Since
f
θ
(·) is a neural network, p(z
u
|x
u
) becomes intractable and
it is approximated with evidence lower bound (ELBO):
log p E
q
[log p(x
u
|z
u
) KL(q(z
u
|x
u
)||p(z
u
))]
(2)
where q(x
u
; φ) is a variational approximation of the poste-
rior distribution, p(x
u
; θ) is the prior distribution, φ and θ are
parameters of p(x
u
|z
u
), log p(x
u
|z
u
) is the log-likelihood for
user u and KL is the Kullback-Leibler divergence.
FL [14] is defined as
F L(p
t
) = α
t
(1 p
t
)
γ
log(p
t
), (3)
where p
t
is:
p
t
=
(
ˆx
ui
if x
ui
= 1
1 ˆx
ui
otherwise
(4)
and α
t
, γ act as hyperparameters.
Since maximising log-likelihood is the same as minimising
cross-entropy and focal loss can be understood as a form of
weighted cross-entropy, ELBO (2) can be easily rewritten:
log p E
q
[α
t
(1 p(x
u
|z
u
))
γ
log p(x
u
|z
u
)
KL(q(z
u
|x
u
)||p(z
u
))]
(5)
C. VASP
Recommender model m can be expressed as a function
m(·) : x
u
ˆx
u
. If m uses a sigmoid function on the output,
it’s obvious that ˆx
uI
< 0, 1 >. We propose joint-learning
with Hadamard product [15], denoted for combining any
number n, n N of recommender models m
n
as
m
n
(x
u
) =
n
K
j=1
m
j
= m
1
(x
u
) m
2
(x
u
) ... m
n
(x
u
)
(6)
since m
n
(x
u
) = ˆx
nu
and x
u
is the same for all models in
combination, (6) can be directly rewritten as
m
n
(x
u
) = ˆx
nu
=
n
K
j=1
ˆx
ju
(7)
while
ˆx
nu
h0, 1i.
(8)
While in Wide & Deep [2], networks are combined with
the summation (logical OR), in VASP we use the Hadamard
product (logical AND), meaning that both networks have to
agree. In [2], activation of a single network is sufficient for
positive output.
Proposed VASP architecture (Fig. 1.) uses combination of
two models, NEASE and FLVAE ensembled by element-wise
multiplication (7):
m
V ASP
(x
u
) = m
F LV AE
(x
u
) m
EASE
(x
u
)
= ˆx
F LV AEu
ˆx
NEASEu
(9)
To satisfy condition (8) we add sigmoid function to (1):
ˆx
NEASEu
= σ(W · x
u
) (10)
Since m
F LV AE
and m
EASE
are both fully differentiable,
m
V ASP
is also fully differentiable and backpropagation can
be used to optimize the m
V ASP
.
D. Data augmentation to prevent learning identity
Inspired by [24] we prevent learning identity by spliting the
input interactions x
u
before every training epoch randomly
into two parts, x
Au
and x
Bu
so:
Fig. 2. Data augmentation to prevent overfitting towards identity
Fig. 3. Explaining VASP on MovieLens20M Dataset: Output of the joint model (left) was linearly decomposed to EASE component (middle) and FLVAE
component (right) to demonstrate that EASE is learning more apparent linear dependencies and FLVAE the non-linear ones. Red = Horrors, Blue = Children
movies, Green = Western movies and Yellow = Noir.
x
Aui
=
(
0 if x
ui
= 0
1 x
Bui
otherwise
x
Bui
=
(
0 if x
ui
= 0
1 x
Aui
otherwise
(11)
while
I
X
i=1
x
Aui
I
X
i=1
x
Bui
(12)
The autoencoder is learning x
Bui
by showing x
Aui
in one
training step and than x
Aui
by showing x
Bui
in another. Thus
autoencoder is still seeing all the data, but does not see the
identity and cannot learn it (as demonstrated by Figure 2).
IV. INTERPRETING VASP
In order to analyze the effect of shallow and deep compo-
nents of the proposed ensemble model, we have implemented
a workflow to produce a visualization of movie embeddings
learned by individual models and the ensemble.
The principle was the following. We performed a sensitivity
analysis of models by putting one-hot vectors to the input
and generating output probabilities (reconstructions). These
probabilities were then transformed to distances by linear
scaling and then projected into a t-SNE plot. See Figure 3
for further details.
V. EXPERIMENTAL SETUP
To verify our assumtions by experiments, we implemented
three models: neural variant of EASE, deep Variational Au-
toencoder, and then VASP: joint learning model consisting of
NEASE and deep Variational Autoencoder ensembled by the
Hadamard product as described in chapter III-C. We trained
those models on two datasets: MovieLens20M and Netflix
prize dataset, and compared results over various baselines,
including current SOTA models.
A. Datasets
1) MovieLens20M [6]: Dataset of 27000 movies rated
by 138,000 users generating 20 million ratings in total. We
preprocessed the dataset according to [13]: Since this dataset
contains explicit ratings, we converted the data to implicit
interactions by considering valid interaction only rating of four
or higher
2
. Only users with five or more interactions remain
in the dataset after preprocessing. We randomly choose 10000
users as a test set and train our models on the rest.
2) Netflix Prize Dataset [1]: Dataset from Netflix prize
- over 100 million ratings from 480000 randomly-chosen,
anonymous Netflix customers over 17000 movie titles. We
converted explicit ratings to implicit interactions by the same
method as we used on the MovieLens20M. We randomly
choose 40000 users as a test set and train our models on the
rest.
B. Metrics
We evaluate our models in the same way as in [13]. First,
we sample 80% of the test user’s interactions as input for the
model, and then we measure Recall@k and NCDG@k for
predicted interactions against the remaining 20% of the user’s
interactions.
NCDG for TOP-k recommended items, denoted
NCDG@k is defined as
NDCG@k =
DCG@k
IDCG@k
(13)
where
DCG@k =
k
X
i=1
2
rel
i
1
log
2
(i + 1)
(14)
and
2
Note that we use implicit interactions because it is prevalent case for
practical recommendation tasks
IDCG@k =
|R
k
|
X
i=1
2
rel
i
1
log
2
(i + 1)
(15)
rel
i
is relevance of the recommendation at position i and
R
k
is the list of those 20% interactions acting as ”true” user
interactions.
Recall for TOP-k recommended items, denoted Recall@k
is defined as
Recall@k =
|
ˆ
R
k
R
k
|
R
k
(16)
where
ˆ
R
k
is top-k items recommended by evaluated model.
C. Baselines
We chose several autoencoder-based models for a top-
n recommendation, including MultDAE and MultVAE from
[13], EASE from [20] and current SOTA models, RecVAE [19]
and H+VAMP [10] as baselines for performance evaluation.
D. Implemented models
1) Neural EASE: was implemented as a dense layer with
forced zeros on diagonal by kernel constraint. We evaluate
three different loss functions: mean squared error, cosine
proximity loss and focal loss. Since EASE has forced zeros
on diagonal in parametrs matrix to prevent learning identity,
no data augmentation was used.
2) Variational Autoencoder: we implemented FLVAE with
densely connected residual network both in encoder and de-
coder. Sigmoid activation was used on the output.
Data augmentation described in section III-D was used to
prevent autoencoder in learning identity.
Default values of α
t
= 0.25 and γ = 2.0 in (3) was used.
3) VASP: is build by connecting neural EASE and FLVAE
by the Hadamard product as described in III-C. Hyperpa-
rameters was the same as for plain FLVAE. We evaluated
three variants: pre-trained EASE and FLVAE joined together
as ensemble, jointly training form the start and alternating
approach where FLVAE and EASE was training in every step
separatedly. We prevent our model to learn identity by using
data augmentation described in III-D.
E. Hyperparameters
We used 2048 units for latent space and 4096 units for
hidden layers. We used seven densely connected residual
hidden layers for the encoder and five layers of the same
architecture in the decoder.
We trained our model for 50 epochs with a learning rate
of 0.00005 and batches of 1024 samples. Then, we lower the
learning rate to 0.00001 and trained for another 20 epochs.
Then we performed finetuning with a learning rate of 0.000001
for another 20 epochs.
All models was implemented in Tensorflow [5] and the
source code with notes for reproducing the results is publicly
available on our GitHub page.
VI. RESULTS AND DISCUSSION
Neural EASE: We evaluated three different variants of the
model based on the loss function used - mean squared error,
cosine loss, and focal loss (see Table VI).
TABLE I
RESULTS WITH DIFFERENT LOSS FUNCTIONS FOR THE EASE MODEL ON
MOVIELENS20M DATASET.
Loss function used NCDG@100 Recall@20 Recall@50
MSE 0.425 0.393 0.523
Cosine proximity 0.431 0.403 0.532
Focal loss 0.377 0.343 0.426
We found that using cosine proximity loss leads to better
performance of the model as authors of [20] expected.
FLVAE: Authors of MultVAE reject deeper architectures by
stating that ”going deeper does not improve performance. We
investigated the matter and believed that overfitting towards
identity was to blame. We address this issue by adopting data
augmentation described in the section III-D. This approach
successfully allowed us to build much bigger models than in
[13], [19] or [10] where 200 units for latent space and 600 in
hidden layers was used.
VASP: We evaluated three methods of training models
connected by the Hadamard product (see Table VI).
TABLE II
RESULTS WITH DIFFERENT TRAINING APPROACH FOR THE VASP MODEL
ON MOVIELENS20M DATASET.
Training approach NCDG@100 Recall@20 Recall@50
Pretrained ensemble 0.442 0.414 0.545
Alternating training 0.436 0.401 0.543
Joint learning 0.448 0.414 0.552
First, we connected pre-trained models and evaluated them
as an ensemble. Then we initialized the joint model and trained
it from scratch. Lastly, we experimented with the alternating
approach, where in one step was frozen weights of FLVAE,
and in the next step, we weights of the EASE model were
frozen instead. However, this approach did not perform better
than joint-learning from the start.
Finally, we have compared our base models NEASE,
FLVAE and jointy learned ensemble VASP to state of the
art approaches (see Table VI). Significantly best performing
models are in bold. Our VASP outperformed other models
and achieved the SOTA for MovieLens 20M dataset. It also
performed quite well for the Netflix dataset (second highest
ranking model).
In our future experiments, we will carefully analyse the
H+Vamp Gated model that performed better on Netflix. We
will try to put it in the ensemble with the Neural EASE or
use the idea of Variational Mixture of Posteriors to improve
performance of our deep FLVAE model.
VII. CONCLUSION
We proved EASE to be a compelling Top-N recommenda-
tion model that can still match current SOTA baselines.
TABLE III
RESULTS
MovieLens 20M Netflix Prize Dataset
NCDG@100 Recall@20 Recall@50 NCDG@100 Recall@20 Recall@50
Mult-DAE 0.419 0.387 0.524 0.380 0.344 0.438
Mult-VAE 0.426 0.395 0.537 0.386 0.351 0.444
EASE 0.420 0.391 0.521 0.393 0.362 0.445
RecVAE 0.442 0.414 0.553 0.394 0.361 0.452
H+Vamp Gated 0.445 0.413 0.551 0.409 0.376 0.463
Neural EASE 0.431 0.403 0.532 0.395 0.363 0.447
FLVAE 0.445 0.409 0.547 0.398 0.363 0.450
VASP 0.448 0.414 0.552 0.406 0.372 0.457
We proposed a data augmentation method to prevent over-
fitting to identity and experimentally proved that using this
method leads to better performance of autoencoders used for
top-n recommendation.
We proposed a novel joint-learning technique for training
multiple models together. Using that we constructed VASP,
Variational Autoencoder with parallel Shalow Path and ex-
perimentally proved, that variational autoencoder connected
with parallel simple shallow linear model can match current
sophisticated SOTA models and even outperform them in some
cases.
ACKNOWLEDGMENT
Our research has been supported by the Grant
Agency of the Czech Technical University in Prague
(SGS20/213/OHK3/3T/18), the Czech Science Foundation
(GA
ˇ
CR 18-18080S), Recombee and VUSTE-APIS.
REFERENCES
[1] James Bennett and Stan Lanning. The Netflix Prize. KDD Cup and
Workshop, 2007.
[2] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar
Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai,
Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xi-
aobing Liu, and Hemal Shah. Wide & Deep Learning for Recommender
Systems. RecSys 2017 - Proceedings of the 11th ACM Conference on
Recommender Systems, pages 396–397, jun 2016.
[3] Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. Performance of
recommender algorithms on top-N recommendation tasks. In RecSys’10
- Proceedings of the 4th ACM Conference on Recommender Systems,
2010.
[4] Ahlem Drif, Houssem Eddine Zerrad, and Hocine Cherifi. Ensvae:
Ensemble variational autoencoders for recommendations. IEEE Access,
8:188335–188351, 2020.
[5] Mart
´
ın Abadi et al. TensorFlow: Large-scale machine learning on
heterogeneous systems, 2015. Software available from tensorflow.org.
[6] F. Maxwell Harper and Joseph A. Konstan. The movielens datasets:
History and context. ACM Trans. Interact. Intell. Syst., 5(4), December
2015.
[7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep resid-
ual learning for image recognition. Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, 2016-
December:770–778, 2016.
[8] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q.
Weinberger. Densely connected convolutional networks. Proceedings
- 30th IEEE Conference on Computer Vision and Pattern Recognition,
CVPR 2017, 2017-January:2261–2269, 2017.
[9] George Karypis. Evaluation of item-based top-n recommendation
algorithms. In Proceedings of the tenth international conference on
Information and knowledge management, pages 247–254, 2001.
[10] Daeryong Kim and Bongwon Suh. Enhancing VAEs for collaborative
filtering: Flexible priors & gating mechanisms. In RecSys 2019 - 13th
ACM Conference on Recommender Systems, 2019.
[11] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes.
In 2nd International Conference on Learning Representations, ICLR
2014 - Conference Track Proceedings, 2014.
[12] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization
techniques for recommender systems. Computer, 2009.
[13] Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony
Jebara. Variational autoencoders for collaborative filtering. In The Web
Conference 2018 - Proceedings of the World Wide Web Conference,
WWW 2018, 2018.
[14] Tsung Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar.
Focal Loss for Dense Object Detection. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 42(2):318–327, 2020.
[15] Elizabeth Million. The hadamard product. Creative commons, 2007.
[16] Andrew Ng et al. Sparse autoencoder. CS294A Lecture notes,
72(2011):1–19, 2011.
[17] Tomas Rehorek, P Kordik, et al. Comparing offline and online evaluation
results of recommender systems. In In Proceedings of the REVEAL
workshop at RecSyS conference (RecSyS’18), 2018.
[18] Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie.
AutoRec. In Proceedings of the 24th International Conference on World
Wide Web - WWW ’15 Companion, pages 111–112, New York, New
York, USA, 2015. ACM Press.
[19] Ilya Shenbin, Anton Alekseev, Elena Tutubalina, Valentin Malykh, and
Sergey I. Nikolenko. RecVAE: A new variational autoencoder for top-n
recommendations with implicit feedback. In WSDM 2020 - Proceedings
of the 13th International Conference on Web Search and Data Mining,
pages 528–536. Association for Computing Machinery, Inc, jan 2020.
[20] Harald Steck. Embarrassingly shallow autoencoders for sparse data.
In The Web Conference 2019 - Proceedings of the World Wide Web
Conference, WWW 2019, 2019.
[21] Harald Steck. Autoencoders that don’t overfit towards the identity.
Advances in Neural Information Processing Systems, 33, 2020.
[22] Panagiotis Symeonidis and Andreas Zioupos. Matrix and Tensor
Factorization Techniques for Recommender Systems. 2016.
[23] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine
Manzagol. Extracting and composing robust features with denoising
autoencoders. In Proceedings of the 25th international conference on
Machine learning, pages 1096–1103, 2008.
[24] Richard Zhang, Phillip Isola, and Alexei A Efros. Split-brain au-
toencoders: Unsupervised learning by cross-channel prediction. In
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 1058–1067, 2017.
[25] Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. Deep learning based
recommender system: A survey and new perspectives. ACM Computing
Surveys, 52(1), 2019.