Fermat's Library | Deep Variational Autoencoder with Shallow Parallel Path for Top-N Recommendation (VASP) annotated/explained version.

Deep Variational Autoencoder with Shallow Parallel

Path for Top-N Recommendation (VASP)

Vojt

ech Van

cura

Faculty of Information Technology

Czech Technical University in Prague

Prague, Czech Republic

vancurv@ﬁt.cvut.cz

Pavel Kord

ık

Faculty of Information Technology

Czech Technical University in Prague

Prague, Czech Republic

pavel.kordik@ﬁt.cvut.cz

0000-0003-1433-0089

Abstract—Recently introduced EASE algorithm presents a

simple and elegant way, how to solve the top-N recommendation

task. In this paper, we introduce Neural EASE to further improve

the performance of this algorithm by incorporating techniques for

training modern neural networks. Also, there is a growing interest

in the recsys community to utilize variational autoencoders

(VAE) for this task. We introduce deep autoencoder FLVAE

beneﬁting from multiple non-linear layers without an information

bottleneck while not overﬁtting towards the identity. We show

how to learn FLVAE in parallel with Neural EASE and achieve

the state of the art performance on the MovieLens 20M dataset

and competitive results on the Netﬂix Prize dataset.

Index Terms—recommender systems, variational autoencoders,

I. INTRODUCTION

With the increasing amount of information on the Web,

Recommender Systems (RS) are an important way to over-

come infobesity. On the other hand, companies like NetFlix,

Youtube, Amazon, or Google are making signiﬁcant revenues

from recommendations

. Thus, RS are gaining more attention

over the past two decades.

As online companies grow, RS have to scale to millions

of active users and millions of items. Speed of training and

recall are also increasingly important as available content often

change dynamically and RS need to be able to react in real-

time.

Proper evaluation of RS is also increasingly important topic

as ofﬂine evaluation is often biased predictor of the online

performance [17]. For ofﬂine evaluation, the recsys commu-

nity shifted towards Top-N approaches [9] as evaluating the

performance based on root mean squared error on top of a

predicted rating matrix can be very misleading.

In the Top-N recommendation scenario [3], RS is recom-

mending N most relevant items for every user. This is the

typical case in various domains, including media, news, or

e-commerce.

Various approaches have been proposed for solving the Top-

N recommendation task, including collaborative ﬁltering with

matrix factorization to give an example. Recently, sparse-

data autoencoders [10], [13], [16], [18], [19], [23] gained

According to [25] 80% movies on NetFlix, 60% videos on Youtube are

watched based on recommendations; recommendations are responsible of

35% sale revenues on Amazon [22]

much attention and were providing state-of-the-art results in

solving this task. We examined various proposed models,

including denoising autoencoders, variational autoencoders,

and shallow autoencoder called the EASE [20], which despite

being a simple linear model, is providing competitive and

explainable results while addressing the biggest problem with

sparse autoencoders: overﬁtting towards identity.

Inspired by [2], our motivation was to build a RS model,

that is as elegant and explainable as EASE while leveraging

the potential of deep autoencoders to model complex nonlinear

patterns in the data.

In order to do this, we had to overcome several issues, most

importantly, the overﬁtting towards identity.

Traditionally, overﬁtting towards identity is addressed by

using dropout in the input layer [10], [13], [19]. However,

this approach is not effective enough and is not enabling the

usage of really deep architectures.

In this work, we propose three major contributions to

address the issues mentioned above. We propose:

• the usage of focal loss for training autoencoders for Top-

N recommendation.

• a simple yet effective data augmentation technique to pre-

vent Top-N recommending autoencoders from overﬁtting

towards identity.

• a joint-learning technique based on the Hadamard product

for training different combinations of various models.

As a demonstration, we build the VASP, a Variational

Autoencoder with a Shallow parallel Path. VASP combines

deep Variational Autoencoder and a neural variant of shallow

EASE jointly trained together to model both linear and non-

linear patterns in the data. VASP was able to achieve state

of the art performance on the MovieLens 20M dataset and

competitive results on the Netﬂix Prize dataset.

II. RELATED WORK

Matrix Factorization (MF) has been the ﬁrst choice model

for many years since the team “BellKor’s Pragmatic Chaos”

won the Netﬂix Prize [12].

In 2016 Sedhain et al. proposed AutoRec [18], an

autoencoder-based model for collaborative ﬁltering with ex-

plicit ratings that outperforms all current baseline models.

arXiv:2102.05774v1 [cs.LG] 10 Feb 2021

After emergence of variational autoencoders (VAE), the col-

laborative ﬁltering model MultVAE was proposed in 2018 by

Liang et al. [13] This model uses multinomial log-likelihood

for data distribution.

H. Steck proposed EASE [20], the Embarrasingelly Shal-

low Autoencoder with no hidden layers as opposed to deep

architectures. This approach was able to beat SOTA models

when introduced.

Several techniques for improving the MultVAE were pro-

posed recently. RecVAE [19] uses a separate regularization

term in the form of the KL divergence between the actual

parameter distribution and the distribution in previous training

step preventing instability during training.

H+VAMP [10] implements a variational autoencoder with

a variational mixture of posteriors prior (Vamp Prior) with

the goal to learn better latent representations of user-items

interactions.

During the evolution of recommender systems, many sim-

ple, shallow (linear, wide), and complex deep architectures

have been proposed. Cheng et al. proposed a combination

of those two approaches into a single framework called

Wide&Deep Learning [2] and introduced a technique called

joint training. The authors also point out the distinction

between joint training and ensembling. We are inspired by this

work, but our approach is quite different. We do not process

item attributes, just the interactions, therefore deep path is

not design to encode items but to ﬁnd nonlinear interaction

patterns. Also, voting scheme is different.

In [4], the authors propose ensembling of pre-trained rec-

ommender models by variational autoencoder. However, joint-

learning of such model seems to be problematic from the

perspective of scaling and practical usability.

Many other deep learning techniques originally developed

for computer vision or natural language processing was later

successfully used in other ﬁelds such as recommender systems.

Residual networks [7] are good example. Another example is

using the approach from [8] for dense layers in RecVAE [19].

We follow this trend by adopting Focal Loss (FL) from

[14] for recommendation systems. This novel approach is

used for imbalanced classes in object detection addressing the

imbalance between the background class and other classes.

That means that the loss is higher for examples in the training

set that are difﬁcult to classify. This perfectly ﬁts the situation

in collaborative ﬁltering, where some items are more popular

than others. It is more difﬁcult to recommend niche items

as they do not have many interactions. Higher loss for these

items push recommender system to focus more on cold start

and niche items.

Another essential idea while training an autoencoder on

sparse data is to prevent overﬁtting towards learning identity

function between the input and the output layer of the au-

toencoder [21]. In [24], the authors proposed the Split-Brain

Autoencoder, which prevents learning identity by splitting

input image into two separate channels: grayscale channel

and color channels X

. Learning is then performed in

a separate way by training two networks, F

to perform

Fig. 1. VASP Architecture

automatic colorization by learning X

by showing X

and

to make a grayscale prediction by learning X

by showing

. On the other hand, our approach uses only one neural

network with automated data augmentation as a preprocessing

step.

III. OUR APPROACH

Following notation from [13], we index users as u ∈

{1, ..., U}, items as i ∈ {1, ..., I}, and user-item interaction

matrix X ∈ N

U×I

. Lowercase x

= [x

, x

, ..., x

]

∈ N

denotes the interaction history and ˆx

= [ˆx

, ˆx

, ..., ˆx

]

∈

predicted ratings of the user u.

A. Neural EASE (NEASE)

Following [20], the EASE model can be described as:

ˆx

= W · x

, (1)

where W ∈ R

|I|×|I|

is the weight matrix. Diagonal of W

is constrained to zero to prevent learning identity function

between the input and the output. In [20] authors proposed

using square loss between the data x

and the predicted

scores ˆx

because this training objective has a closed-form

solution. The authors also suggest that using more complex

loss functions may lead to better prediction accuracy with

higher computational costs.

To enable running the model in parallel to a deep autoen-

coder, we interpret the EASE model (1) as a single-layer

perceptron without bias nodes and with forced zeros on the

diagonal, which can be trained with any suitable loss function

using backpropagation.

Our experiments with several different loss functions are

described in Section VI.

B. MultVAE with focal loss (FLVAE)

Consistently with any other variational autoencoder [11],

the MultVae model’s generative process starts by sampling k-

dimensional latent representation z

from a standard Gaussian

prior [13]. Then, under an assumption that interaction history

has been drawn from a multinomial distribution, a neural

network f

(·) is used to produce a probability distribution

π(z

) over I items:

∼ N (0, I

π(z

) ∝ exp{f

)},

∼ M ult(N

, π(z

))

Variational autoencoder then aims to maximize the average

marginal likelihood p(z

) =

p(x

)p(z

)dz. Since

(·) is a neural network, p(z

) becomes intractable and

it is approximated with evidence lower bound (ELBO):

log p ≥ E

[log p(x

) − KL(q(z

)||p(z

))]

(2)

where q(x

; φ) is a variational approximation of the poste-

rior distribution, p(x

; θ) is the prior distribution, φ and θ are

parameters of p(x

), log p(x

) is the log-likelihood for

user u and KL is the Kullback-Leibler divergence.

FL [14] is deﬁned as

F L(p

) = −α

(1 − p

)

log(p

), (3)

where p

is:

(

ˆx

if x

= 1

1 − ˆx

otherwise

(4)

and α

, γ act as hyperparameters.

Since maximising log-likelihood is the same as minimising

cross-entropy and focal loss can be understood as a form of

weighted cross-entropy, ELBO (2) can be easily rewritten:

log p ≥E

[α

(1 − p(x

))

log p(x

)−

−KL(q(z

)||p(z

))]

(5)

C. VASP

Recommender model m can be expressed as a function

m(·) : x

→ ˆx

. If m uses a sigmoid function on the output,

it’s obvious that ˆx

∈< 0, 1 >. We propose joint-learning

with Hadamard product [15], denoted  for combining any

number n, n ∈ N of recommender models m

) =

j=1

= m

)  m

)  ...  m

)

(6)

since m

) = ˆx

and x

is the same for all models in

combination, (6) can be directly rewritten as

) = ˆx

j=1

ˆx

(7)

while

ˆx

∈ h0, 1i.

(8)

While in Wide & Deep [2], networks are combined with

the summation (logical OR), in VASP we use the Hadamard

product (logical AND), meaning that both networks have to

agree. In [2], activation of a single network is sufﬁcient for

positive output.

Proposed VASP architecture (Fig. 1.) uses combination of

two models, NEASE and FLVAE ensembled by element-wise

multiplication (7):

V ASP

) = m

F LV AE

)  m

EASE

)

= ˆx

F LV AEu

 ˆx

NEASEu

(9)

To satisfy condition (8) we add sigmoid function to (1):

ˆx

NEASEu

= σ(W · x

) (10)

Since m

F LV AE

and m

EASE

are both fully differentiable,

V ASP

is also fully differentiable and backpropagation can

be used to optimize the m

V ASP

D. Data augmentation to prevent learning identity

Inspired by [24] we prevent learning identity by spliting the

input interactions x

before every training epoch randomly

into two parts, x

and x

so:

Fig. 2. Data augmentation to prevent overﬁtting towards identity

Fig. 3. Explaining VASP on MovieLens20M Dataset: Output of the joint model (left) was linearly decomposed to EASE component (middle) and FLVAE

component (right) to demonstrate that EASE is learning more apparent linear dependencies and FLVAE the non-linear ones. Red = Horrors, Blue = Children

movies, Green = Western movies and Yellow = Noir.

Aui

(

0 if x

= 0

1 − x

Bui

otherwise

Bui

(

0 if x

= 0

1 − x

Aui

otherwise

(11)

while

i=1

Aui

≈

i=1

Bui

(12)

The autoencoder is learning x

Bui

by showing x

Aui

in one

training step and than x

Aui

by showing x

Bui

in another. Thus

autoencoder is still seeing all the data, but does not see the

identity and cannot learn it (as demonstrated by Figure 2).

IV. INTERPRETING VASP

In order to analyze the effect of shallow and deep compo-

nents of the proposed ensemble model, we have implemented

a workﬂow to produce a visualization of movie embeddings

learned by individual models and the ensemble.

The principle was the following. We performed a sensitivity

analysis of models by putting one-hot vectors to the input

and generating output probabilities (reconstructions). These

probabilities were then transformed to distances by linear

scaling and then projected into a t-SNE plot. See Figure 3

for further details.

V. EXPERIMENTAL SETUP

To verify our assumtions by experiments, we implemented

three models: neural variant of EASE, deep Variational Au-

toencoder, and then VASP: joint learning model consisting of

NEASE and deep Variational Autoencoder ensembled by the

Hadamard product as described in chapter III-C. We trained

those models on two datasets: MovieLens20M and Netﬂix

prize dataset, and compared results over various baselines,

including current SOTA models.

A. Datasets

1) MovieLens20M [6]: Dataset of 27000 movies rated

by 138,000 users generating 20 million ratings in total. We

preprocessed the dataset according to [13]: Since this dataset

contains explicit ratings, we converted the data to implicit

interactions by considering valid interaction only rating of four

or higher

. Only users with ﬁve or more interactions remain

in the dataset after preprocessing. We randomly choose 10000

users as a test set and train our models on the rest.

2) Netﬂix Prize Dataset [1]: Dataset from Netﬂix prize

- over 100 million ratings from 480000 randomly-chosen,

anonymous Netﬂix customers over 17000 movie titles. We

converted explicit ratings to implicit interactions by the same

method as we used on the MovieLens20M. We randomly

choose 40000 users as a test set and train our models on the

rest.

B. Metrics

We evaluate our models in the same way as in [13]. First,

we sample 80% of the test user’s interactions as input for the

model, and then we measure Recall@k and NCDG@k for

predicted interactions against the remaining 20% of the user’s

interactions.

NCDG for TOP-k recommended items, denoted

NCDG@k is deﬁned as

NDCG@k =

DCG@k

IDCG@k

(13)

where

DCG@k =

i=1

rel

− 1

log

(i + 1)

(14)

and

Note that we use implicit interactions because it is prevalent case for

practical recommendation tasks

IDCG@k =

i=1

rel

− 1

log

(i + 1)

(15)

rel

is relevance of the recommendation at position i and

is the list of those 20% interactions acting as ”true” user

interactions.

Recall for TOP-k recommended items, denoted Recall@k

is deﬁned as

Recall@k =

∩ R

(16)

where

is top-k items recommended by evaluated model.

C. Baselines

We chose several autoencoder-based models for a top-

n recommendation, including MultDAE and MultVAE from

[13], EASE from [20] and current SOTA models, RecVAE [19]

and H+VAMP [10] as baselines for performance evaluation.

D. Implemented models

1) Neural EASE: was implemented as a dense layer with

forced zeros on diagonal by kernel constraint. We evaluate

three different loss functions: mean squared error, cosine

proximity loss and focal loss. Since EASE has forced zeros

on diagonal in parametrs matrix to prevent learning identity,

no data augmentation was used.

2) Variational Autoencoder: we implemented FLVAE with

densely connected residual network both in encoder and de-

coder. Sigmoid activation was used on the output.

Data augmentation described in section III-D was used to

prevent autoencoder in learning identity.

Default values of α

= 0.25 and γ = 2.0 in (3) was used.

3) VASP: is build by connecting neural EASE and FLVAE

by the Hadamard product as described in III-C. Hyperpa-

rameters was the same as for plain FLVAE. We evaluated

three variants: pre-trained EASE and FLVAE joined together

as ensemble, jointly training form the start and alternating

approach where FLVAE and EASE was training in every step

separatedly. We prevent our model to learn identity by using

data augmentation described in III-D.

E. Hyperparameters

We used 2048 units for latent space and 4096 units for

hidden layers. We used seven densely connected residual

hidden layers for the encoder and ﬁve layers of the same

architecture in the decoder.

We trained our model for 50 epochs with a learning rate

of 0.00005 and batches of 1024 samples. Then, we lower the

learning rate to 0.00001 and trained for another 20 epochs.

Then we performed ﬁnetuning with a learning rate of 0.000001

for another 20 epochs.

All models was implemented in Tensorﬂow [5] and the

source code with notes for reproducing the results is publicly

available on our GitHub page.

VI. RESULTS AND DISCUSSION

Neural EASE: We evaluated three different variants of the

model based on the loss function used - mean squared error,

cosine loss, and focal loss (see Table VI).

TABLE I

RESULTS WITH DIFFERENT LOSS FUNCTIONS FOR THE EASE MODEL ON

MOVIELENS20M DATASET.

Loss function used NCDG@100 Recall@20 Recall@50

MSE 0.425 0.393 0.523

Cosine proximity 0.431 0.403 0.532

Focal loss 0.377 0.343 0.426

We found that using cosine proximity loss leads to better

performance of the model as authors of [20] expected.

FLVAE: Authors of MultVAE reject deeper architectures by

stating that ”going deeper does not improve performance.” We

investigated the matter and believed that overﬁtting towards

identity was to blame. We address this issue by adopting data

augmentation described in the section III-D. This approach

successfully allowed us to build much bigger models than in

[13], [19] or [10] where 200 units for latent space and 600 in

hidden layers was used.

VASP: We evaluated three methods of training models

connected by the Hadamard product (see Table VI).

TABLE II

RESULTS WITH DIFFERENT TRAINING APPROACH FOR THE VASP MODEL

ON MOVIELENS20M DATASET.

Training approach NCDG@100 Recall@20 Recall@50

Pretrained ensemble 0.442 0.414 0.545

Alternating training 0.436 0.401 0.543

Joint learning 0.448 0.414 0.552

First, we connected pre-trained models and evaluated them

as an ensemble. Then we initialized the joint model and trained

it from scratch. Lastly, we experimented with the alternating

approach, where in one step was frozen weights of FLVAE,

and in the next step, we weights of the EASE model were

frozen instead. However, this approach did not perform better

than joint-learning from the start.

Finally, we have compared our base models NEASE,

FLVAE and jointy learned ensemble VASP to state of the

art approaches (see Table VI). Signiﬁcantly best performing

models are in bold. Our VASP outperformed other models

and achieved the SOTA for MovieLens 20M dataset. It also

performed quite well for the Netﬂix dataset (second highest

ranking model).

In our future experiments, we will carefully analyse the

H+Vamp Gated model that performed better on Netﬂix. We

will try to put it in the ensemble with the Neural EASE or

use the idea of Variational Mixture of Posteriors to improve

performance of our deep FLVAE model.

VII. CONCLUSION

We proved EASE to be a compelling Top-N recommenda-

tion model that can still match current SOTA baselines.

TABLE III

RESULTS

MovieLens 20M Netﬂix Prize Dataset

NCDG@100 Recall@20 Recall@50 NCDG@100 Recall@20 Recall@50

Mult-DAE 0.419 0.387 0.524 0.380 0.344 0.438

Mult-VAE 0.426 0.395 0.537 0.386 0.351 0.444

EASE 0.420 0.391 0.521 0.393 0.362 0.445

RecVAE 0.442 0.414 0.553 0.394 0.361 0.452

H+Vamp Gated 0.445 0.413 0.551 0.409 0.376 0.463

Neural EASE 0.431 0.403 0.532 0.395 0.363 0.447

FLVAE 0.445 0.409 0.547 0.398 0.363 0.450

VASP 0.448 0.414 0.552 0.406 0.372 0.457

We proposed a data augmentation method to prevent over-

ﬁtting to identity and experimentally proved that using this

method leads to better performance of autoencoders used for

top-n recommendation.

We proposed a novel joint-learning technique for training

multiple models together. Using that we constructed VASP,

Variational Autoencoder with parallel Shalow Path and ex-

perimentally proved, that variational autoencoder connected

with parallel simple shallow linear model can match current

sophisticated SOTA models and even outperform them in some

cases.

ACKNOWLEDGMENT

Our research has been supported by the Grant

Agency of the Czech Technical University in Prague

(SGS20/213/OHK3/3T/18), the Czech Science Foundation

(GA

CR 18-18080S), Recombee and VUSTE-APIS.

REFERENCES

[1] James Bennett and Stan Lanning. The Netﬂix Prize. KDD Cup and

Workshop, 2007.

[2] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar

Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai,

Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xi-

aobing Liu, and Hemal Shah. Wide & Deep Learning for Recommender

Systems. RecSys 2017 - Proceedings of the 11th ACM Conference on

Recommender Systems, pages 396–397, jun 2016.

[3] Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. Performance of

recommender algorithms on top-N recommendation tasks. In RecSys’10

- Proceedings of the 4th ACM Conference on Recommender Systems,

2010.

[4] Ahlem Drif, Houssem Eddine Zerrad, and Hocine Cheriﬁ. Ensvae:

Ensemble variational autoencoders for recommendations. IEEE Access,

8:188335–188351, 2020.

[5] Mart

ın Abadi et al. TensorFlow: Large-scale machine learning on

heterogeneous systems, 2015. Software available from tensorﬂow.org.

[6] F. Maxwell Harper and Joseph A. Konstan. The movielens datasets:

History and context. ACM Trans. Interact. Intell. Syst., 5(4), December

2015.

[7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep resid-

ual learning for image recognition. Proceedings of the IEEE Computer

Society Conference on Computer Vision and Pattern Recognition, 2016-

December:770–778, 2016.

[8] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q.

Weinberger. Densely connected convolutional networks. Proceedings

- 30th IEEE Conference on Computer Vision and Pattern Recognition,

CVPR 2017, 2017-January:2261–2269, 2017.

[9] George Karypis. Evaluation of item-based top-n recommendation

algorithms. In Proceedings of the tenth international conference on

Information and knowledge management, pages 247–254, 2001.

[10] Daeryong Kim and Bongwon Suh. Enhancing VAEs for collaborative

ﬁltering: Flexible priors & gating mechanisms. In RecSys 2019 - 13th

ACM Conference on Recommender Systems, 2019.

[11] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes.

In 2nd International Conference on Learning Representations, ICLR

2014 - Conference Track Proceedings, 2014.

[12] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization

techniques for recommender systems. Computer, 2009.

[13] Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony

Jebara. Variational autoencoders for collaborative ﬁltering. In The Web

Conference 2018 - Proceedings of the World Wide Web Conference,

WWW 2018, 2018.

[14] Tsung Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar.

Focal Loss for Dense Object Detection. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 42(2):318–327, 2020.

[15] Elizabeth Million. The hadamard product. Creative commons, 2007.

[16] Andrew Ng et al. Sparse autoencoder. CS294A Lecture notes,

72(2011):1–19, 2011.

[17] Tomas Rehorek, P Kordik, et al. Comparing ofﬂine and online evaluation

results of recommender systems. In In Proceedings of the REVEAL

workshop at RecSyS conference (RecSyS’18), 2018.

[18] Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie.

AutoRec. In Proceedings of the 24th International Conference on World

Wide Web - WWW ’15 Companion, pages 111–112, New York, New

York, USA, 2015. ACM Press.

[19] Ilya Shenbin, Anton Alekseev, Elena Tutubalina, Valentin Malykh, and

Sergey I. Nikolenko. RecVAE: A new variational autoencoder for top-n

recommendations with implicit feedback. In WSDM 2020 - Proceedings

of the 13th International Conference on Web Search and Data Mining,

pages 528–536. Association for Computing Machinery, Inc, jan 2020.

[20] Harald Steck. Embarrassingly shallow autoencoders for sparse data.

In The Web Conference 2019 - Proceedings of the World Wide Web

Conference, WWW 2019, 2019.

[21] Harald Steck. Autoencoders that don’t overﬁt towards the identity.

Advances in Neural Information Processing Systems, 33, 2020.

[22] Panagiotis Symeonidis and Andreas Zioupos. Matrix and Tensor

Factorization Techniques for Recommender Systems. 2016.

[23] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine

Manzagol. Extracting and composing robust features with denoising

autoencoders. In Proceedings of the 25th international conference on

Machine learning, pages 1096–1103, 2008.

[24] Richard Zhang, Phillip Isola, and Alexei A Efros. Split-brain au-

toencoders: Unsupervised learning by cross-channel prediction. In

Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition, pages 1058–1067, 2017.

[25] Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. Deep learning based

recommender system: A survey and new perspectives. ACM Computing

Surveys, 52(1), 2019.

Comments

Products

Project