YoshuaBengioInterview.txt

Hi, Yoshua, I'm really glad
you could join us here today. >> I'm very glad, too. >> Today you're not just a researcher or
engineer in deep learning. You've become one of the institutions and
one of the icons of deep learning, but I'd really like to hear
the story of how it started. So how did you end up getting into deep
learning, and then pursuing this journey? >> Right, well, actually,
it started when I was a kid, adolescent, reading a lot of science fiction,
like, I guess, many of us. And when I started my graduate studies in
1985, I started reading neural net papers, and that's where I got all excited,
and it became really a passion. >> And actually, what was that like in,
what, mid 80s, right, 1985, reading these papers, do you remember? >> Yeah. Well, coming from the courses I had taking
in classical AI with expert systems, and suddenly discovering that there
was all this world of thinking about how humans might be learning,
and human intelligence. And how we might draw connections between
that and artificial intelligence and computers. That was really exciting for
me when I discovered this literature, and I started reading the connectionists,
of course. So the papers from Geoff Hinton,
[INAUDIBLE], and so on. And I worked on recurrent nets,
I worked on speech recognition, I worked on HMNs, so graphical models. And then quickly, I moved to AT&T Bell
Labs and MIT, where I did postdocs. And that's where I discovered some
of the issues with the long-term dependencies with training neural nets. And then shortly after,
I got recruited at UdeM back in Montreal, where I had spent most
of my adolescent years. >> So as someone who's been there for
the last several decades and seen it all, certainly seen a lot of it,
tell me a bit about how you're thinking about deep learning, about neural
networks has evolved over this time? >> We start with experiments,
with intuitions, and theory sort of comes later. We now understand a lot better,
for example, why Backdrop is working so well,
why depth is so important. And these kinds of notions, we didn't have
any solid justification for in those days. When we started working on deep nets in
the early 2000s, we had the intuition that it made a lot of sense that a deeper
network should be more powerful. But we didn't know how to take that and prove it, and of course,
our experiments, initially, didn't work. >> And actually, what were the most important things
that you think turned out to be right? And what were the biggest surprises
of what turned out to be wrong, compared to what we knew 30 years ago? >> Sure, so one of the biggest
mistakes I made was to think, like everyone else in the 90s, that you needed smooth nonlinearities
in order for Backdrop to work. because I thought that if we had
something like rectifying nonlinearities, where you have a flat part,
that it would be really hard to train, because the derivative would be zero in so
many places. And when we started
experimenting with ReLU, with deep nets around 2010,
I was obsessed with the idea that, we should be careful about whether neurons
won't saturate too much on the zero part. But in the end, it turned out that,
actually, the ReLU was working a lot better than the sigmoids and attach,
and that was a big surprise. We did this, exploring this because of
the biological connection, actually, not because we thought that it
would be easier to optimize. But it turned out to work better, whereas
I thought it would be harder to train. >> So let me ask you, what is the relationship between
deep learning and the brain? There's the obvious answer, but
I'm curious what's your answer to that? >> Well, the initial insight
that really got me excited with neural nets was this idea from
the connectionists that information is distributed across
the activation of many neurons. Rather than being represented
by sort of the grandmother cell, as they were calling it,
a symbolic representation. That was the traditional
view in classical AI. And I still believe this is
a really important thing, and I see people rediscovering
the importance of that, even recently. So that was really a foundation. The depth thing is something that
came later, in the early 2000s, but it wasn't something I was thinking
about in the 90s, for example. >> Right, right, and I remember you
built a lot of relatively shallow, but very distributed representations for
the word embeddings, right, very early on. >> Right, that's right, yeah, that's one of the things that I got
really excited about in the late 90s. Actually, my brother, Samy, and I worked
on the idea that we could use neural nets to tackle the curse of dimensionality,
which was believed to be one of the central issues with
the statistical learning. And that fact that we could have these
distributed presentations could be used to represent joint distributions over many
random variables in a very efficient way. And it turned out to work quite well,
and then I extended this to joint distributions over sequences of words, and
this is how the word embeddings were born. Because I thought,
this will allow generalization across words that have similar
semantic meaning and so on. >> So over the last couple decades, your
research group has invented more ideas than anyone can summarize
in a few minutes. So I'm curious, what are the inventions or ideas you're most proud
of from your group? >> Right, so I think I mentioned long-term
dependencies, the study of that. I think people still don't
understand it well enough. Then there's the story I mentioned
about curse of dimensionality, joint distributions with neural nets,
which became, more recently, the that Hugo Larochelle did. And then, as I said,
that gave rise to all sort of work on learning word embeddings for
joint distributions for words. Then came, I think, probably the best
known events of the work we did with deep learning, with stacks of
auto encoders and stacks of RBMs. One thing then, it was the work on
understanding better the difficulties of training deep nets with
with the initialization ideas, and also,
the vanishing gradient in deep nets. And that work actually was the one which
gave rise to the experiments showing the importance of piecewise
linear activation functions. Then I would say some of the most
important work regards the work we did with unsupervised learning,
the denoising auto-encoders, the GANs, which are very popular these days,
the generative adversarial networks. The work we did with neural machine
translation using attention, which turned out to be really
important for making translation work. And it's currently used in industrial
systems, like Google Translate. But this attention thing actually
really changed my views on neural nets. Neural nets we used to think as machines
that can map a vector to a vector. But really with attention mechanisms, you
can now handle any kind of data structure. And this is really opening up
a lot of interesting avenues. Direction of actually
connecting to biology, one thing that I've been working
on in the last couple of years is, how could we come up with something like
backprop but that brains could implement. And we have a few papers in that direction
that seems to be interesting for the neuroscience people. And then we're continuing in
that direction of course. >> One of the topics that I know
you've been thinking a lot about is the relationship between deep learning and the brain,
can you tell us a bit more about that? >> The biological thing is something I've
been thinking about for a while actually and having a lot of,
I would say daydreaming about. Because I think of it like a puzzle. So we have these pieces of evidence
from what we know from the brain and from learning in the brain like
spike timing dependent plasticity. And on the other hand, we have all of
these concepts from machine learning. The idea of globally training
the whole system with respect to an objective function,
and the idea of backprop. And what does backprop mean? Like, what does credit
assignment really mean? When I started thinking about how brains
could do something like backprop, it prompted me to think about, well, maybe
there's some more general concepts behind backprop which make it so efficient which
allow us to be efficient with backprop. And maybe there's a larger family of
ways to do credit assignment, and that connects to questions that people in
reinforcement learning have been asking. So it's interesting how sometimes
asking a simple question leads you to thinking about so many different
things, and forces you to think about so many elements that you like to
bring together like a big puzzle. So this has gone for a number of years. And I need to say that this whole
endeavor, like many of the ones that I have followed, has been highly
inspired by Jeff Hinton's thoughts. So in particular,
he gave this talk in 2007 I think, the first deep learning
workshop on what he thought was the way that
the brain is working. How kind of temporal
code could be used for potentially doing some
of the job of backprop. And that led to a lot of the ideas that
I've explored in recent years with this. Yeah, so it's kind of
an interesting story that has been running for a decade now, basically. >> One of the topics I've heard you
speak about multiple times as well is unsupervised learning. Can you share your perspective on that? >> Yes, yes, so
unsupervised learning is really important. Right now, our industrial systems
are based on supervised learning, which essentially requires humans to
define what the important concepts are for the problem and
to label those concepts in the data. And we build all these amazing toys and
services and systems using this. But humans are able to do much more. They are able to explore and
discover new concepts by observation and interaction with the world. A two year old is able to
understand intuitive physics. In other words, she understands gravity,
she understands pressure, she understands inertia. She understands liquid, solids. And of course, her parents never told
her about any of this stuff, right? So how did she figure it out? So that's the kind of question that
unsupervised learning is trying to answer. It's not just about we have labels or
we don't have labels. It's about actually building a mental construction that explains how
the world works by observation. And more recently, I've been combining the ideas in unsupervised learning with
the ideas in reinforcement learning. Because I believe that there
is a very strong indication about the important underlying concepts
that we're trying to disentangle, we're trying to separate from each other. That a human or machine can get
by interacting with the world, by exploring the world and trying
things and trying to control things. So these are I think tightly coupled
to the original ideas of unsupervised learning. So my take on unsupervised learning, 15 years ago when we started
doing the the and the RBMs and so on was very focused on the idea
of learning good representations. And I still think this is
an essential question. But the thing we don't know is how and
what is a good representation? How do we figure out an objective
function, for example? So we've tried many things over the years. And that's actually one of the cool things
about unsupervised learning research, that there are so many different ideas, so different ways that this
problem can be attacked. And that's just, maybe there's another one
we'll discover next year that's completely different and maybe the brain is using
something else completely different. So it's not incremental research, it's something that in
itself is very exploratory. We don't have a good definition of what's
the right objective function to even measure that a system is doing
a good job on unsupervised learning. So of course, it's challenging,
but at the same time, it leaves open a wide
field of possibilities, which is what researchers really love, at
least that's something that appeals to me. >> So today, there's so
much going on in deep learning. And I think we've passed
the point where it's possible for any one human to read every single
deep learning paper being published. So I'm curious, what in deep
learning today excites you the most? >> So I'm very ambitious, and
I feel like the current state of the science of deep learning is
far from where I'd like to see it. And I have the impression that our systems
right now make the kind of mistakes that suggest they have a very
superficial understanding of the world. So what excites me the most now is sort
of direction of research where we're not trying to build systems that
are going to do something useful. We're just going back to principles about,
how can a computer observe the world, interact with the world, and
discover how that world works? Even if that world is simple, something
that we can program as a kind of video game, we don't know how to do that well. And that's cool, because I don't have to
compete with Google, and Facebook, and Baidu, and so on, right? Because this is a kind of basic research that can be done by anyone in
their garage and could change the world. So there are many, of course,
many directions to attack this. But I see a lot of the fruitful
interactions between ideas in deep learning and reinforcement learning
being really important there. And I'm really excited that
the progress in this direction Could have a huge impact on
practical applications actually. Because if you look at some of the big
challenges that we have in applications, like how we deal with new domains, or categories on which we
have too few examples. And in cases where humans are very
good at solving those problems. So these transfer learning and
dramatization issues, they would become much easier to
tackle if we had systems that had a better understanding
of how the world works. A deeper understanding, right? What is actually going on? What are the causes of what I'm seeing? And how could I influence what
I'm seeing by my actions? So these are the kinds of questions
I'm really excited about these days. I think the connect, also the deep
learning research that has evolved over the last couple of decades
with even older questions in AI. Because a lot of the success in deep
learning has been with perception. So what's left, right? What's left is sort of
high level condition, which is about understanding at
an abstract level how things work. So we are program of understanding high
level abstractions I think has not reached those high levels of abstractions
and so we have to get there. We have to think about reasoning, about
sequential processing of information. We have to think of how
causality works and how machines can discover all
these things by themselves. Potentially guided by humans, but
as much as possible in an autonomous way. >> And it sounds like from
part of what you said that you're a fan of research approaches
where you experiment on, I'm going to use term toy problem,
not in a disparaging way. >> Right.
>> But on the small problem. And you're optimistic that that
transfers to bigger problems later. >> Yes, yes, it transfers in a way. Of course we're going to have
to do some work to scale up and address those problems. But my main motivation for going for those toy problems is that we can
understand better our failures and we can reduce the problem to
something we can intuitively sort of manipulate and
understand more easily. So sort of a classical divide and
conquer science approach. And also, I think, something people
don't think about it enough is the research cycle can be much faster,
right? So if I can do an experiment in a few
hours, I can progress much faster. If I have to try out a huge model that
tries to capture the whole common sense and everything in the general
knowledge, which eventually we'll do. It's just each experiment just takes
too much time with current hardware. So while our hardware friends are building
machines that are going to be a thousand or a million times faster,
I'm doing those toy experiments. [LAUGH]
>> You know, I've also heard you speak about the science of deep learning,
not just as an engineering discipline, but doing more work to understand
what's really going on. Do you want to share
your thoughts on that? >> Yeah, absolutely. I fear that a lot of the work that we're
doing is sort of like blind people trying to find their way. [LAUGH] And you can get a lot of luck and
find interesting things that way. But really if we sort of
stop a little bit and try to understand what we're doing
in a way that's transferable, because we go down to
principles to theory, but when I say theory I don't mean,
necessarily, math. Of course I like math and so on, but
I don't think that we need that everything be formalized mathematically but
be formalized logically. In the sense that I can convince
somebody that this should work, whether this make sense. This is the most important aspect. And then math allows us to make
that stronger and tighter. But really it's more about understanding. And it's about also doing our research, not to be the next baseline,
or benchmark, or beat the other guys in the other lab,
or the other company. It's more about what kind of question
should we ask that would allow us to understand better
the phenomena of interest. What makes, for example, training in deeper networks harder,
or current nets harder? We have some ideas, but
a lot of things we don't understand yet. So we can maybe design experiments whose
goal is not to have a better algorithm, but just to understand better
the algorithms we currently have or what circumstances make the particular
algorithm work better and why. It's the why that really matters. That's what's science is about. It's why. >> Right.
Today there are a lot of people that want to enter the field. And I'm sure you've answered this
a lot in one-on-one settings, but with all the people watching this on
video, what advice would you have for people that want to get into AI,
get into deep learning? >> Right, so first of all,
there are different motivations and different things you can do. What you need to become a deep learning
researcher may not be the same as if you want to be an engineer who's going to
use deep learning to build products. There's a different level of understanding
that's needed in both cases. But in any case in both cases, practice. So to really master a subject
like deep learning, of course you have to read a lot. You have to practice programming
the things yourself. Very often I interview students
who have used software. And these days there's so many good
software around that you can just plug and play and
understand nothing of what you're doing. Or at such as a superficial level
that then it becomes hard to figure out when it doesn't work and
what's going wrong. So actually trying to implement things
yourself, even if it's inefficient. But just to make sure you really
understand what is going on is really useful, and trying things yourself. >> So don't just use one of the
programming frameworks where you can do everything in a few lines of code, but
you don't really know what just happened. >> Exactly, exactly, and
I would say even more than that. Trying to derive the thing yourself
from first principles, if you can. That really helps. But yeah, the usual things
you have to do like reading, looking at other people's code,
writing your own code, doing lots of experiment, making sure
you understand everything you do. So especially for the science part of it, trying to ask why am I doing this,
why are people doing this? Maybe the answer is somewhere in
the book and you have to read more. But it's even better if you can
actually figure it out by yourself. Yeah, cool, yeah. And in fact, of the things I read,
you and Ian [INAUDIBLE] and Aaron [INAUDIBLE] wrote
a highly regarded book. >> Thank you, thank you. Yes, it's selling a lot. It's a bit crazy. I feel like there is more people
reading this book than people who can read it [LAUGH] right now. But yeah, also proceedings of the ICLR I conference is probably the best
concentrated place of good papers. Of course there are really good papers
at NIPS and ICML and other conferences. But if you really want to go for a lot
of good papers, just read the last few ICLR proceedings, and that will give
you really good view of the field. >> Cool, yeah. Any other thoughts? When people ask you for advice, how does
someone become good at deep learning? >> Well,
it depends on where you come from. Don't be afraid by the math. Just develop the intuitions, and
then the math become really easier to understand once you get the hang of
what's going on at the intuitive level. And one good news is that you don't
need five years of PhD to become proficient at deep learning. You can actually learn pretty quickly. If you have a good background
in computer science and math, you can learn enough to use it and
build things and start research experiments
in just a few months. Something like six months for
people with the right training. Maybe they don't know anything
about machine learning, but if they're good at math and
computer science, it can be very fast. And of course, so that means you need
to have the right training in math and computer science. Sometimes what you learn in just
computer science courses is not enough. You need some continuous math, especially. So this is probability, algebra and
optimization, for example. >> I see.
And calculus. >> And calculus, yeah. >> Thanks a lot, Joshua, for sharing all
of the comments and insights and advice. Even though I've known you for a long
time, there are many details of your early history that I didn't know until now,
so thank you. >> Well, thank you, Andrew, for doing this special recording and what you're doing. I hope it's going to be
used by a lot of people.