Learning Machine Learning: A beginner's journey

- December 26, 2016

I have been learning about machine learning and deep learning (ML/DL) for the last year. I think ML/DL is here to stay. I don't think this is a fad or bubble! Here is why:

ML/DL has results. It is hard to argue against success.
ML/DL has been on the rise organically since 1985 (with the backpropagation algorithms) and went through another phase of acceleration after 2005 (with the wide availability of big data and distributed data processing platforms). The rise of ML/DL is following a rising curve pattern, not the pattern for a hyped ephemeral bubble. Since it grew gradually over many years, I am betting it will be around for at least the same amount of time.
ML/DL has been co-developed with applications. It has developed very much on the practice side with trial and error, and its theory is still lagging a bit behind and is unable explain many things. According to Nassim Taleb's heuristics ML/DL is antifragile.
ML/DL has the market behind it. Big money provides big incentive and has been attracting a lot of smart people. This many smart people cannot be wrong.

Certainly there is also a lot of hype about ML/DL. ML/DL proved viable for specific sets of applications, and it is a hyperbole to claim that the general AI has arrived. We are far from it. But that is a good thing, because we will have a lot of juicy problems to work on.

So I am doubling down on ML/DL.

Here are my first impressions learning about ML/DL. ML/DL uses a very different toolkit and approach than the distributed systems field I grew up in. I was initially surprised and taken aback by the very experimental and trial and error nature of ML/DL. ML/DL is dealing with noisy/fuzzy/messy real world data and naturally the field produced statistical and probabilistic tools. Validation is only via showing performance on the test set. The data set is the king. Debugging is a mess, and learning is very opaque. On the other hand, I really like the dynamism in the ML/DL area. There are a lot of resources and platforms and a lot of very interesting applications.

My interest in ML/DL is in its interactions with distributed systems. I am not interested in writing image/text/speech processing applications. I learned about ML/DL to think about two questions:

How can we build better distributed systems/architectures to improve the performance of ML/DL systems/applications?
How can we use ML/DL to build better distributed systems?

These are big questions and will take long to answer properly, so I hope to revisit them later. Below I talk about how I went about learning ML/DL, and in the coming days I hope to write brief summaries introductory ML/DL concepts and mechanisms.

How I went about learning ML/DL

In January, I started following Andrew Ng's machine learning course at Coursera. (Alternatively, here is Ng's course material for CS 229 at Stanford.) After the kids went to sleep, I spent an hour each night following Ng's class videos. Andrew Ng has a nice and simple way of explaining ML concepts. He is a very good teacher.

On a side note, if you like to learn a bit about Ng's thinking process and his approach to life, creativity, and failure, I recommend this interview. It is a very good read.

I really liked the first 3 weeks of Ng's course: Introduction and Linear Regression, Linear Regression with multiple features, and Logistic Regression and regularization. But as the course went to logistic regression with nonlinear decision boundaries, I started to get overwhelmed with the amount of information and complication. And as the course progressed to neural networks, I started to get lost. For example, I could not form a good mental model and picture of forward and backward propagation in neural networks. So those parts didn't stick with me. (I was also not following the programming assignments well.)

I think the problem was that Ng was explaining the neural networks concepts in a general/generic way. That sounded too abstract to me. It might have worked better if he had settled on a small concrete use case and explained the concepts that way.

Recently, I started auditing a deep learning course on Udacity with a Google Engineer, Vincent Vanhoucke. This course offered a simpler introduction to deep learning. The course started with multinomial logistic classification. Since I knew about logistic regression, I could follow this easily. I liked the softmax function, one-hot encoding, and cross entropy ideas as they are all very practical and concrete concepts. The course presented these with the use case of MNIST letter classification for the first 10 letters.

Then using the same MNIST example, the course introduced rectified linear units (ReLu) as a simple way of introducing nonlinearity and showed how to chain multinomial logistic classification with ReLus to construct a deep network that solves the letter classification task much better. This time, I was able to follow the forward and backward propagation ideas much better. Instead of explaining deep networks in a general abstract way, this course explained it in a ReLu-specific way and reusing the letter classification example built with logistical regression. (In Ng's class ReLu came on week 7, when introducing support vector machines).

As a quick way to overview Vincent's course contents, you may watch this YouTube video from another Google engineer. (Watch at 1.5x speed.) The talk uses the same MNIST example and similar approach. But it doesn't go into explanation of forward/backward propagation and deriving the ReLus, instead it focuses more on giving you introduction to TensorFlow skills as well as introducing the basic neural network concepts. Within 1 hour, the talk gets you learning about convolutional networks. The presentation is nice and easy to follow.

Reflecting back, it is interesting how much YouTube helped me to learn about ML/DL. I normally like reading papers better than listening/watching videos, or maybe that is because I am more accustomed to learning that way. But for learning about ML/DL, these YouTube videos have been very helpful. It looks like lecturing is making a come back.

Here is a bonus video of Andrew Ng talking about nuts and bolts of applying deep learning. Ng is a great teacher, so it is easy to follow the concepts presented and learn a lot from this talk.

In the coming days I hope to write brief summaries of the introductory ML/DL concepts I learned from these courses. In my first couple posts on this, I plan to follow the first 3 weeks of Ng's class. There are very good course notes of that course here, and I will summarize even more briefly to mention the big ideas. Then I will switch to Vanhoucke's course to introduce the ideas in multinomial logistical regression and the generalization to neural networks and deep learning from there. I will use the #mlbegin tag for the series. Let's see how that goes.

UPDATE (1/11/17): I wrote them up. Here are the introductory ML/DL concepts I learned from those courses:

Comments

Sagar said…

Try Andrej Karpathy's CS231n:

http://cs231n.stanford.edu/

It gets into deep learning for vision. Most of the recent deep learning papers are quite accessible, and you'll have no trouble following them if you're comfortable with high school calculus.

December 26, 2016 at 1:40 AM

windmaple said…

It's interesting to see that even a CS professor has to make some serious efforts to dabble into ML/DL :)

December 26, 2016 at 3:03 AM

Alen said…

Hi.
Very nice post. I'm also interested in Machine Learning. Thanks for recommending the course for learning. I'll try to take it.

December 26, 2016 at 5:14 AM

ACV said…

This is precisely how I felt about Ng's course. The first 3 parts went smoothly, but then .... no. I also think using Octave was a bad choice actually. Who wants to learn a new language completely useless outside the course. Could have been python or R. Thanks for the suggestions.

December 26, 2016 at 5:58 AM

Thomas said…

I can't understand why MNIST is a common example to teach machine learning. Some other application like text or voice learning will be interesting.

December 26, 2016 at 10:04 AM

Thomas said…

Any other tutotial that does not use MNIST as example? I don't know the reason but almost everywhere is the same. I am interested how deep learning is used for voice or trends predicting.

December 26, 2016 at 10:06 AM

Unknown said…

Hi! Great read. I'm currently interested in learning machine learning and am really overwhelmed by the amount of introductory courses on machine learning. The more I look the more confused I get. Currenly I'm deciding between Ng's course and introduction to machine learning book by Andreas. What do you think? I'm thinking of trying top down approach and Ng's course seems like bottom approach.

December 26, 2016 at 10:46 AM

Kaveh said…

Octave is basically matlab not useless and the syntax is closer mathematical notarion

December 26, 2016 at 4:00 PM

Anonymous said…

do you think these recommendations are also valid for a total stranger with no CS background?
does anyone have any other reco's?

December 27, 2016 at 2:05 AM

Vũ Hải Nam said…

Please take a look at: A complete daily plan for studying to become a #MachineLearning engineer https://github.com/ZuzooVn/machine-learning-for-software-engineers

December 27, 2016 at 10:47 AM

machine learning said…

Hello there.
Exceptionally decent post. I'm additionally intrigued by Machine Learning. A debt of gratitude is in order for suggesting the course to learn. I'll endeavor to take it.

July 27, 2017 at 5:47 AM

Asia said…

Nice post.

October 8, 2018 at 12:31 AM

zabi said…

I'm additionally intrigued by Machine Learning. A debt of gratitude is in order for suggesting the course to learn. I'll endeavor to take it.

June 9, 2019 at 7:51 AM

Indie Test Developer said…

Thanks for sharing wonderful piece of content, Keep posting!

October 22, 2019 at 12:57 AM