Metadata

Posts

Showing posts from January, 2017

Deep Learning With Dynamic Computation Graphs (ICLR 2017)

- January 18, 2017

This is a paper by Google that is under submission to ICLR 2017. Here is the OpenReview link for the paper. The paper pdf as well as paper reviews are openly available there. What a concept! This paper was of interest to me because I wanted to learn about dynamic computation graphs. Unfortunately almost all machine learning/deep learning (ML/DL) frameworks operate on static computation graphs and can't handle dynamic computation graphs. (Dynet and Chainer are exceptions). Using dynamic computation graphs allows dealing with recurrent neural networks (RNNs) better, among other use cases. ( Here is a great article about RNNs and LSTMs . Another good writeup on RNNs is here .) TensorFlow already supports RNNs , but by adding padding to ensure that all input data are of the same size, i.e., the maximum size in the dataset/domain. Even then this support is good only for linear RNNs not good for treeRNNs which is suitable for more advanced natural language processing. This was a ve

My first impressions after a week of using TensorFlow

- January 17, 2017

Last week I went through the TensorFlow (TF) Tutorials here . I found that I hadn't understood some important points about TensorFlow execution, when I read the TensorFlow paper. I am noting them here fresh to capture my experience as a beginner. (As one gathers more experience with a platform, the baffling introductory concepts starts to occur obvious and trivial.) The biggest realization I had was to see a dichotomy in TensorFlow among two phases. The first phase defines a computation graph (e.g., a neural network to be trained and the operations for doing so). The second phase executes the computation/dataflow graph defined in Phase1 on a set of available devices. This deferred execution model enables optimizations in the execution phase by using global information about the computation graph: graph rewriting can be done to remove redundancies, better scheduling decisions can be made, etc. Another big benefit is in enabling flexibility and ability to explore/experiment in the

Google DistBelief paper: Large Scale Distributed Deep Networks

- January 12, 2017

This paper introduced the DistBelief deep neural network architecture. The paper is from NIPS 2012. If you consider the pace of progress in deep learning, that is old and it shows. DistBelief doesn't support distributed GPU training which most modern deep networks (including TensorFlow) employ. The scalability and performance of DistBelief has been long surpassed. On the other hand, the paper is a must read if you are interested in distributed deep network platforms. This is the paper that applied the distributed parameter-server idea to Deep Learning. The parameter-server idea is still going strong as it is suitable to serve the convergent iteration nature of machine learning and deep learning tasks. The DistBelief architecture has been used by the Microsoft Adam project , Baidu Deep Image , Apache Hama , and Petuum's Bosen . Google, though, has since switched from the DistBelief parameter-server to TensorFlow's hybrid dataflow architecture, citing the difficulty of cus

Learning Machine Learning: Deep Neural Networks

- January 11, 2017

This post is part of the ML/DL learning series. Earlier in the series, we covered these: + Learning Machine Learning: A beginner's journey + Linear Regression + Logistic Regression + Multinomial Logistic Regression In this part, we are going to add hidden layers to our neural network, learn how backpropagation works for gradient descent in a deep NN, and finally talk about regularization techniques for avoiding overfitting. For this post also, I follow the course notes from the Udacity Deep Learning Class by Vincent Vanhoucke at Google. Go, take the course. It is a great course to learn about deep learning and TensorFlow. Linear models are limited We constructed a single layer NN for multinomial regression in our last post. How many parameters did that NN have? For an input vector X of size N, and K output classes, you have (N+1)*K parameters to use. N*K is the size of W, and K is the size of b. You will need to use many more parameters in practice. Deep learning cr

Learning Machine Learning: Multinomial Logistic Classification

- January 11, 2017

In the previous post, we got started on classification. Classification is the task of taking an input and giving it a label that says, this is an "A". In the previous post, we covered logistic regression, which made the decision for a single label "A". In this post, we will generalize that to multinomial logistic classification where your job is to figure out which of the K classes a given input belongs to. For this post I follow the course notes from the Udacity Deep Learning Class by Vincent Vanhoucke at Google . I really liked his presentation of the course: very practical and to the point. This must have required a lot of preparation. Going through the video transcript file, I can see that the material has been prepared meticulously to be clear and concise. I strongly recommend the course. The course uses TensorFlow to teach you about Deep Neural Networks in a hands-on manner, and follows the MNIST letter recognition example in the first three lessons. Don&

Learning Machine Learning: Logistic Regression

- January 06, 2017

This is part 2 of learning machine learning introductory concepts. Recall that supervised learning had two basic examples, regression and classification . We covered linear regression in part 1 , and now in part 2 we look at classification. Although the name of the technique used here, logistic regression , includes the word "regression", this is in fact a classification algorithm. It builds on a similar gradient descent approach as we discussed in part 1 in the context of linear regression. (In this post, again I follow/summarize from Andrew Ng's machine learning course at Coursera . Here is Ng's course material for CS 229 at Stanford . There are also good course notes here , and I will summarize even more briefly than those notes to highlight only the big ideas.) Hypothesis representation The goal of the logistic regression algorithm is to determine what class a new input should fall into. Here is an example application. See, line fitting does not make sense

Learning Machine Learning: Introduction and Linear Regression

- January 05, 2017

In an earlier post, I had talked about how I went about learning about machine learning and deep learning (ML/DL) , and said that I would write brief summaries of the introductory ML/DL concepts I learned during that process. I will do part 1 now, otherwise soon I will start to find the introductory concepts obvious and trivial (which they are not). So for all it is worth, and mostly to keep my brain organized, here is the first post on the introductory ML/DL concepts. Supervised and Unsupervised Learning Algorithms Machine learning algorithms are divided broadly into two parts: supervised and unsupervised learning algorithms. In supervised learning, there is a training phase where a supervisor trains the algorithm with examples of how the output relates to the input. Two basic examples of supervised learning are regression , which uses a continuous extrapolation function for output prediction, and classification , which outputs a classification into buckets/groups. The rest of