Tuesday, 2 August 2016

Machine Learning - A simple explanation

I've recently dived into machine learning and in this blog post I will brief about the journey and experiences so far. The interest grew since I first applied NuPIC - Numenta's Platform for Intelligent Computing in anomaly detection.This was a project in Intel's Ultimate Coder Challenge and you can find more about it here.

NuPIC is inspired by neocortex -  a majority of brain mass.It is observed that the neocortex does not differentiate between the stimulus irrespective of the source.What I mean here is that, you can attach eyes to your tongue and somehow you will learn to see! I'm not kidding here, scientists and researchers have tried it.

Coming back to nupic, it is open source and developed by Numenta. My experience with nupic was quite good.But since I didn't develop everything from scratch and this is one demerit if you want to learn, they however have written a white paper which explains everything in detail.

NuPIC uses  Hierarchical Temporal Memory ( previously - Cortical Learning Algorithm) as the guiding algorithm. HTM is based on the fact that our neocortex functions in a hierarchical structure.The neurons at the lowest of the hierarchy represent and map the stimulus or sensory data and pass on the stimulations to the layer above them and as we move up in the hierarchy, we find that the layers respond to more abstract data inputs.

One more important term concerned with HTM is SDR or sparse distributed representation.These can be considered as the datastructures of the neocortex.The sensory inputs are converted into SDR by various techniques - spatial and temporal pooling.I will not go in much detail here.You can find my implementation at this link and the code here.

Well, Intel Ultimate Coder Challenge is over but I am still learning and working on machine learning.Andrew Ng course is very helpful.As of now, I can say that this course is the best if somebody wants to develop an understanding of machine learning.

I say, Machine Learning = Math + Statistics + Probability + Physics + Coding. You also need to have good imagination power to imagine the graphs. And yes graphs are very helpful in ML and in data visualization in general.A simple plot can suggest the kind of problem you have in front ( regression, classification, etc).

Let me take a very simple example of linear regression and we'll try to build an approach to solve the problem.

Suppose we have a data set X = {2, 3 ,4 5} , y = {5,5,10,12}. Given this data set, let's visualize it first.

We haven't defined our problem statement yet.It can be something like - "Given the above data set, predict the value of y at say, 7?" This looks like mathematical problem and can be solved mathematically if one knows the law that governs the X and corresponding y.This law is an equation.In ML it is referred to as a hypothesis. The plot suggests that a straight line equation can solve the problem, so let's use it.

We have y = mx + c . Here, known values are x and y.Given a simple data set that we have, we can easily calculate the unknown terms m and c.We can also solve our problem by using any two data set points and putting them in our equation but then, it will satisfy just those two points and won't generalize.Also, imagine a lot of data points.In such scenario there will be many more unknown variables and we'll require a much more complex equation.Maybe  higher degree polynomial.

ML can solve our problem gracefully.It can come up with a hypothesis which will generalize over a large data set.Above problem is a linear regression problem ( regression because, the data is continuous and linear because the relation between x and y is almost linear ).

Coming up with the hypothesis that best suits our data sets and generalizes to a large extent is the main goal of machine learning.The program learns from data and then comes up with the finest hypothesis.For making our program to learn, we need to  train it with the training data that we have provided.Mostly, the data set is divided between training set and testing set.Testing set is the one which the program hasn't seen so far and will be required to show predictions on that data.

How do we train?ML has something called as Cost Function which is usually the mean-square error function. Practically, what we do is, we assume some random value of the parameters m and c and then put it in our hypothesis. We calculate the out put this hypothesis produces and compare with the actual value (y) at that particular x.We square the difference of the two and repeat it with all the data sets.This gives us the mean-square error value.

But where is the learning? 

Learning is in minimizing the cost function.The cost function is calculated and then the valueo of parameters is found for minimum Cost function ( close to zero ).We update the value of the parameters so that our gradient is pointed towards the minima. This kind of approach is called Gradient Descent. So when you train with the data-set, the program is doing some gradient descent to find out the best fit values of m and c. 

Finally after getting the value of m and c ( the weights ), we are ready to predict for new un-seen values of x!

Above plot shows the best fit line to predict the unknown values.You can see this line passes through mid of the data points.This would have been a better plot with more number of data points.But we get the idea.

I hope this helped in developing a basic understanding of a ML problem statement.