Splitting Data into Training and Test

I finally took a course of Machine Learning Foundation Course at Coursera. Well I have taken several courses before on Coursera as well, but in my opinion this is the best one and I like it.

Today I’m learning of how to install iPython – which is very simple and easy, and then start to run the notebook.

Notebook is like a combination of Python and Wikipedia. I can add some text on it, and also execute some python commands on it. I can show graph as well. Pretty cool for a presentation.

Then I learn some basic concept of Data Scientist, which are:

  • To find a better function to represent the data. Today I learned linear regression model which can be linear or quadratic function or more complex one. One thing to remember is that the complex one is not always the best. Because it may fit the training data, but it doesn’t make sense
  • We should split the data into: (1) training data, (2) test data.  And then we do training using the training data sets. After that, we test the function with test data set.  The best function is the one that can minimize the error when we test it using the test data set

That’s all for today. I have to go out now for me time. And I’ll be back again when I’m a little bit fresh.


