Splitting Data into Training and Test

I finally took a course of Machine Learning Foundation Course at Coursera. Well I have taken several courses before on Coursera as well, but in my opinion this is the best one and I like it.

Today I’m learning of how to install iPython – which is very simple and easy, and then start to run the notebook.

Notebook is like a combination of Python and Wikipedia. I can add some text on it, and also execute some python commands on it. I can show graph as well. Pretty cool for a presentation.

Then I learn some basic concept of Data Scientist, which are:

  • To find a better function to represent the data. Today I learned linear regression model which can be linear or quadratic function or more complex one. One thing to remember is that the complex one is not always the best. Because it may fit the training data, but it doesn’t make sense
  • We should split the data into: (1) training data, (2) test data.  And then we do training using the training data sets. After that, we test the function with test data set.  The best function is the one that can minimize the error when we test it using the test data set

That’s all for today. I have to go out now for me time. And I’ll be back again when I’m a little bit fresh.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s