Friday, May 3, 2013

What is Data Science?

I'm taking the Introduction to Data Science course on coursera so I'll post a few tidbits on some things I learn over the next many weeks.

First topic, what is Data Science?

Well the term is quite fuzzy so it might depend who you ask but here's Drew Conway's Data Science Diagram -  a commonly referred to on when describing data science.
  
Since alot of data is electronic now a days you need to be able to somewhat speak the language. You do not need to be a CS major or programmer, but more specific skills of working with data are important from using the command like to put a text file in the right format to programming in R.

The substantive expertise part of it means being able to explore, discover, create hypothesis and tests. Basically, ask and find the right questions and answers.

Conway points out the danger zone because this is the part where people "know enough to be dangerous". Without grounded statistics, one might misinterpret data (when doing data science).  Thinking about it I think the danger zone might be called "Computer Science"




The difference between data science and business intelligence is that in business intelligence, a data warehouse is often created to do specific analysis and answer particular questions which takes a lot  of effort up front to build. This is usually  more specific than data science and BI is not as adaptable when requirements change. In short, BI is about building a particular tool to answer particular questions where data science is more general. Also noted that alot of times the BI engineers do not consume or do analysis on the system they build.