Netflix competition.
As the datasets are getting BIG and COMPLEX, the most difficult challenge for Statistical Scientist is to figure out “Where is the information hidden.” It’s an interactive process of investigation rather than a passive application of algorithms and calculating error rates. Two critical skills: ** (1)** “look at the data”, which is missing in the mechanical push the button culture; and **(2)** learn “how to question the data”, rather than only answering a specific question. They allow data scientists to discover the unexpected in addition to the usual verification of the expected.
This begs the question whether

- the Data Science training curriculum should look like a
*long manual of specialized methods and (series of*cookbook*) algorithms;* - or, should train students (and industry professionals) in the
**Sci**entific**D**ata E**x**ploration (Sci-Dx) — A systematic and pragmatic approach to data modeling addressing the “Monkey and banana problem” [Pigeon’s approach] for practitioners. [I believe Wolfgang Kohler‘s “insight learning” idea can guide us to develop such a curriculum.]

*disparate*Statistical procedures from a

*common*perspective (thus reduces the size of the manual) and can be

*appropriately combined*to build versatile data products brick by brick. ]]>