Netflix competition. As the datasets are getting BIG and COMPLEX, the most difficult challenge for Statistical Scientist is to figure out “Where is the information hidden.” It’s an interactive process of investigation rather than a passive application of algorithms and calculating error rates. Two critical skills: (1) “look at the data”, which is missing in the mechanical push the button culture; and (2) learn “how to question the data”, rather than only answering a specific question. They allow data scientists to discover the unexpected in addition to the usual verification of the expected. This begs the question whether
- the Data Science training curriculum should look like a long manual of specialized methods and (series of cookbook) algorithms;
- or, should train students (and industry professionals) in the Scientific Data Exploration (Sci-Dx) — A systematic and pragmatic approach to data modeling addressing the “Monkey and banana problem” [Pigeon’s approach] for practitioners. [I believe Wolfgang Kohler‘s “insight learning” idea can guide us to develop such a curriculum.]