article by Mark van der Laan, which has a number of noteworthy aspects. I feel it’s an excellent just-in-time reminder, which rightly demands a change in perspective: “We have to start respecting, celebrating, and teaching important theoretical statistical contributions that precisely define the identity of our field.” **The real question is which are those topics?**
**Answer**: *which statistical concepts and tools are routinely used by non-statistician data scientists for their data-driven discovery*? *How many of them were discovered in the last three decades (and compare with the number of so-called “top journal” papers that get published every month!)? Are we moving in the right direction? *Isn’t it obvious why “our field has been nearly invisible in key arenas, especially in the ongoing discourse on Big Data and data science.” (Davidian 2013). Selling the same thing under a new name will not going to help (in either research or teaching) ; we need to invent and recognize new ideas, which are beautiful & useful.
I totally agree with what he said, “Historically, data analysis was the job of a statistician, but, due to the lack of rigor that has developed in our field, I fear our representation in data science is becoming marginalized.” I believe the first step is to go beyond the currently fashionable plug-and-play type model building attitude – let’s make it an *Interactive and Iterative *(thus more enjoyable) process based on few fundamental and unified rules. Another way of saying the same thing is, “*the smartest thing on the planet is neither man nor machine – its the combination of the two*” [George Lee].
He refers to the famous quote “*All models are wrong, but some are useful.*” He also expressed the concern that “Due to this, models that are so unrealistic that they are indexed by a finite dimensional parameter are still the status quo, even though everybody agrees they are known to be false.”
To me the important question is: Can we *systematically discover the useful ones* rather than starting with a guess solely based on convenience–typically two types: Theoretical and Computational. (Classical) Theoreticians like to stay in the perpetual fantasy world of “optimality,” whereas the (present-day) Computational goal is to make it “faster” by hook or crook.
It seems to me that the ultimate goal is to devise a “* Nonparametric procedure to Discover Parametric models*” (The Principle of NDP), which are simple and better than “models of convenience.” Do we have any systematic modeling strategy for that? [An example]
“

*Stop working on toy problems, stop talking down theory, stop being attached to outdated statistical methods, stop worrying about the politics of our journals and our field. Be a true and proud statistician who is making an impact on the real world of Big Data. The world of data science needs us—let’s rise to the challenge.”*]]>