21st-century statistics

Models of Convenience to Useful Models

April 21, 2015 by deepstatorg

article by Mark van der Laan, which has a number of noteworthy aspects. I feel it’s an excellent just-in-time reminder, which rightly demands a change in perspective: “We have to start respecting, celebrating, and teaching important theoretical statistical contributions that precisely define the identity of our field.” The real question is which are those topics? Answer: which statistical concepts and tools are routinely used by non-statistician data scientists for their data-driven discovery? How many of them were discovered in the last three decades (and compare with the number of so-called “top journal” papers that get published every month!)? Are we moving in the right direction? Isn’t it obvious why “our field has been nearly invisible in key arenas, especially in the ongoing discourse on Big Data and data science.” (Davidian 2013). Selling the same thing under a new name will not going to help (in either research or teaching) ; we need to invent and recognize new ideas, which are beautiful & useful. I totally agree with what he said, “Historically, data analysis was the job of a statistician, but, due to the lack of rigor that has developed in our field, I fear our representation in data science is becoming marginalized.” I believe the first step is to go beyond the currently fashionable plug-and-play type model building attitude – let’s make it an Interactive and Iterative (thus more enjoyable) process based on few fundamental and unified rules. Another way of saying the same thing is, “the smartest thing on the planet is neither man nor machine – its the combination of the two” [George Lee]. He refers to the famous quote “All models are wrong, but some are useful.” He also expressed the concern that “Due to this, models that are so unrealistic that they are indexed by a finite dimensional parameter are still the status quo, even though everybody agrees they are known to be false.” To me the important question is: Can we systematically discover the useful ones rather than starting with a guess solely based on convenience–typically two types: Theoretical and Computational. (Classical) Theoreticians like to stay in the perpetual fantasy world of “optimality,” whereas the (present-day) Computational goal is to make it “faster” by hook or crook. It seems to me that the ultimate goal is to devise a “Nonparametric procedure to Discover Parametric models” (The Principle of NDP), which are simple and better than “models of convenience.” Do we have any systematic modeling strategy for that? [An example] “Stop working on toy problems, stop talking down theory, stop being attached to outdated statistical methods, stop worrying about the politics of our journals and our field. Be a true and proud statistician who is making an impact on the real world of Big Data. The world of data science needs us—let’s rise to the challenge.”]]>

The Unsolved Problem of Statistics: The BIG Question

July 3, 2014 by deepstatorg

Data Type: discrete, continuous and mixed (combination of discrete and continuous data).

Data Structure: univariate, multivariate, time series, spatial, image, graph/network etc….(roughly in the order of increasing complexity)

Data Pattern: linear/non-linear, stationary/non-stationary, etc…

Data Size: small, medium and big (though this may be vague as today’s big data are tomorrow’s small data)

There is a long history of developing core statistical modeling principles that are valid for any combinations of the above list. A few examples include bivariate continuous regression (Francis Galton ), multivariate discrete data (Karl Pearson, Udny Yule, Leo Goodman), mixed data (Thomas Bayes, Student, R.A. Fisher,Fix and Hodges), time series (Norbert Wiener, Box & Jenkins, Emanuel Parzen, John Tukey,David Brillinger), non-stationary (Clive Granger, Robert Engle) and non-linearity (Grace Wahba, Cleveland ) . To tackle these rich varieties of data, many cultures of statistical science have been developed over the last century, which can be broadly classified as (1) Parametric confirmatory; (2) Nonparametric exploratory and (3) Optimization-driven Algorithmic approaches. United Statistical Algorithm. I claim what we need is a breakthrough—“Periodic Table of Data Science.” Developing new algorithms in an isolated manner will not be enough to justify “learning from data” as a proper scientific endeavor. We have to put some order (by understanding their internal statistical structure) into the current inventory of algorithms that are mushrooming at a staggering rate these days. The underlying unity on “how they relate to each other” will dictate what the Fundamental Principles of Data Science are. At a more practical level, this will enable data scientists to predict new algorithms in a systematic way rather than trial & error. Theory of Data Analysis: How can we develop such a consistent and unified framework of data analysis (the foundation of data science) that would reveal the interconnectedness among different branches of statistics? This remains one of the most vexing mysteries of modern Statistics. However, developing such a theory (leading to progressive unification of fundamental statistical learning tools) can have enormous implications for theory and practice of data analysis.

An example of modern data challenge: Is there any fundamental universal modeling principle to tackle these large varieties of data ? This is still an Unsolved Problem, especially difficult to solve Programmatically. Big Data experts have started to recognize that this is a real challenging “Research problem! Killing most CIO’s” and “if there is any achilles heel it’s going to be this.” Check out this recent Talk [PDF, Video (from 1:32:00 – 1:38:00)] by Turing laureate Michael Stonebraker @ White House Office of Science & Technology Policy and MIT, March 3, 2014, and also this, this one. I conjecture Statistics can play the crucial role in solving this BIG-Variety problem.]]>

21st-century statistics

Models of Convenience to Useful Models

The Unsolved Problem of Statistics: The BIG Question

EDUCATION

Contact Us

Read Recent Blogs

21st-century statistics

Footer

Follow Us

Contact Us

Read Recent Blogs