Blog

Beware of Retail Algorithm Building Culture, Yet Another Example

August 6, 2014 by deepstatorg

here): How the growing culture of developing RETAIL statistical algorithms and softwares can prevent data scientists “looking at the data” leading to useless answers. The importance of Nonparametric Exploratory Statistical Practice in the age of developing easy-to-use retail statistical softwares is illustrated using simple examples: http://s.hbr.org/1qVt3Fj.]]>

BIG data, BIG opportunity, BIG Tent

July 27, 2014 by deepstatorg

Question: How can we create a BIG TENT to increase visibility and IMPACT of our profession? There are two possibilities (applied inter-disciplinary and core research), and both efforts are important (and should be balanced): (a) Applying retail algorithms working as multidisciplinary teams (as described in PDF): BIG data brings enormous interdisciplinary opportunities for statisticians. This helps to create a living for statisticians by applying (traditional) statistical tools + problem specific bells and whistles. (b) Developing wholesale algorithms that can be translated into curriculum. Along with Retail domain-specific problem solving (and retail paper publications), the academic statisticians need to develop wholesale algorithms (multidisciplinary utility: like AIC, Bootstrap, Knn, RKHS, SVM, Spline, RF, Lasso etc.) that other disciplines can routinely use for their data-driven (exploratory not confirmatory) research. I fear we might lose our unique identity (and spirit of our discipline—the science of learning from data) if we focus too much on solving `isolated’ problems (while busy making a living for ourselves using traditional tools + some twists and turns); otherwise, we will produce skilled biologists or engineers, not statisticians. We need to find a balance between these two broad approaches, which have the same goal- –to advance the frontiers of statistics.]]>

Algorithms: Retail and Wholesale

July 4, 2014 by deepstatorg

Retail: Solving real scientific problems one at a time for clients/collaborators.

Wholesale: Theory and algorithms applicable simultaneously for many clients and problems. An example [PDF].

This paper [PDF] argues that academic statisticians should aim to develop “wholesale” algorithms. Our research on United Statistical Algorithms motivated by the following question:

“How to develop a Systematic Data Modeling Strategy? How to design Flexible and Reusable algorithms based on General Theory that can be adapted to solve specific Practical Problems”

]]>

Data Then Science, Next-Generation Statisticians

July 3, 2014 by deepstatorg

Statistical validation of scientific guesses —-> scientific validation of Statistical findings.

Parametric Confirmatory (Science + Data) ——–> Nonparametric exploratory modeling (Data + Science).

Question: How many of such nonparametric exploratory modeling tools (not inferential tool !) we have developed in last three decades?

]]>

The Unsolved Problem of Statistics: The BIG Question

July 3, 2014 by deepstatorg

Data Type: discrete, continuous and mixed (combination of discrete and continuous data).

Data Structure: univariate, multivariate, time series, spatial, image, graph/network etc….(roughly in the order of increasing complexity)

Data Pattern: linear/non-linear, stationary/non-stationary, etc…

Data Size: small, medium and big (though this may be vague as today’s big data are tomorrow’s small data)

There is a long history of developing core statistical modeling principles that are valid for any combinations of the above list. A few examples include bivariate continuous regression (Francis Galton ), multivariate discrete data (Karl Pearson, Udny Yule, Leo Goodman), mixed data (Thomas Bayes, Student, R.A. Fisher,Fix and Hodges), time series (Norbert Wiener, Box & Jenkins, Emanuel Parzen, John Tukey,David Brillinger), non-stationary (Clive Granger, Robert Engle) and non-linearity (Grace Wahba, Cleveland ) . To tackle these rich varieties of data, many cultures of statistical science have been developed over the last century, which can be broadly classified as (1) Parametric confirmatory; (2) Nonparametric exploratory and (3) Optimization-driven Algorithmic approaches. United Statistical Algorithm. I claim what we need is a breakthrough—“Periodic Table of Data Science.” Developing new algorithms in an isolated manner will not be enough to justify “learning from data” as a proper scientific endeavor. We have to put some order (by understanding their internal statistical structure) into the current inventory of algorithms that are mushrooming at a staggering rate these days. The underlying unity on “how they relate to each other” will dictate what the Fundamental Principles of Data Science are. At a more practical level, this will enable data scientists to predict new algorithms in a systematic way rather than trial & error. Theory of Data Analysis: How can we develop such a consistent and unified framework of data analysis (the foundation of data science) that would reveal the interconnectedness among different branches of statistics? This remains one of the most vexing mysteries of modern Statistics. However, developing such a theory (leading to progressive unification of fundamental statistical learning tools) can have enormous implications for theory and practice of data analysis.

An example of modern data challenge: Is there any fundamental universal modeling principle to tackle these large varieties of data ? This is still an Unsolved Problem, especially difficult to solve Programmatically. Big Data experts have started to recognize that this is a real challenging “Research problem! Killing most CIO’s” and “if there is any achilles heel it’s going to be this.” Check out this recent Talk [PDF, Video (from 1:32:00 – 1:38:00)] by Turing laureate Michael Stonebraker @ White House Office of Science & Technology Policy and MIT, March 3, 2014, and also this, this one. I conjecture Statistics can play the crucial role in solving this BIG-Variety problem.]]>

Blog

Beware of Retail Algorithm Building Culture, Yet Another Example

BIG data, BIG opportunity, BIG Tent

Algorithms: Retail and Wholesale

Data Then Science, Next-Generation Statisticians

The Unsolved Problem of Statistics: The BIG Question

EDUCATION

Contact Us

Read Recent Blogs

Blog

Footer

Follow Us

Contact Us

Read Recent Blogs