• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Deep Mukhopadhyay, Ph.D.

  • Home
  • Blog
  • Research
  • Publications
  • Software
  • Teaching
    • Courses
  • Talks

Blog

Beware of Retail Algorithm Building Culture, Yet Another Example

August 6, 2014 by deepstatorg

here): How the growing culture of developing RETAIL statistical algorithms and softwares can prevent data scientists “looking at the data” leading to useless answers. The importance of Nonparametric Exploratory Statistical Practice in the age of developing easy-to-use retail statistical softwares is illustrated using simple examples: http://s.hbr.org/1qVt3Fj.]]>

Filed Under: Blog Tagged With: BIG data, retail algorithm

BIG data, BIG opportunity, BIG Tent

July 27, 2014 by deepstatorg

Question: How can we create a BIG TENT to increase visibility and IMPACT of our profession? There are two possibilities (applied inter-disciplinary and core research), and both efforts are important (and should be balanced): (a)  Applying retail algorithms working as multidisciplinary teams (as described in PDF): BIG data brings enormous interdisciplinary opportunities for statisticians. This helps to create a living for statisticians by applying (traditional) statistical tools + problem specific bells and whistles. (b) Developing wholesale algorithms  that can be translated into curriculum.  Along with Retail domain-specific problem solving (and retail paper publications), the academic statisticians need to develop wholesale algorithms (multidisciplinary utility: like AIC, Bootstrap, Knn, RKHS, SVM, Spline, RF, Lasso etc.) that other disciplines can routinely use for their data-driven (exploratory not confirmatory) research. I fear we might lose our unique identity (and spirit of our discipline—the science of learning from data) if we focus too much on solving `isolated’ problems (while busy making a living for ourselves using traditional tools + some twists and turns); otherwise, we will produce skilled biologists or engineers, not statisticians.  We need to find a balance between these two broad approaches, which have the same goal- –to advance the frontiers of statistics.]]>

Filed Under: Blog Tagged With: BIG Tent, retail algorithm, wholesale algorithms

Algorithms: Retail and Wholesale

July 4, 2014 by deepstatorg

  • Retail: Solving real scientific problems one at a time for clients/collaborators.
  • Wholesale: Theory and algorithms applicable simultaneously for many clients and problems. An example [PDF].
  • This paper [PDF] argues that academic statisticians should aim to develop “wholesale” algorithms. Our research on United Statistical Algorithms motivated by the following question:

    “How to develop a Systematic Data Modeling Strategy? How to design Flexible and Reusable algorithms based on General Theory that can be adapted to solve specific Practical Problems”

    ]]>

    Filed Under: Blog Tagged With: retail algorithm, united algorithms, wholesale algorithm

    Data Then Science, Next-Generation Statisticians

    July 3, 2014 by deepstatorg

  • Statistical validation of scientific guesses   —->   scientific validation of Statistical findings.
  • Parametric Confirmatory (Science + Data)     ——–>    Nonparametric exploratory modeling (Data + Science).
  • Question: How many of such nonparametric exploratory modeling tools (not inferential tool !) we have developed in last three decades?

    sotrue

    ]]>

    Filed Under: Blog Tagged With: Data Science, Next-Generation Statisticians

    The Unsolved Problem of Statistics: The BIG Question

    July 3, 2014 by deepstatorg

  • Data Type:  discrete, continuous and mixed (combination of discrete and continuous data).
  • Data Structure: univariate, multivariate, time series, spatial, image, graph/network etc….(roughly in the order of increasing complexity)
  • Data Pattern:  linear/non-linear, stationary/non-stationary, etc…
  • Data Size: small, medium and big (though this may be vague as today’s big data are tomorrow’s small data)
  • There is a long history of developing core statistical modeling principles that are valid for any combinations of the above list. A few examples include bivariate continuous regression (Francis Galton ), multivariate discrete data (Karl Pearson, Udny Yule, Leo Goodman), mixed data (Thomas Bayes, Student, R.A. Fisher,Fix and Hodges), time series (Norbert Wiener, Box & Jenkins, Emanuel Parzen, John Tukey,David Brillinger), non-stationary (Clive Granger, Robert Engle) and non-linearity (Grace Wahba, Cleveland ) . To tackle these rich varieties of data, many cultures of statistical science have been developed over the last century, which can be broadly classified as (1) Parametric confirmatory; (2) Nonparametric exploratory and (3) Optimization-driven Algorithmic approaches. United Statistical Algorithm. I claim what we need is a breakthrough—“Periodic Table of Data Science.” Developing new algorithms in an isolated manner will not be enough to justify “learning from data” as a proper scientific endeavor. We have to put some order (by understanding their internal statistical structure) into the current inventory of algorithms that are mushrooming at a staggering rate these days. The underlying unity on “how they relate to each other” will dictate what the Fundamental Principles of Data Science are. At a more practical level, this will enable data scientists to predict new algorithms in a systematic way rather than trial & error. Theory of Data Analysis: How can we develop such a consistent and unified framework of data analysis (the foundation of data science) that would reveal the interconnectedness among different branches of statistics? This remains one of the most vexing mysteries of modern Statistics. However, developing such a theory (leading to progressive unification of fundamental statistical learning tools) can have enormous implications for theory and practice of data analysis.   Newton An example of modern data challenge: Is there any fundamental universal modeling principle to tackle these large varieties of data ?  This is still an  Unsolved Problem, especially difficult to solve Programmatically. Big Data experts have started to recognize that this is a real challenging “Research problem! Killing most CIO’s” and “if there is any achilles heel it’s going to be this.” Check out this recent Talk [PDF, Video (from 1:32:00 – 1:38:00)] by Turing laureate  Michael Stonebraker @ White House Office of Science & Technology Policy and MIT, March 3, 2014, and also this, this one.  I conjecture Statistics can play the crucial role in solving this BIG-Variety problem.]]>

    Filed Under: Blog Tagged With: 21st-century statistics, Data Science, history, Open Problem of Statistics

    • « Go to Previous Page
    • Page 1
    • Interim pages omitted …
    • Page 3
    • Page 4
    • Page 5

    Primary Sidebar

    Deep Mukhopadhyay

    Deep Mukhopadhyay
    Statistics Department
    deep [at] unitedstatalgo.com

    EDUCATION

    • Ph.D. (2013), Texas A&M University
    • M.S. (2008), Indian Institute of Technology (IIT), Kanpur
    • B.S. (2006), University of Calcutta, India

    Footer

    Follow Us

    • LinkedIn
    • Twitter
    • Skype

    Contact Us

    • Email
      deep@unitedstatalgo.com
    • Address
      Department of Statistics
      Sequoia Hall, 390 Serra Mall
      Stanford, CA 94305

    Read Recent Blogs

    • Could Einstein’s Work Get Published Today?
    • What's The Point of Doing Fundamental Science?
    • Two sides of Theoretical Data Science: Analysis and Synthesis

    Copyright © 2025 · eleven40 Pro on Genesis Framework · WordPress · Log in