• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Deep Mukhopadhyay, Ph.D.

  • Home
  • Blog
  • Research
  • Publications
  • Software
  • Teaching
    • Courses
  • Talks

Data Science

Two sides of Theoretical Data Science: Analysis and Synthesis

February 26, 2018 by deepstatorg

Theory of [Efficient] Computing: A branch of Theoretical Computer Science that deals with how quickly one can solve (compute) a given algorithm.  The critical task is to analyze algorithms carefully based on their performance characteristics to make it computationally efficient.

Theory of Unified Algorithms: An emerging branch of Theoretical Statistics that deals with how efficiently one can represent a large class of diverse algorithms using a single unified semantics. The critical task is to put together different “mini-algorithms” into a coherent master algorithm.

For overall development of Data Science, we need both ANALYSIS + SYNTHESIS. However, it is also important to bear in mind the distinction between the two.

Filed Under: Blog Tagged With: Data Science, Science of Statistics

Confirmatory Culture: Time To Reform or Conform?

November 1, 2016 by deepstatorg

THEORY

Culture 1: Algorithm + Theory: the role of theory is to justify or confirm. Culture 2: Theory + Algorithm: From confirmatory to constructive theory, explaining the statistical origin of the algorithm(s)–an explanation of where they came from. Culture 2 views “Algorithms” as the derived product, not the fundamental starting point [this point of view separates statistical science from machine learning].

PRACTICE 

Culture 1: Science + Data: Job of a Statistician is to confirm scientific guesses. Thus, happily play in everyone’s backyard as a confirmatist. Culture 2: Data + Science: Exploratory nonparametric attitude. Plays in the front-yard as the key player in order to guide scientists to ask the “right question”.

TEACHING 

Culture 1: It proceeds in the following sequences: for (i in 1:B) { Teach Algorithm-i; Teach Inference-i; Teach Computation-i } By construction, it requires extensive bookkeeping and memorization of a long list of disconnected algorithms. Culture 2: The pedagogical efforts emphasize the underlying fundamental principles and statistical logic whose consequences are algorithms. This “short-cut” approach substantially accelerates the learning by making it less mechanical and intimidating. Should we continue to conform to the confirmatory culture or It’s time to reform? The choice is ours and the consequences are ours as well.]]>

Filed Under: Blog Tagged With: 21st-century statistics, Data Science, Next-Generation Statisticians, Science of Statistics

Data Scientist and Data Mechanic

April 4, 2016 by deepstatorg

Netflix competition. As the datasets are getting BIG and COMPLEX, the most difficult challenge for Statistical Scientist is to figure out “Where is the information hidden.”  It’s an interactive process of investigation rather than a passive application of algorithms and calculating error rates. Two critical skills:  (1)  “look at the data”, which is missing in the mechanical push the button culture; and (2)  learn “how to question the data”, rather than only answering a specific question.  They allow data scientists to discover the unexpected in addition to the usual verification of the expected. This begs the question whether

  • the Data Science training curriculum should look like a long manual of specialized methods and (series of cookbook) algorithms;
  • or, should train students (and industry professionals) in the Scientific Data Exploration (Sci-Dx) — A systematic and pragmatic approach to data modeling addressing the “Monkey and banana problem” [Pigeon’s approach] for practitioners. [I believe Wolfgang Kohler‘s “insight learning” idea can guide us to  develop such a curriculum.]
The first path will produce DataRobots, not Data Scientists. The later goal looks out of reach unless we figure out how to design the “LEGO Bricks” of Statistical Science (fundamental building blocks of Statistical learning), which help to understand disparate Statistical procedures from a common perspective (thus reduces the size of the manual) and can be appropriately combined to build versatile data products brick by brick.    ]]>

Filed Under: Blog Tagged With: Data Mechanic, Data Science, Data Scientist, Kaggle Syndrome

The Scientific Core of Data Analysis

November 26, 2015 by deepstatorg

Richard Courant‘s view: “However, the difficulty that challenges the inventive skill of the applied mathematician is to find suitable coordinate functions.” He also noted that “If these functions are chosen without proper regard for the individuality of the problem the task of computation will become hopeless.” This leads me to the following conjecture: Efficient nonparametric data transformation or representation scheme is the basis for almost all successful learning algorithms–the Scientific Core of Data Analysis–that should be emphasized in research, teaching, and practice of 21st century Statistical Science to develop a systematic and unified theory of data analysis (Foundation of data science).]]>

Filed Under: Blog Tagged With: 21st-century statistics, Core of Data Analysis, Data Science, Next-Generation Statisticians

Data Then Science, Next-Generation Statisticians

July 3, 2014 by deepstatorg

  • Statistical validation of scientific guesses   —->   scientific validation of Statistical findings.
  • Parametric Confirmatory (Science + Data)     ——–>    Nonparametric exploratory modeling (Data + Science).
  • Question: How many of such nonparametric exploratory modeling tools (not inferential tool !) we have developed in last three decades?

    sotrue

    ]]>

    Filed Under: Blog Tagged With: Data Science, Next-Generation Statisticians

    • Page 1
    • Page 2
    • Go to Next Page »

    Primary Sidebar

    Deep Mukhopadhyay

    Deep Mukhopadhyay
    Statistics Department
    deep [at] unitedstatalgo.com

    EDUCATION

    • Ph.D. (2013), Texas A&M University
    • M.S. (2008), Indian Institute of Technology (IIT), Kanpur
    • B.S. (2006), University of Calcutta, India

    Footer

    Follow Us

    • LinkedIn
    • Twitter
    • Skype

    Contact Us

    • Email
      deep@unitedstatalgo.com
    • Address
      Department of Statistics
      Sequoia Hall, 390 Serra Mall
      Stanford, CA 94305

    Read Recent Blogs

    • Could Einstein’s Work Get Published Today?
    • What's The Point of Doing Fundamental Science?
    • Two sides of Theoretical Data Science: Analysis and Synthesis

    Copyright © 2025 · eleven40 Pro on Genesis Framework · WordPress · Log in