• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Deep Mukhopadhyay, Ph.D.

  • Home
  • Blog
  • Research
  • Publications
  • Software
  • Teaching
    • Courses
  • Talks

“We have not yet gone about structuring the field as a whole in an understandable and effective way. We have large tasks before us, both in developing initial structure and in using this structure to organize what others have done and to see what still others might do. Those of us who recognize the importance of more effective data analysis MUST feel the urgency of transforming it somewhat more nearly into an organized body of knowledge.” 

                                                                                              — Colin Mallows and John Tukey, 1982. 


What’s Next In Data Science?

2019 marks the 150th anniversary of the Mendeleev periodic table. This iconic discovery, which is based on the ingenious observation that the properties of the elements are periodic functions of their atomic numbers, always amuses me. I often wonder: can one day we develop such an organized connected framework for Statistics where a range of diverse algorithms and perspectives can peacefully coexist under a single unified umbrella —“Algorithm of Algorithms.” I’m developing new theoretical principles to blur the boundaries between different cultures and subfields of Statistics and Machine Learning.

A New Frontier in Fundamental Statistics

Are there any general principles for designing statistical algorithms? By general principles I mean, a theoretical framework that is [beautiful] logically coherent, [useful] can systematically synthesize a large number of techniques that have proved useful in data analysis, and [adaptable] can provide ways to generalize them beyond classical regime. I have been developing one such candidate theory to lay the groundwork for a progressive unification of fundamental statistical learning tools. Our theory has given birth to a new and exciting discipline for 21st-century statistics, called “Nonparametric Data Science,” which is rapidly gaining ground.

To realize this vision, we focus on one important field of statistics at a time, with a goal to simplify, unify and generalize them using our “Nonparametric Data Science” theory and tools. Under this new framework, a significant number of statistical problems have been tackled to date, including: generalized empirical Bayes (Mukhopadhyay and Fletcher, 2019), large-scale inference (Mukhopadhyay 2016, 2018, 2020), statistical spectral analysis of graphs (Mukhopadhyay, 2020), universal copula modeling (Mukhopadhyay and Parzen, 2020), high-dimensional data modeling (Mukhopadhyay and Wang, 2018), density estimation (Mukhopadhyay, 2017a), dependence modeling (Parzen and Mukhopadhyay, 2013b), non-linear time series modeling (Mukhopadhyay and Parzen, 2017; Mukhopadhyay and Nandi (2017),  and nonparametric distributed learning (Bruce et al., 2016). All of these results show how our general theory acts as an organizing principle for varieties of data analysis endeavors, thereby allowing us to connect different sub-fields of statistics using one universal language.

The Age of ‘Unified Algorithms’ Is Here

A coherent way of designing and understanding data analysis is the ultimate goal of the “Theory of Data Science.” But, does it exist at all? The advancements made so far have convinced me that such a theory of ‘United Statistical Algorithms’ is within reach.  This whole field is still very nascent and desperately needs new bold ideas.

“Many useful techniques are developed in application areas, and often more than one. For me theoretical analysis that connects them, explains them better, sheds light on their performance is always very gratifying.” — Trevor Hastie (2018).

From a practical standpoint, there is a dire need to put some order into the current inventory of algorithms that are mushrooming at a staggering rate, in order to better understand the statistical core. A theory of ‘United Statistical Algorithms’ provides us a modern language of data analysis that can put together different “mini-algorithms” into a coherent master algorithm for increased simplification of theory, computation, and practice.  There is little doubt that this field will keep evolving and expanding rapidly in the coming years due to its pervasive necessity across many disciplines including statistics, data science, and AI.

 

 

Primary Sidebar

Deep Mukhopadhyay

Deep Mukhopadhyay
Statistics Department
deep [at] unitedstatalgo.com

EDUCATION

  • Ph.D. (2013), Texas A&M University
  • M.S. (2008), Indian Institute of Technology (IIT), Kanpur
  • B.S. (2006), University of Calcutta, India

Footer

Follow Us

  • LinkedIn
  • Twitter
  • Skype

Contact Us

  • Email
    deep@unitedstatalgo.com
  • Address
    Department of Statistics
    Sequoia Hall, 390 Serra Mall
    Stanford, CA 94305

Read Recent Blogs

  • Could Einstein’s Work Get Published Today?
  • What's The Point of Doing Fundamental Science?
  • Two sides of Theoretical Data Science: Analysis and Synthesis

Copyright © 2023 · eleven40 Pro on Genesis Framework · WordPress · Log in