Nonparametric data science: Testing hypotheses in large complex data

    Research output: Chapter in Book/Report/Conference proceedingChapter

    1 Scopus citations

    Abstract

    Big data contains very large, structured, or unstructured data sets, requiring novel statistical techniques to extract typically not well-defined parameters. The availability of massive amounts of complex data sets has provided challenges and opportunities to process and analyze the data, which is difficult using traditional data processing techniques. New protocols and methods are needed not only to record, store, and analyze the live streaming massive data sets but also to develop new analytical tools for testing hypotheses to gain novel insights and discoveries from systems that were previously not understood. There is a need to establish a clear path, and create and implement innovative new approaches, which are not distribution dependent to increase the understanding of complex large datasets. Data analysis is a challenge due to the lack of scalability of the underlying algorithms and the complexity of the data. Existing statistical tools, most of them are developed to draw inference from incomplete information available, have not been able to keep up with the speed of advancements in modern technologies generating a massive amount of continuous streaming data. The new approaches based on nonparametric methods have capabilities to yield transformational changes in biomedical research; integrate with next-generation technology platforms that can accelerate scientific discovery; use data ecosystems based on the data generated by researchers; and facilitate harmonization of data, methods, and technologies; and provide cutting-edge theory-based nonparametric methods in advanced computing environments. Each upgrade to a larger length scale increases variability and volume, which will eventually generate a rich data landscape that must be analyzed by cutting-edge analytical tools using both structured and general data-mining novel approaches in a continuous processing mode. The nonparametric analytical tools and concepts are needed to analyze such massive data to keep up with rapidly growing technology and which can also be used in the analysis of continuous streaming big data.

    Original languageEnglish (US)
    Title of host publicationData Science
    Subtitle of host publicationTheory and Applications
    EditorsArni S.R. Srinivasa Rao, C.R. Rao
    PublisherElsevier B.V.
    Pages201-231
    Number of pages31
    ISBN (Print)9780323852005
    DOIs
    StatePublished - Jan 2021

    Publication series

    NameHandbook of Statistics
    Volume44
    ISSN (Print)0169-7161

    Keywords

    • Big data
    • Efficient
    • Nonparametric methods
    • Predictions
    • Ranked-set
    • Ranks
    • Samples

    ASJC Scopus subject areas

    • Statistics and Probability
    • Modeling and Simulation
    • Applied Mathematics

    Fingerprint

    Dive into the research topics of 'Nonparametric data science: Testing hypotheses in large complex data'. Together they form a unique fingerprint.

    Cite this