TY - CHAP
T1 - Nonparametric data science
T2 - Testing hypotheses in large complex data
AU - Mathur, Sunil
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/1
Y1 - 2021/1
N2 - Big data contains very large, structured, or unstructured data sets, requiring novel statistical techniques to extract typically not well-defined parameters. The availability of massive amounts of complex data sets has provided challenges and opportunities to process and analyze the data, which is difficult using traditional data processing techniques. New protocols and methods are needed not only to record, store, and analyze the live streaming massive data sets but also to develop new analytical tools for testing hypotheses to gain novel insights and discoveries from systems that were previously not understood. There is a need to establish a clear path, and create and implement innovative new approaches, which are not distribution dependent to increase the understanding of complex large datasets. Data analysis is a challenge due to the lack of scalability of the underlying algorithms and the complexity of the data. Existing statistical tools, most of them are developed to draw inference from incomplete information available, have not been able to keep up with the speed of advancements in modern technologies generating a massive amount of continuous streaming data. The new approaches based on nonparametric methods have capabilities to yield transformational changes in biomedical research; integrate with next-generation technology platforms that can accelerate scientific discovery; use data ecosystems based on the data generated by researchers; and facilitate harmonization of data, methods, and technologies; and provide cutting-edge theory-based nonparametric methods in advanced computing environments. Each upgrade to a larger length scale increases variability and volume, which will eventually generate a rich data landscape that must be analyzed by cutting-edge analytical tools using both structured and general data-mining novel approaches in a continuous processing mode. The nonparametric analytical tools and concepts are needed to analyze such massive data to keep up with rapidly growing technology and which can also be used in the analysis of continuous streaming big data.
AB - Big data contains very large, structured, or unstructured data sets, requiring novel statistical techniques to extract typically not well-defined parameters. The availability of massive amounts of complex data sets has provided challenges and opportunities to process and analyze the data, which is difficult using traditional data processing techniques. New protocols and methods are needed not only to record, store, and analyze the live streaming massive data sets but also to develop new analytical tools for testing hypotheses to gain novel insights and discoveries from systems that were previously not understood. There is a need to establish a clear path, and create and implement innovative new approaches, which are not distribution dependent to increase the understanding of complex large datasets. Data analysis is a challenge due to the lack of scalability of the underlying algorithms and the complexity of the data. Existing statistical tools, most of them are developed to draw inference from incomplete information available, have not been able to keep up with the speed of advancements in modern technologies generating a massive amount of continuous streaming data. The new approaches based on nonparametric methods have capabilities to yield transformational changes in biomedical research; integrate with next-generation technology platforms that can accelerate scientific discovery; use data ecosystems based on the data generated by researchers; and facilitate harmonization of data, methods, and technologies; and provide cutting-edge theory-based nonparametric methods in advanced computing environments. Each upgrade to a larger length scale increases variability and volume, which will eventually generate a rich data landscape that must be analyzed by cutting-edge analytical tools using both structured and general data-mining novel approaches in a continuous processing mode. The nonparametric analytical tools and concepts are needed to analyze such massive data to keep up with rapidly growing technology and which can also be used in the analysis of continuous streaming big data.
KW - Big data
KW - Efficient
KW - Nonparametric methods
KW - Predictions
KW - Ranked-set
KW - Ranks
KW - Samples
UR - http://www.scopus.com/inward/record.url?scp=85094567046&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094567046&partnerID=8YFLogxK
U2 - 10.1016/bs.host.2020.10.004
DO - 10.1016/bs.host.2020.10.004
M3 - Chapter
AN - SCOPUS:85094567046
SN - 9780323852005
T3 - Handbook of Statistics
SP - 201
EP - 231
BT - Data Science
A2 - Srinivasa Rao, Arni S.R.
A2 - Rao, C.R.
PB - Elsevier B.V.
ER -