The acceleration in the field of data science is well known [see, e.g., D. Donoho, J. Comput. Graph. Stat. 26(4), 745–766 (2017) and references therein]. Improvements in technology for acquisition, storage, and processing have made unheard of amounts of data available to scientists; in parallel with that, the pace of methodological advance has also been rapid; with new techniques and packages becoming available, it seems, every day. With these affordances come many challenges, notably the volume and variety of the data [Fan et al., Natl. Sci. Rev. 1(2), 293–314 (2014)]. In this Perspective piece, we examine a different challenge—how to choose and use the right analysis method—and make an argument for the sharing of raw data.
REFERENCES
1.
J.
Theiler
and S.
Eubank
, “Don’t bleach chaotic data
,” Chaos
3
(4
), 771
–782
(1993
). 2.
R.
Alley
, “Ice-core evidence of abrupt climate changes
,” Proc. Natl. Acad. Sci. U.S.A.
97
, 1331
–1334
(2000
). 3.
See www.spectraworks.com/MAS/Help/ for the KSpectra tool.
4.
R.
Hegger
, H.
Kantz
, and T.
Schreiber
, “Practical implementation of nonlinear time series methods: The TISEAN package
,” Chaos
9
(2
), 413
–435
(1999
). 5.
See www.pks.mpg.de/tisean/ for the TISEAN system.
6.
H.
Kantz
and T.
Schreiber
, Nonlinear Time Series Analysis
(Cambridge University Press
, Cambridge, UK
, 1997
).7.
L.
Rasmussen
, E.
Whitley
, and L.
Welty
, “Pragmatic reproducible research: Improving the research process from raw data to results, bit by bit
,” J. Clin. Invest.
133
, 173741
(2023
). 8.
See jupyter.org for the Jupyter project.
9.
See www.datamation.com/big-data/raw-data/ for “What is Raw Data? Definition, Examples, & Processing Steps.”
10.
C.
Orzel
, “What does it mean to share raw data?,” Forbes, July 2020.11.
J.
Garland
, T.
Jones
, M.
Neuder
, V.
Morris
, J.
White
, and E.
Bradley
, “Anomaly detection in paleoclimate records using permutation entropy
,” Entropy
20
(12
), 931
(2018
). 12.
Roundtable on Environmental Health Sciences, Research, and Medicine (National Academies Press, Health and Medicine Division, National Academies of Sciences, Engineering, and Medicine, 2016).
13.
M.
Shimojo
, T. S.
Bastian
, A. S.
Hales
, S. M.
White
, K.
Iwai
, R. E.
Hills
, A.
Hirota
, N. M.
Phillips
, T.
Sawada
, P.
Yagoubov
, G.
Siringo
, S.
Asayama
, M.
Sugimoto
, R.
Brajša
, I.
Skokić
, M.
Bárta
, S.
Kim
, I.
de Gregorio-Monsalvo
, S. A.
Corder
, H. S.
Hudson
, S.
Wedemeyer
, D. E.
Gary
, B.
De Pontieu
, M.
Loukitcheva
, G. D.
Fleishman
, B.
Chen
, A.
Kobelski
, and Y.
Yan
, “Observing the sun with the Atacama Large Millimeter/Submillimeter Array (ALMA): High-resolution interferometric imaging
,” Sol. Phys.
292
(7
), 87
(2017
). 14.
D.
Engber
, “Daryl Bem proved ESP is real—Which means science is broken,” Slate, 2017.15.
See datascience.nih.gov/tools-and-analytics/best-practices-for-sharing-research-software-faq for FAQs on best practices for sharing research software.
16.
R.
Abdill
, E.
Talarico
, and L.
Grieneisen
, “A how-to guide for code-sharing in biology,” arxiv:2401.03068 (2024).17.
M.
Porter
, “A non-expert’s introduction to data ethics for mathematicians,” arxiv:2201.07794 (2024).18.
I.
Hrynaszkiewicz
, M.
Norton
, A.
Vickers
, and D.
Altman
, “Preparing raw clinical data for publication: Guidance for journal editors, authors, and peer reviewers
,” BMJ
340
, c181
(2010
). 19.
L.
Poirier
, “Ethnographies of datasets: Teaching critical data analysis through R notebooks
,” J. Interact. Technol. Pedag.
18
, 1
(2020
).20.
D.
Donoho
, “50 years of data science
,” J. Comput. Graph. Stat.
26
(4
), 745
–766
(2017
). 21.
J.
Fan
, F.
Han
, and H.
Liu
, “Challenges of Big Data analysis
,” Natl. Sci. Rev.
1
(2
), 293
–314
(2014
). © 2024 Author(s). Published under an exclusive license by AIP Publishing.
2024
Author(s)
You do not currently have access to this content.