Peptide classification using nanopore-based devices promises to be a breakthrough method in basic research, diagnostics, and analytics. However, the measured blockage currents suffer from a low signal-to-noise ratio and a high information density that has hitherto not been fully deciphered. Some simple machine learning approaches using average current blockade depths and dwell-times have been investigated to improve this situation. In this work, a comprehensive statistical analysis of nanopore current signals is performed and demonstrated to be sufficient for classifying up to 42 peptides with over 70% accuracy. Two sets of features, the statistical moments and the catch22 set, are compared both in their representations and after training small classifier neural networks. We demonstrate that complex features of the events, captured in both the catch22 set and the central moments, are key to classifying peptides with otherwise similar mean currents. These results highlight the efficacy of purely statistical analysis of nanopore data and suggest a path forward for more sophisticated classification techniques.

1.
Y.
Wang
,
Y.
Zhao
,
A.
Bollas
,
Y.
Wang
, and
K. F.
Au
, “
Nanopore sequencing technology, bioinformatics and applications
,”
Nat. Biotechnol.
39
,
1348
1365
(
2021
).
2.
T.
Hoenen
,
A.
Groseth
,
K.
Rosenke
,
R. J.
Fischer
,
A.
Hoenen
,
S. D.
Judson
,
C.
Martellaro
,
D.
Falzarano
,
A.
Marzi
,
R. B.
Squires
,
K. R.
Wollenberg
,
E.
de Wit
,
J. B.
Prescott
,
D.
Safronetz
,
N.
van Doremalen
,
T.
Bushmaker
,
F.
Feldmann
,
K.
McNally
,
F. K.
Bolay
,
B.
Fields
et al, “
Nanopore sequencing as a rapidly deployable Ebola outbreak tool
,”
Emerging Infect. Dis.
22
,
331
334
(
2016
).
3.
L. E.
Kafetzopoulou
,
S. T.
Pullan
,
P.
Lemey
,
M. A.
Suchard
,
D. U.
Ehichioya
,
M.
Pahlmann
,
A.
Thielebein
,
J.
Hinzmann
,
L.
Oestereich
,
D. M.
Wozniak
,
K.
Efthymiadis
,
D.
Schachten
,
F.
Koenig
,
J.
Matjeschk
,
S.
Lorenzen
,
S.
Lumley
,
Y.
Ighodalo
,
D. I.
Adomeh
,
T.
Olokor
,
E.
Omomoh
et al, “
Metagenomic sequencing at the epicenter of the Nigeria 2018 Lassa fever outbreak
,”
Science
363
,
74
77
(
2019
).
4.
M.
Jain
,
S.
Koren
,
K. H.
Miga
,
J.
Quick
,
A. C.
Rand
,
T. A.
Sasani
,
J. R.
Tyson
,
A. D.
Beggs
,
A. T.
Dilthey
,
I. T.
Fiddes
,
S.
Malla
,
H.
Marriott
,
T.
Nieto
,
J.
O’Grady
,
H. E.
Olsen
,
B. S.
Pedersen
,
A.
Rhie
,
H.
Richardson
,
A. R.
Quinlan
,
T. P.
Snutch
et al, “
Nanopore sequencing and assembly of a human genome with ultra-long reads
,”
Nat. Biotechnol.
36
,
338
345
(
2018
).
5.
C.
Vermeulen
,
M.
Pagès-Gallego
,
L.
Kester
,
M. E. G.
Kranendonk
,
P.
Wesseling
,
N.
Verburg
,
P.
de Witt Hamer
,
E. J.
Kooi
,
L.
Dankmeijer
,
J.
van der Lugt
,
K.
van Baarsen
,
E. W.
Hoving
,
B. B. J.
Tops
, and
J.
de Ridder
, “
Ultra-fast deep-learned CNS tumour classification during surgery
,”
Nature
622
,
842
849
(
2023
).
6.
I. W.
Hamley
,
Introduction to Peptide Science
(
Wiley
,
2020
).
7.
Á.
Díaz Carral
,
M.
Ostertag
, and
M.
Fyta
, “
Deep learning for nanopore ionic current blockades
,”
J. Chem. Phys.
154
,
044111
(
2021
).
8.
Y.
Bao
,
J.
Wadden
,
J. R.
Erb-Downward
,
P.
Ranjan
,
W.
Zhou
,
T. L.
McDonald
,
R. E.
Mills
,
A. P.
Boyle
,
R. P.
Dickson
,
D.
Blaauw
, and
J. D.
Welch
, “
SquiggleNet: Real-time, direct classification of nanopore signals
,”
Genome Biol.
22
,
298
(
2021
).
9.
A.
Senanayake
,
H.
Gamaarachchi
,
D.
Herath
, and
R.
Ragel
, “
DeepSelectNet: Deep neural network based selective sequencing for oxford nanopore sequencing
,”
BMC Bioinf.
24
,
31
(
2023
).
10.
K.
Wang
,
S.
Zhang
,
X.
Zhou
,
X.
Yang
,
X.
Li
,
Y.
Wang
,
P.
Fan
,
Y.
Xiao
,
W.
Sun
,
P.
Zhang
,
W.
Li
, and
S.
Huang
, “
Unambiguous discrimination of all 20 proteinogenic amino acids and their modifications by nanopore
,”
Nat. Methods
21
,
92
101
(
2024
).
11.
Y.
Zhang
,
Y.
Yi
,
Z.
Li
,
K.
Zhou
,
L.
Liu
, and
H.-C.
Wu
, “
Peptide sequencing based on host–guest interaction-assisted nanopore sensing
,”
Nat. Methods
21
,
102
109
(
2024
).
12.
W.
Li
,
F.
Li
,
X.
Zhang
,
H.-K.
Lin
, and
C.
Xu
, “
Insights into the post-translational modification and its emerging role in shaping the tumor microenvironment
,”
Signal Transduction Targeted Ther.
6
,
422
(
2021
).
13.
H.
Wang
,
L.
Yang
,
M.
Liu
, and
J.
Luo
, “
Protein post-translational modifications in the regulation of cancer hallmarks
,”
Cancer Gene Ther.
30
,
529
547
(
2023
).
14.
A. K.
Srivastava
,
G.
Guadagnin
,
P.
Cappello
, and
F.
Novelli
, “
Post-translational modifications in tumor-associated antigens as a platform for novel immuno-oncology therapies
,”
Cancers
15
,
138
(
2023
).
15.
H.
Dutta
and
N.
Jain
, “
Post-translational modifications and their implications in cancer
,”
Front. Oncol.
13
,
1240115
(
2023
).
16.
H.
Ouldali
,
K.
Sarthak
,
T.
Ensslen
,
F.
Piguet
,
P.
Manivet
,
J.
Pelta
,
J. C.
Behrends
,
A.
Aksimentiev
, and
A.
Oukhaled
, “
Electrical recognition of the twenty proteinogenic amino acids using an aerolysin nanopore
,”
Nat. Biotechnol.
38
,
176
181
(
2020
).
17.
T.
Ensslen
,
K.
Sarthak
,
A.
Aksimentiev
, and
J. C.
Behrends
, “
Resolving isomeric posttranslational modifications using a biological nanopore as a sensor of molecular shape
,”
J. Am. Chem. Soc.
144
,
16060
16068
(
2022
).
18.
C.
Cao
,
P.
Magalhães
,
L. F.
Krapp
,
J. F.
Bada Juarez
,
S. F.
Mayer
,
V.
Rukes
,
A.
Chiki
,
H. A.
Lashuel
, and
M.
Dal Peraro
, “
Deep learning-assisted single-molecule detection of protein post-translational modifications with a biological nanopore
,”
ACS Nano
18
,
1504
1515
(
2024
).
19.
M.
Afshar Bakshloo
,
J. J.
Kasianowicz
,
M.
Pastoriza-Gallego
,
J.
Mathé
,
R.
Daniel
,
F.
Piguet
, and
A.
Oukhaled
, “
Nanopore-based protein identification
,”
J. Am. Chem. Soc.
144
,
2716
2725
(
2022
).
20.
S.
Badrinarayanan
,
C.
Guntuboina
,
P.
Mollaei
, and
A.
Barati Farimani
, “
Multi-peptide: Multimodality leveraged language-graph learning of peptide properties
,”
J. Chem. Inf. Model.
65
,
83
91
(
2025
).
21.
C. H.
Lubba
,
S. S.
Sethi
,
P.
Knaute
,
S. R.
Schultz
,
B. D.
Fulcher
, and
N. S.
Jones
, “
catch22: canonical time-series characteristics
,”
Data Min. Knowl. Discovery
33
,
1821
1852
(
2019
).
22.
L.
van der Maaten
and
G.
Hinton
, “
Visualizing data using t-SNE
,”
J. Mach. Learn. Res.
9
,
2579
2605
(
2008
).
23.
F.
Pedregosa
,
G.
Varoquaux
,
A.
Gramfort
,
V.
Michel
,
B.
Thirion
,
O.
Grisel
,
M.
Blondel
,
P.
Prettenhofer
,
R.
Weiss
,
V.
Dubourg
,
J.
Vanderplas
,
A.
Passos
,
D.
Cournapeau
,
M.
Brucher
,
M.
Perrot
, and
E.
Duchesnay
, “
Scikit-learn: Machine learning in Python
,”
J. Mach. Learn. Res.
12
,
2825
2830
(
2011
).
24.
J. C.
Behrends
and
T.
Ensslen
, “
Method and systems for identifying a sequence of monomer units of a biological or synthetic heteropolymer
,” U.S. patent 20240077491A1 (
2024
).
25.
F.
Piguet
,
T.
Ensslen
,
M. A.
Bakshloo
,
M.
Talarimoghari
,
H.
Ouldali
,
G.
Baaken
,
E.
Zaitseva
,
M.
Pastoriza-Gallego
,
J. C.
Behrends
, and
A.
Oukhaled
, “
Pore-forming toxins
,” in
Methods in Enzymology
, edited by
A. P.
Heuck
(
Academic Press
,
2021
), Vol.
649
, pp.
587
634
.
26.
J. E.
Reiner
,
A.
Balijepalli
,
J. W. F.
Robertson
,
B. S.
Drown
,
D. L.
Burden
, and
J. J.
Kasianowicz
, “
The effects of diffusion on an exonuclease/nanopore-based DNA sequencing engine
,”
J. Chem. Phys.
137
,
214903
(
2012
).
27.
A.
Papoulis
and
S.
Pillai
,
Probability, Random Variables, and Stochastic Processes
,
McGraw-Hill Series in Electrical and Computer Engineering
(
McGraw-Hill
,
2002
).
28.
M.
Rosenblatt
, “
Remarks on some nonparametric estimates of a density function
,”
Ann. Math. Stat.
27
,
832
837
(
1956
).
29.
E.
Parzen
, “
On estimation of a probability density function and mode
,”
Ann. Math. Stat.
33
,
1065
1076
(
1962
).
30.
P.
Virtanen
,
R.
Gommers
,
T. E.
Oliphant
,
M.
Haberland
,
T.
Reddy
,
D.
Cournapeau
,
E.
Burovski
,
P.
Peterson
,
W.
Weckesser
,
J.
Bright
,
S. J.
van der Walt
,
M.
Brett
,
J.
Wilson
,
K. J.
Millman
,
N.
Mayorov
,
A. R. J.
Nelson
,
E.
Jones
,
R.
Kern
,
E.
Larson
,
C. J.
Carey
et al, “
SciPy 1.0: Fundamental algorithms for scientific computing in Python
,”
Nat. Methods
17
,
261
272
(
2020
).
31.
B. D.
Fulcher
,
M. A.
Little
, and
N. S.
Jones
, “
Highly comparative time-series analysis: The empirical structure of time series and their methods
,”
J. R. Soc. Interface
10
,
20130048
(
2013
).
32.
B. D.
Fulcher
and
N. S.
Jones
, “
hctsa: A computational framework for automated time-series phenotyping using massive feature extraction
,”
Cell Syst.
5
,
527
531.e3
(
2017
).
33.
S. M.
Lundberg
and
S.-I.
Lee
, “
A unified approach to interpreting model predictions
,” in
Proceedings of the 31st International Conference on Neural Information Processing Systems
(
Curran Associates, Inc.
,
Red Hook, NY
,
2017
), pp.
4768
4777
.
34.
J.
Ansel
,
E.
Yang
,
H.
He
,
N.
Gimelshein
,
A.
Jain
,
M.
Voznesensky
,
B.
Bao
,
P.
Bell
,
D.
Berard
,
E.
Burovski
,
G.
Chauhan
,
A.
Chourdia
,
W.
Constable
,
A.
Desmaison
,
Z.
DeVito
,
E.
Ellison
,
W.
Feng
,
J.
Gong
,
M.
Gschwind
,
B.
Hirsh
,
S.
Huang
,
K.
Kalambarkar
,
L.
Kirsch
,
M.
Lazos
,
M.
Lezcano
,
Y.
Liang
,
J.
Liang
,
Y.
Lu
,
C. K.
Luk
,
B.
Maher
,
Y.
Pan
,
C.
Puhrsch
,
M.
Reso
,
M.
Saroufim
,
M. Y.
Siraichi
,
H.
Suk
,
S.
Zhang
,
M.
Suo
,
P.
Tillet
,
X.
Zhao
,
E.
Wang
,
K.
Zhou
,
R.
Zou
,
X.
Wang
,
A.
Mathews
,
W.
Wen
,
G.
Chanan
,
P.
Wu
, and
S.
Chintala
, “
PyTorch 2: Faster machine learning through dynamic Python bytecode transformation and graph compilation
,” in
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, La Jolla, CA
(Association for Computing Machinery, New York, 2024), Vol. 2, pp.
929
947
.
35.
S. J.
Reddi
,
S.
Kale
, and
S.
Kumar
, “
On the convergence of Adam and beyond
,” arXiv:1904.09237 (
2019
).
36.
J. D.
Kelleher
,
B.
MacNamee
, and
A.
D’Arcy
,
Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies
(
The MIT Press
,
Cambridge, MA, London, England
,
2015
).
You do not currently have access to this content.