The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.

1.
C.
Phillips
,
Single Nucleotide Polymorphisms
, Volume
578
43
71
(
2009
),
methods in Molecular Biology (Clifton, N.J
.).
2.
P. H.
Guzzi
,
G.
Agapito
,
M.
Milano
, and
M.
Cannataro
,
Briefings in bioinformatics
p.
bbv076
(
2015
).
3.
E.
Rumiato
,
E.
Boldrin
,
A.
Amadori
, and
D.
Saggioro
,
Cancer Chemotherapy and Pharmacology
72
,
483
488
(
2013
).
4.
M.
Arbitrio
,
M. T.
Di Martino
,
V.
Barbieri
,
G.
Agapito
,
P. H.
Guzzi
,
C.
Botta
,
E.
Iuliano
,
F.
Scionti
,
E.
Altomare
,
S.
Codispoti
, et al.,
Cancer chemotherapy and pharmacology
77
,
205
209
(
2016
).
5.
M. T.
Di Martino
,
M.
Arbitrio
,
P. H.
Guzzi
,
E.
Leone
,
F.
Baudi
,
E.
Piro
,
T.
Prantera
,
I.
Cucinotto
,
T.
Calimeri
,
M.
Rossi
, et al.,
British journal of haematology
154
,
529
533
(
2011
).
6.
J.
Li
,
L.
Zhang
,
H.
Zhou
,
M.
Stoneking
, and
K.
Tang
,
Human Molecular Genetics
20
,
528
540
(
2011
).
7.
M.
Cannataro
,
P. H.
Guzzi
,
T.
Mazza
,
G.
Tradigo
, and
P.
Veltri
, “
Preprocessing of mass spectrometry proteomics data on the grid
,” in
18th IEEE Symposium on Computer-Based Medical Systems (CBMS’05)
(
IEEE
,
2005
), pp.
549
554
.
8.
M.
Cannataro
,
C.
Comito
,
A.
Guzzo
, and
P.
Veltri
, “
Integrating ontology and workflow in proteus, a grid-based problem solving environment for bioinformatics
,” in
Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on
, Vol.
2
(
IEEE
,
2004
), pp.
90
94
.
9.
P. H.
Guzzi
and
M.
Cannataro
,
BMC bioinformatics
11
,
315
+ (
2010
).
10.
P. H.
Guzzi
,
G.
Agapito
, and
M.
Cannataro
,
Computers, IEEE Transactions on
63
,
2961
2974
(
2014
).
11.
B.
Calabrese
and
M.
Cannataro
,
Scalable Computing: Practice and Experience
16
,
1
18
(
2015
).
12.
M.
Cannataro
,
D.
Talia
,
G.
Tradigo
,
P.
Trunfio
, and
P.
Veltri
,
Future Generation Computer Systems
24
,
222
234
(
2008
).
13.
T. M.
Sissung
,
B. C.
English
,
D.
Venzon
,
W. D.
Figg
, and
J. F.
Deeken
,
Pharmacogenomics
11
,
89
103
.
14.
P.
Guzzi
,
G.
Agapito
,
M.
Di Martino
,
M.
Arbitrio
,
P.
Tassone
,
P.
Tagliaferri
, and
M.
Cannataro
,
BMC Bioinformatics
13
, p.
258
(
2012
).
15.
G.
Agapito
,
P. H.
Guzzi
, and
M.
Cannataro
,
Journal of biomedical informatics
56
,
273
283
(
2015
).
16.
C.
Borgelt
, “
An implementation of the fp-growth algorithm
,” in
Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
(
ACM
,
2005
), pp.
1
5
.
17.
H.
Li
,
Y.
Wang
,
D.
Zhang
,
M.
Zhang
, and
E. Y.
Chang
, “
Pfp: parallel fp-growth for query recommendation
,” in
Proceedings of the 2008 ACM conference on Recommender systems
(
ACM
,
2008
), pp.
107
114
.
This content is only available via PDF.
You do not currently have access to this content.