We use the method of Increment of Diversity with Quadratic Discriminant analysis (IDQD) to predict the transcription start sites (TSS). In typical TSS set prediction both sensitivity and positive predictive value have achieved a value higher than 65% with positives/negatives ratio 1:58. The performance evaluations by using Receiver Operator Characteristics (ROC) and Precision Recall Curves (PRC) were carried out, which give area under ROC(auROC) higher than 96% and area under PRC(auPRC)≈26% for positives/negatives ratio 1:679, 64% for postives/negatives ratio 1:113. In whole genome searching we made prediction on classical TSSs (collected in database dbTSS2006) in chromosomes 4,21 and 22 and obtained auROC = 93% and auPRC = 40% for positives/negatives ratio 1:138 and auROC = 97% and auPRC = 65% for positives/negatives ratio 1:68. The work shows the IDQD method is capable of solving complicate classification problems in bioinformatics.

You do not currently have access to this content.