This study assesses speaker verification efficacy in detecting cloned voices, particularly in safety-critical applications such as healthcare documentation and banking biometrics. It compares deeply trained neural networks like the DeepSpeaker with human listeners in recognizing these cloned voices, underlining the severe implications of voice cloning in these sectors. Cloned voices in healthcare could endanger patient safety by altering medical records, causing inaccurate diagnoses and treatments. In banking, they threaten biometric security, increasing the risk of financial fraud and identity theft. We tested feature extraction strategies using up to 40 parameters, including MFCC, GFCC, LFCC, CQCC, MSRCC, and others computed with the Simplified Python Audio Features Extraction (spafe libraries) or Librosa. We verified the feature vectors using Feature Ranking (Random Forest-derived) and performed dimensionality reduction using Principal Component Analysis (PCA). Our central research question was whether using the voice cloning method to effectively attack the advanced authentication systems is possible. The research reveals the neural network's superiority over human detection in pinpointing cloned voices, underscoring the urgent need for sophisticated AI-based security.

This content is only available via PDF.