In this work, a pre-trained convolutional neural network is employed to detect gunfire from audio excerpts of urban sounds. The pretrained convolutional neural network is fined-tuned with transfer learned features to a new task using a smaller number of training signals. Two CNN methods are applied to the time-frequency representation of the audio signals. The first CNN method is based on classifying specific events in audio signals. The second CNN method is an image-based analysis method. The accuracy of the two CNN results will be compared and analyzed based on gunfire type and retrieved urban multipath conditions. A k-means clustering algorithm is employed to identify gunfire types and parametric modeling to retrieve the urban multipath conditions.