Non‐speech audio event detection (AED) could be used for low‐cost, spatially diffuse surveillance applications, e.g., monitoring of vehicle activity in a national park, or of footsteps in a hallway. Experiments have shown that non‐speech AED benefits from the dynamic inference strategies such as the hidden Markov model (HMM), but that the acoustic features useful for non‐speech events may not be the same as those useful for speech. One possible solution is a tandem HMM: an HMM whose observation vector is constructed from the output of an instantaneous discriminative classifier, e.g., a neural network. The use of tandem HMMs for non‐speech AED is hindered, however, by the relatively small size of most non‐speech‐audio training corpora. This talk will demonstrate that tandem HMMs can be trained to detect non‐speech audio events using a novel form of regularized training: Baum–Welch back‐propagation (as proposed by Bengio et al.), using the conjugate‐gradient adaptive form of the Baum–Welch auxiliary function (as proposed by Lee et al., and as commonly used in maximum a posteriori HMM adaptation).