Most models of animal acoustic communication describe how vocal cues produced by a signaler influence the behavior of a listener. The response made by a listener depends in large part on the perceived meaning of the signal. But, how do signals become meaningful to listeners? In some cases, such as imprinting, signal meaning can be attributed to structural cues that are perceived and acted upon through an innate releasing mechanism. In other instances, signals may be arbitrarily related to objects, individuals, or species. Equivalence theory provides a model describing how some arbitrary signals may acquire meaning. Here, we describe theory and experimental evidence in the form of cross‐modal matching‐to‐sample tasks showing how acoustic signals can become referents for visual stimuli. The subject of these behavioral experiments is a California sea lion with extensive experience in performing associative learning tasks. The aim of the experiments is to establish multiple auditory‐visual discriminations and then test for the emergence of untrained relationships between disparate visual stimuli linked by a common auditory signal. Preliminary data show successful emergent matching across visual and auditory modalities. These findings suggest that acoustic signals become meaningful to listeners when learned associations lead to the formation of equivalence classes.