Music varies enormously across cultures, but some traits are widespread. Cross-cultural consistency in music could be driven by universal perceptual mechanisms adapted to natural sounds, but supporting evidence has been circumstantial due to the dearth of cross-cultural research. Here, we explore whether such perceptual mechanisms impose universal similarity relations on musical structure, potentially dissociating from culture-specific aesthetic judgments about music. We measured one possible signature of these similarity relations—the extent to which concurrent notes are perceived as a single sound—in members of a small-scale Amazonian society and Western listeners. We also measured aesthetic responses to the same stimuli. Unlike Westerners, Amazonian listeners were aesthetically indifferent to whether note combinations were canonically consonant (with aggregate frequency spectra resembling the harmonic series). However, Amazonians were nonetheless more likely to hear consonant combinations as a single sound, with fusion judgments that qualitatively resembled those of Western listeners. Thus, even in a culture with little exposure to Western harmony, reliance on harmonic frequency relations for sound segregation evidently induces consistent perceptual structure in note combinations. The results suggest that perceptual mechanisms for representing music can be shared across cultures, even though the perceptual equivalences that result give rise to culture-specific aesthetic associations.