In a musical ensemble, performers try to synchronize to a governing tempo by resolving differences in sound-onset timing from individual players even without a conductor's cue. Listeners and players alike must construct an internalized sense of when the beat occurs and adapt to that information dynamically as the performance goes on. Here, we examined this process by simulating individual sound onset timings with an ensemble of 40 virtual “metronomes” around 90 bpm with which we asked listeners to tap along for an approximately 10-beat duration. We manipulated coupling strength at five levels (very-weak, weak, medium, strong, perfect) where stronger coupling corresponds to a more definitively periodic beat. The inter tap interval (ITI) from 8 subjects were analyzed in three segments of the trial duration [early (tap 1–3), middle (4–6), and late (7–9)]. Also, the phase coherence of taps between listeners was compared to the stimulus density. Stronger coupling resulted in more stable ITI, while ITI became shorter in later segments for the medium and weak conditions. Interestingly, taps coincided with the greatest stimulus density for weaker coupling, whereas taps led ahead for stronger coupling. The results suggest that listeners could maintain a collective beat perception but less anticipatorily for less-synchronized sounds.