Automatic music transcription, a central topic in music signal analysis, is typically limited to equal-tempered music and evaluated on a quartertone tolerance level. A system is proposed to automatically transcribe microtonal and heterophonic music as applied to the makam music of Turkey. Specific traits of this music that deviate from properties targeted by current transcription tools are discussed, and a collection of instrumental and vocal recordings is compiled, along with aligned microtonal reference pitch annotations. An existing multi-pitch detection algorithm is adapted for transcribing music with 20 cent resolution, and a method for converting a multi-pitch heterophonic output into a single melodic line is proposed. Evaluation metrics for transcribing microtonal music are applied, which use various levels of tolerance for inaccuracies with respect to frequency and time. Results show that the system is able to transcribe microtonal instrumental music at 20 cent resolution with an F-measure of 56.7%, outperforming state-of-the-art methods for the same task. Case studies on transcribed recordings are provided, to demonstrate the shortcomings and the strengths of the proposed method.

1.
Abraham
,
O.
, and
von Hornbostel
,
E. M.
(
1994
). “
Suggested methods for the transcription of exotic music
,”
Ethnomusicology
38
,
425
456
[originally published in German in 1909: “Vorschläge für die Transkription exotischer Melodien”].
2.
Anderson Sutton
,
R.
, and
Vetter
,
R. R.
(
2006
). “
Flexing the frame in Javanese gamelan music: Playfulness in a performance of Ladrang Pangkur
,” in
Analytic Studies in World Music
, edited by
M.
Tenzer
(
Oxford University Press
,
Oxford, UK
), Chap. 7, pp.
237
272
.
3.
Arel
,
H. S.
(
1968
).
Türk Musikisi Nazariyat i (The Theory of Turkish Music)
(
Hüsnütabiat matbaas i
,
Istanbul, Turkey
), Vol.
2
.
4.
Bay
,
M.
,
Ehmann
,
A. F.
, and
Downie
,
J. S.
(
2009
). “
Evaluation of multiple-F0 estimation and tracking systems
,” in
International Society for Music Information Retrieval Conference
, Kobe, Japan, pp.
315
320
.
5.
Benetos
,
E.
,
Cherla
,
S.
, and
Weyde
,
T.
(
2013a
). “
An efficient shift-invariant model for polyphonic music transcription
,” in
6th International Workshop on Machine Learning and Music
, Prague, Czech Republic, pp.
7
10
.
6.
Benetos
,
E.
, and
Dixon
,
S.
(
2013
). “
Multiple-instrument polyphonic music transcription using a temporally-constrained shift-invariant model
,”
J. Acoust. Soc. Am.
133
,
1727
1741
.
7.
Benetos
,
E.
,
Dixon
,
S.
,
Giannoulis
,
D.
,
Kirchhoff
,
H.
, and
Klapuri
,
A.
(
2013b
). “
Automatic music transcription: Challenges and future directions
,”
J. Intell. Inf. Syst.
41
,
407
434
.
8.
Benetos
,
E.
, and
Holzapfel
,
A.
(
2013
). “
Automatic transcription of Turkish makam music
,” in
International Society for Music Information Retrieval Conference
, Curitiba, Brazil, pp.
355
360
.
9.
Bozkurt
,
B.
(
2008
). “
An automatic pitch analysis method for Turkish maqam music
,”
J. New Mus. Res.
37
,
1
13
.
10.
Bozkurt
,
B.
,
Ayangil
,
R.
, and
Holzapfel
,
A.
(
2014
). “
Computational analysis of makam music in Turkey: Review of state-of-the-art and challenges
,”
J. New Mus. Res.
43
,
3
23
.
11.
Brown
,
J. C.
(
1991
). “
Calculation of a constant Q spectral transform
,”
J. Acoust. Soc. Am.
89
,
425
434
.
12.
Brown
,
S.
(
2007
). “
Contagious heterophony: A new theory about the origins of music
,”
Musicae Scientiae
11
,
3
26
.
13.
Bunch
,
P.
, and
Godsill
,
S.
(
2011
). “
Point process MCMC for sequential music transcription
,” in
International Conference on Acoustical Speech and Signal Processing
, Prague, Czech Republic, pp.
5936
5939
.
14.
Cooke
,
P.
(
2001
). “
Heterophony
,” Oxford Music Online, Grove Music Online, http://grovemusic.com/ (Last accessed August 6, 2015).
15.
Davy
,
M.
,
Godsill
,
S.
, and
Idier
,
J.
(
2006
). “
Bayesian analysis of western tonal music
,”
J. Acoust. Soc. Am.
119
,
2498
2517
.
16.
de Cheveigné
,
A.
(
2006
). “
Multiple F0 estimation
,” in
Computational Auditory Scene Analysis, Algorithms and Applications
, edited by
D. L.
Wang
and
G. J.
Brown
(
IEEE Press/Wiley
,
New York
), pp.
45
79
.
17.
de Cheveigné
,
A.
, and
Kawahara
,
H.
(
2002
). “
YIN, a fundamental frequency estimator for speech and music
,”
J. Acoust. Soc. Am.
111
,
1917
1930
.
18.
Dempster
,
A. P.
,
Laird
,
N. M.
, and
Rubin
,
D. B.
(
1977
). “
Maximum likelihood from incomplete data via the EM algorithm
,”
J. R. Stat. Soc.
39
,
1
38
.
19.
Dessein
,
A.
,
Cont
,
A.
, and
Lemaitre
,
G.
(
2010
). “
Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence
,” in
International Society for Music Information Retrieval Conference
, Utrecht, Netherlands, pp.
489
494
.
20.
Dixon
,
S.
,
Mauch
,
M.
, and
Tidhar
,
D.
(
2012
). “
Estimation of harpsichord inharmonicity and temperament from musical recordings
,”
J. Acoust. Soc. Am.
131
,
878
887
.
21.
Emiya
,
V.
,
Badeau
,
R.
, and
David
,
B.
(
2010
). “
Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle
,”
IEEE Trans. Audio, Speech Lang. Proc.
18
,
1643
1654
.
22.
Erkut
,
C.
,
Tolonen
,
T.
,
Karjalainen
,
M.
, and
Välimäki
,
V.
(
1999
). “
Acoustic analysis of Tanbur, a Turkish long-necked lute
,” in
International Congress on Sound and Vibration (IIAV)
, pp.
345
352
.
23.
Fuentes
,
B.
,
Badeau
,
R.
, and
Richard
,
G.
(
2013
). “
Harmonic adaptive latent component analysis of audio and application to music transcription
,”
IEEE Trans. Audio Speech Lang. Proc.
21
,
1854
1866
.
24.
Houtsma
,
A.
(
1968
). “
Discrimination of frequency ratios
,”
J. Acoust. Soc. Am.
44
,
383
.
25.
Karaosmanoğlu
,
K.
(
2012
). “
A Turkish Makam music symbolic database for music information retrieval: Symbtr
,” in
International Society for Music Information Retrieval Conference
, Porto, Portugal, pp.
223
228
.
26.
Kirchhoff
,
H.
,
Dixon
,
S.
, and
Klapuri
,
A.
(
2013
). “
Missing template estimation for user-assisted music transcription
,” in
International Conference on Acoustical Speech and Signal Processing
, Vancouver, Canada,
26
30
.
27.
Klapuri
,
A.
, and
Davy
,
M.
(
2006
).
Signal Processing Methods for Music Transcription
(
Springer-Verlag
,
New York
).
28.
Lee
,
K.
(
1980
). “
Certain experiences in Korean music
,” in
Musics of Many Cultures: An Introduction
, edited by
E.
May
(
University of California Press
,
Oakland, CA
), pp.
32
47
.
29.
Macrae
,
R.
, and
Dixon
,
S.
(
2010
). “
Accurate real-time windowed time warping
,” in
International Society for Music Information Retrieval Conference
, Utrecht, Netherlands, pp.
423
428
.
30.
Mauch
,
M.
, and
Ewert
,
S.
(
2013
). “The audio degradation toolbox and its application to robustness evaluation,” in
International Society for Music Information Retrieval Conference
, Curitiba, Brazil, pp.
83
88
.
31.
MIREX
(
2007
). “
Music Information Retrieval Evaluation eXchange (MIREX)
,” http://music-ir.org/mirexwiki/ (Last accessed August 6, 2015).
32.
Nesbit
,
A.
,
Hollenberg
,
L.
, and
Senyard
,
A.
(
2004
). “
Towards automatic transcription of Australian aboriginal music
,” in
Proceedings of the 5th International Conference on Music Information Retrieval
, Barcelona, Spain.
33.
Racy
,
A. J.
(
2003
).
Making Music in the Arab World: The Culture and Artistry of Tarab
(
Cambridge University Press
,
Cambridge, UK
), Chap. Heterophony, pp.
80
96
.
34.
Rigaud
,
F.
,
David
,
B.
, and
Daudet
,
L.
(
2013
). “
A parametric model and estimation techniques for the inharmonicity and tuning of the piano
,”
J. Acoust. Soc. Am.
133
,
3107
3118
.
41.
Reigle
,
R.
(
2013
). (personal communication).
35.
Seeger
,
C.
(
1958
). “
Prescriptive and descriptive music-writing
,”
Music Quart.
64
,
184
195
.
36.
Smaragdis
,
P.
(
2009
). “
Relative-pitch tracking of multiple arbitary sounds
,”
J. Acoust. Soc. Am.
125
,
3406
3413
.
37.
Smaragdis
,
P.
,
Raj
,
B.
, and
Shashanka
,
M.
(
2006
). “
A probabilistic latent variable model for acoustic modeling
,” in
Advances in Models for Acoustic Processing Workshop (NIPS'06)
, Whistler, Canada.
38.
Stock
,
J. P. J.
(
2007
). “
Alexander j. Ellis and his place in the history of ethnomusicology
,”
Ethnomusicology
51
,
306
325
.
39.
Thompson
,
W. F.
(
2013
). “
Intervals and scales
,” in
The Psychology of Music
, edited by
D.
Deutsch
(
Elsevier
, Amsterdam,
the Netherlands
), Chap. 4.
40.
Vincent
,
E.
,
Bertin
,
N.
, and
Badeau
,
R.
(
2010
). “
Adaptive harmonic spectral decomposition for multiple pitch estimation
,”
IEEE Trans. Audio Speech Lang. Process.
18
,
528
537
.
You do not currently have access to this content.