AUTOMATIC MUSICAL PUNCTUATION: A RULE SYSTEM, AND A NEURAL NETWORK APPROACH

Anders Friberg and Roberto Bresin
Royal Institute of Technology
Speech, Music and Hearing
S-100 44 Stockholm, Sweden
email: andersf@speech.kth.se, roberto@speech.kth.se

Abstract

By punctuation we mean the marking of the melodic structure by inserting commas at boundaries that divide small structural units. These commas are often signalled by micropauses. Two models that automatically tries to find the positions of these commas are presented. The first is based on a set of subrules; some subrules mark a possible comma position while other subrules remove these marks according to different conditions. The second model is a neural network. They both use a small context of maximally 5 notes. The models are evaluated by using a set of 52 musical excerpts and by comparing the results with the preferred punctuation of a performer.

INTRODUCTION

The marking of the structure is an important aspect of musical performance (e.g. Repp, 1992). The characteristic accelerando-ritardando shape that is commonly used to mark the phrase structure in romantic classical music has been modelled previously by Todd (1985, 1992) and Friberg (1995a). Another possibility to mark a structural boundary is to insert a comma, that is to introduce a micro pause at the boundary. With musical punctuation we mean here the insertion of such commas in a melody, thus dividing it in smaller units. In this investigation two different methods are described that both tries to automatically place these commas at musically appropriate positions. The result is evaluated by comparing it with the preferred punctuation of a performer.

There are often several possibilities to place commas. For example, some performers may prefer to mark relatively long melodic units while others may prefer to make a finer division. To avoid these individual differences we chose to use the preferred punctuation of one performer as the reference.

The punctuation resembles the grouping analysis introduced by Lerdahl & Jackendoff (1983). Previous computational models for the detection of group boundaries has been presented by e.g. Tenney & Polansky (1980), Baker (1989) and Cambouropoulos (1997). However, the punctuation differs from a grouping analysis in two ways: (1) The grouping analysis is often assumed to model the cognitive grouping of events by the listener, i.e. performance cues can be used in the analysis, such as micropauses introduced by the performer. The automatic punctuation, on the other hand, tries to model the musicians choice of punctuation on the basis of the score only. (2) The punctuation model attempts to find the positions where a comma would be appropriate in the performance rather than find all possible places that emerges from a grouping analysis.

RULE APPROACH

Method

The main development of the punctuation rule has been by means of the analysis-by-synthesis method using the program Director Musices (Friberg, 1995b). The input is the coded music score using only pitch and duration information. Other marks such as articulation signs are disregarded. To identify appoggiaturas, the harmonic analysis is also included in the score description. As in our previous developments of performance rules (see e.g. Friberg, 1995b; Sundberg et al. 1991), Lars Frydén has served as the musical expert providing both musical judgement of synthesized performances and most of the musical ideas used for elaborating the rule. The rule has been developed and tested using a music material of about 60 melodic excerpts of various performers and styles, mainly from the classical repertoire.

Description of the punctuation rule

The overall strategy of the rule is first to identify potential, weighted locations of commas on the basis of the melodic context. The maximum context length is five notes, two notes before and two notes after the current note. This is realized by means of 6 subrules which mark each note that precede a potential comma with a weight number. If several subrules mark the same position, the weights are added. In Figure 1 the five major melodic contexts for potential comma locations are shown. They can divided into two major types of contexts. (1) Contrast in (a) pitch (melodic leaps, Fig. 1-2) and (b) duration. i. e. at note followed by shorter note(s) (longest of five, Fig. 1-3; first short, Fig. 1-4; short between long, Fig. 1-5). (2) Melodic tension-release (appoggiatura, Fig. 1-1).

Fig 1. Five main contexts generating allocations of potential commas (indicated) according to the punctuation rule. The contexts are (1) after appoggiatura; (2) in melodic leaps; (3) after the longest note in a series of five; (4) after a note followed by two or more shorter notes of equal duration; and (5) before a note surrounded by longer notes.

To make a further differentiation of potential locations, the weight value is in most cases proportional to contexts factors. Two major principles can be identified for the scaling of weights. (1) Contrast. The weight is (a) proportional to the duration ratio between the marked note and a certain context (longest of five, first short), (b) proportional to the leap size (leap), or (c) removed from comparatively short notes. (2) Tempo. Weights are reduced for notes of shorter durations, so that the number of marks is reduced in sequences of comparatively short notes (Leap).

There are also a set of 6 subrules that handles the interaction of the potential comma locations. For example, weights will be reduced when several of the duration contexts applies to the same position or are removed on notes with relatively short durations. A more complete description of the rule is given in Friberg et al. (1997).

NEURAL NETWORK APPROACH

Description

The architecture of artificial neural networks (ANNs) used in the present research was designed starting from the ANNs that were developed in previous research used to model musical performance (Bresin et al. 1992, 1993). A modification of the input and output data structure was necessary since the task for the ANNs in the present work is to indicate the notes that have to be punctuated in a score, according to the performer observations/marks. In previous models, on the other hand, the goal of the ANNs was to give duration and loudness deviations.

The basic idea of using ANNs instead of the rules described above was an attempt to make a better model of the interaction of the subrules and to see if a more accurate result could be obtained. Different ANN architectures are under development and different training, based on different subsets of the training melodies, will be performed. In the first training experiments a typical feed-forward structure was used: it placed a high number of punctuation marks in accordance with the performer indications but at the same time introduced a high number of musically unacceptable marks. For this reason we implemented the ANN shown in Figure 2. It is the version that was included in the evaluation below.

There are 15 input nodes using data extracted form the melody. They are divided into three groups of five node each, referring to a five notes context. The third note is the one to be judged to be punctuated. The information given in the input is each note's relative pitch distance to next note, its distance in semitones from the root of the prevailing chord, and its nominal duration in milliseconds. The difference from the first network was that two feed-back neurones (from the output layer to the input one) were added in order to give to the ANN information about the decision taken on the previous note (punctuated or not punctuated).

This architecture introduced less extra punctuation marks, in respect to the previous model, and most of them are musically acceptable. Furthermore this ANN was able to detect punctuation points matching the performer indications and not considered by the ANN without feed-back neurones. Results of this ANN are showed in tables 1 and 2.

Other experiments are under development: different structures of the input data are being used, and training melodies will be divided into subsets representing different composers and music styles.

Artificial Neural Network for Punctuation Point Detection

Figure 2. Neural Network structure. This is the basic structure used to model the punctuation according to the performer.

EVALUATION

In order to assess the outcome of the punctuation a database of 52 melodic excerpts was assembled. Most of the melodies were taken from the classical repertoire, ranging from Baroque to contemporary music, but also including folk music and popular music. They were divided in two groups of 26 melodies each. The selection was made so that each group of examples from the same composer was divided equally among the two groups. The remaining melodies were randomly divided. The first group served as the optimization group and the second was used as the test group. All melodies were marked with the preferred punctuation by expert performer Lars Frydén.

Using the optimization group the weight parameters of the punctuation rule were optimized. By means of an iterative process, each weight parameter was systematically varied so as to minimize the discrepancy between the rule generated punctuations and those of the performer. The distance function that was minimized was defined as

Distance = (N_LF - N_SAME) + (abs (N_LF - N_RULE)),

where N_LF is the total number of marks by the performer, N_SAME is the total number of marks by the rule that coincide with those of the performer, and N_RULE is the total number of marks by the rule. This distance measure has the properties of both maximizing the N_SAME and to force N_RULE to be close to N_LF.

The ANNs were trained using the well known "learning by error back-propagation" algorithm. Selected examples from the optimization group were used as the training data. Only 127 notes and 37 punctuations were selected (to train the rule all the melodies in the optimization group were used, including more the 1200 notes).

Results

A typical example of punctuation marks is given in Figure 3 where the commas by the performer, the rule and the ANN are shown. This is an example where there is a general agreement between the two systems and the performer. The resulting number of commas for the optimization group for both the rule and the ANN is shown in Table 1. The corresponding results for the test group is shown in Table 2.

Schubert Unfinished Symphony: comparison between marks by the Performer, the Rule System, the ANN

Figure 3. Punctuation marks for Schubert's "Unfinished Symphony". 1 Performer, 2 Rules, 3 Neural Network.

Table 1. Inserted commas by the performer, rule and ANN in the optimization group. N_SAME is the number of commas in which both the performer and the automatic procedures marked the same position. N_SAME/N_RULE and N_SAME/N_RULE are estimates of the efficiency of the automatic punctuations.

RULE ANN

Performer N_RULE N_SAME N_SAME/N_RULE N_ANN N_SAME N_SAME/N_ANN

bachbourrek 10 14 9 0.64 12 5 0.42

bachcmajfug 29 14 14 1.00 31 19 0.61

bachhkyrie 4 9 4 0.44 7 3 0.43

bäck 8 9 7 0.78 14 5 0.36

brahms 4 9 4 0.44 9 3 0.33

chpmazop67no4 11 12 5 0.42 12 5 0.42

chpmazurka 11 12 10 0.83 10 7 0.70

clementine 13 10 8 0.80 5 3 0.60

ekor 12 13 12 0.92 10 5 0.50

handel 8 10 5 0.50 17 5 0.29

haydnkors 9 10 9 0.90 17 9 0.53

haydnquart 14 16 13 0.81 15 5 0.33

lucia 3 5 3 0.60 6 3 0.50

mendelson 8 5 5 1.00 8 6 0.75

mozamaj 12 14 5 0.36 10 1 0.10

mozebmaj 2 2 2 1.00 3 2 0.67

mozgmajquart 7 7 5 0.71 6 2 0.33

mozmenuet 4 4 1 0.25 8 1 0.13

ramel 7 7 3 0.43 11 3 0.27

sej 5 4 3 0.75 4 3 0.75

shubertheroic 0 0 0 0 0

shubertmilitary 22 18 17 0.94 23 12 0.52

shubertunfin 12 13 12 0.92 11 9 0.82

stravpetruska 3 6 3 0.50 7 1 0.14

varmeland 8 5 3 0.60 7 3 0.43

vide 15 14 13 0.93 11 7 0.64

total 241 242 175 0.72 275 128 0.46

Table 2. Inserted commas by the performer, rule and ANN in the test group. See Table 1 for explanations.

RULE ANN

Performer N_RULE N_SAME N_SAME/N_RULE N_ANN N_SAME N_SAME/N_ANN

bachchaconne 9 14 8 0.57 14 2 0.14

bachgmfuga 5 10 4 0.40 14 2 0.14

bachsarabandek 13 9 6 0.67 13 4 0.31

berwald 6 10 4 0.40 17 3 0.18

bygnan 11 7 4 0.57 15 1 0.07

chpmazurk1 6 6 3 0.50 8 4 0.50

chpwaltz 11 8 8 1.00 8 5 0.63

denforstagang 10 11 5 0.45 28 3 0.11

finnskog 12 8 8 1.00 11 10 0.91

haydnfmaj 6 11 6 0.55 7 0 0.00

haydnpuka 4 4 4 1.00 4 1 0.25

herdinna 20 21 18 0.86 18 11 0.61

mendelmidsum 10 8 7 0.88 12 1 0.08

mozalaturka 3 4 3 0.75 8 1 0.13

mozdminquart 6 7 6 0.86 6 3 0.50

mozgmaj 7 3 1 0.33 15 1 0.07

mozgminsymf 8 8 8 1.00 4 1 0.25

mozpisonat 10 14 5 0.36 15 1 0.07

roman 5 5 5 1.00 8 3 0.38

shubertavemaria 19 18 15 0.83 27 16 0.59

shubertichtraumte 18 14 12 0.86 15 8 0.53

shuberttrauer 9 3 1 0.33 15 4 0.27

sorgeliga 3 15 1 0.07 9 0 0.00

svinstad 7 6 5 0.83 7 3 0.43

varvindar 30 27 17 0.63 15 4 0.27

vila 9 8 6 0.75 15 9 0.60

total 257 259 170 0.66 328 101 0.31

As seen in the tables, the optimization succeeded in making the total number of commas by the rule close to the total number of commas by the performer. As this is not directly controlled in the ANN, these numbers are somewhat higher in this case. In many cases the ANN gave an output very close to the behaviour of the rule system, indicating that the same general principles were used in both cases.

N_SAME/N_RULE and N_SAME/N_ANN was used to estimate the efficiency of the punctuation. The relatively low values is not disappointing if we make a closer examination of the automatic procedures. It turns out that most marks are at a musically acceptable position. The following reasons have been identified regarding the difference of the automatic punctuation compared to the performers punctuation: (1) The performer was influenced by the text, especially in some well-known folk songs. One such case is "sorgeliga" in the test group where the performer chose to put a comma after each sentence instead of considering the small-scale musical structure. The rule and the ANN (with less marks) marked the musical structure in an correct way. (2) The performer sometimes made groups coinciding with the metrical structure. As the influence of meter is not taken into account, the automatic procedures has in general problems of matching such a case. However this often results in an "upbeat" phrasing instead, which in most cases is musically acceptable. One such example is "mozamaj" (Mozart A major sonata, K. 331) in which the performer put commas on the bar lines and both the rule and the ANN put commas before the note preceding the bar line. Both strategies were used by expert pianists in this piece, as demonstrated by Gabrielsson (1987). (3) Some of the places where the performer chose to omit a comma at a group boundary could not be predicted by the automatic procedures. This could be due to effects not taken into account here such as the harmonic progression or the phrase structure of the whole excerpt.

In summary, the rule and the ANN succeeded in making a musically acceptable punctuation. The modelling of the performers preferred punctuation could only be partly realized. Several aspects, such as the text and the meter was found to influence the performers choice. These aspects, as well as information about composer or music style, were not taken into account in the present models but will be included in future developments in this on-going project

ACKNOWLEDGEMENTS

We gratefully acknowledge the continuing support, co-operation and supply of ideas by Lars Frydén and Johan Sundberg. This work was supported by the Swedish National Council for Research in the Humanities and Social Sciences, by the Bank of Sweden Tercentenary Foundation and by EU TMR program (Training and Mobility of Researchers).

REFERENCES

Baker, M. (1989) "An artificial intelligence approach to musical grouping analysis", Contemporary Music Review, Spring 1989, Special issue on Science and Music
Bresin R., De Poli G., Vidolin A. (1993) "A Neural Networks Based System for Automatic Performance of Musical Scores", Proceedings of the 1993 SMAC (Stockholm Music Acoustic Conference), Stockholm.
Bresin R., De Poli G., Vidolin A. (1992) "Symbolic and sub-symbolic rules system for real time score performance", Proceedings of the 1992 ICMC (International Computer Music Conference), pp. 211-214, San Jose.
Cambouropoulos, E. (forthcoming). "Musical Rhythm: A Formal Model for Determining Local Boundaries, Accents and Metre in a Melodic Surface", in M. Leman (ed.) Music, Gestalt & Computing -Studies in Systematic and Cognitive Musicology, Springer-Verlag, Berlin
Friberg, A. (1995a). "Matching the rule parameters of Phrase arch to performances of ìTräumereiî: A preliminary study", in A. Friberg and J. Sundberg (eds.), Proceedings of the KTH symposium on Grammars for music performance May 27,1995, pp. 37-44.
Friberg, A. (1995b) "A Quantitative Rule System for Musical Expression", Doctoral dissertation, Royal Institute of Technology, Sweden
Friberg, A., Frydén, L., and Sundberg, J. (1997) "A rule for automatic musical punctuation of melodies", Proceedings of the third triennial ESCOM conference, Uppsala, Sweden, 1997.
Gabrielsson, A. (1987). "Once again: the theme from Mozart's piano sonata in A major (K. 331) A comparison of five performances", in A. Gabrielsson (ed.), Action and Perception in Rhythm and Music, Stockholm: Royal Swedish Academy of Music, Publication No. 55, pp. 81-103.
Lerdahl, F. & Jackendoff, R. (1983). A generative theory of tonal music, Cambridge, MA: The MIT Press.
Repp, B., K. (1992). "Diversity and commonality in music performance: An analysis of timing microstructure in Schumann's "Träumerei"", J. Acoust. Soc. Am. 92 (5), pp. 2546-2568.
Sundberg, J., Friberg, A. & Frydén, L. (1991) "Common Secrets of Musicians and Listeners - An analysis-by-synthesis Study of Musical Performance" in P. Howell, R. West & I. Cross (eds.), Representing Musical Structure, London: Academic press.
Tenney, J and Polansky, L. (1980). "Temporal Gestalt Perception in Music", Journal of Music Theory, 24, 205-241.
Todd, N., P., McA. (1985). "A model of expressive timing in tonal music", Music Perception 3, pp. 33-58.
Todd, N., P., McA. (1992). "The dynamics of dynamics: A model of musical expression", J. Acoust. Soc. Am. 91 (6), pp. 3540-3550.

			RULE			ANN
	Performer	N_RULE	N_SAME	N_SAME/N_RULE	N_ANN	N_SAME	N_SAME/N_ANN
bachbourrek	10	14	9	0.64	12	5	0.42
bachcmajfug	29	14	14	1.00	31	19	0.61
bachhkyrie	4	9	4	0.44	7	3	0.43
bäck	8	9	7	0.78	14	5	0.36
brahms	4	9	4	0.44	9	3	0.33
chpmazop67no4	11	12	5	0.42	12	5	0.42
chpmazurka	11	12	10	0.83	10	7	0.70
clementine	13	10	8	0.80	5	3	0.60
ekor	12	13	12	0.92	10	5	0.50
handel	8	10	5	0.50	17	5	0.29
haydnkors	9	10	9	0.90	17	9	0.53
haydnquart	14	16	13	0.81	15	5	0.33
lucia	3	5	3	0.60	6	3	0.50
mendelson	8	5	5	1.00	8	6	0.75
mozamaj	12	14	5	0.36	10	1	0.10
mozebmaj	2	2	2	1.00	3	2	0.67
mozgmajquart	7	7	5	0.71	6	2	0.33
mozmenuet	4	4	1	0.25	8	1	0.13
ramel	7	7	3	0.43	11	3	0.27
sej	5	4	3	0.75	4	3	0.75
shubertheroic	0	0	0		0	0
shubertmilitary	22	18	17	0.94	23	12	0.52
shubertunfin	12	13	12	0.92	11	9	0.82
stravpetruska	3	6	3	0.50	7	1	0.14
varmeland	8	5	3	0.60	7	3	0.43
vide	15	14	13	0.93	11	7	0.64
total	241	242	175	0.72	275	128	0.46