Performance of Musical Scores by Means of Neural Networks Progress and Status Report


Roberto Bresin

 
C.S.C. - Centro di Sonologia Computazionale
Universita' di Padova
via S. Francesco 11 - 35121 Padova - Italy
Phone: ++39 49 8273757 - Fax: ++39 49 8273733
E.mail: rb@csc.unipd.it Speech, Music and Hearing Department Royal Institute of Technology Box 70014 - 10044 Stockholm - Sweden Phone: ++46 8 790.7876 - Fax: ++46 8 790.7854
E.mail: roberto@speech.kth.se

Abstract
The research on automatic performance by means of neural networks (NNs) made at C.S.C. since 1990 is briefly summarised . In particular the attention is focused on the evolution of the architecture of the NNs employed: from the first simple model, able to learn some performing rules, to the last one, that can well simulate the style of a real pianist. A description of the program MELODIA is also given: this program, developed at the C.S.C., allows to run both rules, and NNs, and to study the deviations obtained. Further developments of the research are presented, in particular the use of the fuzzy set theory together with the NN approach.

  1. Introduction
    At the C.S.C., the research on the automatic performance of musical scores began in 1990. After a first experience based on expert systems, we decided to try the way of artificial neural networks (NNs) since they promised to solve complex problems, which were not easy to overcome with traditional computational techniques. The present research team is composed by Giovanni Umberto Battel, professional pianist and teacher of piano at the Conservatory of Music of Venice "Benedetto Marcello", Giovanni De Poli, professor of Informatics at the Department of Electronics and Informatics of Padua University and director of the C.S.C., Alvise Vidolin, live electronics performer and teacher of computer music at the Conservatory of Music of Venice, and me, engineer at the C.S.C. and free-lance researcher. Very important is the support to research given by thesis works of various students.

  2. Towards a neural networks based model
    The main problem when using NNs is to identify the input and output neurones, and to define the structure of the net itself. We started our investigation studying the set of rules for performance developed by Friberg, Fryden, Sundberg, and co-workers, at the Speech Department of the Stockholm KTH (Sundberg, 1991; Friberg, 1991). Since the equations of these rules have input parameters which can be deduce from a direct reading of the musical score, these parameters were used as inputs for our growing NNs. As outputs for the NNs we used the same outputs of the "KTH rules": in a first step of the research we considered only the time deviation and the loudness deviation parameters, and in further models we introduced also an output node taking into account of the deviation of another variable that is very similar to the off-time duration parameter described in the KTH rules system (Bresin & Vecchio, 1994). As deeply showed in a previous work (Bresin & Vecchio, 1994), it is possible to represent the KTH rules system with the following equation:

    where the meaning of each parameter is the following:
    yn deviation of time or of loudness
    fi() function that represents the i-th KTH rule determined in an heuristic way
    xn represents the vector of the parameters used by the KTH. rules applied at the n-th note vector of parameters used by the rule applied to the n-th note
    ki constant used to emphasise the deviation due to each i-th rule

    The C.S.C. model could be written in the following form:

    where the meaning of each parameter is the following: yn deviation of time or of loudness
    net() Neural Net, non linear function that computes the deviations
    xn is the same of equation (2.1)
    k vector of constants or of variables which evolve very slow in time

  3. Evolution of the architecture of the net() function
    After the choice of the input and output parameters, the next step in the design of a NN is to built its structure. In the following it is briefly describe the evolution of the architecture of the NNs used in our research on automatic performance, starting from the first classical feed-forward model to the most recent architecture, the ecological-predictive NN.

    1. Feed-forward NN
      We first adopted this model of NN, trained with the back-propagation training algorithm, since it is one of the most widely studied models, and it is suitable to solve problems in which it is asked a reaction in presence of some stimuli. The NN in figure 3.1 shows the final structure of the feed-forward NN that we use in our experiments: the MLP (Multi Layer Perceptron) is constituted of two hidden layers. As comes out from this figure, the variables (stimuli) in input to the NN are deduced from the KTH rules system, and so it is for the outputs (reaction). This NN was trained to reproduce the KTH rules: some rules were applied to a musical score, and their results were summed up and taught to the NN. Some listening tests (Battel et al., 1993, 1994) pointed out that the NN learned well the rules: the performances with NN get a greater mean rate with respect to the performances obtained with the KTH rules system.

      Figure 3.1: Feed-forward NN. Outputs:
      DPD=Deviation of Performing Duration,
      DMD=Deviation of Metrical Duration. Inputs:
      ND=Nominal Duration, MC=Melodic Charge,
      LP=Leap Presence, SL=Semitones in the Leap, S/E
      P=Start/End of a Phrase, RN=Repeated Note,
      LA=Leap Articulation

    2. Ecological NN
      Since the feed-forward NN described above only works in a one or two notes context, it was necessary to introduce at least the information on the last deviations made by the NN, in order to give a certain continuity in the performing action of the NN. So it was built a feed- forward NN with feed-back neurones, the so called ecological NN (Fig. 3.2). The word ecological is used by psychologists in order to stress the action of the NN on the surrounding environment: in our case the feed-back input-neurones could represent the ears of the artificial pianist listening to himself while playing, the other input-neurones represent the eyes of the artificial pianist, and the output-neurones could be seen as the hands of the artificial pianist. In figure 3.2 there are three new special input-neurones (C1, C2, and C3), which we called context neurones: they take into account the attributes related to two notes after the current one. These new nodes were necessary to teach to the NN particular situations: i.e. there could be two notes with the same input attributes corresponding to different output deviations, in this case the context nodes allow to override the ambiguity if they give information related to different notes in the two situations.

    3. Ecological-predictive NN
      Since the beginning of our research, one of the desired tasks was to teach to a NN the performing style of a real pianist. To solve this task we used the NNs showed until now, improving the results while utilising more complex architectures. With the NN described in this paragraph we achieved the best results until now (Fig. 3.3): you can notice that there are two NNs similar to that of figure 3.2, linked together by the two output- neurones of the lowest NN, which represent the time deviations related to the n+1 note, if n is the current note. In this way the artificial pianist can predict which could be the values of the next deviations.

      Figure 3.2: Ecological NN with context-neurones

      Figure 3.3: Ecological-predictive NN

      All the NN models described in these paragraphs have to be intended applied also to obtain loudness deviations. In particular the NN of figure 3.3 approximates very well the style of the real pianist used for the training, who played on a Disklavier Grand Piano connected via MIDI to a PC (Bresin & Vecchio, 1994).

  4. The MELODIA program
    is a program implemented at the C.S.C. in order to better understand the KTH rules, to test new rules, to produce input patterns for particular NNs, and to test NNs previously trained. The program allows to perform via MIDI any score, previously written in a simple language, applying some symbolic rules (19 rules). Many of these rules were chosen from those proposed by Sundberg and co- workers at the KTH (Sundberg, 1991; Friberg, 1991), other rules are a modified version of KTH rules, and there are also completely new rules. In respect to the KTH rules system, MELODIA considers also rests, grace-notes, staccato, and legato. The application of rules occurs in an interactive way: each time the user can choose the rule to apply, and its weight. When the choice of rules session is finished, the user is asked to listen to the results or to view them on the screen. The view on the screen shows both the time deviations graphic (in milliseconds) and the loudness deviations graphic (in decibel) due to the applications of the chosen rules to the score. After this phase the user has the possibility to begin a new session with another score or to listen to, and to view previously processed scores. Applying the symbolic rules to any score, it is also possible to produce in output some patterns files to train or to test new neural networks. In fact a new important feature of MELODIA is the possibility to perform scores with previously trained neural networks: it is possible to load the structure and the weights of two neural networks at a time (one for loudness deviations, and another one for time deviations) and to perform any score. The scores to be performed with MELODIA can be written in standard MIDI so that they can be produced with any score editor and than processed with MELODIA. The performances produced are then saved in standard MIDI files. The program runs on a personal computer IBM compatible, and with a Roland MPU-401 MIDI interface or compatible.

  5. Future developments and applications of the model
    At the present we are planning new listening tests (using the NN of figure 3.3 ), and further steps in the research. One is to understand the use of the pedals in piano performance, since we have the possibility to use a Disklavier, and to try to formulate some rules. Another step will be the application of MELODIA, and of the NNs to the performance of MUSIC V or Csound scores, to help those computer music composers which apply performing rules to their compositions (one of the most important activities at the C.S.C. is the computer music production). Another idea is to integrate our model in a more complex environment for automatic performance, like an artificial orchestra, in which NNs, rules, human performers interact. We are working also on a version of the rule system of the KTH based on the fuzzy set theory. In this way it is possible to deal with uncertainty and, at the same time, to have a rules structured knowledge. Furthermore we want to combine the NN approach with the fuzzy set one, in view of a fuzzy controller for the performance deviations. In this way the capabilities of the NN approach to extract behaviours from examples, and the fuzzy approach of a knowledge structured by linguistic variables.

  6. Conclusion
    In conclusion I would like to stress the importance of a symposium like this one in Aarhus: it is certainly a great opportunity to achieve future developments in the fascinating field of automatic performance. The research presented in this paper starts itself from another work, made at the KTH of Stockholm University. My hope is that someone of us will built some particular instruments, which will allow to human performers to better play computer generated music in order to reduce the gap between composers and audience.

References