Performance of Musical Scores by Means of Neural Networks
Progress and Status Report
Roberto Bresin
C.S.C. - Centro di Sonologia Computazionale
Universita' di Padova
via S. Francesco 11 - 35121 Padova - Italy
Phone: ++39 49 8273757 - Fax: ++39 49 8273733
E.mail: rb@csc.unipd.it
Speech, Music and Hearing Department
Royal Institute of Technology
Box 70014 - 10044 Stockholm - Sweden
Phone: ++46 8 790.7876 - Fax: ++46 8 790.7854
E.mail: roberto@speech.kth.se
Abstract
The research on automatic performance by means of neural networks (NNs) made at
C.S.C. since 1990 is briefly summarised . In particular the attention is focused on the
evolution of the architecture of the NNs employed: from the first simple model, able to
learn some performing rules, to the last one, that can well simulate the style of a real
pianist. A description of the program MELODIA is also given: this program, developed
at the C.S.C., allows to run both rules, and NNs, and to study the deviations obtained.
Further developments of the research are presented, in particular the use of the fuzzy set
theory together with the NN approach.
- Introduction
At the C.S.C., the research on the automatic
performance of musical scores began in 1990. After a
first experience based on expert systems, we decided to
try the way of artificial neural networks (NNs) since
they promised to solve complex problems, which were
not easy to overcome with traditional computational
techniques.
The present research team is composed by Giovanni
Umberto Battel, professional pianist and teacher of
piano at the Conservatory of Music of Venice
"Benedetto Marcello", Giovanni De Poli, professor of
Informatics at the Department of Electronics and
Informatics of Padua University and director of the
C.S.C., Alvise Vidolin, live electronics performer and
teacher of computer music at the Conservatory of
Music of Venice, and me, engineer at the C.S.C. and
free-lance researcher. Very important is the support to
research given by thesis works of various students.
- Towards a neural networks based model
The main problem when using NNs is to identify the
input and output neurones, and to define the structure
of the net itself.
We started our investigation studying the set of rules
for performance developed by Friberg, Fryden,
Sundberg, and co-workers, at the Speech Department
of the Stockholm KTH (Sundberg, 1991; Friberg, 1991).
Since the equations of these rules have input
parameters which can be deduce from a direct reading
of the musical score, these parameters were used as
inputs for our growing NNs. As outputs for the NNs we
used the same outputs of the "KTH rules": in a first step
of the research we considered only the time deviation
and the loudness deviation parameters, and in further
models we introduced also an output node taking into
account of the deviation of another variable that is very
similar to the off-time duration parameter described in
the KTH rules system (Bresin & Vecchio, 1994).
As deeply showed in a previous work (Bresin &
Vecchio, 1994), it is possible to represent the KTH rules
system with the following equation:

where the meaning of each parameter is the following:
yn deviation of time or of loudness
fi() function that represents the i-th KTH rule
determined in an heuristic way
xn represents the vector of the parameters used
by the KTH. rules applied at the n-th note vector of
parameters used by the rule applied to the n-th note
ki constant used to emphasise the deviation due
to each i-th rule
The C.S.C. model could be written in the following
form:

where the meaning of each parameter is the following:
yn deviation of time or of loudness
net() Neural Net, non linear function that computes
the deviations
xn is the same of equation (2.1)
k vector of constants or of variables which
evolve very slow in time
- Evolution of the architecture of the net() function
After the choice of the input and output parameters, the
next step in the design of a NN is to built its structure.
In the following it is briefly describe the evolution of
the architecture of the NNs used in our research on
automatic performance, starting from the first classical
feed-forward model to the most recent architecture, the
ecological-predictive NN.
- Feed-forward NN
We first adopted this model of NN, trained with the
back-propagation training algorithm, since it is one of
the most widely studied models, and it is suitable to
solve problems in which it is asked a reaction in
presence of some stimuli. The NN in figure 3.1 shows
the final structure of the feed-forward NN that we use
in our experiments: the MLP (Multi Layer Perceptron)
is constituted of two hidden layers. As comes out from
this figure, the variables (stimuli) in input to the NN are
deduced from the KTH rules system, and so it is for the
outputs (reaction). This NN was trained to reproduce
the KTH rules: some rules were applied to a musical
score, and their results were summed up and taught to
the NN. Some listening tests (Battel et al., 1993, 1994)
pointed out that the NN learned well the rules: the
performances with NN get a greater mean rate with
respect to the performances obtained with the KTH
rules system.
Figure 3.1: Feed-forward NN. Outputs:
DPD=Deviation of Performing Duration,
DMD=Deviation of Metrical Duration. Inputs:
ND=Nominal Duration, MC=Melodic Charge,
LP=Leap Presence, SL=Semitones in the Leap, S/E
P=Start/End of a Phrase, RN=Repeated Note,
LA=Leap Articulation
- Ecological NN
Since the feed-forward NN described above only works
in a one or two notes context, it was necessary to
introduce at least the information on the last deviations
made by the NN, in order to give a certain continuity in
the performing action of the NN. So it was built a feed-
forward NN with feed-back neurones, the so called
ecological NN (Fig. 3.2). The word ecological is used by
psychologists in order to stress the action of the NN on
the surrounding environment: in our case the feed-back
input-neurones could represent the ears of the artificial
pianist listening to himself while playing, the other
input-neurones represent the eyes of the artificial
pianist, and the output-neurones could be seen as the
hands of the artificial pianist. In figure 3.2 there are
three new special input-neurones (C1, C2, and C3),
which we called context neurones: they take into
account the attributes related to two notes after the
current one. These new nodes were necessary to teach
to the NN particular situations: i.e. there could be two
notes with the same input attributes corresponding to
different output deviations, in this case the context
nodes allow to override the ambiguity if they give
information related to different notes in the two
situations.
- Ecological-predictive NN
Since the beginning of our research, one of the desired
tasks was to teach to a NN the performing style of a real
pianist. To solve this task we used the NNs showed
until now, improving the results while utilising more
complex architectures. With the NN described in this
paragraph we achieved the best results until now (Fig.
3.3): you can notice that there are two NNs similar to
that of figure 3.2, linked together by the two output-
neurones of the lowest NN, which represent the time
deviations related to the n+1 note, if n is the current
note. In this way the artificial pianist can predict which
could be the values of the next deviations.
Figure 3.2: Ecological NN with context-neurones
Figure 3.3: Ecological-predictive NN
All the NN models described in these paragraphs have
to be intended applied also to obtain loudness
deviations. In particular the NN of figure 3.3
approximates very well the style of the real pianist used
for the training, who played on a Disklavier Grand
Piano connected via MIDI to a PC (Bresin & Vecchio,
1994).
- The MELODIA program
is a program implemented at the C.S.C. in
order to better understand the KTH rules, to test new
rules, to produce input patterns for particular NNs, and
to test NNs previously trained.
The program allows to perform via MIDI any score,
previously written in a simple language, applying some
symbolic rules (19 rules). Many of these rules were
chosen from those proposed by Sundberg and co-
workers at the KTH (Sundberg, 1991; Friberg, 1991),
other rules are a modified version of KTH rules, and
there are also completely new rules. In respect to the
KTH rules system, MELODIA considers also rests,
grace-notes, staccato, and legato. The application of
rules occurs in an interactive way: each time the user
can choose the rule to apply, and its weight. When the
choice of rules session is finished, the user is asked to
listen to the results or to view them on the screen. The
view on the screen shows both the time deviations
graphic (in milliseconds) and the loudness deviations
graphic (in decibel) due to the applications of the
chosen rules to the score. After this phase the user has
the possibility to begin a new session with another
score or to listen to, and to view previously processed
scores. Applying the symbolic rules to any score, it is
also possible to produce in output some patterns files to
train or to test new neural networks. In fact a new
important feature of MELODIA is the possibility to
perform scores with previously trained neural networks:
it is possible to load the structure and the weights of
two neural networks at a time (one for loudness
deviations, and another one for time deviations) and to
perform any score. The scores to be performed with
MELODIA can be written in standard MIDI so that
they can be produced with any score editor and than
processed with MELODIA. The performances
produced are then saved in standard MIDI files.
The program runs on a personal computer IBM
compatible, and with a Roland MPU-401 MIDI
interface or compatible.
- Future developments and applications of the model
At the present we are planning new listening tests
(using the NN of figure 3.3 ), and further steps in the
research. One is to understand the use of the pedals in
piano performance, since we have the possibility to use
a Disklavier, and to try to formulate some rules.
Another step will be the application of MELODIA, and
of the NNs to the performance of MUSIC V or Csound
scores, to help those computer music composers which
apply performing rules to their compositions (one of
the most important activities at the C.S.C. is the
computer music production).
Another idea is to integrate our model in a more
complex environment for automatic performance, like
an artificial orchestra, in which NNs, rules, human
performers interact. We are working also on a version
of the rule system of the KTH based on the fuzzy set
theory. In this way it is possible to deal with uncertainty
and, at the same time, to have a rules structured
knowledge. Furthermore we want to combine the NN
approach with the fuzzy set one, in view of a fuzzy
controller for the performance deviations. In this way
the capabilities of the NN approach to extract
behaviours from examples, and the fuzzy approach of a
knowledge structured by linguistic variables.
- Conclusion
In conclusion I would like to stress the importance of a
symposium like this one in Aarhus: it is certainly a great
opportunity to achieve future developments in the
fascinating field of automatic performance. The
research presented in this paper starts itself from
another work, made at the KTH of Stockholm
University. My hope is that someone of us will built
some particular instruments, which will allow to human
performers to better play computer generated music in
order to reduce the gap between composers and
audience.
References