Please read news on problems with Opera 7.11 browsers

Session 10 - Music performance

INVITED
Large-scale performance studies with intelligent data analysis methods
G Widmer
University of Vienna, Department of Medical Cybernetics and Artificial Intelligence, Vienna, Austria

The paper reviews current work in the area of AI-based computational performance research. The focus is on approaches that study large amounts of empirical data (performance measurements) with advanced data analysis methods. It is shown that intelligent computational techniques (Artificial Intelligence, machine learning, pattern recognition) are essential in all the phases of such data-intensive investigations, from the extraction of expressive information from (audio) recordings through data visualisation to autonomous intelligent data analysis. Examples from our recent work demonstrate what is currently possible. In particular, we focus on a large research program that studies aspects of the performance style of concert pianists with novel visualisation and data analysis methods. A number of serious limitations of this work will become apparent, and we will take these as starting points to formulate some challenging directions for further research.

INVITED
Studies of music performance: a theoretical analysis of empirical findings
P N Juslin
Uppsala University, Department of Psychology, Uppsala, Sweden

The goal of this paper is to outline a psychological approach to expression in music performance that can help to provide a solid foundation for teaching of expression in music education. I will argue that performance expression is a problem amenable to empirical investigation, and that psychological theory is critical to an understanding of this problem. In my view, a psychological approach to performance expression should consider how this phenomenon reflects human characteristics that are not necessarily unique to the music domain. Research reviewed in this paper will provide support for this view. Drawing on previous research, I propose that performance expression is best conceptualized as a multi-dimensional phenomenon consisting of five primary components: (a) Generative rules that function to clarify the musical structure, (b) Emotional expression that serves to convey emotions to listeners, (c) Random variations that reflect internal time-keeper variance and motor delays, (d) Motion principles that prescribe that certain aspects of the performance should be shaped in accordance with biological motion, and (e) Stylistic unexpectedness that involves local deviations from performance conventions. An analysis of expression in terms of these components, referred to as the GERMS model, has important implications for research and teaching of music performance

ORAL
Expressive Director: a system for the real-time control of music performance synthesis
S Canazza¹, A Roda¹, P Zanon¹, A Friberg²
¹University of Padova, Information Engineering, Padova, Italy; ²Royal Institute of Technology, Speech, Music and Hearing, Stockholm, Sweden

The Expressive Director is a system allowing real-time control of music performance synthesis, in particular regarding expressive and emotional aspects. It allows a user to interact in real time, for example, changing the emotional intent from happy to sad or from a romantic expressive style to a neutral while it is playing.
The model is based on the concept of expressive profiles, which contain for each note, the expressive deviations in terms of intensity, timing, and articulation. Each profile represents an expressive style, for example an emotion, a musical style, or a single rule. The expressive profiles can be calculated in Director Musices. It is a program that contains about 30 rules for modelling music performance resulting from a long-term research project. Expressive profiles for any combination of rules can be calculated, thus, all rules in Director Musices can be used for real-time control in Expressive Director.
One possible expressive input is to move a pointer on the screen in a two-dimensional space (called "Control Space") in which the expressive profiles are mapped. For example, the Control Space can be made so as to represent the Valence-Arousal space from music psychology research. Depending on the position in the Control Space, the system applies the expressive profiles to the music output on the base of the pre-processed rule palettes.

ORAL
Estimation of the note durations in a polyphonic piano transcription system
A M Barbancho, I Barbancho, L J Tardon
Universidad de Malaga, Ingeniería de Comunicaciones, Malaga, Spain

When a piano piece is performed, the same note durations may last a different while due to expressiveness reasons (ritardando, accelerando...) and to the human performer. Therefore, the automatic determination of the note durations needs certain musical intelligence to cope with these effects.
In this paper, a system for determining the note durations and the measure division of a polyphonic piano piece is presented. The inputs for this system will be the instants when each onset takes place together with its intensity, and the duration, in seconds, of each note. The beat and the time signature can be either inputs for the system or calculated by the system. It will also be considered the possibility of giving some information about the style of music under analysis to the system. In order to do the mapping into note durations, the system will not analyze the notes in an isolated way, but it will consider sets of notes. In order to do the whole process, several techniques will be tried like fuzzy logic based techniques or some minimization algorithms.

ORAL
Expressiveness analysis of virtual sound movements and its musical applications
A de Gotzen
CSC-DEI Universita` di Padova, Padova, Italy

This paper describes a work which investigates sound motion expressiveness. This work is divided in three parts: the design of a perceptive test, the statistical analysis of the collected data and then a musical application. The main idea is to find a way to use sound motion like a musical parameter to convey a specific expressive content related to performance gestures. The purposes of this work can be stated as follow:
- To show how multi-modality can be conveyed by acoustic means by changing the position of sound objects into a virtual space;
- To open a new multi-modal channel to convey expressiveness;
- To establish an explicit connection between sound movement and expressiveness;
These purposes are related to the idea that much contemporary music can take advantage of expressive spatialization. In particular, this framework has been used in the opera Medea by Italian composer Adriano Guarnieri. The need for this kind of investigation comes from some contemporary music works in which sound motion is a musical parameter like intensity, timbre, pitch etc.. While the connection between music and emotion of these latter parameters has been deeply investigated spatialization still remains a open research path.

POSTER
A statistical approach to expressive intentions recognition in violin performances
R A P Dillon
University of Genoa, DIST, Genova, Italy

This paper presents a possible approach to the automatic understanding of expressive intentions in actual music performance. In particular we will analyze three performances (played with different expressive characteristics such as "brillante", "agitato" and "cupo") of the same excerpt played by a professional violinist and propose a classification system based on the extraction and analysis of a set of simple audio parameters, extracted in real time by a set of tools implemented in the EyesWeb open platform. The cue extraction process is carried out by looking at a "note" and a "phrase" profile (obtained by squaring and low pass filtering the incoming signal using different cut off frequencies). These are compared to obtain, for each note, parameters regarding note duration, articulation and dynamics (needed to point out how the performance is evolving). Statistical analysis is then performed on the extracted data to see if it is possible to find remarkable differences between the original performances. Once this is confirmed, a Hidden Markov Model, developed in MatLab and able to recognize the player intentions having the audio cues as inputs, is explained.

POSTER
Automatic analysis of performance variables from audio
A Friberg¹, E Schoonderwaldt¹, P Juslin²
¹KTH, Speech, Music and Hearing, Stockholm, Sweden; ²Uppsala University, Dept. of Psychology, Uppsala, Sweden

Detection of played notes from audio input is often a useful feature in various music computer systems. Previous algorithms often focus on the detection of tone onsets and pitches resulting in a score representation, for example in form of a midi file. For the purpose of studying the expressive communication between player and listener, we present an algorithm that extracts a larger set of performance variables. Audio input is segmented into tone onsets and offsets using a combination of pitch and sound level analysis. Onset candidates are found by identifying areas of similar pitch and by substantial dips in the sound level. Several post-processing steps enhance the recognition. For each detected tone the following variables are computed: pitch, sound level, instantaneous tempo, articulation, attack velocity, spectral content, vibrato rate and vibrato extent. It has been tested with the instruments violin, flute, voice, and electric guitar. Preliminary results using violin and flute indicate an average tone detection accuracy of 97 %. The algorithm has been used in a system for quantifying emotional communication between musicians and listeners. A real-time version has been used in the interactive expressive game ESP.

ORAL
Melodic characterization of monophonic recordings for expressive tempo transformations
E Gómez¹, M Grachten², X Amatriain¹, J L Arcos³
¹Universitat Pompeu Fabra, Music Technology Group, Barcelona, Spain; ²CSIC - Spanish Council for Scientific Research , IIIA-CSIC-Artificial Intelligence Research Institute, Barcelona, Spain; ³CSIC - Spanish Council for Scientific Research, IIIA-CSIC-Artificial Intelligence Research Institute, Barcelona, Spain

The work described in this paper aims at characterizing tempo changes in terms of expressivity, in order to develop a transformation system to perform expressive tempo transformations in monophonic instrument phrases. For this purpose, we have developed an analysis tool that extracts a set of acoustic features from monophonic recordings. This set of features is structured and stored following a description scheme that is derived from the current MPEG-7 standard. These performance descriptions are then compared with their coresponding scores, using edit distance techniques, for automatically annotating the expressive transformations performed by the musician. Then, these annotated performance descriptions are incorporated in a case-based reasoning (CBR) system in order to build an expressive tempo transformations case base. The transformation system will use this CBR system to perform tempo transformations in an expressive manner. Saxophone performances of jazz standards played by a professional performer have been recorded for this characterization. In this paper, we first describe which are the acoustic features that have been used for this characterization and how they are structured and stored. Then, we explain the analysis methods that have been implemented to extract this set of features from audio signals and how they are processed by the CBR system. Results are finally presented and discussed.

POSTER
Measurements and models of musical articulation
J Jerkert
KTH, Speech, Music and Hearing, Stockholm, Sweden

It has been known for some time that many aspects of interpretation in Western music - such as final ritardando, intonation of high and low tones, exaggeration of tone length contrasts, etc. - can be partly formulated in terms of general principles. Although different interpreters of course play differently, they still have to follow some basic rules to be considered acceptably musical at all. This investigation tries to find out whether any general rules can be formulated for musical articulation. Articulation is here taken to be the binding together or the separation of the individual tones, for example if a note is played staccato or legato. A number of commercially available recordings were analyzed, most notably Bach fugue themes on the organ. The onsets and offsets of each tone were estimated from spectrograms. Preliminary results show that different performers articulate the melodies in a similar way, thus indicating the possibility of rule formulations. More complete results will be presented at the conference.

POSTER
The percussion instrument operation synchronized with the visual and/or auditory stimuli
T Kamitani, K Kouzuki, M Matsuda
Osaka Electro-Communication University, Joint Institute for Advanced Multimedia Studies, Sijyonawate, Japan

In recent years, the video game machines, imitated the actual musical instrument, have been developed. By using this kind of machine, those who are inexperienced in a musical instrument can also enjoy a percussion instrument by synchronizing the video images and the sounds. However the human recognition mechanism induced by the visual and/or auditory stimulus is not clear. Then the accuracy of human response time induces by visual and auditory stimuli is measured and discussed in this paper. From the result of the experiment, it became clear that the accuracy of the percussion instrument tapping is improved when the specific visual stimuli are given. And the auditory stimuli did not influence on the accuracy of the percussion instrument tapping under the kind of visual stimulus.

ORAL
Play it again with feeling: Feedback-learning of musical expressivity
J Karlsson¹, P N Juslin¹, A Friberg², E Schoonderwaldt²
¹Uppsala University, Department of Psychology, Uppsala, Sweden; ²Royal Institute of Technology, Speech Music Hearing, Stockholm, Sweden

Findings from empirical research suggest that musicians regard expressive skills as being extremely important in music performance. Yet expressive skills are commonly neglected in music education, perhaps because expression involves tacit knowledge that is difficult to convey from teacher to student. To address this problem, we have created a computer program that aims to improve a performer's communication of emotions by providing feedback that allows the performer to compare his or her performance strategy with an "optimal" performance strategy. The program involves recording of performances, automatic analysis of acoustic parameters, and regression modeling of the performer's playing strategy. The aim of this study was to evaluate the new program in collaboration with music students. The study combined methods from experimental psychology with usability evaluation. Preliminary findings suggest that the program is effective in improving performers' communication of emotions, though the usability measures also show that certain aspects of the program can be improved. The proposed program could potentially help to clarify the elusive relations between the performer's intentions, acoustic variables in the performance, and the listener's impression. Thus, computer-assisted teaching might serve as a complement to traditional teaching strategies, allowing musicians to experiment freely with interpretative ideas.

POSTER
Aspects on time modification in score-based performance control
M Laurson¹, M Kuuskankare²
¹Sibelius Academy, Music and Technology, Helsinki, Finland; ²Sibelius Academy, Department of Doctoral Studies in Musical Performance and Research, Helsinki, Finland

There has been already several attempts to fine-tune timing information when translating a musical score into a computer generated performance. Lately this approach has become even more important. First, recent advances within the research dealing with modeling of acoustical instruments has lead to a situation where we must find new efficient control strategies that allow to play these instruments. Second, while various real-time controllers are been actively developed, this technology is not yet mature enough to mimic the complex interaction between a human performer and an acoustical instrument. Our approach using a musical score as a starting point to generate performance control information has several attractive features: high-level musician-oriented user-interface, flexibility (almost immediate feedback of the system after editing the score) and precision. In this paper we will describe in detail how our notation package can be used to modify timing of a input-score containing basic musical information such as pitches and rhythms. Our tools can be classified as follows. First, we have tools that allow to control timing information in a global way (tempo functions and global performance rules). Second, there are several tools that allow to modify timing locally (local rules, offset-time editor, standard and non- standard expressions).

POSTER
Interaction among physical acoustic knowledge and performance in the accordion and its educational side
R Llanos-Vazquez¹, E Macho-Stadler¹, J Alonso-Moral², M J Elejalde-Garcia¹
¹Universidad del Pais Vasco, Fisica Aplicada 1, Bilbao, Spain; ²Conservatorio Superior de Musica Juan Crisostomo de Arriaga, Bilbao, Spain

The vibration function of the free tongue, as well as the mechanisms of buttons and bellows, have a fundamental importance for the sound production of the accordion. The objective of this work is to show how to organize a corpus of physical acoustic knowledge useful for the playing and for the teaching of this instrument. Concretely, we focus on the analysis of hard attacks (of finger) versus soft attacks (of bellows), different button lowering velocities in the attacks, spectral analysis of the same tones in different registers, bellows direction change, differentiations among different "mussettes", as well as a connection among sound envelopes and expressivity's elements. The interactive development among the physical side and the musical side seems to provide some clear and precise possibilities for the educational and playing activity, and it contributes knowledge so that those activities be more effective. Likewise, with the physical acoustic references, in this work there are generated coherent fundamentals on different ways of the accordion playing.

ORAL
Time domain note average energy based music onset detection
R Liu¹, N J L Griffith¹, J Walker², P Murphy²
¹University of Limerick, Computer Science and Information Systems, Limerick, Ireland; ²University of Limerick, Electronic and Computer Engineering, Limerick, Ireland

This paper describes a novel time domain strategy for the detection of note onsets in music as the first step in music transcription. The detection of onsets in music base is based on the changing energy level. The proposed method involves a technique for calculating what is called the note average energy (NAE). NAE is insensitive to both the dynamic range of the overall energy level of the musical piece and whether the song in question is monophonic or polyphonic. More importantly, the new strategy circumvents the threshold problem that generally has dogged previous onset detection methods. The performance of the NAE method is illustrated by its performance on a range of music pieces played on a variety of different instruments.

POSTER
Application of Bayesian Networks to automatic recognition of expressive content of piano improvisations
L Mion
University of Padua, Department of Information Engineering, Padua, Italy

Expressive content detection in human communications can be approached from various perspectives. We focus on the expressiveness conveyed by subjects while performing piano improvisations. A model based on Bayesian Networks for the automatic recognition of a performer's expressive intentions will be presented. Performances were inspired by a set of eight sensorial adjectives, and a Bayesian Network for each of the suggested intentions was designed. Improvisations of six performers were recorded and analysed yielding several relations between expressive intentions and acoustic parameters. Factor analysis on perceptual evaluation of performances determined in how many dimensions listeners discriminate the performances. This analysis was used to set the conditional probabilities among the nodes of each network. The purpose is to learn Bayesian Networks according to listeners' judgement, in order to make networks' behaviour as similar as possible to human perception. The whole set of analyses and methodologies will be described, explaining how results are taken into account to learn the networks. Results show that better automatic recognition is gained when intentions can be explained in terms of variations of acoustic parameters within a short time window. When performers used very different strategies for rendering a given expressive intention, the corresponding network presents uncertain decision rules, thus making the recognition harder.

ORAL
ESP - an interactive collaborative game using non-verbal communication
M L Rinman¹, A Friberg², I Kjellmo³, A Camurriº, , D Cirotteau, S Dahl², B Mazzarinoº, B Bendikesen³, H McCarthy
¹Royal Institute of Technology, Centre for User Oriented IT-design, Stockholm, Sweden; ²Royal Institute of Technology, Speech, Music and Hearing, Stockholm, Sweden; ³Octaga/ Telenor, Oslo, Norway; ºUniversity of Genoa, InfoMus Lab, DIST, Genoa, Italy

The interactive game environment EPS (expressive performance space), presented in this short paper, is a work still in progress. EPS involves participants in an activity using non-verbal emotional expressions. Two teams use expressive gestures in either voice or body movements to compete. Each team has an avatar controlled either by singing into a microphone or by moving in front of a video camera. Participants/players control their avatars by using acoustical or motion cues. The avatar is navigated /moving around in a 3D distributed virtual environ-ment using the Octagon server and player system. The voice input is processed using a musical cue analysis module yielding performance variables such as tempo, sound level and articula-tion as well as an emotional prediction. Similarly, movements captured from the video camera are analyzed in terms of different movement cues. The target group is children aged 13- 16 and the purpose is to elaborate new forms of collaboration.

POSTER
Mapping a physical correlate of loudness into the velocity space of MIDI-controlled piano tones
T Taguti
Konan University, Information and Systems Engineering, Kobe, Japan

This paper presents a method of mapping a physical correlate of loudness into the velocity space for piano tones of MIDI-controlled musical instruments. The proposed method is based on (1) a listening experiment to determine the contour of equal loudness, and (2) a measurement of the values of the concerned physical correlate to the loudness. To fix the method, the equivalent continuous A-weighted sound pressure level (LAeq) was taken as the physical correlate. The experiment was done with a particular sound synthesis module. The paired comparison was conducted on the notes in the range C3 ~ C7 by setting C4 of different velocities as the standard stimuli, where the durations of the standard and comparison stimuli were 500ms each, being separated by a 500ms silence. The subject was a female who was expert in piano playing. By using the LAeq values in dB of note C4 as the interval scale, an interpolation formula that maps dB to velocity was established. This formula was implemented as part of a computer-aided piano performance system so that the dynamic expression is specified in the units of dB in the above sense.

ORAL
On the relation between performance cues and emotional engagement
R Timmers, A Camurri, G Volpe
University of Genova, DIST, Genova, Italy

A variety of factors may influence listeners' emotional engagement with a musical performance. It may depend on the sight of the pianist, the performers' interpretation of the music, or the musical experience of the listener. In an explorative study, we investigated what aspects of a performance are most strongly related to listeners' emotional engagement and considered cues present in sound as well as in the performer's movement. Twelve participants were presented audio recordings and twelve participants were presented audio and video recordings of three performances of an Etude of Skriabin. Their task was to indicate their emotional engagement with the music by moving a slider up and down. The results can to a large extent be described as listeners' synchronization with performance aspects: higher energy levels in movement and sound of the performance corresponded to higher levels of emotional engagement, while lower levels had lower engagement. A great deal of overlap existed between the cues of the movement and the audio suggesting an unambiguous interpretation of the music by the performer. This interpretation was best revealed by the shape of energy within a phrase: depending on the shape, the phrase was interpreted as tensing, relaxing or balanced.

POSTER
Analysis of jazz drummers' movements in performance of swing grooves - a preliminary report
C H Waadeland
Norwegian University of Science and Technology, Department of Music, Trondheim, Norway

'Swing groove' is often used to denote a typical rhythmic cymbal pattern played by jazz drummers. Various ways of swing performances may be highly individual for different drummers. (In fact, the drummer may sometimes be identified by the way the swing groove is performed.) Moreover, various performances of the swing groove may in varying degree be musical appropriate (more or less "correct" related to various styles or traditions of performance), and may also in varying degree be "swinging" (i.e., make you want to "swing along with the music").- In this investigation we pose the following question: What can empirical studies of a jazz drummer's movements tell us about the way the drummer is performing swing grooves? - In our search for some answers to this question we have constructed a series of experiments where kinematic as well as dynamic aspects of jazz drummers' performances of swing grooves are measured and analysed. These experiments are, at present, part of an ongoing investigation, and careful analysis has still to be done in order to give valid results. However, to give an idea of our approach, we outline the basic constituents of the experiments and discuss some methods of analysis. Moreover, we present some speculative thoughts and suggest some possible consequences of our investigation.

POSTER
A singing transcription system using melody tracking algorithm based on Adaptive Round Semitone (ARS) plus music grammar constraints
C K Wang¹, R Y Lyu¹, Y C Chiang²
¹Chang Gung University, Dept. of Electrical Engineering, Kwei-Shan Tao-Yuan, Taiwan; ²National Tsing Hua University, Institute of Statistics, Hsinchu, Taiwan

In this paper, a new approach is proposed for melody tracking used in an automatic singing transcription system. The melody tracker is based on Adaptive Round Semitone (ARS) algorithm, which converts a pitch contour of singing voice to a sequence of music notes by dynamically quantizing pitch in frequency (Hz) to a set of MIDI numbers, which correspond to the music note names uniquely. The pitch of singing voice is usually much more unstable than that of musical instruments. A poor-skilled singer may generate voice with even worse pitch correctness. Not only the pitch level could drift upward or downward due to singers' moods, but also the tune scale may be hard to keep constant during singing. ARS deals with these issues by using an adaptively autoregressive model, which changes singers' tune scale of the current note dynamically due to information of the previous notes. The ARS algorithm has been tested with 117 pieces of songs sung by 13 people roughly classified into normal and poor singers. The error rate achieved is 22.8% for normal singers and 24.1% for poor singers. Compared with the other approaches, ARS achieves the lowest error rates for poor singers and seems much more insensitive to the diversity of singers' singing skills. Furthermore, by adding on the transcription process a heuristic music grammar constraints based on music theory, the error rate can be reduced to 19.8%, and 20.5%, which beats all the other approaches mentioned in the other literatures.

ORAL
Learning to recognize famous pianists with machine learning techniques
P G Zanon¹, G Widmer²
¹University of Padua, DEI, Padova, Italy; ²University of Vienna, OeFAI, Vienna, Austria

In this paper we address the question whether is possible for a machine to learn to identify famous performers (pianists) based on their style of playing. The task has been accomplished on the basis of features extracted from the original audio CD recordings. Different Mozart piano sonatas performed by several famous pianists have been used to train some machine learning algorithms and to test the reliability of the learning process. Systematic experiments were carried out with the aim of covering a number of classifiers and their possible configurations.
The results show that the machine can correctly identify the performer of a recording with a probability significantly higher than chance. Some pianists showed s particularly high degree of recognition. Moreover, a systematic study on the recognition in all the possible pairs of performers has been carried out, revealing that some of the couples are clearly more distinguishable than the others. These findings coincide with another empirical performance style study we are currently conducting with novel visualisation techniques. As a next step, we will try to establish exactly which features, i.e., which aspects of performances are particularly predictive and indicative of particular artists.

Maintained by webmaster@speech.kth.se