Please read news on problems with Opera 7.11 browsers

Session 11 - Music Perception

INVITED
Automatic transcription of music
A P Klapuri
Tampere University of Technology, Institute of Signal Processing, Tampere, Finland

The aim of this paper is to describe methods for the automatic transcription of polyphonic music. Music transcription is here understood as a transformation from an acoustic signal into a MIDI-like representation, a "recipe" which allows musically meaningful processing and analysis. Algorithms are discussed that concern three different subproblems. (1) Estimation of the temporal structure of acoustic musical signals, the musical meter. Signal processing methods and a probabilistic model are described which track the metrical pulse at three different time scales: tactus (beat), musical measure, and tatum rate. (2) Estimation of the fundamental frequencies of concurrent musical sounds. A method is described which operates iteratively by detecting and cancelling harmonic sounds which occur in the presence of other harmonic and noisy sounds. The method is compared with alternative approaches presented in the literature. (3) Higher-level musicological modeling to resolve otherwise ambiguous analysis segments. The use of musical key estimation and N-gram modeling is described. Validation experiments are performed by applying the presented methods to the transcription of both synthesized musical signals and real-world acoustic data from various musical genres.

INVITED
Sex, drugs and rock 'n' roll: The evolutionary neurobiology of hearing and hedonism
N P M Todd
Manchester University, Psychology, Manchester, United Kingdom

An issue which has been practically ignored by conventional acoustics is the loudness of many kinds of popular music, including rock and dance music. Many of these kinds of music need to be above a certain threshold, somewhere around 90 dB HL, before being considered acceptable. In this paper I outline a theory of hearing in which a primitive acoustic sense, inherited from our lower vertebrate ancestors, has been conserved in humans. This primitive sense is mediated by the sacculus and has a threshold of about 90 dB HL to air-conducted sound but only about 30 dB HL to bone-conducted sound and is maximally sensitive to low frequencies (less than 500 Hz). According to the theory a primitive acoustic central pathway to the mesolimbic dopamine system, which plays a role in reproductive vocal behaviour in lower vertebrates, has also been conserved in humans. The theory thus accounts for why loud, low frequency sound and vibration is rewarding. In support of the theory I discuss some recent experiments using EEG evoked potentials to investigate acoustic sensitivity to bone-conducted low frequency sound.

ORAL
Brain activity in perception and retention in memory of the pitch of music and speech
E Castro-Sierra¹, H A Poblano²
¹National Autonomous University of Mexico/Hospital Infantil de Mexico Federico Gomez, National School of Music, Mexico, D.F., Mexico; ²Autonomous Metropolitan University, MS Program in Neurological Rehabilitation, Mexico, D.F., Mexico

Brain activity stimulated by music and speech has been studied using fMRI and PET. These techniques commonly employ single stimuli which contrast with others in a sequence. Neural activity is elicited when one of the sounds differs from the remaining ones. These results have led to a distinction between activities of right-hemisphere areas stimulated by music sounds and activities of left-hemisphere areas stimulated by speech sounds.EEG was used to study activities, at different frequencies, of superior and inferior frontal, temporal and parieto-occipital areas of 10 subjects of either sex (age range: 11:11-16:8), native speakers of Zapotec (a Mexican tone language), while responding to a tonal memory test contrasting pairs of musical sequences, a tonal memory test contrasting pairs of speech sequences and a tonal perception test analyzing single speech sequences. α- and β-wave activities in right and left inferior frontal areas correlated with responses to either the music or the speech memory tests. Right parieto-occipital θ-wave activity also correlated with responses to the music test. α- and β-wave activities in left superior frontal, temporal and parieto-occipital areas correlated with responses to the speech perception test. These data point to differential activity of both hemispheres of the brain stimulated by the pitch of tone language samples and to a more localized right hemisphere activity stimulated by the pitch of music samples.

ORAL
Looking at perception of continuous tempo drift - a new method for estimating Internal drift and Just Noticeable Difference
S Dahl, S Granqvist
Royal Institute of Technology (KTH), Dept. of Speech, Music and Hearing, Stockholm, Sweden

The method proposed here investigates if there is such a thing as an internal representation of a "steady tempo" - and whether this representation itself is free from tempo drift. The method uses a modification of the method for Parameter Estimation by Sequential Testing (PEST). Several click sequences are presented to the listener in each test and depending on the listener's response (correct or incorrect) the magnitude of the tempo drift is modified for the next presentation. While investigating tempo detection, this method does not rule out the possibility that the internal "clock" can have an inherent tempo drift. In practice this means that some listeners will perceive that tempo is increasing when, in fact, it is decreasing, and vice versa. Preliminary results confirm that some listeners tend to place their answers "biased" towards either increasing or decreasing tempo. Our results also indicate that these listeners appear to be consistent in doing this. Thus we would like to propose a model of detectability of continuous (linear) tempo drift based on a person's Internal drift (ID), which can be isochronous or biased in either direction. Surrounding this ID is an interval corresponding to twice the Just Noticeable Difference (JND).

POSTER
What can the body movements reveal about a musician's emotional intention?
S Dahl, A Friberg
Royal Institute of Technology (KTH), Dept. of Speech, Music and Hearing, Stockholm, Sweden

Music has an intimate relationship with motion in several aspects. Obviously, movements are required to play an instrument, but musicians move also their bodies in a way not directly related to note production. In order to explore to what extent emotional intentions can be conveyed through musicians' movements only, video recordings of a marimba player performing the same piece with the intentions Happy, Sad, Angry and Fearful, were recorded. 20 observers watched the video clips, without sound, and rated both the perceived emotional content as well as movement cues. The videos were presented in four viewing conditions, showing different parts of the player. The observers' ratings for the intended emotions showed that the intentions Happiness, Sadness and Anger were well communicated, while Fear was not. The identification of the intended emotion was only slightly influenced by the viewing condition, although in some cases the head was important. The movement ratings indicate that there are cues that the observer use to distinguish between intentions, similar to the cues found for audio signals in music performance. Anger was characterized by large, fast, uneven, and jerky movements; Happy by large and somewhat fast movements, Sadness by small, slow, even and smooth movements.

POSTER
Time estimation: Isochronous versus Accelerated audio sequences
D Dall'Osto, S E Ferrer
Universitat Autonoma Barcelona, Psychology of Education, Barcelona, Spain

The duality 'Regularity-Irregularity' describes the framework of the present initial study, as a general relationship regarding time and rhythm in music. Regularity in time generally refers to some isochronous intervals of time, while 'Irregularity' refers to non-periodic and non-isochronous time intervals. Performers, who modify the musical pace with expressive intentions, introduce amounts of 'irregularity' within the metronomic regularity. Composers also vary duration and proportion by introducing gradual temporal changes, like accelerations or decelerations, so to create sort of 'goal-directed' processes. But how listeners to music do perceive those temporal deviations? A starting point could be to know whether the perception of duration is influenced by the 'content' of musical stimuli. Musical 'content' is not directly semantic; it refers to a complex internal structure that organizes together temporal and non-temporal information, based on audio, basically non-verbal, stimuli. The present work focuses on the influence of the 'temporal content' of musical stimuli on the perception of duration. Starting to study the effect of gradual processes of temporal deviation on musical time perception is the goal of the presented experiment. As a first case within the dual category regular-irregular only acceleration will be experimentally investigated.

POSTER
Timbral semantics and the pipe organ
A C Disley, D M Howard
The University of York, Department of Electronics, York, United Kingdom

Words used to describe timbre are usually difficult to define or relate to a measurable phenomenon. This paper attempts to establish if any such words are objective with a common understanding, or if they are all subjective. The pipe organ is used because it is both a complex timbral synthesiser and a non-electronic instrument from which repeatable samples can be taken.
A number of adjectives were gathered from English speaking subjects. The most common words were selected and used without specific definition as rating scales in comparative listening tests. Comparison with spectral analyses revealed possible cues for some words, and audio examples were synthesised to test these theories. Some adjectives have emerged as having degrees of common understanding across a majority of subjects.

ORAL
Hardness recognition in synthetic sounds
B L Giordano, K Petrini
Università di Padova, Dipartimento di Psicologia Generale, Padova, Italy

Sound source recognition investigates recovery of different features of the objects, whose interaction lead to the generation of the acoustical signal. Among them material type have received particular attention, while recovering of material properties, such as hardness, have been scarcely considered. Hardness plays a significant role in the musical field too, especially for percussion instruments, where resonating objects of variable hardness are struck with mallets of variable hardness. Comparison of previous results on hardness recognition point toward the perceptual independence of the resonator and exciter properties. This issue was addressed in four experiments conducted on stimuli synthesized with a physical model, which allowed independent manipulation of the exciter and resonator properties. Free identification and forced choice tasks have been used to investigate the ability of listeners to discriminate variations in the exciter from variations in the resonator. Scaling tasks have been used to investigate the relationship between the synthesis parameters and the hardness estimates of the exciter and of the resonator. Free identification and forced choice data reveal a bias toward the interpretation of the acoustical signals in terms of features of the resonating object. Hardness scaling results reveal the perceptual dependence of exciter and resonator properties, although strong individual differences are found.

POSTER
Categorical perception of microtonal intervals
E Huovinen
University of Turku, Department of Musicology, Turku, Finland

This study addresses the effect of simple tonal contexts on the categorical perception of melodic intervals. The starting point is the simple hypothesis that similar intervals may be perceived differently when placed in different positions in relation to a perceptual tonal center. The study is carried out using pitch materials that are derived form the 19-tone equal temperament, which has been proposed as one of the most plausible alternative tuning systems. A subsidiary aim is thus to vitalize the discussion concerning microtonal tunings with input from experimental psychology. An interval recognition experiment was conducted using five consecutive interval sizes from the 19-tone equal temperament. In each trial, the subjects compared two melodic intervals that were both preceded by the same "bass tone", which was supposed to create a simple tonal context for the comparison. The five basic interval sizes were used in all 19 intervallic relationships to the pitch-class of the context tone, and the last melodic interval of each pair was subjected to systematic alterations. In the discussion of the results, special interest will be directed towards differences in the perceptual interpretation of non-diatonic intervals as a function of their location within a simple tonal context.

POSTER
Towards a virtual singer : contribution to the development of the teaching of sung diction through the intermediary of a functionalised system
J Kiss
University Paris 8, Ati Inrev, St Denis, France

Learning to sing, whether within the construction of the physical vocal pattern, or the interpretive preparation of a piece of music, proceeds initially through observation and then imitation of a given physical model. One of the principal difficulties in this mimetic process frequently lies in the different perceptions of the singer and the listener. Without attempting to compensate directly for this problem, we wish to propose an interactive system that reacts to exterior data, thus facilitating the creation of a model of the singer and at the same time real time simulation of his acts. We are offering, within the framework of this undertaking, to create a basic virtual animation. It is important at this stage to define the nature of the work envisaged, which will not consist of an exhaustive project. We intend more specifically to describe the means for encoding the vocal expression and its relation to the facial expression in order to translate a feeling. We will then proceed to the development of a computer generated synthetic singer, adopting as our method the use of a process of interactive phonation represented by a synthetic actor with simplified expressions. Initially this phonation will be limited to the expression of a small number of phonemes and that this part of the process could become the basis of further concrete developments and constitute the foundation for various potential extensions.

POSTER
Musical sound parameters revisited
B Kostek, P Zwan, M Dziubinski
Gdansk University of Technology, Sound & Vision Engineering, Gdansk, Poland

Recently, a new standard MPEG7 was established. A set of parameters was defined in order to represent musical sound as a multimedia object. These so-called low level features are related to time and frequency domain of musical sounds, as well as to audio waveform parameters. Among others the following parameters were specified within the standard: log attack time, temporal centroid, spectral flatness and spectrum spread, spectral centroid, harmonic variation, etc. Some of these parameters are related to human cognition of musical sounds whereas others not or the relationship between those parameters and their perceptual meaning is not yet defined. Therefore the principal aim of this paper is to carry on a discussion as to which of the parameters defined within MPEG7 standard could be related to the musical timbre. This is done by means of listening tests. Additionally, the set of parameters is used in experiments consisting in automatic recognition of musical instruments sound and separation of musical duets. Results of experiments are described and conclusions drawn and included.

POSTER
Pitch measurements versus perception of South Indian classical music
A Krishnaswamy
Stanford University, CCRMA / EE, Stanford, United States

The number and type of musical intervals used in Indian music have been the subject of vigorous debate and controversy for many years, primarily due to the lack of convincing experimental data to substantiate or discredit existing theories. Recently, however, we presented detailed pitch tracks of samples of South Indian music and offered insight into why perhaps many people believe that South Indian music uses more than 12 distinct notes per octave. We argued that certain pitch inflexions, which are not always mere ornamentations, could be viewed as different "versions" of the same "basic" notes they modify. We now address a few more related misconceptions about South Indian music, some of which can be traced to how certain notes and phrases are perceived by human listeners as opposed to what is actually being played, and we present appropriate examples of annotated pitch tracks to illustrate our arguments. For example, we consider the effects of using the same note names or syllables to vocalize completely different pitches and inflexions. We also examine the variability in intonation we observed, even in recordings of accomplished musicians, and discuss how it relates to the overall perception of certain intervals and inflexions.

ORAL
Structural analysis of listener's emotional responses
M Leman¹, V Vermeulen¹, L De Voogdt¹, A Camurri², B Mazzarino³, G Volpe³
¹University of Ghent, IPEM, Ghent, Belgium; ²University of Genova, D+IST (Laboratoria di Informatica Musicale), Genova, Belgium; ³University of Genova, DIST (Laboratoria di Informatica Musicale), Genova, Belgium

The paper describes an experiment which aims at investigating quantitative relationships between auditory stimuli and emotive/expressive responses as recorded on listeners. The experiment starts from a previous one carried out at IPEM and is the result of a joint work of the IPEM and DIST staffs in the framework of the EU-IST project MEGA (www.megaproject.org).Tests were carried out using a pool of 60 audio fragments with average duration of 30 seconds.Two different approaches were considered to analyze listeners' responses: from the one hand, a cognitive evaluation has been obtained by asking subjects to fill in a form on which to rate the musical excerpts using a 15 dimensional semantic space; on the other hand, sub-cognitive aspects have been addressed by recording listeners' movement by means of a video-camera during the listening sessions. Each subject participated in two listening sessions: the first one for the cognitive evaluation, the second one for motion recording. Data analysis using statistical techniques resulted in (i) reduction of the 15 dimensional semantic space to the 3 dimensional space with base categories "Arousal", "Dominance" and "Valence" emerged from the first experiment and (ii) significant correlations between these three dimensions and a collection of audio and motion cues extracted respectively from the musical excerpts and from the video recordings. Finally, applications in audio mining and man/machine interaction are discussed inbrief.

POSTER
User-dependent taxonomy of musical features as a conceptual framework for musical audio-mining technology
M Lesaffre¹, M Leman¹, B De Baets², H De Meyer³, J P Martensº
¹Ghent University, IPEM, dept. of Musicology, Ghent, Belgium; ²Ghent University, KERMIT, dept. of Applied Mathematics, Biometrics and Process Control, Ghent, Belgium; ³Ghent University, TWI, dept. of Applied Mathematics and Computer Science, Ghent, Belgium; ºGhent University, ELIS, dept. of Electronics and Information Systems, Ghent, Belgium

In musical audio-mining technology research is centered on system development. Systems allow users to search and retrieve music by means of content-based text and audio queries. Though these systems are promising, there is need to a better understanding of the role of user preferences and user profiles. The development of a conceptual taxonomy for musical descriptions is a first step in bridging the gap between system and user. In the first part of this paper, we clarify elements of signification related to spontaneous user behavior and derived user groups, starting from a large-scale experiment with 72 users and 1148 vocal queries. Statistical analysis provides insight into the characteristics of vocal querying such as used methods (singing lyrics, singing syllables, humming, whistling), query length, query performance (melodic, rhythmic) and effects of gender, age and musical education. In the second part of the paper we describe the kind of user-dependent conceptual structures underlying the taxonomy. The results aim at providing a better interface between the system modules and a more intuitive interaction.

ORAL
Consistency in listeners' ratings as a function of listening time
G Madison, B Merker
Uppsala University, Dept. of Psychology, Uppsala, Sweden

Ratings of adjectives describing possible experiential properties is one of the most common dependent variables in music perception research. The consistency of such ratings is typically high, allowing statistically significant results to be based upon relatively small numbers of listeners (e.g., 10-30). The wide range of sample durations employed in the literature (from a few seconds to several minutes) raises the question of how sample length might relate to the property one is attempting to measure. We ask as a first step how the validity and reliability of adjective ratings might be affected by sample duration in the short range up to 16 s.
Our aims were to show how the consistency among listeners, in terms of F-ratios, varies as a function of music sample durations. Also, we check if mean ratings change significantly, indicating that listeners are prone to alter the nature of their jugdements as duration varies.
Stimuli were excerpts from 10 commercially available recordings of instrumental jazz and ethnic ensemble music without vocal content. Each excerpt started at the same position in the recording, but proceeded for either 0.5, 1, 2, 4, 6, 8, or 16 s. Listeners rated 14 adjectives in response to each music example in a split-plot design (10 example ´ 7 duration).

POSTER
Verbal description of musical sound timbre in czech language
O Moravec, J Stepanek
Music faculty of AMU, Sound Studio, Prague, Czech Republic

The words used for description of musical sound timbre were acquired. The research was carried out among people with active relation to the music (instrument players, conductors, composers, sound engineers etc.). Each of the respondents have filled out the questionnary composed of two parts. In the first part personal background of the respondent (respondent profile) was collected, in the second part respondent wrote down particular expressions he uses for timbre description, synonymic and antonymic relations among them. Common frequency vocabulary and frequency vocabularies over the selected respondent classes were created from the data. Similarity of vocabularies and their differencies in dependency on the respondent class are studied.

POSTER
Development of a language for specifying saxophone timbre
A J Nykänen
Luleå University of Technology, Human Work Sciences, Luleå, Sweden

When writing requirement specifications for musical instruments, or when writing music, specifying the sound of the instrument is of greatest interest. Today's notation of music leaves many aspects of sound open for interpretation. This implies that it is necessary to know a music style and time typical ideal to be able to perform music in the way intended by the composer. To investigate how musicians use verbal descriptions of musical sounds, interviews have been made with saxophone players. The results have been analysed with respect to how frequent specific words have been used. Sounds described by the most popular words have been analysed to search for physical aspects of the sounds which could be coupled to the description. Since a large number of words have been used to describe the sounds, and only a few of them were used frequently by different musicians, the conclusion is that there is a lack of common language to describe sounds. New ways to describe sounds are desirable, since a larger set of descriptions will increase the possibility to choose a convenient set of descriptions for musical sounds. Therefore, one new way to describe sounds, based on vocal mimicking, is suggested and evaluated.

POSTER
Influence of duration of tone stationary part on perception of starting transient
Z Otcenasek, J Stepanek, V Syrovy
Music faculty of AMU, Sound studio, Praha, Czech Republic

Sounds of different types of organ pipes were used for the study of an influence of duration of the tone stationary part on starting transient perception. Long sounding tone recordings were truncated from the end in successive steps from original duration about 3 s to the length of 150 ms. Attack transients of tones remains unchanged, truncated tone ends were equally modified with level decrease to silence (fade out). Sets of tones derived from the same original tone were judged on dissimilarity in pairs, focusing on perception of only initial transient part. Results are discussed for tones with various types of transient and stationary part.

POSTER
Perceived influence of changes in musical instrument directivity representation
F Otondo, B Kirkwood
Technical University of Denmark, Ørsted-DTU, Acoustic Technology, Kgs. Lyngby, Denmark

The directivity representation of musical instruments in room acoustic simulations has shown to be significant in the distribution of the calculated acoustical parameters in a room. Different directivity representations of musical instruments were used to create pairs of room acoustic auralizations in order to test for perceived changes in the sound. Listening tests were designed and conducted with an emphasis on the perception of the acoustical attributes of room simulations. Results show that changes in the directivity representation of the source can influence the perceived sound in auralizations and that these perceived changes are more pronounced in terms of loudness and reverberance.

POSTER
Effects of the grouping cue on the pitch shift of a pure tone induced by other tones
T Shirado¹, M Yanagida²
¹Communications Research Laboratory, Keihanna Research Center, Kyoto, Japan; ²Doshisha University, Faculty of Engineering, Kyotanabe, Japan

The pitch of a pure tone can be shifted when it is partially masked by other tones. This phenomenon has principally been explained by place models, including ones which assume a certain interaction between the excitation patterns of each tone in a peripheral system. The purpose of our study was to examine if the pitch shift of a pure tone induced by other tones of lower frequencies is affected by the grouping cue of all the tones into a complex tone. We measured the pitch of the target pure tone under two conditions. First condition, the target tone was presented simultaneously with other tones of lower frequencies. Second condition was the same as the first condition except that frequencies of the lower tones were harmonically related to the frequency of the target tone. The experimental results suggested that the pitch of the target pure tone was affected by the grouping cue and it could not be sufficiently explained by place models

POSTER
Listeners common and group perceptual dimensions in violin timbre
J Stepanek, Z Otcenasek
Music Faculty of AMU, Sound Studio, Prague, Czech Republic

Results of listening tests of violin tones are studied according to listeners and their perceptual models. Five sets of violin tone recordings (pitch B3, F#4, C5, G5, D6) were used. Twenty experienced listeners - violin players (Academy professors and students) assessed dissimilarity in timbre in pairs of tones. The results of five listening tests (individual dissimilarity matrixes) were separately processed using latent class approach (CLASCAL). This approach yields to perceptual spaces of common dimensions shared with all listeners and defines listener classes (groups); groups differ in dimension scales. Then group perceptual spaces for all CLASCAL groups were calculated. Common and group perceptual dimensions are compared using various methods.

ORAL
Neurocognition of music vs. speech sounds: EEG and fMRI evidence
M Tervaniemi¹, S Kruck², A Szameitat², E Schröger³, W De Baeneº, K Alter², A Friederici²
¹University of Helsinki, Cognitive Brain Research Unit, Department of Psychology, Helsinki, Finland; ²Max-Planck Institute for Cognitive Neuroscience, Leipzig, Germany; ³University of Leipzig, Institut für Allgemeine Psychologie, Leipzig, Germany; ºUniversity of Ghent, Department of Psychololgy, Ghent, Belgium

In both music and speech, emotional information is transmitted via changes in sound frequency, duration, and intensity. The present project was conducted to compare in within-subject design the neural sound-change detection in sound frequency, duration, and intensity presented in music vs. speech sounds.
The subjects were presented with music sound patterns and pseudowords. The pseudoword /ba:ba/ was produced by a trained female native German speaker and thereafter digitized. The first syllable was stressed as indicated by longer duration, higher pitch, and higher intensity. The music sounds (digital samples of a saxophone sound) mimicked /ba:ba/ in the pitch range and the sound duration. As deviants among pseudoword and music sounds, three changes of the standard sound were presented: pitch increase, duration decrease, and intensity increase. Subsequent experiments were conducted with EEG and fMRI techniques.
The data indicate that the neural responses underlying the change detection in music vs. speech sounds differ from each other in strength as well as in their topography. Thus these data provide support for the existence of neural specialization of the human brain to represent sounds with different informational content.

POSTER
Pattern recognition of musical instruments using Hidden Markov Models
R Ventura-Miravet, F Murtagh, J Ming
Queen's University Belfast, Computer Science, Belfast, United Kingdom

Today there are a large amount of musical recordings available from the Internet. Several useful applications emerge if users are able to automatically search this data given particular musical content e.g. different musical instruments. In order to search for a particular instrument, we extract acoustic features given a sound sample and then match these features against a database of all stored musical instruments by using pattern recognition. Whilst relatively little research has been conducted on musical instrument recognition, there already exists a large body of knowledge concerning speaker recognition. This work is concerned with applying the techniques and modelling methodologies that are used within speaker recognition to musical instrument recognition. In particular, we present a study that aims to identify musical instruments using statistical approaches that reflect the structure of the data. Given samples of each musical instrument, we perform training to generate an N-state Hidden Markov Model (HMM) that represents the spectral characteristics and time variability of musical patterns. The results show that this approach, which is widely used in speaker recognition, offers a powerful technique for feature extraction and instrument recognition given musical data. We obtain an average recognition accuracy rate of approximately 94% when discriminating between six different musical instruments.

ORAL
Influence of rhythmic, melodic, and semantic violations in language and music on the electrical activity in the brain
S Ystad¹, C Magne², S Farner¹, G Pallone¹, V Pasdeloup³, R Kronland-Martinet¹, M Besson²
¹CNRS, Laboratoire de Mécanique et d'Acoustique, Marseille, France; ²CNRS, Physiological and Cognitive Neurosciences Institute, Marseille, France; ³University of Rennes II, Department of linguistics, Rennes, France

The work presented here is part of a larger multidisciplinary project associating audio signal processing, linguistics, and cognitive sciences. It aims at comparing and better understanding music and language processing in the brain. From a music and speech synthesis point of view, this is important when striving for naturalness and expressiveness in synthesized music and language. As a first experiment towards this goal, we developed and used a method to extend a given part of an audio signal without altering the timbre. This allows manipulations of the rhythm and was applied to alter syllable lengths in the language. To make a similar experiment related to music, note duration and melody of musical sequences were digitally modified by altering MIDI codes. Participants were presented with linguistic and musical phrases, and the final words or notes were either semantically congruous or incongruous (e.g., "I take coffee with sugar/dog"). Moreover, the second last or the last syllables or notes were increased in duration, in order to produce rhythmic incongruities. Thus, the two factors rhythm and semantics/melody were independently manipulated. Changes in brain electrical activity were measured with electrodes on the scalp (Event-Related Potential or ERPs). Preliminary results show that semantic and melodic violations elicited different ERP components, while the rhythmic violations evoked similar ERPs. Final results and developed techniques will be discussed in the paper.

Maintained by webmaster@speech.kth.se