|  |  |  |  |  |  | 
  
    |  |  |  | 
	
	|  | 
	 |  | Session 11 - Music PerceptionINVITED
Automatic transcription of music
 A P Klapuri
 Tampere University of Technology, Institute of Signal Processing, Tampere, Finland
 The aim of this paper is to describe methods for the automatic transcription of polyphonic music. Music transcription is here understood as a transformation from an acoustic signal into a MIDI-like representation, a "recipe" which allows musically meaningful processing and analysis. Algorithms are discussed that concern three different subproblems. (1) Estimation of the temporal structure of acoustic musical signals, the musical meter. Signal processing methods and a probabilistic model are described which track the metrical pulse at three different time scales: tactus (beat), musical measure, and tatum rate. (2) Estimation of the fundamental frequencies of concurrent musical sounds. A method is described which operates iteratively by detecting and cancelling harmonic sounds which occur in the presence of other harmonic and noisy sounds. The method is compared with alternative approaches presented in the literature. (3) Higher-level musicological modeling to resolve otherwise ambiguous analysis segments. The use of musical key estimation and N-gram modeling is described. Validation experiments are performed by applying the presented methods to the transcription of both synthesized musical signals and real-world acoustic data from various musical genres. INVITED
Sex, drugs and rock 'n' roll: The evolutionary neurobiology of hearing and hedonism
 N P M Todd
 Manchester University, Psychology, Manchester, United Kingdom
 
An issue which has been practically ignored by conventional 
acoustics is the loudness of many kinds of popular music, including rock and dance 
music. Many of these kinds of music need to be above a certain threshold, somewhere 
around 90 dB HL, before being considered acceptable. In this paper I outline a 
theory of hearing in which a primitive acoustic sense, inherited from our lower 
vertebrate ancestors, has been conserved in humans. This primitive sense is mediated 
by the sacculus and has a threshold of about 90 dB HL to air-conducted sound but 
only about 30 dB HL to bone-conducted sound and is maximally sensitive to low 
frequencies (less than 500 Hz). According to the theory a primitive acoustic central 
pathway to the mesolimbic dopamine system, which plays a role in reproductive 
vocal behaviour in lower vertebrates, has also been conserved in humans. The theory 
thus accounts for why loud, low frequency sound and vibration is rewarding. In 
support of the theory I discuss some recent experiments using EEG evoked potentials 
to investigate acoustic sensitivity to bone-conducted low frequency sound.
 
 ORAL Brain activity in perception and retention in memory 
  of the pitch of music and speech
 E Castro-Sierra¹, H A Poblano²
 ¹National Autonomous University of Mexico/Hospital 
  Infantil de Mexico Federico Gomez, National School of Music, Mexico, D.F., Mexico; 
  ²Autonomous Metropolitan University, MS Program in Neurological Rehabilitation, 
  Mexico, D.F., Mexico
 Brain activity stimulated by music and speech has been 
  studied using fMRI and PET. These techniques commonly employ single stimuli 
  which contrast with others in a sequence. Neural activity is elicited when one 
  of the sounds differs from the remaining ones. These results have led to a distinction 
  between activities of right-hemisphere areas stimulated by music sounds and 
  activities of left-hemisphere areas stimulated by speech sounds.EEG was used 
  to study activities, at different frequencies, of superior and inferior frontal, 
  temporal and parieto-occipital areas of 10 subjects of either sex (age range: 
  11:11-16:8), native speakers of Zapotec (a Mexican tone language), while responding 
  to a tonal memory test contrasting pairs of musical sequences, a tonal memory 
  test contrasting pairs of speech sequences and a tonal perception test analyzing 
  single speech sequences. α- and β-wave activities in right and left 
  inferior frontal areas correlated with responses to either the music or the 
  speech memory tests. Right parieto-occipital θ-wave activity also correlated 
  with responses to the music test. α- and β-wave activities in left 
  superior frontal, temporal and parieto-occipital areas correlated with responses 
  to the speech perception test. These data point to differential activity of 
  both hemispheres of the brain stimulated by the pitch of tone language samples 
  and to a more localized right hemisphere activity stimulated by the pitch of 
  music samples. ORAL Looking at perception of continuous tempo drift - a 
  new method for estimating Internal drift and Just Noticeable Difference
 S Dahl, S Granqvist
 Royal Institute of Technology (KTH), Dept. of Speech, 
  Music and Hearing, Stockholm, Sweden
 
The method proposed here investigates if there is such 
a thing as an internal representation of a "steady tempo" - and whether 
this representation itself is free from tempo drift. The method uses a modification 
of the method for Parameter Estimation by Sequential Testing (PEST). Several click 
sequences are presented to the listener in each test and depending on the listener's 
response (correct or incorrect) the magnitude of the tempo drift is modified for 
the next presentation. While investigating tempo detection, this method does not 
rule out the possibility that the internal "clock" can have an inherent 
tempo drift. In practice this means that some listeners will perceive that tempo 
is increasing when, in fact, it is decreasing, and vice versa. Preliminary results 
confirm that some listeners tend to place their answers "biased" towards 
either increasing or decreasing tempo. Our results also indicate that these listeners 
appear to be consistent in doing this. Thus we would like to propose a model of 
detectability of continuous (linear) tempo drift based on a person's Internal 
drift (ID), which can be isochronous or biased in either direction. Surrounding 
this ID is an interval corresponding to twice the Just Noticeable Difference (JND).
 
 POSTER What can the body movements reveal about a musician's 
  emotional intention?
 S Dahl, A Friberg
 Royal Institute of Technology (KTH), Dept. of Speech, 
  Music and Hearing, Stockholm, Sweden
 Music has an intimate relationship with motion in several 
  aspects. Obviously, movements are required to play an instrument, but musicians 
  move also their bodies in a way not directly related to note production. In 
  order to explore to what extent emotional intentions can be conveyed through 
  musicians' movements only, video recordings of a marimba player performing the 
  same piece with the intentions Happy, Sad, Angry and Fearful, were recorded. 
  20 observers watched the video clips, without sound, and rated both the perceived 
  emotional content as well as movement cues. The videos were presented in four 
  viewing conditions, showing different parts of the player. The observers' ratings 
  for the intended emotions showed that the intentions Happiness, Sadness and 
  Anger were well communicated, while Fear was not. The identification of the 
  intended emotion was only slightly influenced by the viewing condition, although 
  in some cases the head was important. The movement ratings indicate that there 
  are cues that the observer use to distinguish between intentions, similar to 
  the cues found for audio signals in music performance. Anger was characterized 
  by large, fast, uneven, and jerky movements; Happy by large and somewhat fast 
  movements, Sadness by small, slow, even and smooth movements. POSTER Time estimation: Isochronous versus Accelerated audio sequences
 D Dall'Osto, S E Ferrer
 Universitat Autonoma Barcelona, Psychology of Education, Barcelona, Spain
 The duality 'Regularity-Irregularity' describes the framework of the present initial study, as a general relationship regarding time and rhythm in music.
Regularity in time generally refers to some isochronous intervals of time, while 'Irregularity' refers to non-periodic and non-isochronous time intervals.
Performers, who modify the musical pace with expressive intentions, introduce amounts of 'irregularity' within the metronomic regularity.
Composers also vary duration and proportion by introducing gradual temporal changes, like accelerations or decelerations, so to create sort of 'goal-directed' processes.
But how listeners to music do perceive those temporal deviations?
A starting point could be to know whether the perception of duration is influenced by the 'content' of musical stimuli.
Musical 'content' is not directly semantic; it refers to a complex internal structure that organizes together temporal and non-temporal information, based on audio, basically 
non-verbal, stimuli.
The present work focuses on the influence of the 'temporal content' of musical stimuli on the perception of duration. 
Starting to study the effect of gradual processes of temporal deviation on musical time perception is the goal of the presented experiment. 
As a first case within the dual category regular-irregular only acceleration will be experimentally investigated. POSTER Timbral semantics and the pipe organ
 A C Disley, D M Howard
 The University of York, Department of Electronics, York, 
  United Kingdom
 Words used to describe timbre are usually difficult 
  to define or relate to a measurable phenomenon. This paper attempts to establish 
  if any such words are objective with a common understanding, or if they are 
  all subjective. The pipe organ is used because it is both a complex timbral 
  synthesiser and a non-electronic instrument from which repeatable samples can 
  be taken. A number of adjectives were gathered from English speaking subjects. The most 
  common words were selected and used without specific definition as rating scales 
  in comparative listening tests. Comparison with spectral analyses revealed possible 
  cues for some words, and audio examples were synthesised to test these theories. 
  Some adjectives have emerged as having degrees of common understanding across 
  a majority of subjects.
 ORAL Hardness recognition in synthetic sounds
 B L Giordano, K Petrini
 Università di Padova, Dipartimento di Psicologia 
  Generale, Padova, Italy
 Sound source recognition investigates recovery of different
  features of the objects, whose interaction lead to the generation
  of the acoustical signal. Among them material type have received
  particular attention, while recovering of material properties,
  such as hardness, have been scarcely considered. Hardness plays a
  significant role in the musical field too, especially for
  percussion instruments, where resonating objects of variable
  hardness are struck with mallets of variable hardness. Comparison
  of previous results on hardness recognition point toward the 
  perceptual independence of the resonator and exciter properties. 
  This issue was addressed in four experiments conducted on stimuli 
  synthesized with a physical model, which allowed independent 
  manipulation of the exciter and resonator properties. Free 
  identification and forced choice tasks have been used to 
  investigate the ability of listeners to discriminate variations in 
  the exciter from variations in the resonator. Scaling tasks have 
  been used to investigate the relationship between the synthesis 
  parameters and the hardness estimates of the exciter and of the 
  resonator. Free identification and forced choice data reveal a 
  bias toward the interpretation of the acoustical signals in terms 
  of features of the resonating object. Hardness scaling results 
  reveal the perceptual dependence of exciter and resonator 
  properties, although strong individual differences are found. 
  POSTER Categorical perception of microtonal intervals
 E Huovinen
 University of Turku, Department of Musicology, Turku, 
  Finland
 
This study addresses the effect of simple tonal contexts 
on the categorical perception of melodic intervals. The starting point is the 
simple hypothesis that similar intervals may be perceived differently when placed 
in different positions in relation to a perceptual tonal center. The study is 
carried out using pitch materials that are derived form the 19-tone equal temperament, 
which has been proposed as one of the most plausible alternative tuning systems. 
A subsidiary aim is thus to vitalize the discussion concerning microtonal tunings 
with input from experimental psychology. An interval recognition experiment was 
conducted using five consecutive interval sizes from the 19-tone equal temperament. 
In each trial, the subjects compared two melodic intervals that were both preceded 
by the same "bass tone", which was supposed to create a simple tonal context 
for the comparison. The five basic interval sizes were used in all 19 intervallic 
relationships to the pitch-class of the context tone, and the last melodic interval 
of each pair was subjected to systematic alterations. In the discussion of the 
results, special interest will be directed towards differences in the perceptual 
interpretation of non-diatonic intervals as a function of their location within 
a simple tonal context. 
POSTER Towards a virtual singer : contribution to the development 
  of the teaching of sung diction through the intermediary of a functionalised 
  system
 J Kiss
 University Paris 8, Ati Inrev, St Denis, France
 Learning to sing, whether within the construction of 
  the physical vocal pattern, or the interpretive preparation of a piece of music, 
  proceeds initially through observation and then imitation of a given physical 
  model. One of the principal difficulties in this mimetic process frequently 
  lies in the different perceptions of the singer and the listener. Without attempting 
  to compensate directly for this problem, we wish to propose an interactive system 
  that reacts to exterior data, thus facilitating the creation of a model of the 
  singer and at the same time real time simulation of his acts. We are offering, 
  within the framework of this undertaking, to create a basic virtual animation. 
  It is important at this stage to define the nature of the work envisaged, which 
  will not consist of an exhaustive project. We intend more specifically to describe 
  the means for encoding the vocal expression and its relation to the facial expression 
  in order to translate a feeling. We will then proceed to the development of 
  a computer generated synthetic singer, adopting as our method the use of a process 
  of interactive phonation represented by a synthetic actor with simplified expressions. 
  Initially this phonation will be limited to the expression of a small number 
  of phonemes and that this part of the process could become the basis of further 
  concrete developments and constitute the foundation for various potential extensions. POSTER Musical sound parameters revisited
 B Kostek, P Zwan, M Dziubinski
 Gdansk University of Technology, Sound & Vision Engineering, 
  Gdansk, Poland
 Recently, a new standard MPEG7 was established. A set 
  of parameters was defined in order to represent musical sound as a multimedia 
  object. These so-called low level features are related to time and frequency 
  domain of musical sounds, as well as to audio waveform parameters. Among others 
  the following parameters were specified within the standard: log attack time, 
  temporal centroid, spectral flatness and spectrum spread, spectral centroid, 
  harmonic variation, etc. Some of these parameters are related to human cognition 
  of musical sounds whereas others not or the relationship between those parameters 
  and their perceptual meaning is not yet defined. Therefore the principal aim 
  of this paper is to carry on a discussion as to which of the parameters defined 
  within MPEG7 standard could be related to the musical timbre. This is done by 
  means of listening tests. Additionally, the set of parameters is used in experiments 
  consisting in automatic recognition of musical instruments sound and separation 
  of musical duets. Results of experiments are described and conclusions drawn 
  and included. POSTER Pitch measurements versus perception of South Indian 
  classical music
 A Krishnaswamy
 Stanford University, CCRMA / EE, Stanford, United States
 The number and type of musical intervals used in Indian 
  music have been the subject of vigorous debate and controversy for many years, 
  primarily due to the lack of convincing experimental data to substantiate or 
  discredit existing theories. Recently, however, we presented detailed pitch 
  tracks of samples of South Indian music and offered insight into why perhaps 
  many people believe that South Indian music uses more than 12 distinct notes 
  per octave. We argued that certain pitch inflexions, which are not always mere 
  ornamentations, could be viewed as different "versions" of the same 
  "basic" notes they modify. We now address a few more related misconceptions 
  about South Indian music, some of which can be traced to how certain notes and 
  phrases are perceived by human listeners as opposed to what is actually being 
  played, and we present appropriate examples of annotated pitch tracks to illustrate 
  our arguments. For example, we consider the effects of using the same note names 
  or syllables to vocalize completely different pitches and inflexions. We also 
  examine the variability in intonation we observed, even in recordings of accomplished 
  musicians, and discuss how it relates to the overall perception of certain intervals 
  and inflexions. ORAL Structural analysis of listener's emotional responses
 M Leman¹, V Vermeulen¹, L De Voogdt¹, 
  A Camurri², B Mazzarino³, G Volpe³
 ¹University of Ghent, IPEM, Ghent, Belgium; ²University 
  of Genova, D+IST (Laboratoria di Informatica Musicale), Genova, Belgium; ³University 
  of Genova, DIST (Laboratoria di Informatica Musicale), Genova, Belgium
 The paper describes an experiment which aims at investigating 
  quantitative relationships between auditory stimuli and emotive/expressive responses 
  as recorded on listeners. The experiment starts from a previous one carried 
  out at IPEM and is the result of a joint work of the IPEM and DIST staffs in 
  the framework of the EU-IST project MEGA (www.megaproject.org).Tests were carried 
  out using a pool of 60 audio fragments with average duration of 30 seconds.Two 
  different approaches were considered to analyze listeners' responses: from the 
  one hand, a cognitive evaluation has been obtained by asking subjects to fill 
  in a form on which to rate the musical excerpts using a 15 dimensional semantic 
  space; on the other hand, sub-cognitive aspects have been addressed by recording 
  listeners' movement by means of a video-camera during the listening sessions. 
  Each subject participated in two listening sessions: the first one for the cognitive 
  evaluation, the second one for motion recording. Data analysis using statistical 
  techniques resulted in (i) reduction of the 15 dimensional semantic space to 
  the 3 dimensional space with base categories "Arousal", "Dominance" 
  and "Valence" emerged from the first experiment and (ii) significant 
  correlations between these three dimensions and a collection of audio and motion 
  cues extracted respectively from the musical excerpts and from the video recordings. 
  Finally, applications in audio mining and man/machine interaction are discussed 
  inbrief. POSTER User-dependent taxonomy of musical features as a conceptual 
  framework for musical audio-mining technology
 M Lesaffre¹, M Leman¹, B De Baets², H 
  De Meyer³, J P Martensº
 ¹Ghent University, IPEM, dept. of Musicology, Ghent, 
  Belgium; ²Ghent University, KERMIT, dept. of Applied Mathematics, Biometrics 
  and Process Control, Ghent, Belgium; ³Ghent University, TWI, dept. of Applied 
  Mathematics and Computer Science, Ghent, Belgium; ºGhent University, ELIS, 
  dept. of Electronics and Information Systems, Ghent, Belgium
 In musical audio-mining technology research is centered 
  on system development. Systems allow users to search and retrieve music by means 
  of content-based text and audio queries. Though these systems are promising, 
  there is need to a better understanding of the role of user preferences and 
  user profiles. The development of a conceptual taxonomy for musical descriptions 
  is a first step in bridging the gap between system and user. In the first part 
  of this paper, we clarify elements of signification related to spontaneous user 
  behavior and derived user groups, starting from a large-scale experiment with 
  72 users and 1148 vocal queries. Statistical analysis provides insight into 
  the characteristics of vocal querying such as used methods (singing lyrics, 
  singing syllables, humming, whistling), query length, query performance (melodic, 
  rhythmic) and effects of gender, age and musical education. In the second part 
  of the paper we describe the kind of user-dependent conceptual structures underlying 
  the taxonomy. The results aim at providing a better interface between the system 
  modules and a more intuitive interaction. ORAL Consistency in listeners' ratings as a function of listening 
  time
 G Madison, B Merker
 Uppsala University, Dept. of Psychology, Uppsala, Sweden
 Ratings of adjectives describing possible experiential 
  properties is one of the most common dependent variables in music perception 
  research. The consistency of such ratings is typically high, allowing statistically 
  significant results to be based upon relatively small numbers of listeners (e.g., 
  10-30). The wide range of sample durations employed in the literature (from 
  a few seconds to several minutes) raises the question of how sample length might 
  relate to the property one is attempting to measure. We ask as a first step 
  how the validity and reliability of adjective ratings might be affected by sample 
  duration in the short range up to 16 s.Our aims were to show how the consistency among listeners, in terms of F-ratios, 
  varies as a function of music sample durations. Also, we check if mean ratings 
  change significantly, indicating that listeners are prone to alter the nature 
  of their jugdements as duration varies.
 Stimuli were excerpts from 10 commercially available recordings of instrumental 
  jazz and ethnic ensemble music without vocal content. Each excerpt started at 
  the same position in the recording, but proceeded for either 0.5, 1, 2, 4, 6, 
  8, or 16 s. Listeners rated 14 adjectives in response to each music example 
  in a split-plot design (10 example ´ 7 duration).
 POSTER Verbal description of musical sound timbre in czech 
  language
 O Moravec, J Stepanek
 Music faculty of AMU, Sound Studio, Prague, Czech Republic
 
The words used for description of musical sound timbre 
were acquired. The research was carried out among people with active relation 
to the music (instrument players, conductors, composers, sound engineers etc.). 
Each of the respondents have filled out the questionnary composed of two parts. 
In the first part personal background of the respondent (respondent profile) was 
collected, in the second part respondent wrote down particular expressions he 
uses for timbre description, synonymic and antonymic relations among them. Common 
frequency vocabulary and frequency vocabularies over the selected respondent classes 
were created from the data. Similarity of vocabularies and their differencies 
in dependency on the respondent class are studied.
 
 POSTER Development of a language for specifying saxophone timbre
 A J Nykänen
 Luleå University of Technology, Human Work Sciences, 
  Luleå, Sweden
 When writing requirement specifications for musical 
  instruments, or when writing music, specifying the sound of the instrument is 
  of greatest interest. Today's notation of music leaves many aspects of sound 
  open for interpretation. This implies that it is necessary to know a music style 
  and time typical ideal to be able to perform music in the way intended by the 
  composer. To investigate how musicians use verbal descriptions of musical sounds, 
  interviews have been made with saxophone players. The results have been analysed 
  with respect to how frequent specific words have been used. Sounds described 
  by the most popular words have been analysed to search for physical aspects 
  of the sounds which could be coupled to the description. Since a large number 
  of words have been used to describe the sounds, and only a few of them were 
  used frequently by different musicians, the conclusion is that there is a lack 
  of common language to describe sounds. New ways to describe sounds are desirable, 
  since a larger set of descriptions will increase the possibility to choose a 
  convenient set of descriptions for musical sounds. Therefore, one new way to 
  describe sounds, based on vocal mimicking, is suggested and evaluated. POSTER Influence of duration of tone stationary part on perception 
  of starting transient
 Z Otcenasek, J Stepanek, V Syrovy
 Music faculty of AMU, Sound studio, Praha, Czech Republic
 Sounds of different types of organ pipes were used for 
  the study of an influence of duration of the tone stationary part on starting 
  transient perception. Long sounding tone recordings were truncated from the 
  end in successive steps from original duration about 3 s to the length of 150 
  ms. Attack transients of tones remains unchanged, truncated tone ends were equally 
  modified with level decrease to silence (fade out). Sets of tones derived from 
  the same original tone were judged on dissimilarity in pairs, focusing on perception 
  of only initial transient part. Results are discussed for tones with various 
  types of transient and stationary part. POSTER Perceived influence of changes in musical instrument 
  directivity representation
 F Otondo, B Kirkwood
 Technical University of Denmark, Ørsted-DTU, Acoustic 
  Technology, Kgs. Lyngby, Denmark
 The directivity representation of musical instruments 
  in room acoustic simulations has shown to be significant in the distribution 
  of the calculated acoustical parameters in a room. Different directivity representations 
  of musical instruments were used to create pairs of room acoustic auralizations 
  in order to test for perceived changes in the sound. Listening tests were designed 
  and conducted with an emphasis on the perception of the acoustical attributes 
  of room simulations. Results show that changes in the directivity representation 
  of the source can influence the perceived sound in auralizations and that these 
  perceived changes are more pronounced in terms of loudness and reverberance. POSTER Effects of the grouping cue on the pitch shift of a 
  pure tone induced by other tones
 T Shirado¹, M Yanagida²
 ¹Communications Research Laboratory, Keihanna Research 
  Center, Kyoto, Japan; ²Doshisha University, Faculty of Engineering, Kyotanabe, 
  Japan
 The pitch of a pure tone can be shifted when it is partially 
  masked by other tones. This phenomenon has principally been explained by place 
  models, including ones which assume a certain interaction between the excitation 
  patterns of each tone in a peripheral system. The purpose of our study was to 
  examine if the pitch shift of a pure tone induced by other tones of lower frequencies 
  is affected by the grouping cue of all the tones into a complex tone. We measured 
  the pitch of the target pure tone under two conditions. First condition, the 
  target tone was presented simultaneously with other tones of lower frequencies. 
  Second condition was the same as the first condition except that frequencies 
  of the lower tones were harmonically related to the frequency of the target 
  tone. The experimental results suggested that the pitch of the target pure tone 
  was affected by the grouping cue and it could not be sufficiently explained 
  by place models POSTER Listeners common and group perceptual dimensions in 
  violin timbre
 J Stepanek, Z Otcenasek
 Music Faculty of AMU, Sound Studio, Prague, Czech Republic
 Results of listening tests of violin tones are studied 
  according to listeners and their perceptual models. Five sets of violin tone 
  recordings (pitch B3, F#4, C5, G5, D6) were used. Twenty experienced listeners 
  - violin players (Academy professors and students) assessed dissimilarity in 
  timbre in pairs of tones. The results of five listening tests (individual dissimilarity 
  matrixes) were separately processed using latent class approach (CLASCAL). This 
  approach yields to perceptual spaces of common dimensions shared with all listeners 
  and defines listener classes (groups); groups differ in dimension scales. Then 
  group perceptual spaces for all CLASCAL groups were calculated. Common and group 
  perceptual dimensions are compared using various methods. ORAL Neurocognition of music vs. speech sounds: EEG and fMRI 
  evidence
 M Tervaniemi¹, S Kruck², A Szameitat², 
  E Schröger³, W De Baeneº, K Alter², A Friederici²
 ¹University of Helsinki, Cognitive Brain Research 
  Unit, Department of Psychology, Helsinki, Finland; ²Max-Planck Institute 
  for Cognitive Neuroscience, Leipzig, Germany; ³University of Leipzig, Institut 
  für Allgemeine Psychologie, Leipzig, Germany; ºUniversity of Ghent, 
  Department of Psychololgy, Ghent, Belgium
 In both music and speech, emotional information is transmitted 
  via changes in sound frequency, duration, and intensity. The present project 
  was conducted to compare in within-subject design the neural sound-change detection 
  in sound frequency, duration, and intensity presented in music vs. speech sounds. 
  The subjects were presented with music sound patterns and pseudowords. The pseudoword 
  /ba:ba/ was produced by a trained female native German speaker and thereafter 
  digitized. The first syllable was stressed as indicated by longer duration, 
  higher pitch, and higher intensity. The music sounds (digital samples of a saxophone 
  sound) mimicked /ba:ba/ in the pitch range and the sound duration. As deviants 
  among pseudoword and music sounds, three changes of the standard sound were 
  presented: pitch increase, duration decrease, and intensity increase. Subsequent 
  experiments were conducted with EEG and fMRI techniques.
 The data indicate that the neural responses underlying the change detection 
  in music vs. speech sounds differ from each other in strength as well as in 
  their topography. Thus these data provide support for the existence of neural 
  specialization of the human brain to represent sounds with different informational 
  content.
 POSTER Pattern recognition of musical instruments using Hidden 
  Markov Models
 R Ventura-Miravet, F Murtagh, J Ming
 Queen's University Belfast, Computer Science, Belfast, 
  United Kingdom
 Today there are a large amount of musical recordings 
  available from the Internet. Several useful applications emerge if users are 
  able to automatically search this data given particular musical content e.g. 
  different musical instruments. In order to search for a particular instrument, 
  we extract acoustic features given a sound sample and then match these features 
  against a database of all stored musical instruments by using pattern recognition. 
  Whilst relatively little research has been conducted on musical instrument recognition, 
  there already exists a large body of knowledge concerning speaker recognition. 
  This work is concerned with applying the techniques and modelling methodologies 
  that are used within speaker recognition to musical instrument recognition. 
  In particular, we present a study that aims to identify musical instruments 
  using statistical approaches that reflect the structure of the data. Given samples 
  of each musical instrument, we perform training to generate an N-state Hidden 
  Markov Model (HMM) that represents the spectral characteristics and time variability 
  of musical patterns. The results show that this approach, which is widely used 
  in speaker recognition, offers a powerful technique for feature extraction and 
  instrument recognition given musical data. We obtain an average recognition 
  accuracy rate of approximately 94% when discriminating between six different 
  musical instruments. ORAL Influence of rhythmic, melodic, and semantic violations 
  in language and music on the electrical activity in the brain
 S Ystad¹, C Magne², S Farner¹, G Pallone¹, 
  V Pasdeloup³, R Kronland-Martinet¹, M Besson²
 ¹CNRS, Laboratoire de Mécanique et d'Acoustique, 
  Marseille, France; ²CNRS, Physiological and Cognitive Neurosciences Institute, 
  Marseille, France; ³University of Rennes II, Department of linguistics, 
  Rennes, France
 The work presented here is part of a larger multidisciplinary 
  project associating audio signal processing, linguistics, and cognitive sciences. 
  It aims at comparing and better understanding music and language processing 
  in the brain. From a music and speech synthesis point of view, this is important 
  when striving for naturalness and expressiveness in synthesized music and language. 
  As a first experiment towards this goal, we developed and used a method to extend 
  a given part of an audio signal without altering the timbre. This allows manipulations 
  of the rhythm and was applied to alter syllable lengths in the language. To 
  make a similar experiment related to music, note duration and melody of musical 
  sequences were digitally modified by altering MIDI codes. Participants were 
  presented with linguistic and musical phrases, and the final words or notes 
  were either semantically congruous or incongruous (e.g., "I take coffee 
  with sugar/dog"). Moreover, the second last or the last syllables or notes 
  were increased in duration, in order to produce rhythmic incongruities. Thus, 
  the two factors rhythm and semantics/melody were independently manipulated. 
  Changes in brain electrical activity were measured with electrodes on the scalp 
  (Event-Related Potential or ERPs). Preliminary results show that semantic and 
  melodic violations elicited different ERP components, while the rhythmic violations 
  evoked similar ERPs. Final results and developed techniques will be discussed 
  in the paper. |  |  | 
    |  |  |  | 
	
	|  |