| | | | | |
 |
|
 |
|
|
|
Session 11 - Music Perception
INVITED
Automatic transcription of music
A P Klapuri
Tampere University of Technology, Institute of Signal Processing, Tampere, Finland
The aim of this paper is to describe methods for the automatic transcription of polyphonic music. Music transcription is here understood as a transformation from an acoustic signal into a MIDI-like representation, a "recipe" which allows musically meaningful processing and analysis. Algorithms are discussed that concern three different subproblems. (1) Estimation of the temporal structure of acoustic musical signals, the musical meter. Signal processing methods and a probabilistic model are described which track the metrical pulse at three different time scales: tactus (beat), musical measure, and tatum rate. (2) Estimation of the fundamental frequencies of concurrent musical sounds. A method is described which operates iteratively by detecting and cancelling harmonic sounds which occur in the presence of other harmonic and noisy sounds. The method is compared with alternative approaches presented in the literature. (3) Higher-level musicological modeling to resolve otherwise ambiguous analysis segments. The use of musical key estimation and N-gram modeling is described. Validation experiments are performed by applying the presented methods to the transcription of both synthesized musical signals and real-world acoustic data from various musical genres.
INVITED
Sex, drugs and rock 'n' roll: The evolutionary neurobiology of hearing and hedonism
N P M Todd
Manchester University, Psychology, Manchester, United Kingdom
An issue which has been practically ignored by conventional
acoustics is the loudness of many kinds of popular music, including rock and dance
music. Many of these kinds of music need to be above a certain threshold, somewhere
around 90 dB HL, before being considered acceptable. In this paper I outline a
theory of hearing in which a primitive acoustic sense, inherited from our lower
vertebrate ancestors, has been conserved in humans. This primitive sense is mediated
by the sacculus and has a threshold of about 90 dB HL to air-conducted sound but
only about 30 dB HL to bone-conducted sound and is maximally sensitive to low
frequencies (less than 500 Hz). According to the theory a primitive acoustic central
pathway to the mesolimbic dopamine system, which plays a role in reproductive
vocal behaviour in lower vertebrates, has also been conserved in humans. The theory
thus accounts for why loud, low frequency sound and vibration is rewarding. In
support of the theory I discuss some recent experiments using EEG evoked potentials
to investigate acoustic sensitivity to bone-conducted low frequency sound.
ORAL
Brain activity in perception and retention in memory
of the pitch of music and speech
E Castro-Sierra¹, H A Poblano²
¹National Autonomous University of Mexico/Hospital
Infantil de Mexico Federico Gomez, National School of Music, Mexico, D.F., Mexico;
²Autonomous Metropolitan University, MS Program in Neurological Rehabilitation,
Mexico, D.F., Mexico
Brain activity stimulated by music and speech has been
studied using fMRI and PET. These techniques commonly employ single stimuli
which contrast with others in a sequence. Neural activity is elicited when one
of the sounds differs from the remaining ones. These results have led to a distinction
between activities of right-hemisphere areas stimulated by music sounds and
activities of left-hemisphere areas stimulated by speech sounds.EEG was used
to study activities, at different frequencies, of superior and inferior frontal,
temporal and parieto-occipital areas of 10 subjects of either sex (age range:
11:11-16:8), native speakers of Zapotec (a Mexican tone language), while responding
to a tonal memory test contrasting pairs of musical sequences, a tonal memory
test contrasting pairs of speech sequences and a tonal perception test analyzing
single speech sequences. α- and β-wave activities in right and left
inferior frontal areas correlated with responses to either the music or the
speech memory tests. Right parieto-occipital θ-wave activity also correlated
with responses to the music test. α- and β-wave activities in left
superior frontal, temporal and parieto-occipital areas correlated with responses
to the speech perception test. These data point to differential activity of
both hemispheres of the brain stimulated by the pitch of tone language samples
and to a more localized right hemisphere activity stimulated by the pitch of
music samples.
ORAL
Looking at perception of continuous tempo drift - a
new method for estimating Internal drift and Just Noticeable Difference
S Dahl, S Granqvist
Royal Institute of Technology (KTH), Dept. of Speech,
Music and Hearing, Stockholm, Sweden
The method proposed here investigates if there is such
a thing as an internal representation of a "steady tempo" - and whether
this representation itself is free from tempo drift. The method uses a modification
of the method for Parameter Estimation by Sequential Testing (PEST). Several click
sequences are presented to the listener in each test and depending on the listener's
response (correct or incorrect) the magnitude of the tempo drift is modified for
the next presentation. While investigating tempo detection, this method does not
rule out the possibility that the internal "clock" can have an inherent
tempo drift. In practice this means that some listeners will perceive that tempo
is increasing when, in fact, it is decreasing, and vice versa. Preliminary results
confirm that some listeners tend to place their answers "biased" towards
either increasing or decreasing tempo. Our results also indicate that these listeners
appear to be consistent in doing this. Thus we would like to propose a model of
detectability of continuous (linear) tempo drift based on a person's Internal
drift (ID), which can be isochronous or biased in either direction. Surrounding
this ID is an interval corresponding to twice the Just Noticeable Difference (JND).
POSTER
What can the body movements reveal about a musician's
emotional intention?
S Dahl, A Friberg
Royal Institute of Technology (KTH), Dept. of Speech,
Music and Hearing, Stockholm, Sweden
Music has an intimate relationship with motion in several
aspects. Obviously, movements are required to play an instrument, but musicians
move also their bodies in a way not directly related to note production. In
order to explore to what extent emotional intentions can be conveyed through
musicians' movements only, video recordings of a marimba player performing the
same piece with the intentions Happy, Sad, Angry and Fearful, were recorded.
20 observers watched the video clips, without sound, and rated both the perceived
emotional content as well as movement cues. The videos were presented in four
viewing conditions, showing different parts of the player. The observers' ratings
for the intended emotions showed that the intentions Happiness, Sadness and
Anger were well communicated, while Fear was not. The identification of the
intended emotion was only slightly influenced by the viewing condition, although
in some cases the head was important. The movement ratings indicate that there
are cues that the observer use to distinguish between intentions, similar to
the cues found for audio signals in music performance. Anger was characterized
by large, fast, uneven, and jerky movements; Happy by large and somewhat fast
movements, Sadness by small, slow, even and smooth movements.
POSTER
Time estimation: Isochronous versus Accelerated audio sequences
D Dall'Osto, S E Ferrer
Universitat Autonoma Barcelona, Psychology of Education, Barcelona, Spain
The duality 'Regularity-Irregularity' describes the framework of the present initial study, as a general relationship regarding time and rhythm in music.
Regularity in time generally refers to some isochronous intervals of time, while 'Irregularity' refers to non-periodic and non-isochronous time intervals.
Performers, who modify the musical pace with expressive intentions, introduce amounts of 'irregularity' within the metronomic regularity.
Composers also vary duration and proportion by introducing gradual temporal changes, like accelerations or decelerations, so to create sort of 'goal-directed' processes.
But how listeners to music do perceive those temporal deviations?
A starting point could be to know whether the perception of duration is influenced by the 'content' of musical stimuli.
Musical 'content' is not directly semantic; it refers to a complex internal structure that organizes together temporal and non-temporal information, based on audio, basically
non-verbal, stimuli.
The present work focuses on the influence of the 'temporal content' of musical stimuli on the perception of duration.
Starting to study the effect of gradual processes of temporal deviation on musical time perception is the goal of the presented experiment.
As a first case within the dual category regular-irregular only acceleration will be experimentally investigated.
POSTER
Timbral semantics and the pipe organ
A C Disley, D M Howard
The University of York, Department of Electronics, York,
United Kingdom
Words used to describe timbre are usually difficult
to define or relate to a measurable phenomenon. This paper attempts to establish
if any such words are objective with a common understanding, or if they are
all subjective. The pipe organ is used because it is both a complex timbral
synthesiser and a non-electronic instrument from which repeatable samples can
be taken.
A number of adjectives were gathered from English speaking subjects. The most
common words were selected and used without specific definition as rating scales
in comparative listening tests. Comparison with spectral analyses revealed possible
cues for some words, and audio examples were synthesised to test these theories.
Some adjectives have emerged as having degrees of common understanding across
a majority of subjects.
ORAL
Hardness recognition in synthetic sounds
B L Giordano, K Petrini
Università di Padova, Dipartimento di Psicologia
Generale, Padova, Italy
Sound source recognition investigates recovery of different
features of the objects, whose interaction lead to the generation
of the acoustical signal. Among them material type have received
particular attention, while recovering of material properties,
such as hardness, have been scarcely considered. Hardness plays a
significant role in the musical field too, especially for
percussion instruments, where resonating objects of variable
hardness are struck with mallets of variable hardness. Comparison
of previous results on hardness recognition point toward the
perceptual independence of the resonator and exciter properties.
This issue was addressed in four experiments conducted on stimuli
synthesized with a physical model, which allowed independent
manipulation of the exciter and resonator properties. Free
identification and forced choice tasks have been used to
investigate the ability of listeners to discriminate variations in
the exciter from variations in the resonator. Scaling tasks have
been used to investigate the relationship between the synthesis
parameters and the hardness estimates of the exciter and of the
resonator. Free identification and forced choice data reveal a
bias toward the interpretation of the acoustical signals in terms
of features of the resonating object. Hardness scaling results
reveal the perceptual dependence of exciter and resonator
properties, although strong individual differences are found.
POSTER
Categorical perception of microtonal intervals
E Huovinen
University of Turku, Department of Musicology, Turku,
Finland
This study addresses the effect of simple tonal contexts
on the categorical perception of melodic intervals. The starting point is the
simple hypothesis that similar intervals may be perceived differently when placed
in different positions in relation to a perceptual tonal center. The study is
carried out using pitch materials that are derived form the 19-tone equal temperament,
which has been proposed as one of the most plausible alternative tuning systems.
A subsidiary aim is thus to vitalize the discussion concerning microtonal tunings
with input from experimental psychology. An interval recognition experiment was
conducted using five consecutive interval sizes from the 19-tone equal temperament.
In each trial, the subjects compared two melodic intervals that were both preceded
by the same "bass tone", which was supposed to create a simple tonal context
for the comparison. The five basic interval sizes were used in all 19 intervallic
relationships to the pitch-class of the context tone, and the last melodic interval
of each pair was subjected to systematic alterations. In the discussion of the
results, special interest will be directed towards differences in the perceptual
interpretation of non-diatonic intervals as a function of their location within
a simple tonal context.
POSTER
Towards a virtual singer : contribution to the development
of the teaching of sung diction through the intermediary of a functionalised
system
J Kiss
University Paris 8, Ati Inrev, St Denis, France
Learning to sing, whether within the construction of
the physical vocal pattern, or the interpretive preparation of a piece of music,
proceeds initially through observation and then imitation of a given physical
model. One of the principal difficulties in this mimetic process frequently
lies in the different perceptions of the singer and the listener. Without attempting
to compensate directly for this problem, we wish to propose an interactive system
that reacts to exterior data, thus facilitating the creation of a model of the
singer and at the same time real time simulation of his acts. We are offering,
within the framework of this undertaking, to create a basic virtual animation.
It is important at this stage to define the nature of the work envisaged, which
will not consist of an exhaustive project. We intend more specifically to describe
the means for encoding the vocal expression and its relation to the facial expression
in order to translate a feeling. We will then proceed to the development of
a computer generated synthetic singer, adopting as our method the use of a process
of interactive phonation represented by a synthetic actor with simplified expressions.
Initially this phonation will be limited to the expression of a small number
of phonemes and that this part of the process could become the basis of further
concrete developments and constitute the foundation for various potential extensions.
POSTER
Musical sound parameters revisited
B Kostek, P Zwan, M Dziubinski
Gdansk University of Technology, Sound & Vision Engineering,
Gdansk, Poland
Recently, a new standard MPEG7 was established. A set
of parameters was defined in order to represent musical sound as a multimedia
object. These so-called low level features are related to time and frequency
domain of musical sounds, as well as to audio waveform parameters. Among others
the following parameters were specified within the standard: log attack time,
temporal centroid, spectral flatness and spectrum spread, spectral centroid,
harmonic variation, etc. Some of these parameters are related to human cognition
of musical sounds whereas others not or the relationship between those parameters
and their perceptual meaning is not yet defined. Therefore the principal aim
of this paper is to carry on a discussion as to which of the parameters defined
within MPEG7 standard could be related to the musical timbre. This is done by
means of listening tests. Additionally, the set of parameters is used in experiments
consisting in automatic recognition of musical instruments sound and separation
of musical duets. Results of experiments are described and conclusions drawn
and included.
POSTER
Pitch measurements versus perception of South Indian
classical music
A Krishnaswamy
Stanford University, CCRMA / EE, Stanford, United States
The number and type of musical intervals used in Indian
music have been the subject of vigorous debate and controversy for many years,
primarily due to the lack of convincing experimental data to substantiate or
discredit existing theories. Recently, however, we presented detailed pitch
tracks of samples of South Indian music and offered insight into why perhaps
many people believe that South Indian music uses more than 12 distinct notes
per octave. We argued that certain pitch inflexions, which are not always mere
ornamentations, could be viewed as different "versions" of the same
"basic" notes they modify. We now address a few more related misconceptions
about South Indian music, some of which can be traced to how certain notes and
phrases are perceived by human listeners as opposed to what is actually being
played, and we present appropriate examples of annotated pitch tracks to illustrate
our arguments. For example, we consider the effects of using the same note names
or syllables to vocalize completely different pitches and inflexions. We also
examine the variability in intonation we observed, even in recordings of accomplished
musicians, and discuss how it relates to the overall perception of certain intervals
and inflexions.
ORAL
Structural analysis of listener's emotional responses
M Leman¹, V Vermeulen¹, L De Voogdt¹,
A Camurri², B Mazzarino³, G Volpe³
¹University of Ghent, IPEM, Ghent, Belgium; ²University
of Genova, D+IST (Laboratoria di Informatica Musicale), Genova, Belgium; ³University
of Genova, DIST (Laboratoria di Informatica Musicale), Genova, Belgium
The paper describes an experiment which aims at investigating
quantitative relationships between auditory stimuli and emotive/expressive responses
as recorded on listeners. The experiment starts from a previous one carried
out at IPEM and is the result of a joint work of the IPEM and DIST staffs in
the framework of the EU-IST project MEGA (www.megaproject.org).Tests were carried
out using a pool of 60 audio fragments with average duration of 30 seconds.Two
different approaches were considered to analyze listeners' responses: from the
one hand, a cognitive evaluation has been obtained by asking subjects to fill
in a form on which to rate the musical excerpts using a 15 dimensional semantic
space; on the other hand, sub-cognitive aspects have been addressed by recording
listeners' movement by means of a video-camera during the listening sessions.
Each subject participated in two listening sessions: the first one for the cognitive
evaluation, the second one for motion recording. Data analysis using statistical
techniques resulted in (i) reduction of the 15 dimensional semantic space to
the 3 dimensional space with base categories "Arousal", "Dominance"
and "Valence" emerged from the first experiment and (ii) significant
correlations between these three dimensions and a collection of audio and motion
cues extracted respectively from the musical excerpts and from the video recordings.
Finally, applications in audio mining and man/machine interaction are discussed
inbrief.
POSTER
User-dependent taxonomy of musical features as a conceptual
framework for musical audio-mining technology
M Lesaffre¹, M Leman¹, B De Baets², H
De Meyer³, J P Martensº
¹Ghent University, IPEM, dept. of Musicology, Ghent,
Belgium; ²Ghent University, KERMIT, dept. of Applied Mathematics, Biometrics
and Process Control, Ghent, Belgium; ³Ghent University, TWI, dept. of Applied
Mathematics and Computer Science, Ghent, Belgium; ºGhent University, ELIS,
dept. of Electronics and Information Systems, Ghent, Belgium
In musical audio-mining technology research is centered
on system development. Systems allow users to search and retrieve music by means
of content-based text and audio queries. Though these systems are promising,
there is need to a better understanding of the role of user preferences and
user profiles. The development of a conceptual taxonomy for musical descriptions
is a first step in bridging the gap between system and user. In the first part
of this paper, we clarify elements of signification related to spontaneous user
behavior and derived user groups, starting from a large-scale experiment with
72 users and 1148 vocal queries. Statistical analysis provides insight into
the characteristics of vocal querying such as used methods (singing lyrics,
singing syllables, humming, whistling), query length, query performance (melodic,
rhythmic) and effects of gender, age and musical education. In the second part
of the paper we describe the kind of user-dependent conceptual structures underlying
the taxonomy. The results aim at providing a better interface between the system
modules and a more intuitive interaction.
ORAL
Consistency in listeners' ratings as a function of listening
time
G Madison, B Merker
Uppsala University, Dept. of Psychology, Uppsala, Sweden
Ratings of adjectives describing possible experiential
properties is one of the most common dependent variables in music perception
research. The consistency of such ratings is typically high, allowing statistically
significant results to be based upon relatively small numbers of listeners (e.g.,
10-30). The wide range of sample durations employed in the literature (from
a few seconds to several minutes) raises the question of how sample length might
relate to the property one is attempting to measure. We ask as a first step
how the validity and reliability of adjective ratings might be affected by sample
duration in the short range up to 16 s.
Our aims were to show how the consistency among listeners, in terms of F-ratios,
varies as a function of music sample durations. Also, we check if mean ratings
change significantly, indicating that listeners are prone to alter the nature
of their jugdements as duration varies.
Stimuli were excerpts from 10 commercially available recordings of instrumental
jazz and ethnic ensemble music without vocal content. Each excerpt started at
the same position in the recording, but proceeded for either 0.5, 1, 2, 4, 6,
8, or 16 s. Listeners rated 14 adjectives in response to each music example
in a split-plot design (10 example ´ 7 duration).
POSTER
Verbal description of musical sound timbre in czech
language
O Moravec, J Stepanek
Music faculty of AMU, Sound Studio, Prague, Czech Republic
The words used for description of musical sound timbre
were acquired. The research was carried out among people with active relation
to the music (instrument players, conductors, composers, sound engineers etc.).
Each of the respondents have filled out the questionnary composed of two parts.
In the first part personal background of the respondent (respondent profile) was
collected, in the second part respondent wrote down particular expressions he
uses for timbre description, synonymic and antonymic relations among them. Common
frequency vocabulary and frequency vocabularies over the selected respondent classes
were created from the data. Similarity of vocabularies and their differencies
in dependency on the respondent class are studied.
POSTER
Development of a language for specifying saxophone timbre
A J Nykänen
Luleå University of Technology, Human Work Sciences,
Luleå, Sweden
When writing requirement specifications for musical
instruments, or when writing music, specifying the sound of the instrument is
of greatest interest. Today's notation of music leaves many aspects of sound
open for interpretation. This implies that it is necessary to know a music style
and time typical ideal to be able to perform music in the way intended by the
composer. To investigate how musicians use verbal descriptions of musical sounds,
interviews have been made with saxophone players. The results have been analysed
with respect to how frequent specific words have been used. Sounds described
by the most popular words have been analysed to search for physical aspects
of the sounds which could be coupled to the description. Since a large number
of words have been used to describe the sounds, and only a few of them were
used frequently by different musicians, the conclusion is that there is a lack
of common language to describe sounds. New ways to describe sounds are desirable,
since a larger set of descriptions will increase the possibility to choose a
convenient set of descriptions for musical sounds. Therefore, one new way to
describe sounds, based on vocal mimicking, is suggested and evaluated.
POSTER
Influence of duration of tone stationary part on perception
of starting transient
Z Otcenasek, J Stepanek, V Syrovy
Music faculty of AMU, Sound studio, Praha, Czech Republic
Sounds of different types of organ pipes were used for
the study of an influence of duration of the tone stationary part on starting
transient perception. Long sounding tone recordings were truncated from the
end in successive steps from original duration about 3 s to the length of 150
ms. Attack transients of tones remains unchanged, truncated tone ends were equally
modified with level decrease to silence (fade out). Sets of tones derived from
the same original tone were judged on dissimilarity in pairs, focusing on perception
of only initial transient part. Results are discussed for tones with various
types of transient and stationary part.
POSTER
Perceived influence of changes in musical instrument
directivity representation
F Otondo, B Kirkwood
Technical University of Denmark, Ørsted-DTU, Acoustic
Technology, Kgs. Lyngby, Denmark
The directivity representation of musical instruments
in room acoustic simulations has shown to be significant in the distribution
of the calculated acoustical parameters in a room. Different directivity representations
of musical instruments were used to create pairs of room acoustic auralizations
in order to test for perceived changes in the sound. Listening tests were designed
and conducted with an emphasis on the perception of the acoustical attributes
of room simulations. Results show that changes in the directivity representation
of the source can influence the perceived sound in auralizations and that these
perceived changes are more pronounced in terms of loudness and reverberance.
POSTER
Effects of the grouping cue on the pitch shift of a
pure tone induced by other tones
T Shirado¹, M Yanagida²
¹Communications Research Laboratory, Keihanna Research
Center, Kyoto, Japan; ²Doshisha University, Faculty of Engineering, Kyotanabe,
Japan
The pitch of a pure tone can be shifted when it is partially
masked by other tones. This phenomenon has principally been explained by place
models, including ones which assume a certain interaction between the excitation
patterns of each tone in a peripheral system. The purpose of our study was to
examine if the pitch shift of a pure tone induced by other tones of lower frequencies
is affected by the grouping cue of all the tones into a complex tone. We measured
the pitch of the target pure tone under two conditions. First condition, the
target tone was presented simultaneously with other tones of lower frequencies.
Second condition was the same as the first condition except that frequencies
of the lower tones were harmonically related to the frequency of the target
tone. The experimental results suggested that the pitch of the target pure tone
was affected by the grouping cue and it could not be sufficiently explained
by place models
POSTER
Listeners common and group perceptual dimensions in
violin timbre
J Stepanek, Z Otcenasek
Music Faculty of AMU, Sound Studio, Prague, Czech Republic
Results of listening tests of violin tones are studied
according to listeners and their perceptual models. Five sets of violin tone
recordings (pitch B3, F#4, C5, G5, D6) were used. Twenty experienced listeners
- violin players (Academy professors and students) assessed dissimilarity in
timbre in pairs of tones. The results of five listening tests (individual dissimilarity
matrixes) were separately processed using latent class approach (CLASCAL). This
approach yields to perceptual spaces of common dimensions shared with all listeners
and defines listener classes (groups); groups differ in dimension scales. Then
group perceptual spaces for all CLASCAL groups were calculated. Common and group
perceptual dimensions are compared using various methods.
ORAL
Neurocognition of music vs. speech sounds: EEG and fMRI
evidence
M Tervaniemi¹, S Kruck², A Szameitat²,
E Schröger³, W De Baeneº, K Alter², A Friederici²
¹University of Helsinki, Cognitive Brain Research
Unit, Department of Psychology, Helsinki, Finland; ²Max-Planck Institute
for Cognitive Neuroscience, Leipzig, Germany; ³University of Leipzig, Institut
für Allgemeine Psychologie, Leipzig, Germany; ºUniversity of Ghent,
Department of Psychololgy, Ghent, Belgium
In both music and speech, emotional information is transmitted
via changes in sound frequency, duration, and intensity. The present project
was conducted to compare in within-subject design the neural sound-change detection
in sound frequency, duration, and intensity presented in music vs. speech sounds.
The subjects were presented with music sound patterns and pseudowords. The pseudoword
/ba:ba/ was produced by a trained female native German speaker and thereafter
digitized. The first syllable was stressed as indicated by longer duration,
higher pitch, and higher intensity. The music sounds (digital samples of a saxophone
sound) mimicked /ba:ba/ in the pitch range and the sound duration. As deviants
among pseudoword and music sounds, three changes of the standard sound were
presented: pitch increase, duration decrease, and intensity increase. Subsequent
experiments were conducted with EEG and fMRI techniques.
The data indicate that the neural responses underlying the change detection
in music vs. speech sounds differ from each other in strength as well as in
their topography. Thus these data provide support for the existence of neural
specialization of the human brain to represent sounds with different informational
content.
POSTER
Pattern recognition of musical instruments using Hidden
Markov Models
R Ventura-Miravet, F Murtagh, J Ming
Queen's University Belfast, Computer Science, Belfast,
United Kingdom
Today there are a large amount of musical recordings
available from the Internet. Several useful applications emerge if users are
able to automatically search this data given particular musical content e.g.
different musical instruments. In order to search for a particular instrument,
we extract acoustic features given a sound sample and then match these features
against a database of all stored musical instruments by using pattern recognition.
Whilst relatively little research has been conducted on musical instrument recognition,
there already exists a large body of knowledge concerning speaker recognition.
This work is concerned with applying the techniques and modelling methodologies
that are used within speaker recognition to musical instrument recognition.
In particular, we present a study that aims to identify musical instruments
using statistical approaches that reflect the structure of the data. Given samples
of each musical instrument, we perform training to generate an N-state Hidden
Markov Model (HMM) that represents the spectral characteristics and time variability
of musical patterns. The results show that this approach, which is widely used
in speaker recognition, offers a powerful technique for feature extraction and
instrument recognition given musical data. We obtain an average recognition
accuracy rate of approximately 94% when discriminating between six different
musical instruments.
ORAL
Influence of rhythmic, melodic, and semantic violations
in language and music on the electrical activity in the brain
S Ystad¹, C Magne², S Farner¹, G Pallone¹,
V Pasdeloup³, R Kronland-Martinet¹, M Besson²
¹CNRS, Laboratoire de Mécanique et d'Acoustique,
Marseille, France; ²CNRS, Physiological and Cognitive Neurosciences Institute,
Marseille, France; ³University of Rennes II, Department of linguistics,
Rennes, France
The work presented here is part of a larger multidisciplinary
project associating audio signal processing, linguistics, and cognitive sciences.
It aims at comparing and better understanding music and language processing
in the brain. From a music and speech synthesis point of view, this is important
when striving for naturalness and expressiveness in synthesized music and language.
As a first experiment towards this goal, we developed and used a method to extend
a given part of an audio signal without altering the timbre. This allows manipulations
of the rhythm and was applied to alter syllable lengths in the language. To
make a similar experiment related to music, note duration and melody of musical
sequences were digitally modified by altering MIDI codes. Participants were
presented with linguistic and musical phrases, and the final words or notes
were either semantically congruous or incongruous (e.g., "I take coffee
with sugar/dog"). Moreover, the second last or the last syllables or notes
were increased in duration, in order to produce rhythmic incongruities. Thus,
the two factors rhythm and semantics/melody were independently manipulated.
Changes in brain electrical activity were measured with electrodes on the scalp
(Event-Related Potential or ERPs). Preliminary results show that semantic and
melodic violations elicited different ERP components, while the rhythmic violations
evoked similar ERPs. Final results and developed techniques will be discussed
in the paper.
| |
|
 |
|
 |
|