From the Speech group:

NICO ANN Toolkit
Higgins Annotation Tool
Resources for Automatic Speech Recognition

From the Music Acoustics group:

Automatic Music Performance
PlayRec (NEW)
RealSimPLE (NEW)
SMP Tools


WaveSurfer is an Open Source tool for sound visualization and manipulation. It has been designed to suit both novice and advanced users. WaveSurfer has a simple and logical user interface that provides functionality in an intuitive way and which can be adapted to different tasks. It can be used as a stand-alone tool for a wide range of tasks in speech research and education. Typical applications are speech/sound analysis and sound annotation/transcription. WaveSurfer can also serve as a platform for more advanced/specialized applications. This is accomplished either through extending the WaveSurfer application with new custom plug-ins or by embedding WaveSurfer visualization components in other applications.

WaveSurfer home page - Download page - WaveSurfer manual


The Snack Sound Toolkit is designed to be used with a scripting language such as Tcl/Tk or Python. Using Snack you can create powerful multi-platform audio applications with just a few lines of code. Snack has commands for basic sound handling, e.g. sound card and disk I/O. Snack also has primitives for sound visualization, e.g. waveforms and spectrograms. It was developed mainly to handle digital recordings of speech, but is just as useful for general audio. Snack has also successfully been applied to other one-dimensional signals. The combination of Snack and a scripting language makes it possible to create sound tools and applications with a minimum of effort. This is due to the rapid development nature of scripting languages. As a bonus you get an application that is cross-platform from start. It is also easy to integrate Snack based applications with existing sound analysis software.

Snack home page - Download page - Snack manual

ESPS source code from the ESPS/waves+ package

The ESPS Toolkit has been licensed to the Centre for Speech Technology thanks to a generous donation from Microsoft and AT&T. An archive of source files is available for download. Only source code from the ESPS library is provided, no source code for waves. Note that the archive is provided "as is". Neither the authors nor the distributors can provide any form of support for this code. Please refer to the included file README.TXT for a full disclaimer. This software is licensed using a BSD style license, see the included file BSD.txt.

Download file

The NICO ANN tool-kit

The NICO Toolkit is an artificial neural network toolkit specifically designed and optimized for automatic speech recognition applications. Networks with both recurrent connections and time-delay windows are easily constructed. The network topology is very flexible -- any number of layers is allowed and layers can be arbitrarily connected. Tools for extracting input-features from the speech signal are included as well as tools for computing target values from standard phonetic label-files.

NICO toolkit home page at SourceForge

The Broker System

The Department of Speech, Music and Hearing needed to develop applications reusing functionality in existing program modules, in some cases distributed over several machines connected to the Internet. This required a method for interprocess communication (IPC) between the modules. The Broker is a server which forwards function calls, results and error codes between program modules over the Internet, and it should fulfill the following criteria:

  • Easy to use in programs
  • Platform independent
  • Uniform interface for all modules

The Broker home page - Download page - Broker documentation

SMP Tools

These programs are made available to you by Svante Granqvist

  • CircFFT
    An interesting way to display FFTs in real time which may be musically relevant
  • Madde
    An additive, real-time, singing synthesiser
  • Swing
    A metronome with adjustable swing factor
  • RTSect
    A real-time dual channel spectrum display
  • TombStone
    A sweep generator/recorder for loudspeaker and microphone measurement
  • Tone
    A real-time tone generator

The SMPTool home page - Home page of Svante Granqvist

Software for Automatic Music Performance

Director Musices

Director Musices (DM) is a program implementing all previously defined rules. Features in DM includes polyphony, midi input/output, performance variable graphs and user rule definition. It is available for GNU/LInux, Macintosh and Windows.


JAPER is a Java applet that can run both under Windows or PowerMac systems. It works in real-time with the MIDI system/hardware of the client machine.


Melodia is a freeware program for Windows 3.* and upper versions to perform music scores. Melodia can load files in different formats (MIDI, CSound, Melodia, Adagio).

Music performance download page

Playrec: Multi-channel Matlab Audio

Playrec by Robert Humphrey was first written during his degree project work here at TMH. Playrec is a Matlab utility (MEX file) that provides simple yet versatile access to soundcards using PortAudio, a free, open-source audio I/O library. It can be used on different platforms (Windows, Macintosh, Unix). It accesses the soundcard(s) via different host API's, including ASIO, WMME and DirectSound under Windows. In particular, Playrec supports non-blocking, continuous, synchronous, multichannel audio I/O for Matlab.

PlayRec official webpage


RealSimple is a free physics teachers' resource for doing acoustics experiments related to music. It contains instructions for building low-cost experiment setups, free software and guidance for doing live signal analysis on classroom computers, and various computer simulations of vibrations in strings and pipes. There is a user forum which is monitored by the KTH Music Acoustics group. The materials also have in-context connections to the larger acoustics and signal processing knowledge base at Stanford University (CCRMA). The Swedish RealSimple site is targeted toward upper secondary school (gymnasium), while the RealSimple web at Stanford is more oriented toward tertiary education. RealSimple was produced with support from the Wallenberg Global Learning Network.

RealSimPLE at KTH - RealSimPLE at Stanford

Higgins Annotation Tool (HAT)

The Higgins Annotation Tool can be used to transcribe and annotate speech with one or more audio tracks (such as dialogue). For each audio track, a number of audio segments are defined. Each audio segment can then be transcribed. Within each transcription, text segments (such as syntactic phrases) may be defined. A set of feature-value pairs may then be annotated for the tracks, audio segments and text segments. The annotation is saved in an XML format.

The HAT home page

Published by: TMH, Speech, Music and Hearing

Last updated: 2013-10-21