Higgins Annotation Tool

Copyright © 2009-2011, Gabriel Skantze.

The Higgins Annotation Tool can be used to transcribe and annotate speech with one or more audio tracks (such as dialogue).

For each audio track, a number of audio segments are defined. Each audio segment can then be transcribed. Within each transcription, text segments (such as syntactic phrases) may be defined. A set of feature-value pairs may then be annotated for the tracks, audio segments and text segments.

The annotation is saved in an XML format (schema). To start a new annotation, you can either open a wav-file (in stereo or mono) directly in the tool (choosing "New..." form the File menu), or precompile the annotation XML file and open it. To open and annotate more than one audio file simultaneously, you must choose the latter option.


Binaries for Windows (a single .exe file):
Download Higgins Annotation Tool (updated 2011-03-10)

Keyboard shortcuts

Control-Space Play selection, current track
Control-Alt-Space Play selection/segment, all tracks
Play selection/segment from 10%, 20% ... 90%, current track.
Play selection/segment from 10%, 20% ... 90%, all tracks.
Esc Stop playing
Control-A Create audio segment from waveform selection
Control-T Create text segment from text selection
Down Move to next audio segment
Up Move to previous audio segment
Control-Down Move to next audio segment in the same track
Control-Up Move to previous audio segment in the same track
Alt-Down Move to next non-segment
Alt-Up Move to previous non-segment
Control-B Move cursor from feature annotation to transcription
Control-Z Zoom selection


Automatically segmenting audio

As part of the IrisTK dialogue system framework, there is a tool for automatically creating the audio segments, using a voice activity detector. If you install IrisTK, you should be able to run (from the command line):

iristk makehat -i speaker1.wav speaker2.wav -o annotation.xml -e 20

You may have to change the energy threshold (20), depending on the volume used when recording. You can also change the end silence threshold (to for example 1000 milliseconds) by adding "-s 1000". All options are listed if you just run "iristk makehat".

You can also do automatic transcription using the Nuance cloud-based recognizer. To do this, you first have to register for a free developer account. Then insert your app-id and app-key in the IrisTK/addon/Dragon folder, according to the instructions in the readme file located there. If everything is done correctly, you should be able to do recognition by adding "-r en-us" to the makehat command.


If you have any questions regarding the tool, please contact Gabriel Skantze:

Published by: TMH, Speech, Music and Hearing

Last updated: 2015-01-22