Avisoft logo

SASLab Manual

Main window : Tools > Labels > Classify labeled sections

Labeled elements can be classified either by means of a spectrogram cross-correlation procedure (option Spectrogram Templates) or by means of a frequency contour comparison that employs a least mean square criterion (option Frequency Contour Templates).

Spectrogram Templates

This classification option allows identifying syllables by using spectrogram cross-correlation. A user-defined set of template spectrograms is compared with each of the labeled sections in a soundfile (the sound event labeling can be done either manually or automatically. The comparison is done by a two-dimensional spectrogram cross-correlation. The name of the template spectrogram that provides the maximum correlation coefficient (the highest degree of similarity) is considered as the class membership of the unknown syllable. An additional threshold comparison on the cross correlation coefficient allows to increase the reliability of the results. If none of the template spectrograms provides a correlation coefficient that is larger than that user-defined threshold, the sound event will be labeled as unidentified.

The template spectrograms that have previously been saved from the spectrogram window command File > Save Spectrogram (ASCII/Binary)... as .son files can be selected from the Select... button or simply by drag-and-drop. The launched File Open dialog allows to select several files at once (by using the or keys). All template spectrograms must reside in the same folder. The templates must have been created with corresponding spectrogram parameters (same FFT length, frame size, overlap, samplerate). Also, the sample rate of the sound file to be examined must correspond to the sample rate at which the template spectrograms have been created. If the option use all template files in the selected folder is activated, all the .son files in the selected folder will be used as templates.

In order to reject low-frequency noise that might disturb the correlation procedure, a high-pass cutoff frequency can be specified. All spectrogram components below that frequency will be ignored. Similarly, the low-pass cutoff frequency setting will exclude any signal components above the specified frequency.

For tolerating slight frequency deviations between the templates and the unknown sounds, a max frequency deviation can be specified. Depending on the frequency resolution of the template spectrograms and the specified maximum deviation, the cross-correlation will be repeated for various frequency shifts. The maximum correlation coefficient is taken as the similarity score. The cross-correlation algorithm is similar to that of the Avisoft-CORRELATOR application.

The identification threshold determines the rejection of poor identification results. This threshold is applied directly to the cross-correlation coefficients. Any sound event that only provides coefficients below that threshold will be regarded as unidentified (“?”). Higher thresholds will provide more reliable classification results. However, slight deviations between the template and the sound under question might also prevent the proper recognition of class memberships. Adding more templates that cover as much variations as possible can compensate this problem. In order to get unified class names for all variations, the filenames of the spectrogram templates (.son) that should belong to the same class must start with identical names. The unified class names must be separated from the rest of the filename by either a dot (.) or an underscore (_). The following set of templates would create two classes only (e1 and e2). e1_v1 and e1_v2 represent variations that refer to the same class.


The option create automatic classes will perform a cluster analysis in such a way that new templates are being created automatically each time the identification threshold is not exceeded with the current set of templates. In this mode of operation, it is not required (but it is still possible) to define template files manually because they can all be created automatically. The template files will then be saved by default into the Avisoft Bioacoustics documents folder or alternatively into the manually specified folder (simply drag the desired folder into the Classification Settings dialog box). The button Delete old c*.son files will delete any automatically created .son template files that are present in the current template folder.

The results of the classification procedure can be assigned directly to the underlying section labels. The option replace label texts with class names is responsible for these assignments. The section labels of unidentified (?) elements can be removed by activating the option remove unidentified sections. The option keep text of unidentified labels will preserve the original label text in case it could not be identified (otherwise the label will receive the ? character). The option ignore labels with text string (except ? and numbers) will exclude those labeled sections that already have a text string with at least one letter in it (only labels with text strings such as ?, 1 or 189 will be processed and labels with strings such as e1 or species a will be excluded from the classification procedure.

Due to the frame-based generation of the spectrograms, there will be (depending on the selected FFT length) short missing intervals at the beginning and end of each label. This effect can be compensated by activating the option compensate for FFT window size, which will extend the boundaries of the created spectrogram (by shifting the start and end positions appropriately). The related zero padding option will set these margins (FFT length / 2) to zero in order to prevent potential overlapping with adjacent sounds.

In case multiple templates have been defined for each class, the option classify based on averaged correlation values will perform the classification based on the maximum average correlation coefficient of all templates belonging to each class.

The option max duration difference between template and section label allows limiting the template comparison to templates having a duration similar to the duration of the section label to be examined. This can help prevent miss-classifications due to high correlation values between long complex sound elements that would otherwise produce high correlation values when compared to short and less complex templates. The maximum duration difference threshold [unit ms] should be selected according to the typical durations. In order to make it work, it is necessary that the total duration of each template spectrogram reflects the duration of the sound element in it (there should be uniform margins) and that the section labels to be examined exhibit margins similar to those in the templates.

The section label layer list box determines which labels will be used for the classification procedure.

Frequency Contour Templates

This classification method compares the frequency contours of the labeled sections with template frequency contours. The similarity between the labeled sections and the templates is determined by calculating least mean squares. The frequency contour templates are defined by .ft files that can be created by the Graphic Synthesizer module (the command File > Save As... saves .ft files that are part of an arrangement (.ARR)). The frequency contours itself can be created either manually in the Graphic Synthesizer by mouse-drawing or they can be derived from spectrograms (spectrogram window command Tools > Scan frequency contour and amplitude envelope...). The frequency contours of the labeled sections are determined by searching the peak frequencies on the spectrogram, which means that this kind of classification will only work for whistle-like vocalizations that exhibit clear frequency contours. The advantages of this method are that the calculation of the similarity score is fast and that the time scale can be varied by applying the time warping option.

The final classification is done by calculating the root mean square of the frequency differences between the two contours. This means that the marked section receives the class name of the template that provides the minimum overall frequency deviation. In case the durations of the template and the detected call differ by more than a predefined percentage (as specified in the +- xxx % field on the section titled duration), then this detected call will be skipped. The option normalize duration aligns the time scales of the templates before the similarity score is calculated. The option time warping will repeat the correlation at different time scales.

The option Filter Classification Results allows to pick out a single class out of the classified labels (by deleting all section labels that have been assigned to other classes). The take class list box selects the desired class. The option replace label texts with index will assign the running index to the filtered labels.

The Default button will set all parameters to their default settings.

The classification procedure is initiated by clicking at the Start button.

The results will be displayed on the Classification Report window.

Avisoft Bioacoustics last modified on 10 September 2019