bextract is one of the most powerful
executables provided by Marsyas. It can be used for complete feature
extraction and classification experiments with multiple files. It serves
as a canonical example of how audio analysis algorithms can be expressed
in the framework. This documentation refers to the latest refactored
version of bextract. The old-style
bextract using the -e
command-line option to specify the feature extractor is still supported
but use of it discouraged.
Suppose that you want to build a real-time music/speech descriminator based on a collection of music files named music.mf and a collection of speech files named speech.mf. These collections can either be created manually or using the mkcollection utility. The following commandline will extract means and variances of timbral features (time-domain Zero-Crossings, Spectral Centroid, Rolloff, Flux and Mel-Frequency Cepstral Coefficients (MFCC) over a texture window of 1 sec.
bextract music.mf speech.mf -w ms.arff -p ms.mpl -cl GS bextract ms.mf -w ms.arff -p ms.mpl bextract -mfcc classical.mf jazz.mf rock.mf -w genre.arff
The first two commands are equivalent assuming that ms.mf is a labeled collection with the same files as music.mf and speech.mf. The third-command specifies that only the MFCC features should be extracted and is an example of classifying three classes.
The results are stored in a ms.arff which is a text file storing the feature values that can be used in the Weka machine learning environment for experimentation with different classifiers. After a header describing the features (attribute in Weka terminology) it consists of lines of comma separated feature values. Each line corresponds to a feature vector. The attributes in the generated .arff file have long descriptive names that show the process used to calculate the attribute. In order to associate filenames and the subsequences of feature vectors corresponding to them each subsequence corresponding to a file is prefixed by the filename as a comment in the .arff file. It is a text file that is straighforward to parse. Viewing it in a text editor will make this clearer.
In addition to Weka, the native Marsyas kea tool kea can be used to perform evaluations (cross-validation, accuracies, confusion matrices) similar to Weka although with more limited functionality.
At the same time that the features are extracted, a classifier (in the example above a simple Naive Bayes classifier (or Gaussian)) is trained and when feature extraction is completed the whole network of feature extraction and classification is stored and can be used for real-time audio classification directly as a Marsyas plugin stored in ms.mpl.
The resulting plugin makes a classification decision every 20ms but
aggregates the results by majority voting (using the Confidence
MarSystem) to display time-stamped output approximately every 1
second. The whole network is stored in ms.mpl which is loaded into
sfplugin and file_to_be_classified is played and classified at the same
time. The screen output shows the classification results and
confidence. The second command shows that the live run-time
classification can be integrated with
bextract. In both cases
collections can be used instead of single files.
sfplugin -p ms.mpl music_file_to_be_classifed.wav sfplugin -p ms.mpl speech_file_to_be_classifed.wav bextract -e ms.mf -tc file_to_classified.wav bextract -e ms.mf -tc collection_to_classified.wav
Using the command-line option -sv turns on single vector feature extraction where one feature vector is extracted per file. The single-vector feature representation is useful for many Music Information Retrieval tasks (MIR) such as genre classification, similarity retrieval, and visualization of music collections. The following command can be used to generate a weka file for genre classification with one vector per file.
./bextract -sv cl.mf ja.mf ro.mf -w genres.arff -p genres.mpl
The resulting genres.arff file has only one feature vector line for each soundfile in the collections. In this case where no -cl command-line argument is specified a linear Support Vector Machine (SVM) classifier is used instead.
Feature sets refer to collections of features that can be included in the feature extraction. It includes several individual feature sets proposed in the MIR and audio analysis literature as well as some common combinations of them. (for details and the most updated list of supported sets experienced users can consult the selectFeatureSet() function in bextract.cpp). The feature sets can be separated into three lagre groups depending what front-end is used: time-domain, spectral-domain, lpc-based.
The following feature sets are supported (for definitions consult the MIR literature, check the corresponding code implementations and send us email with question for details you don't understand) :
By default stereo files are donwmixed to mono by summing the two channels before extracting features. However, bextract also supports the extraction of feature based on stereo information. There are feature sets that can only be extracted from stereo files. In addition it is possible to use any of the feature sets described above and extract features for both left and right channels that are concatenated to form a feature vector.
For example the first command below calculates MFCC for both left and right channels. The second command calculates the Stereo Panning Spectrum Features which require both channels and also the Spectral Centroid for both left and right.
bextract -st -mfcc mymusic.mf -w mymusic.arff bextract -spsf -st --SpectralCentroid -w mymusic.arff
The feature extraction can be configured in many ways (only some of which are possible through command-line options). The following options can be used to control various aspects of the feature extraction process (most of the default values assume 22050 Hz sampling rate):
bextract also supports a mode, called the Timeline mode that allows labeling of different sections of an audio recording with different labels. For example, you might have a number of audio files of Orca recordings with sections of voiceover, background noise, and orca calls. You could train a classifier to recognize each of these types of signal. Instead of a label associated with each file in the collection there is an associate Marsyas timeline file (the format is described below). To run bextract in Timeline mode, there are two steps: training and classifier:
bextract -t songs.mf -p out.mpl -pm Where: -t songs.mf - A collection file with a song name and its corresponding .mtl (Marsyas Timeline) file on each line -p out.mpl - The Marsyas Plugin to be generated -pm - Mute the output plugin
and predicting labels for a new audio recording
sfplugin -p out.mpl songmono.wav Where: -p out.mpl - The plugin output by bextract in step #1
The songs.mf file is Marsyas collection file with the path to song (usually .wav) files and their corresponding Marsyas Timeline (.mtl) files on each lines. Here is an example song.mf file:
/path/to/song1.wav \t /path/to/song1.mtl /path/to/song2.wav \t /path/to/song2.mtl /path/to/song3.wav \t /path/to/song3.mtl
Please note that the separator character \t must be an actual tab, it cannot be any other kind of whitespace.
The .mtl format has three header lines, followed by blocks of 4 lines for each annotated section. The format is:
HEADER: ------- number of regions line size (=1) total size (samples) FOR EACH SAMPLE: ---------------- start (samples) classId (mrs_natural) end (samples) name (mrs_string)
3 1 2758127 0 0 800000 voiceover 800001 1 1277761 orca 1277762 2 2758127 background
Because the .mtl file is kind of obtuse, we have written a small Ruby program to convert Audacity label files to .mtl format. This script can be found at marsyas/scripts/generate-mtl.rb. The script is currently hardcoded to recognize the chord changes from songs from the annotated Beatles archive, but you can easily change this by modifying the "chords_array" variable.