Audio Feature Extraction Toolboxes

I have recently been working on Evaluation of Audio Feature Extraction Toolboxes. I have had a paper accepted to DAFx on the subject. While there are a range of ways to analyse and each feature extraction toolbox, the computational time can be an effective evaluation metric. Especially when people within the MIR community are looking at larger and larger data sets. 16.5 Hours of audio, 8.79Gbs of audio, was analysed, and the MFCC’s using eight different feature extraction toolboxes. The computation time for every toolbox was captured, and can be seen in the graph below.

Time(s) Aubio	742 Essentia	252 jAudio	840 Librosa	3216 LibXtract	395 Marsyas	526 MIR	1868 YAAFE	211
The MFCCs were used, as they are a computational method, that exists within nine of the ten given tool boxes, and so should provide a good basis for comparison of computational efficiency. The MFCCs were all calculated with a 512 sample window size and 256 sample hop size. The input audio is at a variety of different sample rates and bit depths to ensure that variable input file formats is allowable by the feature extraction tool. This test is run on a MacBook Pro 2.9GHz i7 processor and 8Gb of RAM.

More information will be available in my upcoming paper “An Evaluation of Audio Feature Extraction Toolboxes” which will be published at DAFx-15 later this year.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>