You can write your own recognition and analysis plugins pretty easily and then overlay those on the spectrogram so you get a sense of what your program is doing and why it is going wrong. I don't think I could have ever successfully gotten a recognizer working if I hadn't found Sonic Visualiser. It's awesome.
Been using this for years to recognise chords with the help of the Chordino extension. Great software, but the controls are a bit cumbersome (scrolling, zooming etc). Further, when you do audio editing, you rarely work at 0dB, so i've had many a shock tabbing into Sonic Visualiser and pressing play.
That's good to know -- if you'd like to be any more detailed (e.g. how would you ideally like it to handle levels in order not to blow your ears off, or where are the controls most awkward) that would also be useful.
I Love Sonic Visualiser. I use it all the time for musical analysis to slice audio files for playback in SuperCollider. I only wish it had an annotation layer text file format like Praat does for phonetic analysis of speech audio--I know you can easily export CSV files, but the way Praat uses textgrids always felt more intuitive.
I would also highly recommend using Sonic Visualiser to prototype an analysis pipeline, then automate the analysis with the Python vamp plugin host [1]. I think there is a command line interface called Sonic Annotator, but I never used it.
Yes, the .svl file is a single layer from the session format, in XML. (The session format is itself an XML file, but compressed with bzip2 compression.)
One thing SV does lack is a layer file format that can easily be interchanged between SV itself (for label alignment) and a text editor (for bulk text changes). Neither is there any built-in text editor for editing the whole content of a text transcription at once. I suppose this has to do with the initial focus being on music rather than speech, and having no particular desire to "compete" with Praat.
There's a big difference between just showing a bucketed Fourier transform of the whole audible spectrum, and showing a high-resolution, log-scale, musically-informed visualization of the part that can be heard as musical notes.
More detailed dedicated visualisations, broader support for analysis plugins, and a focus on visualisation and analysis work rather than editing. (Audacity can use some of the same analysis plugins though.)
You can write your own recognition and analysis plugins pretty easily and then overlay those on the spectrogram so you get a sense of what your program is doing and why it is going wrong. I don't think I could have ever successfully gotten a recognizer working if I hadn't found Sonic Visualiser. It's awesome.