How precisely you can you finetune songs with the Yass editor?
To test the accuracy of the Yass editor and compare it with others, I have created the Yass Accuracy Test with an MP3 audio file and a karaoke text file.
The MP3 contains 440khz tones in a variety of lengths: 10s, 1s, 500ms, 400ms, 300ms, 200ms, 100ms, 80ms, 50ms, 30ms, 10ms. Tones are separated with one second silence. The MP3 file has a framerate that defines how precisely players can seek a position in the audio stream. Here, we have 38fps corresponding to roughly 26ms per frame. When you check the waveform (for example in Audacity, see tools), you can see that this resolution condition slightly affects the real position and length of the tones.
The karaoke text file has a resolution of 300 bpm equivalent to 50ms per quarterbeat; this defines how accurate you can position notes. A tone that lasts 100ms will span two or three quarterbeats depending on whether it starts exactly at a quarterbeat or inbetween two quarterbeats. A 10ms tone may span one or two quarterbeats, that is 50ms or 100ms in time. In your tests you can double the bpm in the editor by pressing Ctrl-D, but remember that real songs should not exceed 500 bpm for performance reasons.
The notes were finetuned with Yass 0.9.7. First I covered all silent sections with notes, precisely that neither start and end contains any sound. Then I filled the remainder with notes for the individual tones. I found that what I see (the waveform) and what I hear (in Yass and in the Ultrastar Deluxe editor) is what I wanted to get.
Secondly I recreated the same finetuning with the Ultrastar Deluxe editor (USDX, see players). The USDX version is also included in the package. To compare, select both versions in the Yass library, edit them and enter multi-line view (press page-down). You can see that all notes have the same position and length in both versions, except one – the tone with 30ms length. Why that?
To understand what happens at that specific tone, I separated the 30ms into a single line and switched to single-line view. In the figure above you can see that in Yass (bottom notes, filled) the note ends 1s after the waveform, while in USDX (upper notes, outlined) the note starts 1s after the waveform. Neither version seems correct; correcly, the 30ms-note should span the entire waveform, that is, two quarterbeats.
I doubled the resolution (press Ctrl-D) to 1200 BPM and played single 12,5ms sections. By that I learned that there are gaps, silent frames, in the MP3, where they shouldn’t. My derivation was that these frames are used as bit reservoir (an MP3 Layer III optimization) and the actual tone information is stored in an earlier frame, outside of the played section. I therefore re-encoded the MP3 with the RazorLame frontend (see tools), using the “disable bit reservoir” option (-nores), et voilá, my assumption was right: now the 30ms is spanned exactly with three 12,5ms sections.
Conclusion: For now I recommend using MP3 files without bit reservoir. Another solution would be that someone enhances the seek implementation in the player that I use (javazoom.jl.player.advanced.AdvancedPlayer). Playback in Yass 0.9.7 currently determines the latest frame before the section to be played (seek-in), and the first frame after the section (seek-out), and then calls AdvancedPlayer.play(seek-in, seek-out).
For further testing, you can set seek-in/out frame offsets in the preferences. The offsets you set will be added before playback. Tell me if you have insights going beyond this article, or if you disagree with the arguments presented here.