Structural Segmentation of Dhrupad Vocal Bandish Audio

Supplementary material for poster presentation at ISMIR 2020




Video presentation at ISMIR 2020


(top)


Example audio from a section in a concert

The ground truth metric tempo (m.t.), surface tempo (s.t.) and section boundary annotations for a portion of an audio in the dataset. Below are audios from the section within the big dashed blue rectangle.

Mixture
Vocals
Pakhawaj

Audio source: https://musicbrainz.org/recording/178b4cf6-88e6-414d-bfbd-3d90bb368a9a


(top)


Examples of different surface tempo multiples

Each figure below is a spectrogram of an 8-second example (Frequency in Hz on y-axis and time in seconds on x-axis).

A dashed box (in white) of width 2.5 seconds is shown on each spectrogram to highlight the differences in onset densities between the examples. The m.t. in all the examples is between 50 and 60 BPM, so in each case, the number of strokes / syllables we expect to see within the dashed rectangle is roughly equal to twice the s.t.m. value. The inset in figure (d) of width 1 second highlights the distinct nature of vocalisation at s.t.m 8.


(a) Mixture, s.t.m. = 1
(b) Vocals, s.t.m. = 2
(c) Vocals, s.t.m. = 4
(d) Vocals, s.t.m. = 8

(e) Pakhawaj, s.t.m. = 2
(f) Pakhawaj, s.t.m. = 4
(g) Pakhawaj, s.t.m. = 8
(h) Pakhawaj, s.t.m. = 16

(top)


YouTube playlist of some select Dhrupad vocal music

(The last couple of videos are lecture demonstrations which offer a detailed explanation of the music form)


(top)


Some other model architectures that were experimented with

[1] Schreiber et al, "A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network", ISMIR 2018
(a) - (g) Training (in blue) and validation (in orange) loss curves for the different architecture variations (inset: the last 70 epochs). (h) Training and validation accuracies at the minimum validation loss for each model.

(top)


Musical description of the bandish sections

A table describing each section in a bandish

(top)


References

  • M. Clayton, Time in Indian Music: Rhythm, Metre, and Form in North Indian Rāg Performance. Oxford, England: Oxford University Press, 2000.
  • M. A. Rohit and P. Rao, “Structure and automatic segmentation of Dhrupad vocal bandish audio,” Unpublished technical report, arXiv:2008.00756 [eess.AS],2020.

(top)