View as pdf

Dataset

Detailed description of ragas in the dataset

RAGASCALECOMMENTS
BageshreeS R g m P D nDistinguished; tanpura tuned to Ma (4)
BaharS R g m P D n NSuggests beauty and joyous atmosphere of spring; upper tetrachord, medium-fast tempo
Bliaskhani TodiS r g m P d nMost movements in lower half of middle octave
JaunpuriS R g m P d nMost movements in upper half of middle octave
KedarS R G m M P D NSerious; melodic movement not straightforward
MarwaS r G M D NMajor (important) raga; dominance of Re (2); movement can be straightforward
Miyan ki MalharS R g m P D n NAssociated with rainy season; serious and slow, focus on lower tetrachord
NandS R G m M P D NZigzag movements, especially in descent
ShreeS r G M P d NSerious and awesome; complex melodic movements involving large intervals; zigzag descent

Table S1: The pitch sets and characteristics of the nine ragas in our dataset (the latter are based on the descriptions in the Music in Motion website. For the scale, lower case letters refer to the lower (flatter) alternative and upper case to the higher (sharper) pitch in each case. This is an extension of Table 2 from the paper.

Dataset Description

RAGA/SINGERSCHCCAGSUM
Bageshree 3339
Marwa 43310 
Bahar 3328
Kedar 53210 
Shree 4329
Nand 3 328
Miyan ki Malhar 5 3311 
Jaunpuri 43310 
Bilaskhani Todi 44311 
Sum35282386

Table S2: Number of pieces per singer-raga where piece refers to an alap or pakad recording. Most individual entries in this table comprise of 2 alap and 1 pakad pieces where the alap duration ranges from 165-221 s and pakad from 18-96 s.

OpenPose skeleton tracking:

Figure S1: Figure indicates the bounding box over which the movements of the singer is normalized from video data. The box is defined by the most extreme movements of 11 keypoints (eyes, nose, neck, shoulders, mid-hip, elbows and wrists) along the x and y axes. The x coordinates are normalized over the width of the box and the y coordinates are normalized over the height of the box.

Representation of time series from audio/video data

Figure S2 (a): Voicing feature for an example (CC_3b_MM_42) from piece ‘CC_3b_MM’ in our dataset.

Figure S2 (b): Pitch feature for an example (CC_3b_MM_42) from piece ‘CC_3b_MM’ in our dataset.

Figure S2 (c): Video features including the x and y positions of the left and right wrists for an example (CC_3b_MM_42) from piece ‘CC_3b_MM’ in our dataset.

Train-Val splits in terms of pieces

Seen singer Splits:

Val (Singer 1) = Alap_take_1 (Singer 1)

Train (Singer 1) = Alap_take_2 (Singer 1) + Pakad (Singer 1) + All_Pieces (Singer 2) + All_Pieces (Singer 3)

Unseen singer Splits:

Val (Singer 1) = All_Pieces (Singer 1)

Train (Singer 1) = All_Pieces (Singer 2) + All_Pieces (Singer 3)