Dataset usage

#1
by JonasWeinert - opened

Fascinating idea! Which dataset did you use for fine tuning?

I use video dataset I collect it. I try to do continuous sign language recognition task but I got very bad accuracy, my dataset is skeleton video extracted from the original dataset. any suggestions?

Is it words or sentences? I reckon the temporal dimension makes it tricky/ requires a massive amount of samples. How do you extract the skeletons? Do you model time simultaneously or separately?

It is sentences, each sentence have 3 to 5 word. I extract skeleton by mediaPip and redraw skeleton key points only and use it as input video. I have 15 sentence. number of videos for each sentence is between 24 to 38 video. I follow the example code at "video classification using transformers" provided by hugging face site.

Sign up or log in to comment