Japanese Corpus Queries

#1
by ShortText - opened

The Japanese corpus which T5 or MT5 is using, is it based on Kanji texts or everything mixed (Kanji, Katakana, hirangana) ?

I used mixed data.
The data used were the Japanese dump data from Wikipedia, the Japanese corpus from OSCAR, and the Japanese corpus from CC-100.

sonoisa changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment