Spaces:

awacke1
/

Bloom.Big.Science.Continual.Generator

Sleeping

App Files Files Community

awacke1 commited on Feb 19, 2023

Commit

f21323a

1 Parent(s): 7d9e94d

Update app.py

Browse files

Files changed (1) hide show

app.py +34 -0

app.py CHANGED Viewed

@@ -61,6 +61,40 @@ Here is an outline of some of the most exciting recent developments in AI:
   6. [217 other models optimized for use with Bloom](https://huggingface.co/models?other=bloom)
 - 📚 Datasets:
   1. [Universal Dependencies](https://paperswithcode.com/dataset/universal-dependencies)
   2. [WMT 2014](https://paperswithcode.com/dataset/wmt-2014)
   3. [The Pile](https://paperswithcode.com/dataset/the-pile)

   6. [217 other models optimized for use with Bloom](https://huggingface.co/models?other=bloom)
 - 📚 Datasets:
+- Universal Dependencies: A collection of annotated corpora for natural language processing in a range of languages, with a focus on dependency parsing.
+  - [Universal Dependencies official website.](https://universaldependencies.org/)
+- WMT 2014: The fourth edition of the Workshop on Statistical Machine Translation, featuring shared tasks on translating between English and various other languages.
+  - [WMT14 website.](http://www.statmt.org/wmt14/)
+- The Pile: An English language corpus of diverse text, sourced from various places on the internet.
+  - [The Pile official website.](https://pile.eleuther.ai/)
+- HumanEval: A dataset of English sentences, annotated with human judgments on a range of linguistic qualities.
+  - [HumanEval: An Evaluation Benchmark for Language Understanding](https://github.com/google-research-datasets/humaneval) by Gabriel Ilharco, Daniel Loureiro, Pedro Rodriguez, and Afonso Mendes.
+- FLORES-101: A dataset of parallel sentences in 101 languages, designed for multilingual machine translation.
+  - [FLORES-101: A Massively Multilingual Parallel Corpus for Language Understanding](https://flores101.opennmt.net/) by Aman Madaan, Shruti Rijhwani, Raghav Gupta, and Mitesh M. Khapra.
+- CrowS-Pairs: A dataset of sentence pairs, designed for evaluating the plausibility of generated text.
+  - [CrowS-Pairs: A Challenge Dataset for Plausible Plausibility Judgments](https://github.com/stanford-cogsci/crows-pairs) by Andrea Madotto, Zhaojiang Lin, Chien-Sheng Wu, Pascale Fung, and Caiming Xiong.
+- WikiLingua: A dataset of parallel sentences in 75 languages, sourced from Wikipedia.
+  - [WikiLingua: A New Benchmark Dataset for Cross-Lingual Wikification](https://arxiv.org/abs/2105.08031) by Jiarui Yao, Yanqiao Zhu, Ruihan Bao, Guosheng Lin, Lidong Bing, and Bei Shi.
+- MTEB: A dataset of English sentences, annotated with their entailment relationships with respect to other sentences.
+  - [Multi-Task Evaluation Benchmark for Natural Language Inference](https://github.com/google-research-datasets/mteb) by Michał Lukasik, Marcin Junczys-Dowmunt, and Houda Bouamor.
+- xP3: A dataset of English sentences, annotated with their paraphrase relationships with respect to other sentences.
+  - [xP3: A Large-Scale Evaluation Benchmark for Paraphrase Identification in Context](https://github.com/nyu-dl/xp3) by Aniket Didolkar, James Mayfield, Markus Saers, and Jason Baldridge.
+- DiaBLa: A dataset of English dialogue, annotated with dialogue acts.
+  - [A Large-Scale Corpus for Conversation Disentanglement](https://github.com/HLTCHKUST/DiaBLA) by Samuel Broscheit, António Branco, and André F. T. Martins.
+- 📚 Dataset Papers with Code
   1. [Universal Dependencies](https://paperswithcode.com/dataset/universal-dependencies)
   2. [WMT 2014](https://paperswithcode.com/dataset/wmt-2014)
   3. [The Pile](https://paperswithcode.com/dataset/the-pile)