|
--- |
|
datasets: |
|
- natural_instructions |
|
- the_pile |
|
- cot |
|
- Muennighoff/P3 |
|
inference: |
|
parameters: |
|
max_new_tokens: 5 |
|
temperature: 1.0 |
|
top_k: 1 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
widget: |
|
- |
|
example_title: "ADE Corpus V2" |
|
text: |- |
|
Label the sentence based on whether it is related to an adverse drug effect (ADE). Details are described below: |
|
Drugs: Names of drugs and chemicals that include brand names, trivial names, abbreviations and systematic names were annotated. Mentions of drugs or chemicals should strictly be in a therapeutic context. This category does not include the names of metabolites, reaction byproducts, or hospital chemicals (e.g. surgical equipment disinfectants). |
|
Adverse effect: Mentions of adverse effects include signs, symptoms, diseases, disorders, acquired abnormalities, deficiencies, organ damage or death that strictly occur as a consequence of drug intake. |
|
Possible labels: |
|
1. ADE-related |
|
2. not ADE-related |
|
|
|
Sentence: A challenge with clozapine was feasible and showed no clinical symptoms of eosinophilia. |
|
Label: not ADE-related |
|
|
|
Sentence: CONCLUSIONS: These results suggest that clozapine may cause TD; however, the prevalence is low and the severity is relatively mild, with no or mild self-reported discomfort. |
|
Label: ADE-related |
|
|
|
Sentence: Best-corrected visual acuity measurements were performed at every visit. |
|
Label: not ADE-related |
|
|
|
Sentence: These cases were considered unusual in light of the short delay of their onset after initiation of immunosuppressive therapy and their fulminant course: 3 of these patients died of PCP occurring during the first month of treatment with prednisone. |
|
Label: ADE-related |
|
|
|
Sentence: The INR should be monitored more frequently when bosentan is initiated, adjusted, or discontinued in patients taking warfarin. |
|
Label: not ADE-related |
|
|
|
Sentence: NEH must be considered in lupus patients receiving cytotoxic agents to avoid inappropriate use of corticosteroids or antibiotics in this self-limited condition. |
|
Label: |
|
- |
|
example_title: Banking77 |
|
text: |- |
|
The following is a banking customer service query. Classify the query into one of the 77 categories available. |
|
Possible labels: |
|
1. Refund_not_showing_up |
|
2. activate_my_card |
|
3. age_limit |
|
4. apple_pay_or_google_pay |
|
5. atm_support |
|
6. automatic_top_up |
|
7. balance_not_updated_after_bank_transfer |
|
8. balance_not_updated_after_cheque_or_cash_deposit |
|
9. beneficiary_not_allowed |
|
10. cancel_transfer |
|
11. card_about_to_expire |
|
12. card_acceptance |
|
13. card_arrival |
|
14. card_delivery_estimate |
|
15. card_linking |
|
16. card_not_working |
|
17. card_payment_fee_charged |
|
18. card_payment_not_recognised |
|
19. card_payment_wrong_exchange_rate |
|
20. card_swallowed |
|
21. cash_withdrawal_charge |
|
22. cash_withdrawal_not_recognised |
|
23. change_pin |
|
24. compromised_card |
|
25. contactless_not_working |
|
26. country_support |
|
27. declined_card_payment |
|
28. declined_cash_withdrawal |
|
29. declined_transfer |
|
30. direct_debit_payment_not_recognised |
|
31. disposable_card_limits |
|
32. edit_personal_details |
|
33. exchange_charge |
|
34. exchange_rate |
|
35. exchange_via_app |
|
36. extra_charge_on_statement |
|
37. failed_transfer |
|
38. fiat_currency_support |
|
39. get_disposable_virtual_card |
|
40. get_physical_card |
|
41. getting_spare_card |
|
42. getting_virtual_card |
|
43. lost_or_stolen_card |
|
44. lost_or_stolen_phone |
|
45. order_physical_card |
|
46. passcode_forgotten |
|
47. pending_card_payment |
|
48. pending_cash_withdrawal |
|
49. pending_top_up |
|
50. pending_transfer |
|
51. pin_blocked |
|
52. receiving_money |
|
53. request_refund |
|
54. reverted_card_payment? |
|
55. supported_cards_and_currencies |
|
56. terminate_account |
|
57. top_up_by_bank_transfer_charge |
|
58. top_up_by_card_charge |
|
59. top_up_by_cash_or_cheque |
|
60. top_up_failed |
|
61. top_up_limits |
|
62. top_up_reverted |
|
63. topping_up_by_card |
|
64. transaction_charged_twice |
|
65. transfer_fee_charged |
|
66. transfer_into_account |
|
67. transfer_not_received_by_recipient |
|
68. transfer_timing |
|
69. unable_to_verify_identity |
|
70. verify_my_identity |
|
71. verify_source_of_funds |
|
72. verify_top_up |
|
73. virtual_card_not_working |
|
74. visa_or_mastercard |
|
75. why_verify_identity |
|
76. wrong_amount_of_cash_received |
|
77. wrong_exchange_rate_for_cash_withdrawal |
|
|
|
Query: My card payment was not successful. |
|
Label: declined_card_payment |
|
|
|
Query: Is it possible for me to change my PIN number? |
|
Label: change_pin |
|
|
|
Query: limits on top ups |
|
Label: top_up_limits |
|
|
|
Query: I live in the EU - can I get a card? |
|
Label: country_support |
|
|
|
Query: How can I tell the source for my available funds? |
|
Label: verify_source_of_funds |
|
|
|
Query: Why am I getting declines when trying to make a purchase online? |
|
Label: |
|
- |
|
example_title: Overruling |
|
text: |- |
|
In law, an overruling sentence is a statement that nullifies a previous case decision as a precedent, by a constitutionally valid statute or a decision by the same or higher ranking court which establishes a different rule on the point of law involved. Label the sentence based on whether it is overruling or not. |
|
Possible labels: |
|
1. not overruling |
|
2. overruling |
|
|
|
Sentence: see mciver, 134 n.c.app. at 588, 518 s.e.2d at 526. |
|
Label: not overruling |
|
|
|
Sentence: to the extent that paprskar v. state, supra, applied the general test of waiver of constitutional rights set forth in johnson v. zerbst, supra, it is no longer viable. |
|
Label: overruling |
|
|
|
Sentence: narrowstep, 2010 wl 5422405, at *12. |
|
Label: not overruling |
|
|
|
Sentence: accordingly, to the extent of any conflict nemecek v. state, 621 s.w.2d 404 (tex.cr.app. 1980) is overruled. |
|
Label: overruling |
|
|
|
Sentence: the following facts are taken from the administrative record. |
|
Label: not overruling |
|
|
|
Sentence: see scott, supra at 352; commonwealth v. ruffin, 475 mass. 1003, 1004 (2016). |
|
Label: |
|
- |
|
example_title: "Tweet Eval Hate" |
|
text: |- |
|
Label whether the following tweet contains hate speech against either immigrants or women. Hate Speech (HS) is commonly defined as any communication that disparages a person or a group on the basis of some characteristic such as race, color, ethnicity, gender, sexual orientation, nationality, religion, or other characteristics. |
|
Possible labels: |
|
1. hate speech |
|
2. not hate speech |
|
|
|
Tweet: |
|
Label: not hate speech |
|
|
|
Tweet: HOW REFRESHING! In South Korea, there is no such thing as 'political correctness" when it comes to dealing with Muslim refugee wannabes via @user |
|
Label: hate speech |
|
|
|
Tweet: New to Twitter-- any men on here know what the process is to get #verified? |
|
Label: not hate speech |
|
|
|
Tweet: UK Pensioner Faces 350 Lashes In Saudi Arabia why does this country exist it does nothing for migrants picks on old men no help from anyone |
|
Label: not hate speech |
|
|
|
Tweet: RT @user Her:I don't get what u want outta this relationship Him:Well, I was only looking for a bj but u kept coming back |
|
Label: not hate speech |
|
|
|
Tweet: Dont worry @user you are and will always be the most hysterical woman. |
|
Label: |
|
--- |
|
|
|
<h1 style="font-size: 42px">GPT-JT<h1/> |
|
|
|
# Model Summary |
|
We present GPT-JT, a fork of GPT-6B, trained for 20,000 steps, that outperforms most 100B+ parameter models at classification, and improves most tasks relative to GPT-J-6B. GPT-JT was trained with a new decentralized algorithm on computers networked on slow 1Gbps links. |
|
GPT-JT is a bidirectional dense model, trained through UL2 objective with NI, P3, COT, the pile data. |
|
|
|
**Please check out our demo: [TOMA-app](https://huggingface.co/spaces/togethercomputer/TOMA-app).** |
|
|
|
# Quick Start |
|
```python |
|
from transformers import pipeline |
|
pipe = pipeline(model='togethercomputer/GPT-JT-6B-v1') |
|
pipe('''I like this! <-- Is it positive or negative?\nA:''') |
|
``` |
|
|
|
or |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-JT-6B-v1") |
|
model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-JT-6B-v1") |
|
``` |
|
|
|
# Training Data |
|
We fine-tune [GPT-J-6B](https://huggingface.co/EleutherAI/gpt-j-6B) on NI, P3, COT, the pile data. |
|
- [Natural-Instructions](https://github.com/allenai/natural-instructions) |
|
- [P3](https://huggingface.co/datasets/Muennighoff/P3) |
|
- [MMLU-COT](https://github.com/jasonwei20/flan-2/blob/main/mmlu-cot.json) |
|
- [the pile](https://huggingface.co/datasets/the_pile) |
|
|
|
# Hyperparameters |
|
We used AdamW with a learning rate of 1e-5 and global batch size of 64, and train for 20k steps. |
|
We used mix-precision training where the activation is in FP16 while the optimizer states are kept in FP32. |
|
We use both data parallelism and pipeline parallelism to conduct training. |
|
During training, we truncate the input sequence to 2048 tokens, and for input sequence that contains less than 2048 tokens, we concatenate multiple sequences into one long sequence to improve the data efficiency. |
|
|
|
# Infrastructure |
|
We used [the Together Research Computer](https://together.xyz/) to conduct training. |