Text Generation
Transformers
PyTorch
English
gptj
Inference Endpoints
juewang commited on
Commit
0089ab7
·
1 Parent(s): f277354

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ datasets:
5
+ - natural_instructions
6
+ - the_pile
7
+ - cot
8
+ - Muennighoff/P3
9
+ tags:
10
+ - gpt
11
+ pipeline_tag: text-generation
12
+ inference:
13
+ parameters:
14
+ temperature: 1.0
15
+ top_k: 1
16
+ widget:
17
+ - text: "Label the sentence based on whether it is related to an adverse drug effect (ADE). Details are described below:\nDrugs: Names of drugs and chemicals that include brand names, trivial names, abbreviations and systematic names were annotated. Mentions of drugs or chemicals should strictly be in a therapeutic context. This category does not include the names of metabolites, reaction byproducts, or hospital chemicals (e.g. surgical equipment disinfectants).\nAdverse effect: Mentions of adverse effects include signs, symptoms, diseases, disorders, acquired abnormalities, deficiencies, organ damage or death that strictly occur as a consequence of drug intake.\nPossible labels:\n1. ADE-related\n2. not ADE-related\n\nSentence: A challenge with clozapine was feasible and showed no clinical symptoms of eosinophilia.\nLabel: not ADE-related\n\nSentence: CONCLUSIONS: These results suggest that clozapine may cause TD; however, the prevalence is low and the severity is relatively mild, with no or mild self-reported discomfort.\nLabel: ADE-related\n\nSentence: Best-corrected visual acuity measurements were performed at every visit.\nLabel: not ADE-related\n\nSentence: These cases were considered unusual in light of the short delay of their onset after initiation of immunosuppressive therapy and their fulminant course: 3 of these patients died of PCP occurring during the first month of treatment with prednisone.\nLabel: ADE-related\n\nSentence: The INR should be monitored more frequently when bosentan is initiated, adjusted, or discontinued in patients taking warfarin.\nLabel: not ADE-related\n\nSentence: NEH must be considered in lupus patients receiving cytotoxic agents to avoid inappropriate use of corticosteroids or antibiotics in this self-limited condition.\nLabel:"
18
+ example_title: "ADE Corpus V2"
19
+ - text: "The following is a banking customer service query. Classify the query into one of the 77 categories available.\nPossible labels:\n1. Refund_not_showing_up\n2. activate_my_card\n3. age_limit\n4. apple_pay_or_google_pay\n5. atm_support\n6. automatic_top_up\n7. balance_not_updated_after_bank_transfer\n8. balance_not_updated_after_cheque_or_cash_deposit\n9. beneficiary_not_allowed\n10. cancel_transfer\n11. card_about_to_expire\n12. card_acceptance\n13. card_arrival\n14. card_delivery_estimate\n15. card_linking\n16. card_not_working\n17. card_payment_fee_charged\n18. card_payment_not_recognised\n19. card_payment_wrong_exchange_rate\n20. card_swallowed\n21. cash_withdrawal_charge\n22. cash_withdrawal_not_recognised\n23. change_pin\n24. compromised_card\n25. contactless_not_working\n26. country_support\n27. declined_card_payment\n28. declined_cash_withdrawal\n29. declined_transfer\n30. direct_debit_payment_not_recognised\n31. disposable_card_limits\n32. edit_personal_details\n33. exchange_charge\n34. exchange_rate\n35. exchange_via_app\n36. extra_charge_on_statement\n37. failed_transfer\n38. fiat_currency_support\n39. get_disposable_virtual_card\n40. get_physical_card\n41. getting_spare_card\n42. getting_virtual_card\n43. lost_or_stolen_card\n44. lost_or_stolen_phone\n45. order_physical_card\n46. passcode_forgotten\n47. pending_card_payment\n48. pending_cash_withdrawal\n49. pending_top_up\n50. pending_transfer\n51. pin_blocked\n52. receiving_money\n53. request_refund\n54. reverted_card_payment?\n55. supported_cards_and_currencies\n56. terminate_account\n57. top_up_by_bank_transfer_charge\n58. top_up_by_card_charge\n59. top_up_by_cash_or_cheque\n60. top_up_failed\n61. top_up_limits\n62. top_up_reverted\n63. topping_up_by_card\n64. transaction_charged_twice\n65. transfer_fee_charged\n66. transfer_into_account\n67. transfer_not_received_by_recipient\n68. transfer_timing\n69. unable_to_verify_identity\n70. verify_my_identity\n71. verify_source_of_funds\n72. verify_top_up\n73. virtual_card_not_working\n74. visa_or_mastercard\n75. why_verify_identity\n76. wrong_amount_of_cash_received\n77. wrong_exchange_rate_for_cash_withdrawal\n\nQuery: My card payment was not successful.\nLabel: declined_card_payment\n\nQuery: Is it possible for me to change my PIN number?\nLabel: change_pin\n\nQuery: limits on top ups\nLabel: top_up_limits\n\nQuery: I live in the EU - can I get a card?\nLabel: country_support\n\nQuery: How can I tell the source for my available funds?\nLabel: verify_source_of_funds\n\nQuery: Why am I getting declines when trying to make a purchase online?\nLabel:"
20
+ example_title: "Banking77"
21
+ ---
22
+
23
+ <h1 style="font-size: 42px">TOGETHER RESEARCH<h1/>
24
+
25
+ # Model Summary
26
+ We present GPT-JT, a fork of GPT-6B, trained for 20,000 steps, that outperforms most 100B+ parameter models at classification, and improves most tasks. GPT-JT was trained with a new decentralized algorithm with 1G interconnect.
27
+
28
+ # Quick Start
29
+ ```python
30
+ from transformers import pipeline
31
+ pipe = pipeline(model='togethercomputer/GPT-JT-6B-v1')
32
+ pipe('''Please answer the following question:\n\nQuestion: Where is Zurich?\nAnswer:''')
33
+ ```
34
+ # Training Data
35
+ We fine-tune [GPT-J-6B](https://huggingface.co/EleutherAI/gpt-j-6B) on NI, P3, COT, the pile data.
36
+ - [Natural-Instructions](https://github.com/allenai/natural-instructions)
37
+ - [P3](https://huggingface.co/datasets/Muennighoff/P3)
38
+ - [MMLU-COT](https://github.com/jasonwei20/flan-2/blob/main/mmlu-cot.json)
39
+ - [the pile](https://huggingface.co/datasets/the_pile)
40
+
41
+ # Hyperparameters
42
+ We used AdamW with a learning rate of 1e-5 and global batch size of 64, and train for 20k steps.
43
+ We used mix-precision training where the activation is in FP16 while the optimizer states are kept in FP32.
44
+ We use both data parallelism and pipeline parallelism to conduct training.
45
+ During training, we truncate the input sequence to 2048 tokens, and for input sequence that contains less than 2048 tokens, we concatenate multiple sequences into one long sequence to improve the data efficiency.
46
+
47
+ # Infrastructure
48
+ We used [the Together Research Computer](https://together.xyz/) to conduct training.