philschmid
/

t5-11b-sharded

Text2Text Generation

endpoints-template

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

t5-11b-sharded / README.md

philschmid's picture

philschmid HF staff

Update README.md

adb5606 about 2 years ago

|

history blame contribute delete

1.61 kB

	---
	language:
	- en
	- fr
	- ro
	- de
	datasets:
	- c4
	tags:
	- text2text-generation
	- endpoints-template
	license: apache-2.0
	---

	# Fork of [t5-11b](https://huggingface.co/t5-11b)

	> This is fork of [t5-11b](https://huggingface.co/t5-11b) implementing a custom `handler.py` as an example for how to use `t5-11b` with [inference-endpoints](https://hf.co/inference-endpoints) on a single NVIDIA T4.

	---

	# Model Card for T5 11B - fp16

	![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)

	# Use with Inference Endpoints

	Hugging Face Inference endpoints can be used with an HTTP client in any language. We will use Python and the `requests` library to send our requests. (make your you have it installed `pip install requests`)

	![result](inference.png)

	## Send requests with Pyton

	```python
	import json
	import requests as r

	ENDPOINT_URL=""# url of your endpoint
	HF_TOKEN=""

	# payload samples
	regular_payload = { "inputs": "translate English to German: The weather is nice today." }
	parameter_payload = {
	"inputs": "translate English to German: Hello my name is Philipp and I am a Technical Leader at Hugging Face",
	"parameters" : {
	"max_length": 40,
	}
	}

	# HTTP headers for authorization
	headers= {
	"Authorization": f"Bearer {HF_TOKEN}",
	"Content-Type": "application/json"
	}

	# send request
	response = r.post(ENDPOINT_URL, headers=headers, json=paramter_payload)
	generated_text = response.json()

	print(generated_text)

	```