--- datasets: - HuggingFaceH4/ultrachat_200k base_model: - meta-llama/Llama-2-70b-chat-hf library_name: transformers --- ## meta-llama/Llama-2-70b-chat-hf - W8A8_FP8 Compression This is a compressed model using [llmcompressor](https://github.com/vllm-project/llm-compressor). ## Compression Configuration - Base Model: meta-llama/Llama-2-70b-chat-hf - Compression Scheme: W8A8_FP8 - Dataset: HuggingFaceH4/ultrachat_200k - Dataset Split: train_sft - Number of Samples: 512 - Preprocessor: chat - Maximum Sequence Length: 4096 ## Sample Output #### Prompt: ``` Could not generate output ``` #### Output: ``` No CUDA GPUs are available ``` ## Evaluation