minpeter
/

HyperCLOVAX-SEED-Text-Instruct-3B-hf

Model card Files Files and versions

HyperCLOVAX-SEED-Text-Instruct-3B-hf / README.md

minpeter's picture

Update README.md

5f0aeb7 verified 2 months ago

|

history blame contribute delete

2.68 kB

	---
	license: other
	license_name: hyperclovax-seed
	license_link: LICENSE
	base_model:
	- exp-models/HyperCLOVA-X-SEED-Vision-Instruct-3B-Llamafied
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65265ab8f8db96cffcb969dc/RD1HOJJnDQbz6IvNngiIV.png)

	## Overview

	HyperCLOVA-X-SEED-Vision-Instruct-3B-Llamafied is based on a model developed by NAVER that can understand and generate text.
	It demonstrates competitive performance on major benchmarks related to Korean language and culture. In addition, it supports a context length of up to 16k tokens, enabling it to handle a wide range of tasks.

	## Basic Information

	- Model Architecture: Transformer-based architecture (Dense Model)
	- Number of Parameters: 3.26B
	- Input/Output Format: Text / Text (both input and output are in text format)
	- Context Length: 16k
	- Knowledge Cutoff Date: The model was trained on data prior to August 2024.


	## Training and Data

	The training data for HyperCLOVA-X-SEED-Vision-Instruct-3B-Llamafied consists of diverse sources, including high-quality datasets. The training process was carried out in four main stages: Pretraining Stage 1, where the model learns from a large volume of documents; Pretraining Stage 2, which focuses on additional training with high-quality data; Rejection sampling Fine-Tuning (RFT), aimed at enhancing the model’s knowledge across various domains and its complex reasoning abilities; and Supervised Fine-Tuning (SFT), which improves the model’s instruction-following capabilities. Furthermore, due to the characteristics of smaller models, vulnerability to long-context handling was observed. To address this, reinforcement for long-context understanding was incorporated from the pretraining stages through to the SFT stage, enabling the model to stably support context lengths of up to 16k tokens.

	## Huggingface Usage Example

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	model = AutoModelForCausalLM.from_pretrained("/path/to/ckpt")
	tokenizer = AutoTokenizer.from_pretrained("/path/to/ckpt")

	chat = [
	{"role": "tool_list", "content": ""},
	{"role": "system", "content": "- AI 언어모델의 이름은 \"CLOVA X\" 이며 네이버에서 만들었다.\n- 오늘은 2025년 04월 24일(목)이다."},
	{"role": "user", "content": "슈뢰딩거 방정식과 양자역학의 관계를 최대한 자세히 알려줘."},
	]

	inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_dict=True, return_tensors="pt")
	output_ids = model.generate(**inputs, max_length=1024, stop_strings=["<\|endofturn\|>", "<\|stop\|>"], tokenizer=tokenizer)
	print(tokenizer.batch_decode(output_ids))
	```