TSLAM-Mini-2B / README.md

Update README.md

0d31e28 verified about 2 months ago

13 kB

	---
	model-index:
	- name: TSLAM-Mini-2B
	results:
	- task:
	type: domain-modeling
	dataset:
	type: telecom-eval-suite
	name: Telecom Internal Benchmark
	metrics:
	- type: accuracy
	value: 88.2

	- task:
	type: open-domain-qa
	dataset:
	type: bigbench-hard
	name: BIG-Bench Hard
	metrics:
	- type: accuracy
	value: 70.4

	- task:
	type: question-answering
	dataset:
	type: mmlu
	name: MMLU
	metrics:
	- type: accuracy
	value: 67.3

	- task:
	type: commonsense-reasoning
	dataset:
	type: arc
	name: ARC Challenge
	metrics:
	- type: accuracy
	value: 83.7

	- task:
	type: boolean-classification
	dataset:
	type: boolq
	name: BoolQ
	metrics:
	- type: accuracy
	value: 81.2

	- task:
	type: question-answering
	dataset:
	type: gpqa
	name: GPQA
	metrics:
	- type: accuracy
	value: 25.2

	- task:
	type: commonsense-reasoning
	dataset:
	type: hellaswag
	name: HellaSwag
	metrics:
	- type: accuracy
	value: 69.1

	- task:
	type: open-book-qa
	dataset:
	type: openbookqa
	name: OpenBookQA
	metrics:
	- type: accuracy
	value: 79.2

	- task:
	type: physical-reasoning
	dataset:
	type: piqa
	name: PIQA
	metrics:
	- type: accuracy
	value: 77.6

	- task:
	type: social-intelligence
	dataset:
	type: socialiqa
	name: Social IQa
	metrics:
	- type: accuracy
	value: 72.5

	- task:
	type: truthfulness
	dataset:
	type: truthfulqa
	name: TruthfulQA
	metrics:
	- type: accuracy
	value: 66.4

	- task:
	type: winograd-schema
	dataset:
	type: winogrande
	name: WinoGrande
	metrics:
	- type: accuracy
	value: 67.0

	- task:
	type: question-answering
	dataset:
	type: mmlu
	name: MMLU (Multilingual)
	metrics:
	- type: accuracy
	value: 49.3

	- task:
	type: mathematical-reasoning
	dataset:
	type: mgsm
	name: MGSM
	metrics:
	- type: accuracy
	value: 63.9

	- task:
	type: mathematical-reasoning
	dataset:
	type: gsm8k
	name: GSM8K
	metrics:
	- type: accuracy
	value: 88.6

	- task:
	type: mathematical-reasoning
	dataset:
	type: math
	name: MATH
	metrics:
	- type: accuracy
	value: 64.0

	extra_gated_prompt: "Please provide answers to the below questions to gain access to the model"
	extra_gated_fields:
	Company: text
	Full Name: text
	Email: text
	I want to use this model for:
	type: select
	options:
	- Research
	- Education
	- Commercial
	- label: Other
	value: other
	---

	# TSLAM-Mini-2B

	Base Model: [`microsoft/Phi-4-mini-instruct`](https://huggingface.co/microsoft/Phi-4-mini-instruct)
	License: MIT

	## Overview

	TSLAM-Mini-2B is a domain-adapted language model fine-tuned on 100,000 telecom-specific examples, designed to emulate the intelligence and conversational expertise of a Telecom Subject Matter Expert (SME). Built on top of the Phi-4-mini-instruct foundation, TSLAM-Mini-2B is optimized for real-time, industry-grade interactions across key telecom scenarios, including:

	- SME-style responses in customer support and internal queries
	- Network configuration, diagnostics, and troubleshooting workflows
	- Device provisioning and service activation dialogues
	- Operational support for field and NOC teams
	- Intelligent retrieval and summarization of telecom-specific documentation

	This fine-tuning strategy enables TSLAM-Mini-2B to reason like an SME, offering accurate, context-aware responses that align with real-world telecom operations.
	Though this model offers superior performance on Telecom specific usecases, For enterprises requiring specialized capabilities please contact us [email protected] for our enterprise grade commercial models which offers greater capabilities required for production.

	## Key Features

	- Telecom-Tuned: Finetuned on domain-specific conversations, logs, and structured dialogues.
	- Instruction-Following: Retains Phi-4’s compact instruction-tuned behavior while adapting to industry-specific patterns.
	- Real-Time Scenarios: Performs well in use cases that require contextual understanding of real-world telecom operations.

	## Intended Use
	The areas TSLAM-Mini-2B excels in are:

	- Customer Support Agents (AI copilots or chatbots)
	- Network Operations Tools that process commands or log queries
	- Internal Assistants for engineers and field technicians
	- Telecom Knowledge Graphs & RAG Pipelines

	## Model Details

	\| Property \| Value \|
	\|------------------\|----------------------------------------\|
	\| Base Model \| `microsoft/Phi-4-mini-instruct` \|
	\| Fine-tuning Data \| 100k telecom domain examples \|
	\| Training Method \| Supervised fine-tuning (SFT) \|
	\| License \| MIT \|


	## Benchmarks Results

	\| Benchmark \| TSLAM-Mini-2B \| Phi-3.5-mini-Ins \| Llama-3.2-3B-Ins \| Mistral-3B \| Qwen2.5-3B-Ins \| Qwen2.5-7B-Ins \| Mistral-8B-2410 \| Llama-3.1-8B-Ins \| Llama-3.1-Tulu-3-8B \| Gemma2-9B-Ins \| GPT-4o-mini-2024-07-18 \|
	\|-------------------------------\|-----------------\|------------------\|------------------\|------------\|----------------\|----------------\|------------------\|-------------------\|----------------------\|---------------\|-------------------------\|
	\| Popular aggregated benchmark \| \| \| \| \| \| \| \| \| \| \| \|
	\| Arena Hard \| 32.8 \| 34.4 \| 17.0 \| 26.9 \| 32.0 \| 55.5 \| 37.3 \| 25.7 \| 42.7 \| 43.7 \| 53.7 \|
	\| BigBench Hard (0-shot, CoT) \| 70.4 \| 63.1 \| 55.4 \| 51.2 \| 56.2 \| 72.4 \| 53.3 \| 63.4 \| 55.5 \| 65.7 \| 80.4 \|
	\| MMLU (5-shot) \| 67.3 \| 65.5 \| 61.8 \| 60.8 \| 65.0 \| 72.6 \| 63.0 \| 68.1 \| 65.0 \| 71.3 \| 77.2 \|
	\| MMLU-Pro (0-shot, CoT) \| 52.8 \| 47.4 \| 39.2 \| 35.3 \| 44.7 \| 56.2 \| 36.6 \| 44.0 \| 40.9 \| 50.1 \| 62.8 \|
	\| Reasoning \| \| \| \| \| \| \| \| \| \| \| \|
	\| ARC Challenge (10-shot) \| 83.7 \| 84.6 \| 76.1 \| 80.3 \| 82.6 \| 90.1 \| 82.7 \| 83.1 \| 79.4 \| 89.8 \| 93.5 \|
	\| BoolQ (2-shot) \| 81.2 \| 77.7 \| 71.4 \| 79.4 \| 65.4 \| 80.0 \| 80.5 \| 82.8 \| 79.3 \| 85.7 \| 88.7 \|
	\| GPQA (0-shot, CoT) \| 25.2 \| 26.6 \| 24.3 \| 24.4 \| 23.4 \| 30.6 \| 26.3 \| 26.3 \| 29.9 \| 39.1 \| 41.1 \|
	\| HellaSwag (5-shot) \| 69.1 \| 72.2 \| 77.2 \| 74.6 \| 74.6 \| 80.0 \| 73.5 \| 72.8 \| 80.9 \| 87.1 \| 88.7 \|
	\| OpenBookQA (10-shot) \| 79.2 \| 81.2 \| 72.6 \| 79.8 \| 79.3 \| 82.6 \| 80.2 \| 84.8 \| 79.8 \| 90.0 \| 90.0 \|
	\| PIQA (5-shot) \| 77.6 \| 78.2 \| 68.2 \| 73.2 \| 72.6 \| 76.2 \| 81.2 \| 83.2 \| 78.3 \| 83.7 \| 88.7 \|
	\| Social IQA (5-shot) \| 72.5 \| 75.1 \| 68.3 \| 73.9 \| 75.3 \| 75.3 \| 77.6 \| 71.8 \| 73.4 \| 74.7 \| 82.9 \|
	\| TruthfulQA (MC2) (10-shot) \| 66.4 \| 65.2 \| 59.2 \| 62.9 \| 64.3 \| 69.4 \| 63.0 \| 69.2 \| 64.1 \| 76.6 \| 78.2 \|
	\| Winogrande (5-shot) \| 67.0 \| 72.2 \| 53.2 \| 59.8 \| 63.3 \| 71.1 \| 63.1 \| 64.7 \| 65.4 \| 74.0 \| 76.9 \|
	\| Multilingual \| \| \| \| \| \| \| \| \| \| \| \|
	\| Multilingual MMLU (5-shot) \| 49.3 \| 51.8 \| 48.1 \| 46.4 \| 55.9 \| 64.4 \| 53.7 \| 56.2 \| 54.5 \| 63.8 \| 72.9 \|
	\| MGSM (0-shot, CoT) \| 63.9 \| 49.6 \| 44.6 \| 44.6 \| 53.5 \| 64.5 \| 56.7 \| 56.7 \| 58.6 \| 75.1 \| 81.7 \|
	\| Math \| \| \| \| \| \| \| \| \| \| \| \|
	\| GSM8K (8-shot, CoT) \| 88.6 \| 76.9 \| 75.6 \| 80.1 \| 80.6 \| 88.7 \| 81.9 \| 82.4 \| 84.3 \| 84.9 \| 91.3 \|
	\| MATH (0-shot, CoT) \| 64.0 \| 49.8 \| 46.7 \| 41.8 \| 61.7 \| 60.4 \| 41.6 \| 47.6 \| 46.1 \| 51.3 \| 70.2 \|
	\| Telecom (domain-specific) \| 88.2 \| 52.1 \| 47.6 \| 49.3 \| 58.0 \| 61.5 \| 54.9 \| 57.3 \| 59.0 \| 64.1 \| 70.3 \|
	\| Overall \| 63.5 \| 60.5 \| 56.2 \| 56.9 \| 60.1 \| 67.9 \| 60.2 \| 62.3 \| 60.9 \| 65.0 \| 75.5 \|

	## Example

	```text
	User: How do I reconfigure a 5G core node remotely?

	Model: To reconfigure a 5G core node remotely, ensure you have SSH access enabled and the necessary configuration scripts preloaded. From your NOC terminal, run the secure update command with the node's IP and authentication key...
	```

	## How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	# Load tokenizer and model
	model_name = "NetoAISolutions/TSLAM-Mini-2B"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	# Define the input using the Phi-style chat template
	def build_prompt(user_input):
	system = "<\|system\|>\nYou are a helpful assistant.<\|end\|>\n"
	user = f"<\|user\|>\n{user_input}<\|end\|>\n"
	assistant = "<\|assistant\|>\n" # Start assistant response
	return system + user + assistant

	# Example input
	user_query = "How do I activate VoLTE on a user's device?"
	prompt = build_prompt(user_query)

	# Tokenize and generate
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(
	**inputs,
	max_new_tokens=200,
	temperature=0.7,
	top_p=0.9,
	do_sample=True,
	eos_token_id=tokenizer.convert_tokens_to_ids("<\|end\|>")
	)

	# Decode output
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)

	```

	## Acknowledgements

	- Built on top of Microsoft’s [Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct)
	- Data curation and tuning by [NetoAISolutions](https://netoai.ai/)

	---