Llama-Express.1

Llama-Express.1 is a 1B model based on Llama 3.2 (1B), fine-tuned on long chain-of-thought datasets. This instruction-tuned, text-only model is optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. It outperforms many of the available open-source and closed chat models.

Use with transformers

Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.

Make sure to update your transformers installation via pip install --upgrade transformers.

import torch
from transformers import pipeline

model_id = "prithivMLmods/Llama-Express.1"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Intended Use

  1. Multilingual Dialogue:

    • Designed for high-quality, multilingual conversations, making it suitable for applications requiring natural, fluid dialogue across languages.
  2. Agentic Retrieval:

    • Optimized for retrieval-based tasks where reasoning and contextual chaining are crucial for extracting and summarizing relevant information.
  3. Summarization Tasks:

    • Effective in generating concise and accurate summaries from complex and lengthy texts, suitable for academic, professional, and casual use cases.
  4. Instruction-Following Applications:

    • Fine-tuned for tasks requiring adherence to user-provided instructions, making it ideal for automation workflows, content creation, and virtual assistant integrations.

Limitations

  1. Monomodal Focus:

    • As a text-only model, it cannot process multimodal inputs like images, audio, or videos, limiting its versatility in multimedia applications.
  2. Context Length Constraints:

    • While optimized for long chain-of-thought reasoning, extreme cases with very large contexts may still lead to degraded performance or truncation issues.
  3. Bias and Ethics:

    • The model might reflect biases present in the training datasets, potentially resulting in outputs that could be culturally insensitive or inappropriate.
  4. Performance in Low-Resource Languages:

    • While multilingual, its effectiveness may vary across languages, with possible performance drops in underrepresented or low-resource languages.
  5. Dependency on Input Quality:

    • The model's output is heavily influenced by the clarity and specificity of the input instructions. Ambiguous or vague prompts may lead to suboptimal results.
  6. Lack of Real-Time Internet Access:

    • Without real-time retrieval capabilities, it cannot provide up-to-date information or verify facts against the latest data.
Downloads last month
61
Safetensors
Model size
1.24B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for prithivMLmods/Llama-Express.1

Finetuned
(259)
this model
Merges
1 model
Quantizations
1 model

Datasets used to train prithivMLmods/Llama-Express.1

Collection including prithivMLmods/Llama-Express.1