---
language:
- en
license: apache-2.0
tags:
- NLP
pipeline_tag: summarization

---
# Topic Change Point Detection Model
## Model Details

- **Model Name:** Falconsai/topic_change_point
- **Model Type:** Fine-tuned `google/t5-small`
- **Language:** English
- **License:** MIT

## Overview
The Topic Change Point Detection model is designed to identify topics and track how they change within a block of text. It is based on the google/t5-small model, fine-tuned on a custom dataset that maps texts to their respective topic changes. This model can be used to analyze and categorize texts according to their topics and the transitions between them.


### Model Architecture

The base model architecture is T5 (Text-To-Text Transfer Transformer), which treats every NLP problem as a text-to-text problem. The specific version used here is `google/t5-small`, which has been fine-tuned to understand and predict conversation arcs.

Fine-Tuning Data
The model was fine-tuned on a dataset consisting of texts and their corresponding topic changes. The dataset should be formatted in a specified file with two columns: text and topic_changes.

Intended Use
The model is intended for identifying topics and detecting changes in topics across a block of text. It can be useful for applications in various fields: Psychology/Psychiatry for session assesment (This initial use case), content analysis, document insights, conversational analysis, and other areas where understanding the flow of topics is important.

## How to Use

### Inference

To use this model for inference, you need to load the fine-tuned model and tokenizer. Here is an example of how to do this using the `transformers` library:


Running Pipeline
```python
# Use a pipeline as a high-level helper
from transformers import pipeline

text_block = 'Your block of text here.'
pipe = pipeline("summarization", model="Falconsai/topic_change_point")
res1 = pipe(convo1, max_length=1024, min_length=512, do_sample=False)
print(res1)

```


Running on CPU
```python
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Falconsai/topic_change_point")
model = AutoModelForSeq2SeqLM.from_pretrained("Falconsai/topic_change_point")

input_text = 'Your block of text here.'
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

Running on GPU
```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Falconsai/topic_change_point")
model = AutoModelForSeq2SeqLM.from_pretrained("Falconsai/topic_change_point", device_map="auto")

input_text = 'Your block of text here.'
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

```


## Training

The training process involves the following steps:

1. **Load and Explore Data:** Load the dataset and perform initial exploration to understand the data distribution.
2. **Preprocess Data:** Tokenize the text block and prepare them for the T5 model.
3. **Fine-Tune Model:** Fine-tune the `google/t5-small` model using the preprocessed data.
4. **Evaluate Model:** Evaluate the model's performance on a validation set to ensure it's learning correctly.
5. **Save Model:** Save the fine-tuned model for future use.

## Evaluation

The model's performance should be evaluated on a separate validation set to ensure it accurately predicts the conversation arcs. Metrics such as accuracy, precision, recall, and F1 score can be used to assess its performance.

## Limitations

- **Data Dependency:** The model's performance is highly dependent on the quality and representativeness of the training data.
- **Generalization:** The model may not generalize well to conversation texts that are significantly different from the training data.

## Ethical Considerations

When deploying the model, be mindful of the ethical implications, including but not limited to:

- **Privacy:** Ensure that text data used for training and inference does not contain sensitive or personally identifiable information.
- **Bias:** Be aware of potential biases in the training data that could affect the model's predictions.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Citation

If you use this model in your research, please cite it as follows:

```
@misc{topic_change_point,
  author = {Michael Stattelman},
  title = {Topic Change Point Detection},
  year = {2024},
  publisher = {Falcons.ai},
}
```

---