--- language: - en license: apache-2.0 tags: - NLP pipeline_tag: summarization --- # Topic Change Point Detection Model ## Model Details - **Model Name:** Falconsai/topic_change_point - **Model Type:** Fine-tuned `google/t5-small` - **Language:** English - **License:** MIT ## Overview The Topic Change Point Detection model is designed to identify topics and track how they change within a block of text. It is based on the google/t5-small model, fine-tuned on a custom dataset that maps texts to their respective topic changes. This model can be used to analyze and categorize texts according to their topics and the transitions between them. ### Model Architecture The base model architecture is T5 (Text-To-Text Transfer Transformer), which treats every NLP problem as a text-to-text problem. The specific version used here is `google/t5-small`, which has been fine-tuned to understand and predict conversation arcs. Fine-Tuning Data The model was fine-tuned on a dataset consisting of texts and their corresponding topic changes. The dataset should be formatted in a specified file with two columns: text and topic_changes. Intended Use The model is intended for identifying topics and detecting changes in topics across a block of text. It can be useful for applications in various fields: Psychology/Psychiatry for session assesment (This initial use case), content analysis, document insights, conversational analysis, and other areas where understanding the flow of topics is important. ## How to Use ### Inference To use this model for inference, you need to load the fine-tuned model and tokenizer. Here is an example of how to do this using the `transformers` library: Running Pipeline ```python # Use a pipeline as a high-level helper from transformers import pipeline text_block = 'Your block of text here.' pipe = pipeline("summarization", model="Falconsai/topic_change_point") res1 = pipe(convo1, max_length=1024, min_length=512, do_sample=False) print(res1) ``` Running on CPU ```python # Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Falconsai/topic_change_point") model = AutoModelForSeq2SeqLM.from_pretrained("Falconsai/topic_change_point") input_text = 'Your block of text here.' input_ids = tokenizer(input_text, return_tensors="pt").input_ids outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ``` Running on GPU ```python # pip install accelerate from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Falconsai/topic_change_point") model = AutoModelForSeq2SeqLM.from_pretrained("Falconsai/topic_change_point", device_map="auto") input_text = 'Your block of text here.' input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ``` ## Training The training process involves the following steps: 1. **Load and Explore Data:** Load the dataset and perform initial exploration to understand the data distribution. 2. **Preprocess Data:** Tokenize the text block and prepare them for the T5 model. 3. **Fine-Tune Model:** Fine-tune the `google/t5-small` model using the preprocessed data. 4. **Evaluate Model:** Evaluate the model's performance on a validation set to ensure it's learning correctly. 5. **Save Model:** Save the fine-tuned model for future use. ## Evaluation The model's performance should be evaluated on a separate validation set to ensure it accurately predicts the conversation arcs. Metrics such as accuracy, precision, recall, and F1 score can be used to assess its performance. ## Limitations - **Data Dependency:** The model's performance is highly dependent on the quality and representativeness of the training data. - **Generalization:** The model may not generalize well to conversation texts that are significantly different from the training data. ## Ethical Considerations When deploying the model, be mindful of the ethical implications, including but not limited to: - **Privacy:** Ensure that text data used for training and inference does not contain sensitive or personally identifiable information. - **Bias:** Be aware of potential biases in the training data that could affect the model's predictions. ## License This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. ## Citation If you use this model in your research, please cite it as follows: ``` @misc{topic_change_point, author = {Michael Stattelman}, title = {Topic Change Point Detection}, year = {2024}, publisher = {Falcons.ai}, } ``` ---