File size: 2,350 Bytes
506bc5c
 
6830eb0
506bc5c
 
6830eb0
506bc5c
6830eb0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
506bc5c
6830eb0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
title: TherapyNote
app_file: app.py
sdk: gradio
sdk_version: 5.9.0
organization: pxpab
---
# Therapy Session Analysis Pipeline

A Python project that downloads YouTube therapy session captions and extracts structured information using LLMs, LangChain, and LangGraph.

## Features

- Downloads captions from YouTube therapy sessions
- Extracts structured information using LLMs and LangChain
- Supports multiple note formats (SOAP, DAP, BIRP, etc.)
- Uses LangGraph for data extraction workflows
- Manages prompts in a dedicated "langhub" directory
- Integrates with LangSmith for conversation and run logging

## Prerequisites

- Python 3.9+
- uv package manager
- OpenAI API key
- LangChain API key (for logging)

## Installation

1. Clone the repository:
```bash
git clone https://github.com/yourusername/therapy-session-analysis.git
cd therapy-session-analysis
```

2. Install dependencies using uv:
```bash
uv pip install -r requirements.txt
```

3. Set up environment variables:
```bash
export OPENAI_API_KEY="your-openai-key"
export LANGCHAIN_API_KEY="your-langchain-key"
export LANGCHAIN_TRACING_V2="true"
```

## Project Structure

```
project/
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── settings.py
β”œβ”€β”€ langhub/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── prompts/
β”‚       β”œβ”€β”€ __init__.py
β”‚       └── therapy_extraction_prompt.yaml
β”œβ”€β”€ forms/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── schemas.py
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ youtube.py
β”‚   └── text_processing.py
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── llm_provider.py
β”œβ”€β”€ main.py
β”œβ”€β”€ requirements.txt
└── README.md
```

## Usage

Run the main script:
```bash
python main.py
```

## Note Formats

The system supports multiple therapy note formats:
- SOAP (Subjective, Objective, Assessment, Plan)
- DAP (Data, Assessment, Plan)
- BIRP (Behavior, Intervention, Response, Plan)
- And more...

## Contributing

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the LICENSE file for details.