Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -9,4 +9,139 @@ app_file: app.py
|
|
9 |
pinned: false
|
10 |
---
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
pinned: false
|
10 |
---
|
11 |
|
12 |
+
|
13 |
+
# π Mistral RAG Chat - Document Question Answering
|
14 |
+
|
15 |
+
**Chat with your documents using Mistral AI's powerful language models!**
|
16 |
+
|
17 |
+
Upload any text document and ask questions about its content. This app uses Retrieval-Augmented Generation (RAG) to provide accurate, context-aware answers based on your uploaded documents.
|
18 |
+
|
19 |
+
## π Features
|
20 |
+
|
21 |
+
- **π Document Upload**: Support for `.txt` files
|
22 |
+
- **π Smart Retrieval**: Uses FAISS vector search to find relevant content
|
23 |
+
- **π€ Mistral AI**: Powered by Mistral's large language model
|
24 |
+
- **π¬ Chat Interface**: Intuitive conversation-style interaction
|
25 |
+
- **β‘ Fast Processing**: Efficient document chunking and embedding
|
26 |
+
|
27 |
+
## π οΈ How It Works
|
28 |
+
|
29 |
+
1. **Upload** your text document (.txt format)
|
30 |
+
2. **Process** the document (creates searchable embeddings)
|
31 |
+
3. **Ask** questions about the content
|
32 |
+
4. **Get** accurate answers based on the document context
|
33 |
+
|
34 |
+
## π‘ Use Cases
|
35 |
+
|
36 |
+
- **π Research Papers**: Ask questions about academic papers
|
37 |
+
- **π Company Documents**: Query policy manuals, reports, handbooks
|
38 |
+
- **π Educational Content**: Study materials, textbooks, lecture notes
|
39 |
+
- **π° News Articles**: Analyze and understand news content
|
40 |
+
- **π Legal Documents**: Extract key information from contracts, agreements
|
41 |
+
|
42 |
+
## π― Example Queries
|
43 |
+
|
44 |
+
After uploading a document, try asking:
|
45 |
+
- "What is the main topic of this document?"
|
46 |
+
- "Summarize the key points"
|
47 |
+
- "What does the author say about [specific topic]?"
|
48 |
+
- "Are there any statistics or numbers mentioned?"
|
49 |
+
- "What conclusions does the document reach?"
|
50 |
+
|
51 |
+
## π§ Technical Details
|
52 |
+
|
53 |
+
- **Embedding Model**: `mistral-embed` for document vectorization
|
54 |
+
- **LLM**: `mistral-large-latest` for answer generation
|
55 |
+
- **Vector Database**: FAISS for similarity search
|
56 |
+
- **Chunk Size**: 2048 characters for optimal context
|
57 |
+
- **Retrieval**: Top-2 most relevant chunks per query
|
58 |
+
|
59 |
+
## π Supported Formats
|
60 |
+
|
61 |
+
Currently supports:
|
62 |
+
- `.txt` files (UTF-8 encoded)
|
63 |
+
|
64 |
+
*More formats coming soon!*
|
65 |
+
|
66 |
+
## π¦ Getting Started
|
67 |
+
|
68 |
+
1. Click **"Upload Text File"** and select your document
|
69 |
+
2. Click **"Process Document"** and wait for confirmation
|
70 |
+
3. Start asking questions in the chat interface
|
71 |
+
4. Get instant, context-aware answers!
|
72 |
+
|
73 |
+
## β οΈ Important Notes
|
74 |
+
|
75 |
+
- **File Size**: Keep documents under 10MB for best performance
|
76 |
+
- **Language**: Works best with English text
|
77 |
+
- **Context**: The AI only knows what's in your uploaded document
|
78 |
+
- **Privacy**: Documents are processed temporarily and not stored permanently
|
79 |
+
|
80 |
+
## π Privacy & Security
|
81 |
+
|
82 |
+
- Your documents are processed in real-time
|
83 |
+
- No permanent storage of uploaded files
|
84 |
+
- Conversations are not logged or saved
|
85 |
+
- API calls are made securely to Mistral AI
|
86 |
+
|
87 |
+
## π Troubleshooting
|
88 |
+
|
89 |
+
**Document won't process?**
|
90 |
+
- Ensure your file is in `.txt` format
|
91 |
+
- Check that the file contains readable text
|
92 |
+
- Try a smaller file if you're having issues
|
93 |
+
|
94 |
+
**Getting irrelevant answers?**
|
95 |
+
- Make sure your question relates to the document content
|
96 |
+
- Try rephrasing your question more specifically
|
97 |
+
- Check that the document was processed successfully
|
98 |
+
|
99 |
+
**Error messages?**
|
100 |
+
- Refresh the page and try again
|
101 |
+
- Ensure your document is properly formatted
|
102 |
+
- Contact support if issues persist
|
103 |
+
|
104 |
+
## π Built With
|
105 |
+
|
106 |
+
- **[Gradio](https://gradio.app/)**: Web interface framework
|
107 |
+
- **[Mistral AI](https://mistral.ai/)**: Language model and embeddings
|
108 |
+
- **[FAISS](https://faiss.ai/)**: Vector similarity search
|
109 |
+
- **[NumPy](https://numpy.org/)**: Numerical computing
|
110 |
+
- **[Hugging Face Spaces](https://huggingface.co/spaces)**: Hosting platform
|
111 |
+
|
112 |
+
## π Model Information
|
113 |
+
|
114 |
+
- **Base Model**: Mistral Large (latest version)
|
115 |
+
- **Embedding Dimension**: 1024
|
116 |
+
- **Context Window**: 32k tokens
|
117 |
+
- **Languages**: Optimized for English, supports multiple languages
|
118 |
+
|
119 |
+
## π¨ Interface Preview
|
120 |
+
|
121 |
+
```
|
122 |
+
π RAG Chat Interface
|
123 |
+
Upload a text file and chat with its content!
|
124 |
+
|
125 |
+
[Upload Text File] [Process Document]
|
126 |
+
|
127 |
+
Processing Status: Document processed successfully! Split into 15 chunks.
|
128 |
+
|
129 |
+
π¬ Chat:
|
130 |
+
User: What is this document about?
|
131 |
+
Assistant: Based on the uploaded document, this appears to be about...
|
132 |
+
|
133 |
+
[Your Message: Ask questions about the uploaded document...]
|
134 |
+
[Send] [Clear Chat]
|
135 |
+
```
|
136 |
+
|
137 |
+
|
138 |
+
## π License
|
139 |
+
|
140 |
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
141 |
+
|
142 |
+
|
143 |
+
---
|
144 |
+
|
145 |
+
**Made with β€οΈ using Mistral AI and Gradio**
|
146 |
+
|
147 |
+
*Try it now - upload a document and start chatting!*
|