Update README.md
Browse files
README.md
CHANGED
@@ -74,6 +74,25 @@ Stride: 255 tokens (50% overlap)
|
|
74 |
- **Fallback mechanisms**: Intelligent splitting when no semantic boundaries found
|
75 |
- **Combined limits**: Supports both token AND character limits simultaneously
|
76 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
## Quick Start
|
78 |
|
79 |
### Installation
|
@@ -963,25 +982,6 @@ Average tokens per chunk: 236.9
|
|
963 |
- Semantic boundaries preserved
|
964 |
- No text loss or duplication
|
965 |
|
966 |
-
|
967 |
-
|
968 |
-
# Use Cases
|
969 |
-
|
970 |
-
## Perfect for RAG Systems
|
971 |
-
- **Vector Databases**: Ensure chunks fit embedding model limits
|
972 |
-
- **Search Applications**: Optimal chunk sizes for retrieval
|
973 |
-
- **Question Answering**: Maintain semantic coherence
|
974 |
-
|
975 |
-
## Document Processing
|
976 |
-
- **Academic Papers**: Respect section and paragraph boundaries
|
977 |
-
- **Legal Documents**: Maintain clause integrity
|
978 |
-
- **News Articles**: Preserve story flow and context
|
979 |
-
|
980 |
-
## Content Management
|
981 |
-
- **CMS Integration**: Automatic content segmentation
|
982 |
-
- **API Limits**: Respect external service constraints
|
983 |
-
- **Storage Optimization**: Consistent chunk sizes for databases
|
984 |
-
|
985 |
---
|
986 |
|
987 |
# Chunking Strategies
|
|
|
74 |
- **Fallback mechanisms**: Intelligent splitting when no semantic boundaries found
|
75 |
- **Combined limits**: Supports both token AND character limits simultaneously
|
76 |
|
77 |
+
|
78 |
+
# Use Cases
|
79 |
+
|
80 |
+
## Perfect for RAG Systems
|
81 |
+
- **Vector Databases**: Ensure chunks fit embedding model limits
|
82 |
+
- **Search Applications**: Optimal chunk sizes for retrieval
|
83 |
+
- **Question Answering**: Maintain semantic coherence
|
84 |
+
|
85 |
+
## Document Processing
|
86 |
+
- **Academic Papers**: Respect section and paragraph boundaries
|
87 |
+
- **Legal Documents**: Maintain clause integrity
|
88 |
+
- **News Articles**: Preserve story flow and context
|
89 |
+
|
90 |
+
## Content Management
|
91 |
+
- **CMS Integration**: Automatic content segmentation
|
92 |
+
- **API Limits**: Respect external service constraints
|
93 |
+
- **Storage Optimization**: Consistent chunk sizes for databases
|
94 |
+
|
95 |
+
|
96 |
## Quick Start
|
97 |
|
98 |
### Installation
|
|
|
982 |
- Semantic boundaries preserved
|
983 |
- No text loss or duplication
|
984 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
985 |
---
|
986 |
|
987 |
# Chunking Strategies
|