redactable-llm
/

redactable-dolphin-mixtral

Text Generation

Inference Endpoints

Model card Files Files and versions Community

redactable-dolphin-mixtral / README.md

d-delaurier's picture

Update README.md

3031a79 12 months ago

|

history blame contribute delete

3.24 kB

	---
	datasets:
	- cognitivecomputations/dolphin
	- cognitivecomputations/dolphin-coder
	- Open-Orca/OpenOrca
	language:
	- en
	library_name: transformers
	tags:
	- legal
	---
	# Redactable-LLM
	The high-level overview for integrating multiple Open Source Large Language Models within the AutoGen Framework is as follows:

	### Development of Custom Agents
	- Agent Design: Tasks include NLP/NER/PII identification, interpreting natural language commands, executing document redaction, and final verification.
	- Customization: Custom agents trained on specific tasks related to each aspect of the redaction process.
	- Human Interaction: Implement features to facilitate seamless human-agent interaction, allowing users to input commands and queries naturally (Optional)

	### LLM & VLLM AutoGen Integration

	- Model Selection: Automatic, task-dependent agent selection.
	- Enhanced Inference: Enhanced LLM inference features for optimal performance, including tuning, caching, error handling, and templating.
	- Quality Control: Vision agents analyze redacted documents using Set-of-Mark (SoM) prompting. Rejected documents are reprocessed and reviewed.
	-
	![AutoGen Agents](https://i.imgur.com/aFgV7yd.png)

	### System Optimization
	- Workflow Automation: Automate the redaction workflow using a blend of LLMs, custom agents, and human inputs for efficient detection and redaction of sensitive information.
	- Performance Maximization: Optimize the system for both efficiency and accuracy, utilizing AutoGen's complex workflow management features.

	### User Interface Development
	- Interface Design: Develop a user-friendly interface that enables non-technical users to interact with the system via natural language prompts.
	- Feedback Integration: Implement a feedback loop to continuously refine the system's accuracy and user-friendliness based on user inputs.
	- User Knowledgebase: (Optional) User account, profile, and domain knowledge will be accessible by the `Research` agent, for personalized interaction and results.

	### Training, Testing and Validation
	- Model Training: Develop new datasets, focused on document understanding related to redaction.
	- Unit Testing: Conduct extensive unit tests to ensure individual system components function correctly.
	- System Testing: Perform comprehensive end-to-end testing to validate the entire redaction process, from user input to output.
	- User Trials: Facilitate user trials to gather feedback and make necessary system adjustments.
	---

	- #### Mistral AI (LLM)
	[Paper](https://mistral.ai/news/mixtral-of-experts/) \| [Model](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)

	- #### QwenLM (VLLM)
	[Paper](https://arxiv.org/abs/2308.12966) \| [Code](https://github.com/QwenLM/Qwen-VL?tab=readme-ov-file) \| [Paper: Set-of-Mark Prompting](https://arxiv.org/abs/2310.11441)

	- #### AutoGen
	[Paper](https://arxiv.org/abs/2308.08155) \| [Code](https://github.com/microsoft/autogen/tree/main)

	- #### Gretel AI (Synthetic Dataset Generation)
	[Model Page](https://gretel.ai/solutions/public-sector) \| [Code](https://github.com/gretelai) \| [Paper: Textbooks Are All You Need II](https://arxiv.org/abs/2309.05463)