Spaces:

KalbeDigitalLab
/

nutrigenme-paper-extractor

Running

App Files Files Community

nutrigenme-paper-extractor / README.md

fadliaulawi

Update README

28b6169 5 months ago

preview code

raw

history blame

2.45 kB

	---
	title: NutriGenMe PaperExtractor
	emoji: 📄
	colorFrom: green
	colorTo: blue
	sdk: docker
	pinned: false
	license: apache-2.0
	app_port: 8501
	---

	# NutriGenMe Paper Extractor

	## Overview
	The NutriGenMe Paper Extractor is a tool designed to extract relevant information from genomic papers related to the NutriGenMe project. It utilizes natural language processing techniques to parse through documents and extract key data points, enabling researchers and practitioners to efficiently gather insights from a large corpus of literature.

	## Features
	- Automated Extraction: Extracts various entities, such as title, authors, and conclusion of the study, from academic papers automatically.
	- Fast Extraction: Capable of extracting information from complex papers in under 10 minutes.
	- Table Extraction: Extracts values from tables, particularly focusing on gene names, SNPs, and associated diseases.
	- Export to Excel: Export extraction results to Excel format for easy integration and further analysis.

	## Usage
	1. Clone this repository:
	```bash
	git clone https://github.com/KalbeDigitalLab/nutrigenme-paper-extractor
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Prepare environment keys:
	```dosini
	# Credentials for LLM Models
	OPENAI_API_KEY=<api_key>
	GOOGLE_API_KEY=<api_key>
	PERPLEXITY_API_KEY=<api_key>

	# (Optional) Tracking your extraction process with LangSmith
	LANGCHAIN_TRACING_V2='true'
	LANGCHAIN_API_KEY=<langchain_api_key>
	LANGCHAIN_ENDPOINT='https://api.smith.langchain.com'
	LANGCHAIN_PROJECT=<project_name>
	```
	4. Run the application with `streamlit`:
	```bash
	streamlit run app.py
	```

	This program is also already deployed in 🤗HuggingFace [Space](https://huggingface.co/spaces/KalbeDigitalLab/nutrigenme-paper-extractor/).

	## Documentation
	app.py: Designs the user interface and guides the application flow, calling on other scripts for specific tasks.

	process.py: Orchestrates the information extraction by delegating tasks to other scripts and handling the overall workflow.

	prompt.py: Stores prompts crafted for Large Language Models (LLMs) to target specific information during extraction.

	table_detector.py: Focuses on extracting info from Optical Character Recognition (OCR) tables, using functions to detect and process them.

	## Contributing
	Contributions are welcome! If you'd like to contribute to this project, feel free to create pull requests.