Dish-Decode-2 / README.md
GoodML's picture
Update README.md
279da4f verified
metadata
title: Dish Decode 2
emoji: πŸƒ
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
license: mit
short_description: Structure recipe information from videos

🍽️ Recipe Extraction API

This project is a Flask-based API that extracts structured recipe information from cooking tutorial videos! It uses the Deepgram API for audio transcription, Tesseract OCR for text extraction from video frames, and the Gemini API to generate a well-structured recipe document. πŸš€


πŸ“¦ Project Setup

Follow these steps to set up and run the project on your local machine.

1️⃣ Clone the Repository

git clone <your-repo-url>
cd <your-repo-folder>

2️⃣ Install Dependencies

Make sure you have Python installed (Python 3.8 or above is recommended). Install the required libraries using pip:

pip install -r requirements.txt

3️⃣ Install Tesseract OCR

Ensure Tesseract OCR is installed on your system. You can download it here: Tesseract GitHub

Add Tesseract to your system path and make sure to note its installation location.

On Windows:

Add the path to tesseract.exe to your environment variables, e.g.:

C:\Program Files\Tesseract-OCR

On MacOS (using Homebrew):

brew install tesseract

On Ubuntu:

sudo apt-get install tesseract-ocr

4️⃣ Setup Environment Variables

Create a .env file in the root directory and add your API keys:

FIRST_API_KEY=<Your Gemini API Key>
SECOND_API_KEY=<Your Deepgram API Key>

5️⃣ Install FFmpeg

This project uses FFmpeg for converting MP4 videos to WAV audio. Install it via the following:

On MacOS (using Homebrew):

brew install ffmpeg

On Ubuntu:

sudo apt-get install ffmpeg

On Windows:

Download FFmpeg from FFmpeg.org and add it to your system path.


πŸš€ Running the Project

Start the Flask server with the following command:

python app.py

If everything is set up correctly, you should see:

 * Running on http://127.0.0.1:5000/

πŸ“‘ API Endpoints

βœ… Health Check

Endpoint: GET /

Check if the API is running.

curl http://127.0.0.1:5000/

Response:

{
    "status": "success",
    "message": "API is running successfully!"
}

🍲 Recipe Extraction

Endpoint: POST /process-video

Request Body:

Send a JSON payload with a video URL:

{
    "videoUrl": "<URL-of-the-cooking-video>"
}

Example Using curl:

curl -X POST http://127.0.0.1:5000/process-video \
-H "Content-Type: application/json" \
-d '{"videoUrl": "https://example.com/video.mp4"}'

Sample Response:

{
    "**1. Recipe Name:**": "Beef Wellington",
    "**2. Ingredients List:**": "* Fillet of beef\n* Olive oil\n* Salt\n* Pepper",
    "**3. Steps for Preparation:**": "1. Sear the beef fillet\n2. Brush with mustard",
    "**4. Cooking Techniques Used:**": "* Searing\n* Wrapping",
    "**5. Equipment Needed:**": "* Hot pan\n* Blender",
    "**6. Nutritional Information:**": "High in protein and fat",
    "**7. Serving size:**": "2-4 people",
    "**8. Special Notes or Variations:**": "Use horseradish instead of mustard",
    "**9. Festive or Thematic Relevance:**": "Christmas alternative to roast turkey"
}

πŸ› οΈ Key Features

  • Deepgram API for accurate audio transcription.
  • Tesseract OCR for extracting text from video frames.
  • Gemini API for generating structured recipe information.
  • FFmpeg for seamless MP4-to-WAV conversion.
  • Supports both audio and video analysis for enhanced accuracy. 🎯

πŸ§ͺ Testing

Use tools like Postman or curl to test the API endpoints.


🀝 Contributions

Contributions are welcome! Feel free to submit a pull request or open an issue for any enhancements or bug fixes.


πŸ“„ License

This project is licensed under the MIT License.


🌟 Happy Coding and Bon AppΓ©tit! πŸ‘¨β€πŸ³πŸ‘©β€πŸ³

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference