cursor_slides_internvl2 / README_INTERNVL2.md
mknolan's picture
Upload InternVL2 implementation
e59dc66 verified

Image Analysis with InternVL2

This project uses the InternVL2-40B-AWQ model for high-quality image analysis, description, and understanding. It provides a Gradio web interface for users to upload images and get detailed analysis.

Features

  • High-Quality Image Analysis: Uses InternVL2-40B (4-bit quantized) for state-of-the-art image understanding
  • Multiple Analysis Types: General description, text extraction, chart analysis, people description, and technical analysis
  • Simple UI: User-friendly Gradio interface for easy image uploading and analysis
  • Efficient Resource Usage: 4-bit quantized model (AWQ) for reduced memory footprint and faster inference

Requirements

The application requires:

  • Python 3.9+
  • CUDA-compatible GPU (recommended 24GB+ VRAM)
  • Transformers 4.37.2+
  • lmdeploy 0.5.3+
  • Gradio 3.38.0
  • Other dependencies in requirements.txt

Setup

Docker Setup (Recommended)

  1. Build the Docker image:

    docker build -t internvl2-image-analysis .
    
  2. Run the Docker container:

    docker run --gpus all -p 7860:7860 internvl2-image-analysis
    

Local Setup

  1. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Run the application:

    python app_internvl2.py
    

Usage

  1. Open your browser and navigate to http://localhost:7860
  2. Upload an image using the upload box
  3. Choose an analysis type from the options
  4. Click "Analyze Image" and wait for the results

Analysis Types

  • General: Provides a comprehensive description of the image content
  • Text: Focuses on identifying and extracting text from the image
  • Chart: Analyzes charts, graphs, and diagrams in detail
  • People: Describes people in the image - appearance, actions, and expressions
  • Technical: Provides technical analysis of objects and their relationships

Testing

To test the model directly from the command line:

python test_internvl2.py --image path/to/your/image.jpg --prompt "Describe this image in detail."

Deployment to Hugging Face

To deploy to Hugging Face Spaces:

python upload_internvl2_to_hf.py

Model Details

This application uses InternVL2-40B-AWQ, a 4-bit quantized version of InternVL2-40B. The original model consists of:

  • Vision Component: InternViT-6B-448px-V1-5
  • Language Component: Nous-Hermes-2-Yi-34B
  • Total Parameters: ~40B (6B vision + 34B language)

License

This project is released under the same license as the InternVL2 model, which is MIT license.

Acknowledgements