|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# HYPERVIEW - Vision Transformer Model (https://ai4eo.eu/challenge/hyperview-challenge/): |
|
|
|
This repository is based on the original code from: |
|
|
|
https://github.com/ridvansalihkuzu/hyperview_eagleeyes/tree/master/experimental_1 |
|
|
|
Below are the instructions to set up the environment and run the code: |
|
|
|
## Table of Contents |
|
|
|
- [Setup and Usage](#setup-and-usage) |
|
- [Loading the Pre-Trained Model](#loading-the-pre-trained-model) |
|
- [Loading the Training Data](#loading-the-training-data) |
|
- [Code Modifications](#code-modifications) |
|
- [Environment Setup](#environment-setup) |
|
|
|
## Setup and Usage |
|
|
|
### Loading the Pre-Trained Model |
|
|
|
To load a pre-trained model ("VisionTransformer.pt"), use the following code snippet: |
|
|
|
```python |
|
import clip |
|
import torch # Make sure to import torch |
|
|
|
from clip.downstream_task import TaskType |
|
|
|
|
|
device = "cpu" # Change to 'cuda' if you have a GPU |
|
num_classes = 4 # Number of classes in the original HYPERVIEW dataset |
|
|
|
# Load the CLIP model with the downstream task configuration |
|
model, _ = clip.load( |
|
"ViT-L/14", device, downstream_task=TaskType.HYPERVIEW, |
|
class_num=num_classes |
|
) |
|
|
|
# Load the pre-trained weights |
|
model.load_state_dict(torch.load("VisionTransformer.pt")) |
|
model.eval() # Set the model to evaluation mode |
|
``` |
|
|
|
### Loading training data |
|
|
|
```python |
|
import numpy as np |
|
|
|
from clip.hyperview_data_loader import HyperDataloader, DataReader |
|
|
|
|
|
im_size = 224 # Image size |
|
num_classes = 4 # Number of classes in the original HYPERVIEW dataset |
|
|
|
# Paths to training data and ground truth |
|
train_path = "<TRAIN_PATH>" |
|
train_gt_path = "<TRAIN_PATH>/train_gt.csv" |
|
|
|
# Initialize the dataset reader and transformations |
|
target_index = list(np.arange(num_classes)) |
|
trans_tr, _ = HyperDataloader._init_transform(im_size) |
|
train_dataset = DataReader( |
|
database_dir=train_path, label_paths=train_gt_path, |
|
transform=trans_tr, target_index=target_index |
|
) |
|
```` |
|
|
|
## Acknowledgment |
|
|
|
This model is based on the Vision Transformer architecture developed by Google Research, as detailed in their repository [Vision Transformer](https://github.com/google-research/vision_transformer). The original models were trained on the ImageNet and ImageNet-21k datasets and are licensed under the Apache License, Version 2.0. |
|
|
|
We would like to express our gratitude to the authors and contributors of the Vision Transformer project for their valuable work, which has significantly influenced my model's development. For more information on the license and usage, please refer to the [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0). |
|
|
|
**Citations:** |
|
1. [Acknowledgment Note](https://github.com/google-research/vision_transformer?tab=Apache-2.0-1-ov-file#readme) |
|
2. [Vision Transformer GitHub Repository](https://github.com/google-research/vision_transformer) |
|
|
|
## User Request |
|
|
|
If you feel that this model is useful in your research, we would appreciate if you could refer to the following paper while using the model: |
|
|
|
[1] J. Nalepa et al., "Estimating Soil Parameters From Hyperspectral Images: A benchmark dataset and the outcome of the HYPERVIEW challenge," in IEEE Geoscience and Remote Sensing Magazine, DOI: 10.1109/MGRS.2024.3394040. |
|
|
|
(https://ieeexplore.ieee.org/document/10526314) |
|
|