wenkai commited on
Commit
33bd00f
·
verified ·
1 Parent(s): 4474095

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -88
README.md CHANGED
@@ -1,88 +1,8 @@
1
- ## Introduction
2
- <p align="center">
3
- <br>
4
- <img src="assets/FAPM.png"/>
5
- <br>
6
- <p>
7
-
8
- Huggingface repo: *https://huggingface.co/wenkai/FAPM/*
9
-
10
- ## Installation
11
-
12
- 1. (Optional) Creating conda environment
13
-
14
- ```bash
15
- conda create -n lavis python=3.8
16
- conda activate lavis
17
- ```
18
-
19
- 2. for development, you may build from source
20
-
21
- ```bash
22
- git clone https://github.com/xiangwenkai/FAPM.git
23
- cd FAPM
24
- pip install -e .
25
-
26
- # if needed
27
- # pip install Biopython
28
- # pip install fair-esm
29
- ```
30
-
31
- ### Datasets
32
- #### 1.raw dataset
33
- Raw data are avaliable at *https://ftp.uniprot.org/pub/databases/uniprot/previous_releases/release-2023_04/knowledgebase/*, this file is very large and need to be processed to get its name, sequence, GO label, function description and prompt.
34
- The domain level protein dataset we used are avaliable at *https://ftp.ebi.ac.uk/pub/databases/interpro/releases/95.0/protein2ipr.dat.gz*
35
- In this respository, We provide the experimental train/val/test sets of Swiss-Prot, which are avaliable at data/swissprot_exp
36
- #### 2.ESM2 embeddings
37
- Source code for ESM2 embeddings generation: *https://github.com/facebookresearch/esm*
38
- The generation command:
39
- ```bash
40
- conda activate FAPM
41
- python esm_scripts/extract.py esm2_t36_3B_UR50D you_path/protein.fasta you_path_to_save_embedding_files --repr_layers 36 --truncation_seq_length 1024 --include per_tok
42
- ```
43
- Example:
44
- ```
45
- conda activate FAPM
46
- python esm_scripts/extract.py esm2_t36_3B_UR50D data/fasta/example.fasta data/emb_esm2_3b --repr_layers 36 --truncation_seq_length 1024 --include per_tok
47
- ```
48
- The default path to save embedding files is **data/emb_esm2_3b**
49
- You can refer to *data/fasta/prepare_custom_fasta.py* to prepare your custom fasta data.
50
-
51
-
52
- ## Pretraining language models
53
- Source: *https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B*
54
-
55
- ## Training
56
- data config: lavis/configs/datasets/protein/GO_defaults_cap.yaml
57
- stage1 config: lavis/projects/blip2/train/protein_pretrain_stage1.yaml
58
- stage1 training command: run_scripts/blip2/train/protein_pretrain_domain_stage1.sh
59
- stage2 config: lavis/projects/blip2/train/protein_pretrain_stage2.yaml
60
- stage2 training/finetuning command: run_scripts/blip2/train/protein_pretrain_domain_stage2.sh
61
-
62
- ## Trained models
63
- The models are avaliable at **https://huggingface.co/wenkai/FAPM/tree/main/model**
64
- You can also download our trained models from google drive: *https://drive.google.com/drive/folders/1aA0eSYxNw3DvrU5GU1Cu-4q2kIxxAGSE?usp=drive_link*
65
-
66
- ## Testing
67
- config: lavis/projects/blip2/eval/caption_protein_eval.yaml
68
- command: run_scripts/blip2/eval/eval_cap_protein.sh
69
-
70
- ## Inference example
71
- ```
72
- python FAPM_inference.py \
73
- --model_path model/checkpoint_mf2.pth \
74
- --example_path data/emb_esm2_3b/P18281.pt \
75
- --device cuda \
76
- --prompt Acanthamoeba \
77
- --prop True
78
- ```
79
-
80
-
81
-
82
-
83
-
84
-
85
-
86
-
87
-
88
-
 
1
+ title: FAPM demo
2
+ emoji: 📊
3
+ colorFrom: green
4
+ colorTo: purple
5
+ sdk: gradio
6
+ sdk_version: 4.12.0
7
+ app_file: app.py
8
+ pinned: false