Yin Fang commited on
Commit
a2293d2
·
verified ·
1 Parent(s): 6985076

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -1
README.md CHANGED
@@ -1,3 +1,60 @@
1
  ---
2
  license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ ---
4
+
5
+ ## 🗞️ Model description
6
+ **InstructCell** is a multi-modal AI copilot that integrates natural language with single-cell RNA sequencing data, enabling researchers to perform tasks like cell type annotation, pseudo-cell generation, and drug sensitivity prediction through intuitive text commands.
7
+ By leveraging a specialized multi-modal architecture and our multi-modal single-cell instruction dataset, InstructCell reduces technical barriers and enhances accessibility for single-cell analysis.
8
+
9
+ **Instruct Version**: Focused solely on generating concise answers without extra text.
10
+
11
+
12
+ ### 🚀 How to use
13
+
14
+ We provide a simple example for quick reference. This demonstrates a basic **cell type annotation** workflow.
15
+
16
+ Make sure to specify the paths for `H5AD_PATH` and `GENE_VOCAB_PATH` appropriately:
17
+ - `H5AD_PATH`: Path to your `.h5ad` single-cell data file (e.g., `H5AD_PATH = "path/to/your/data.h5ad"`).
18
+ - `GENE_VOCAB_PATH`: Path to your gene vocabulary file (e.g., `GENE_VOCAB_PATH = "path/to/your/gene_vocab.npy"`).
19
+
20
+ ```python
21
+ from mmllm.module import InstructCell
22
+ import anndata
23
+ import numpy as np
24
+ from utils import unify_gene_features
25
+
26
+ # Load the pre-trained InstructCell model from HuggingFace
27
+ model = InstructCell.from_pretrained("zjunlp/InstructCell-instruct")
28
+
29
+ # Load the single-cell data (H5AD format) and gene vocabulary file (numpy format)
30
+ adata = anndata.read_h5ad(H5AD_PATH)
31
+ gene_vocab = np.load(GENE_VOCAB_PATH)
32
+ adata = unify_gene_features(adata, gene_vocab, force_gene_symbol_uppercase=False)
33
+
34
+ # Select a random single-cell sample and extract its gene counts and metadata
35
+ k = np.random.randint(0, len(adata))
36
+ gene_counts = adata[k, :].X.toarray()
37
+ sc_metadata = adata[k, :].obs.iloc[0].to_dict()
38
+
39
+ # Define the model prompt with placeholders for metadata and gene expression profile
40
+ prompt = (
41
+ "Can you help me annotate this single cell from a {species}? "
42
+ "It was sequenced using {sequencing_method} and is derived from {tissue}. "
43
+ "The gene expression profile is {input}. Thanks!"
44
+ )
45
+
46
+ # Use the model to generate predictions
47
+ for key, value in model.predict(
48
+ prompt,
49
+ gene_counts=gene_counts,
50
+ sc_metadata=sc_metadata,
51
+ do_sample=True,
52
+ top_p=0.95,
53
+ top_k=50,
54
+ max_new_tokens=256,
55
+ ).items():
56
+ # Print each key-value pair
57
+ print(f"{key}: {value}")
58
+ ```
59
+
60
+ For more detailed explanations and additional examples, please refer to the Jupyter notebook [demo.ipynb](https://github.com/zjunlp/InstructCell/blob/main/demo.ipynb).