Yin Fang commited on
Commit
7f4e057
·
verified ·
1 Parent(s): 35108e1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ ## 🗞️ Model description
5
+ **InstructCell** is a multi-modal AI copilot that integrates natural language with single-cell RNA sequencing data, enabling researchers to perform tasks like cell type annotation, pseudo-cell generation, and drug sensitivity prediction through intuitive text commands.
6
+ By leveraging a specialized multi-modal architecture and our multi-modal single-cell instruction dataset, InstructCell reduces technical barriers and enhances accessibility for single-cell analysis.
7
+
8
+ **Chat Version**: Supports generating both textual answers and single-cell data, providing a comprehensive output.
9
+
10
+
11
+ ### 🚀 How to use
12
+
13
+ We provide a simple example for quick reference. This demonstrates a basic **cell type annotation** workflow.
14
+
15
+ Make sure to specify the paths for `H5AD_PATH` and `GENE_VOCAB_PATH` appropriately:
16
+ - `H5AD_PATH`: Path to your `.h5ad` single-cell data file (e.g., `H5AD_PATH = "path/to/your/data.h5ad"`).
17
+ - `GENE_VOCAB_PATH`: Path to your gene vocabulary file (e.g., `GENE_VOCAB_PATH = "path/to/your/gene_vocab.npy"`).
18
+
19
+ ```python
20
+ from mmllm.module import InstructCell
21
+ import anndata
22
+ import numpy as np
23
+ from utils import unify_gene_features
24
+ # Load the pre-trained InstructCell model from HuggingFace
25
+ model = InstructCell.from_pretrained("zjunlp/InstructCell-chat")
26
+ # Load the single-cell data (H5AD format) and gene vocabulary file (numpy format)
27
+ adata = anndata.read_h5ad(H5AD_PATH)
28
+ gene_vocab = np.load(GENE_VOCAB_PATH)
29
+ adata = unify_gene_features(adata, gene_vocab, force_gene_symbol_uppercase=False)
30
+ # Select a random single-cell sample and extract its gene counts and metadata
31
+ k = np.random.randint(0, len(adata))
32
+ gene_counts = adata[k, :].X.toarray()
33
+ sc_metadata = adata[k, :].obs.iloc[0].to_dict()
34
+ # Define the model prompt with placeholders for metadata and gene expression profile
35
+ prompt = (
36
+ "Can you help me annotate this single cell from a {species}? "
37
+ "It was sequenced using {sequencing_method} and is derived from {tissue}. "
38
+ "The gene expression profile is {input}. Thanks!"
39
+ )
40
+ # Use the model to generate predictions
41
+ for key, value in model.predict(
42
+ prompt,
43
+ gene_counts=gene_counts,
44
+ sc_metadata=sc_metadata,
45
+ do_sample=True,
46
+ top_p=0.95,
47
+ top_k=50,
48
+ max_new_tokens=256,
49
+ ).items():
50
+ # Print each key-value pair
51
+ print(f"{key}: {value}")
52
+ ```
53
+
54
+ For more detailed explanations and additional examples, please refer to the Jupyter notebook [demo.ipynb](https://github.com/zjunlp/InstructCell/blob/main/demo.ipynb).