khanfs commited on
Commit
c389fa0
·
verified ·
1 Parent(s): 1622e5a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -5
README.md CHANGED
@@ -16,11 +16,10 @@ base_model:
16
  ---
17
 
18
  # ChemSolubilityBERTa
19
- **ChemSolubilityBERTa** is a fine-tuned version of the ChemBERTa model, a prototype designed to predict the aqueous solubility of chemical compounds based on their SMILES representations. Based on ChemBERTa, a BERT-like transformer-based architecture, ChemBERTa pre-trained on 77M SMILES strings for molecular property prediction. We adapted ChemBERTa to predict solubility values by fine-tuning ChemBERTa with the ESOL (Estimated SOLubility) dataset, a water solubility prediction dataset of 1,128 samples. A user inputs a SMILES string, and the model outputs a log solubility value (log mol/L).
20
- You can read the full paper [here](./01_ChemSolubilityBERTa.pdf).
21
-
22
  ## Model Description
23
- This model was fine-tuned using the ESOL dataset, which contains experimental solubility data for various chemical compounds. ChemBERTa, based on BERT architecture, was adapted to perform regression tasks, outputting a predicted log solubility value for any given SMILES string.
 
 
24
 
25
  ## Fine-Tuning Details
26
  - Pretrained model: `seyonec/ChemBERTa-zinc-base-v1`
@@ -42,4 +41,10 @@ smiles_string = "CCO" # Example for ethanol
42
  inputs = tokenizer(smiles_string, return_tensors='pt')
43
  outputs = model(**inputs)
44
  solubility = outputs.logits.item()
45
- print(f"Predicted solubility: {solubility}")
 
 
 
 
 
 
 
16
  ---
17
 
18
  # ChemSolubilityBERTa
 
 
 
19
  ## Model Description
20
+ ChemSolubilityBERTa is a prototype designed to predict the aqueous solubility of chemical compounds from their SMILES representations. Based on ChemBERTa, a BERT-like transformer-based architecture, ChemBERTa pre-trained on 77M SMILES strings for molecular property prediction. We adapted ChemBERTa to predict solubility values by fine-tuning ChemBERTa with the ESOL (Estimated SOLubility) dataset, a water solubility prediction dataset of 1,128 samples. A user inputs a SMILES string, and the model outputs a log solubility value (log mol/L).
21
+
22
+ You can read the full paper [here](./01_ChemSolubilityBERTa.pdf).
23
 
24
  ## Fine-Tuning Details
25
  - Pretrained model: `seyonec/ChemBERTa-zinc-base-v1`
 
41
  inputs = tokenizer(smiles_string, return_tensors='pt')
42
  outputs = model(**inputs)
43
  solubility = outputs.logits.item()
44
+ print(f"Predicted solubility: {solubility}")
45
+
46
+ ##How to Use
47
+
48
+ This model is licensed under the [MIT License](https://opensource.org/licenses/MIT).
49
+
50
+