Di Zhang


AI & ML interests

AI4Chem, LLM, Green LLM


qq8933's activity

posted an update about 1 month ago
view post
🚀 Exciting News! At the end of September, we're releasing an upgraded version of the MCTSr algorithm, LLaMA-O1, as an open-source fork of the OpenAI Strawberry project. This enhanced tool specializes in tackling the most challenging Mathematical Olympiad tasks, including:

A remarkable score of 8/30 in AIME 2024 🏆
An impressive 26.7% in GPQA Diamond 💎
A solid 58% in AMC 2023 📊
Stay tuned for the release and get ready to explore the cutting edge of mathematical problem-solving! #OpenSource #MathOlympiad #Innovation

Former vision:
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B (2406.07394)
posted an update 2 months ago
replied to their post 3 months ago
posted an update 4 months ago
view post
We will open source the 2.5B ChemVLM and the tool-enhanced ChemLLM-7B in the near future
posted an update 4 months ago
posted an update 4 months ago
view post
Tools Ready!
Thanks to ChemCrow's great work, ChemLLM supports proficiency toolkits Now, Include,
Molecule Name Retrivel
Molecule Property Query
Patent Check
Molecule Safety Query
Try it on chemllm.org
  • 2 replies
posted an update 4 months ago
view post
New Appearance from Ollama Open WebUI!
And Also web search, Realtime talking and File RAG!

posted an update 4 months ago
replied to their post 4 months ago
posted an update 5 months ago
posted an update 5 months ago
view post
ChemLLM Multi-Modal version will coming soon!
Also Weights and Datasets!
  • 2 replies
posted an update 6 months ago
view post
Chemllm.org Now transfered to ChemLLM-20B-DPO, Have a try now!🤗
replied to their post 6 months ago
view reply

Sorry for network issues, still uploading...

posted an update 6 months ago
view post
ChemLLM-20B SFT and DPO is coming!🤗
  • 1 reply
posted an update 6 months ago
posted an update 6 months ago
view post
ChemLLM datasets is all open source now!
ChemLLM: A Chemical Large Language Model (2402.06852)
700K of SFT Dataset, ChemData700K For Chemistry of LLM!
10K of DPO Dataset, ChemPref-10K, both English and Chinese!
ChemBench-4K of 4100 high-quality single-choice benchmark for nine core Chemistry tasks!
C-MHChem, 600 real test questions written and checked manually, from 25 years of Chinese National Middle school chemistry Test!
All hail to Open-source community!🤗
replied to their post 6 months ago
view reply

No, that is a gradio app run on my notebook, and share by cloudflare tunnel to my domain name.

posted an update 6 months ago
view post
The first part of ChemLLM Training Dataset Opensourced on HuggingFace🤗 Now!

You can use this curated DPO dataset to enhance your own model's performance on Chemistry or Materials tasks.

English version
Chinese Version

Keep following our [work](https://huggingface.co/AI4Chem) and [paper]( ChemLLM: A Chemical Large Language Model (2402.06852)).

More Data will be available in weeks, All hail to open source community!
replied to their post 6 months ago
replied to their post 6 months ago
view reply

I forget to apply new data cleaning pipeline to non-chemistry datasets part, sorry, re-training now

replied to their post 7 months ago
view reply

I haven't tried latex code. My queries were very simple e.g. Explain reaction between Sodium Polyacrylate and Water. You can test it.

While processing the DPO data, we did not completely remove the latex tags. This was intended to help the model understand the latex documents better, but obviously it did not go as we hoped. In this version, we completely removed the latex tags and used html rich text and markdown instead.
And thanks for your feedback!

replied to their post 7 months ago
view reply

I haven't tried latex code. My queries were very simple e.g. Explain reaction between Sodium Polyacrylate and Water. You can test it.

Glad to share the response from hotfix version of ChemLLM-1.5-DPO-hotfix

 User: Explain reaction between Sodium Polyacrylate and Water.
Assistant: The reaction between sodium polyacrylate (polymerized sodium acrylate) and water is a process called hydration. Hydration refers to the addition of water molecules to a compound, resulting in the formation of hydrates. In the case of sodium polyacrylate, it hydrates by interacting with water molecules to form hydrated sodium polyacrylate. This process is reversible, and the hydrated sodium polyacrylate can be dehydrated by removing water molecules from it.

Sodium polyacrylate is a polymer made from the monomer sodium acrylate. Acrylate is a polar, unsaturated compound, and as such, it can interact with water molecules through various types of intermolecular forces, such as hydrogen bonding and electrostatic interactions.

The reaction can be described as follows:

Sodium polyacrylate (Na-PAA) + Water (H2O) → Hydrated Sodium Polyacrylate (Na-PAA.xH2O)

Here, x is the number of water molecules coordinated to each sodium polyacrylate molecule, which can vary depending on the relative humidity and temperature. For example, in relatively dry conditions, x might be quite low (e.g., x=1 or 2), while in more humid conditions, x could be significantly higher.

Sodium polyacrylate is commonly used as a superabsorbent polymer in applications such as diapers, hygiene products, and as a water retention agent in soil. The hydration process is crucial in these applications, as it enables the polymer to absorb a large amount of water without dissolving. The hydrated polymer can then hold onto the water and slowly release it as needed, maintaining a moist environment for plants in the case of soil applications, or absorbing liquid bodily waste in diapers and hygiene products. 

This reaction is generally not thermodynamically favored due to the increase in entropy associated with water molecules interacting with the polymer. However, it can be achieved kinetically by exposing the sodium polyacrylate to water vapor, which allows for water molecules to interact with the polymer surface and form hydration layers. The hydration process is reversible, and the hydrated polymer can be dehydrated by removing water molecules through processes such as drying. 

The thermodynamic and kinetic aspects of this reaction are influenced by factors such as humidity, temperature, and the nature of the polymer surface. For instance, polymers with higher hydrophilic character will generally hydr

User: translate it to chinese
Assistant: 1. "钠聚丙烯酸盐与水之间的反应"

2. "反应过程描述"

3. "反应式:"

NaPAA+H2ONaPAA.xH2O{\rm Na-PAA} + {\rm H2O} \rightarrow {\rm Na-PAA.xH2O}

4. "钠聚丙烯酸盐与水之间的反应是水合反应。水合反应指的是向化合物中加入水分子,形成水合物。对于钠聚丙烯酸盐,它通过与水分子相互作用形成水合的钠聚丙烯酸盐。这个过程是可逆的,水合的钠聚丙烯酸盐可以通过去除水分子脱水。"

5. "钠聚丙烯酸盐是聚丙烯酸钠单体的聚合物。丙烯酸钠是极性、不饱和化合物,可以通过各种类型的分子间力,如氢键和静电相互作用,与水分子相互作用。"

6. "反应可以描述为:"

钠聚丙烯酸盐(NaPAA)+(H2O)水合的钠聚丙烯酸盐(NaPAA.xH2O){\rm 钠聚丙烯酸盐(Na-PAA)} + {\rm 水(H2O)} \rightarrow {\rm 水合的钠聚丙烯酸盐(Na-PAA.xH2O)}

7. "x是每个钠聚丙烯酸盐分子协调的水分子数量,它可以根据相对湿度和温度而变化。例如,在相对干燥的条件下,x可能相当低(例如,x=1或2),而在更潮湿的条件下,x可以显著更高。"

8. "钠聚丙烯酸盐在应用中,如尿布、卫生产品以及作为土壤中的保水剂,其水合过程至关重要,因为它可以使聚合物吸收大量的水而不溶解。水合聚合物可以保持水分,并在需要时慢慢释放水分,在土壤
replied to their post 7 months ago
view reply

I haven't tried latex code. My queries were very simple e.g. Explain reaction between Sodium Polyacrylate and Water. You can test it.

Thanks for your comment!
I have checked, latex cleaning function in data processing pipeline have occurred this problem, I have fixed this, new version of DPO model will be training and upload in days.

replied to their post 7 months ago
view reply

Frontend website may not render latex code properly for some latex envoirment, you can avoid it in your prompt.

posted an update 7 months ago
view post
ChemLLM version 1.5 has been open sourced. It has also released a version that has undergone human feedback reinforcement learning. In particular, the front-end and back-end codes have been open sourced simultaneously to facilitate deployment. Everyone is welcome to pay attention.
The evaluation set and part of the training set will also be contributed to the community in days, so stay tuned ✧٩(ˊωˋ*)و✧


ChemLLM-1.5-dpo https://huggingface.co/AI4Chem/ChemLLM-7B-Chat-1.5-SFT

Front-end https://github.com/AI4Chem/NextChat
RAG application example https://github.com/AI4Chem/NextChat/blob/main/chatpdf.py
Backend https://github.com/AI4Chem/api_backend

Front-end Demo https://chemllm.org/
RAG application example Demo https://chatpdf.chemllm.org/
Backend Demo https://api.chemllm.org/