lamm-mit
/

Cephalo-Idefics-2-vision-10b-alpha

@@ -36,11 +36,19 @@ The model is developed to process diverse inputs, including images and text, fac
 Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
-This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-8b-beta, is based on the HuggingFaceM4/idefics2-8b-chatty model. The model was trained on a combination of scientific text-image data extracted from Wikipedia and scientific papers. For further details on the base model, see: https://huggingface.co/HuggingFaceM4/idefics2-8b-chatty. More details about technical aspects of the model, training and example applications to materials science problems are provided in the paper (reference at the bottom).
 ### Chat Format
-The lamm-mit/Cephalo-Idefics-2-vision-8b-beta is suiteable for one or more image inputs, wih prompts using the chat format as follows:
 ```raw
 User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
@@ -76,7 +84,7 @@ DEVICE='cuda:0'
 from transformers import AutoProcessor, Idefics2ForConditionalGeneration
 from tqdm.notebook import tqdm
-model_id='lamm-mit/Cephalo-Idefics-2-vision-8b-beta'
 model = Idefics2ForConditionalGeneration.from_pretrained(  model_id,
                                                            torch_dtype=torch.bfloat16, #if your GPU allows
@@ -224,7 +232,7 @@ url1 = "https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg"
 response, messages,images= ask_about_image ( model, processor, question,
                                              images_input=[url1,],
                                              temperature=0.1,
-                                             system= '', init_instr='You carefully study the image, and respond accurately, but succinctly. Think step-by-step.\n\n',
                                              show_conversation=True,
                                              max_new_tokens=512, messages=[], images=[])
 ```
@@ -235,11 +243,11 @@ Sample output:
 <small>Image by [Vaishakh Manohar](https://www.quantamagazine.org/the-simple-algorithm-that-ants-use-to-build-bridges-20180226/)</small>
 <pre style="white-space: pre-wrap;">
-The image depicts a group of ants moving in a coordinated manner, demonstrating their ability to navigate complex environments and adapt to changing conditions. This behavior is relevant for materials design because it highlights the potential of multi-agent AI systems to mimic natural systems and develop new materials with enhanced properties.
-Multi-agent AI refers to the use of multiple autonomous agents working together to solve complex problems. These agents can learn from each other and adapt to new situations, similar to how ants can navigate their environment and communicate with one another. By applying these principles to materials design, researchers can develop new materials that exhibit improved performance, such as enhanced strength, flexibility, and adaptability.
-The relevance of this image for materials design lies in the inspiration it provides for developing new materials that can mimic the natural efficiency and adaptability of ants. By studying the behavior of ants, researchers can gain insights into how to design materials that can respond dynamically to changes in their environment, leading to improved performance and functionality.
 </pre>
 ## Dataset generation

 Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
+This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-10b-alpha, is based on a merged expansion of the https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta and the HuggingFaceM4/idefics2-8b-chatty model. This method allows us to increase the depth of the model and focus on learning more complex representations and associations in deeper layers of the network.
+The model was trained in several stages:
+Step 1: Train https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta by fine-tuning the HuggingFaceM4/idefics2-8b-chatty model.
+Step 2: Combine the https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta decoder with the last 8 layers of the HuggingFaceM4/idefics2-8b-chatty decoder.
+Step 3: Fine-tune the merged model, which now has 40 decoder layers and a total of 10b parameters.
+The model was trained on a combination of scientific text-image data extracted from Wikipedia and scientific papers. For further details on the base model, see: https://huggingface.co/HuggingFaceM4/idefics2-8b-chatty. More details about technical aspects of the model, training and example applications to materials science problems are provided in the paper (reference at the bottom).
 ### Chat Format
+The lamm-mit/Cephalo-Idefics-2-vision-10b-alpha model is suitable for one or more image inputs, wih prompts using the chat format as follows:
 ```raw
 User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
 from transformers import AutoProcessor, Idefics2ForConditionalGeneration
 from tqdm.notebook import tqdm
+model_id='lamm-mit/Cephalo-Idefics-2-vision-10b-alpha'
 model = Idefics2ForConditionalGeneration.from_pretrained(  model_id,
                                                            torch_dtype=torch.bfloat16, #if your GPU allows
 response, messages,images= ask_about_image ( model, processor, question,
                                              images_input=[url1,],
                                              temperature=0.1,
+                                             system= '', init_instr='You carefully study the image and provide detailed answers. Think step-by-step.\n\n',
                                              show_conversation=True,
                                              max_new_tokens=512, messages=[], images=[])
 ```
 <small>Image by [Vaishakh Manohar](https://www.quantamagazine.org/the-simple-algorithm-that-ants-use-to-build-bridges-20180226/)</small>
 <pre style="white-space: pre-wrap;">
+The image shows a group of ants moving in coordinated patterns on a surface. This illustrates the concept of multi-agent AI, which involves the study and simulation of complex systems involving multiple agents (in this case, ants) interacting with each other and their environment.
+The relevance for materials design is in understanding how these natural systems exhibit emergent behaviors such as self-organization, which can inspire the development of new materials and systems that mimic these natural processes. By studying the movement patterns of ants, researchers can gain insights into how to design materials that exhibit similar emergent properties, leading to improved performance in various applications.
+Multi-agent AI involves creating models that describe the interactions between individual agents and their environment, allowing for the simulation of complex systems with multiple interacting components. This approach can be applied to various fields, including materials science, where understanding emergent behaviors at the microscopic level can lead to the design of new materials with enhanced properties.
 </pre>
 ## Dataset generation