liamcripwell commited on
Commit
f9aa6b5
·
verified ·
1 Parent(s): 8be86db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +612 -196
README.md CHANGED
@@ -1,199 +1,615 @@
1
  ---
2
- library_name: transformers
3
- tags: []
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: mit
3
+ language:
4
+ - multilingual
5
+ tags:
6
+ - nlp
7
+ base_model: OpenGVLab/InternVL2_5-1B
8
+ pipeline_tag: text-generation
9
+ inference: true
10
  ---
11
 
12
+ # NuExtract-2-2B by NuMind 🔥
13
+
14
+ NuExtract 2.0 is a family of models trained specifically for structured information extraction tasks. It supports both multimodal inputs and is multilingual.
15
+
16
+ We provide several versions of different sizes, all based on the InternVL2.5 family.
17
+ | Model Size | Model Name | Base Model | Huggingface Link |
18
+ |------------|------------|------------|------------------|
19
+ | 2B | NuExtract-2.0-2B | [InternVL2_5-2B](https://huggingface.co/OpenGVLab/InternVL2_5-2B) | [NuExtract-2-2B](https://huggingface.co/numind/NuExtract-2-2B) |
20
+ | 4B | NuExtract-2.0-4B | [InternVL2_5-4B](https://huggingface.co/OpenGVLab/InternVL2_5-4B) | [NuExtract-2-4B](https://huggingface.co/numind/NuExtract-2-4B) |
21
+ | 8B | NuExtract-2.0-8B | [InternVL2_5-8B](https://huggingface.co/OpenGVLab/InternVL2_5-8B) | [NuExtract-2-8B](https://huggingface.co/numind/NuExtract-2-8B) |
22
+
23
+ ## Overview
24
+
25
+ To use the model, provide an input text/image and a JSON template describing the information you need to extract. The template should be a JSON object, specifying field names and their expected type.
26
+
27
+ Support types include:
28
+ * `verbatim-string` - instructs the model to extract text that is present verbatim in the input.
29
+ * `string` - a generic string field that can incorporate paraphrasing/abstraction.
30
+ * `integer` - a whole number.
31
+ * `number` - a whole or decimal number.
32
+ * `date-time` - ISO formatted date.
33
+ * Array of any of the above types (e.g. `["string"]`)
34
+ * `enum` - a choice from set of possible answers (represented in template as an array of options, e.g. `["yes", "no", "maybe"]`).
35
+ * `multi-label` - an enum that can have multiple possible answers (represented in template as a double-wrapped array, e.g. `[["A", "B", "C"]]`).
36
+
37
+ If the model does not identify relevant information for a field, it will return `null` or `[]` (for arrays and multi-labels).
38
+
39
+ The following is an example template:
40
+ ```json
41
+ {
42
+ "first_name": "verbatim-string",
43
+ "last_name": "verbatim-string",
44
+ "description": "string",
45
+ "age": "integer",
46
+ "gpa": "number",
47
+ "birth_date": "date-time",
48
+ "nationality": ["France", "England", "Japan", "USA", "China"],
49
+ "languages_spoken": [["English", "French", "Japanese", "Mandarin", "Spanish"]]
50
+ }
51
+ ```
52
+ An example output:
53
+ ```json
54
+ {
55
+ "first_name": "Susan",
56
+ "last_name": "Smith",
57
+ "description": "A student studying computer science.",
58
+ "age": 20,
59
+ "gpa": 3.7,
60
+ "birth_date": "2005-03-01",
61
+ "nationality": "England",
62
+ "languages_spoken": ["English", "French"]
63
+ }
64
+ ```
65
+
66
+ ⚠️ We recommend using NuExtract with a temperature at or very close to 0. Some inference frameworks, such as Ollama, use a default of 0.7 which is not well suited to many extraction tasks.
67
+
68
+ ## Inference
69
+
70
+ Use the following code to handle loading and preprocessing of input data:
71
+
72
+ ```python
73
+ import torch
74
+ import torchvision.transforms as T
75
+ from PIL import Image
76
+ from torchvision.transforms.functional import InterpolationMode
77
+
78
+ IMAGENET_MEAN = (0.485, 0.456, 0.406)
79
+ IMAGENET_STD = (0.229, 0.224, 0.225)
80
+
81
+ def build_transform(input_size):
82
+ MEAN, STD = IMAGENET_MEAN, IMAGENET_STD
83
+ transform = T.Compose([
84
+ T.Lambda(lambda img: img.convert('RGB') if img.mode != 'RGB' else img),
85
+ T.Resize((input_size, input_size), interpolation=InterpolationMode.BICUBIC),
86
+ T.ToTensor(),
87
+ T.Normalize(mean=MEAN, std=STD)
88
+ ])
89
+ return transform
90
+
91
+ def find_closest_aspect_ratio(aspect_ratio, target_ratios, width, height, image_size):
92
+ best_ratio_diff = float('inf')
93
+ best_ratio = (1, 1)
94
+ area = width * height
95
+ for ratio in target_ratios:
96
+ target_aspect_ratio = ratio[0] / ratio[1]
97
+ ratio_diff = abs(aspect_ratio - target_aspect_ratio)
98
+ if ratio_diff < best_ratio_diff:
99
+ best_ratio_diff = ratio_diff
100
+ best_ratio = ratio
101
+ elif ratio_diff == best_ratio_diff:
102
+ if area > 0.5 * image_size * image_size * ratio[0] * ratio[1]:
103
+ best_ratio = ratio
104
+ return best_ratio
105
+
106
+ def dynamic_preprocess(image, min_num=1, max_num=12, image_size=448, use_thumbnail=False):
107
+ orig_width, orig_height = image.size
108
+ aspect_ratio = orig_width / orig_height
109
+
110
+ # calculate the existing image aspect ratio
111
+ target_ratios = set(
112
+ (i, j) for n in range(min_num, max_num + 1) for i in range(1, n + 1) for j in range(1, n + 1) if
113
+ i * j <= max_num and i * j >= min_num)
114
+ target_ratios = sorted(target_ratios, key=lambda x: x[0] * x[1])
115
+
116
+ # find the closest aspect ratio to the target
117
+ target_aspect_ratio = find_closest_aspect_ratio(
118
+ aspect_ratio, target_ratios, orig_width, orig_height, image_size)
119
+
120
+ # calculate the target width and height
121
+ target_width = image_size * target_aspect_ratio[0]
122
+ target_height = image_size * target_aspect_ratio[1]
123
+ blocks = target_aspect_ratio[0] * target_aspect_ratio[1]
124
+
125
+ # resize the image
126
+ resized_img = image.resize((target_width, target_height))
127
+ processed_images = []
128
+ for i in range(blocks):
129
+ box = (
130
+ (i % (target_width // image_size)) * image_size,
131
+ (i // (target_width // image_size)) * image_size,
132
+ ((i % (target_width // image_size)) + 1) * image_size,
133
+ ((i // (target_width // image_size)) + 1) * image_size
134
+ )
135
+ # split the image
136
+ split_img = resized_img.crop(box)
137
+ processed_images.append(split_img)
138
+ assert len(processed_images) == blocks
139
+ if use_thumbnail and len(processed_images) != 1:
140
+ thumbnail_img = image.resize((image_size, image_size))
141
+ processed_images.append(thumbnail_img)
142
+ return processed_images
143
+
144
+ def load_image(image_file, input_size=448, max_num=12):
145
+ image = Image.open(image_file).convert('RGB')
146
+ transform = build_transform(input_size=input_size)
147
+ images = dynamic_preprocess(image, image_size=input_size, use_thumbnail=True, max_num=max_num)
148
+ pixel_values = [transform(image) for image in images]
149
+ pixel_values = torch.stack(pixel_values)
150
+ return pixel_values
151
+
152
+ def prepare_inputs(messages, image_paths, tokenizer, device='cuda', dtype=torch.bfloat16):
153
+ """
154
+ Prepares multi-modal input components (supports multiple images per prompt).
155
+
156
+ Args:
157
+ messages: List of input messages/prompts (strings or dicts with 'role' and 'content')
158
+ image_paths: List where each element is either None (for text-only) or a list of image paths
159
+ tokenizer: The tokenizer to use for applying chat templates
160
+ device: Device to place tensors on ('cuda', 'cpu', etc.)
161
+ dtype: Data type for image tensors (default: torch.bfloat16)
162
+
163
+ Returns:
164
+ dict: Contains 'prompts', 'pixel_values_list', and 'num_patches_list' ready for the model
165
+ """
166
+ # Make sure image_paths list is at least as long as messages
167
+ if len(image_paths) < len(messages):
168
+ # Pad with None for text-only messages
169
+ image_paths = image_paths + [None] * (len(messages) - len(image_paths))
170
+
171
+ # Process images and collect patch information
172
+ loaded_images = []
173
+ num_patches_list = []
174
+ for paths in image_paths:
175
+ if paths and isinstance(paths, list) and len(paths) > 0:
176
+ # Load each image in this prompt
177
+ prompt_images = []
178
+ prompt_patches = []
179
+
180
+ for path in paths:
181
+ # Load the image
182
+ img = load_image(path).to(dtype=dtype, device=device)
183
+
184
+ # Ensure img has correct shape [patches, C, H, W]
185
+ if len(img.shape) == 3: # [C, H, W] -> [1, C, H, W]
186
+ img = img.unsqueeze(0)
187
+
188
+ prompt_images.append(img)
189
+ # Record the number of patches for this image
190
+ prompt_patches.append(img.shape[0])
191
+
192
+ loaded_images.append(prompt_images)
193
+ num_patches_list.append(prompt_patches)
194
+ else:
195
+ # Text-only prompt
196
+ loaded_images.append(None)
197
+ num_patches_list.append([])
198
+
199
+ # Create the concatenated pixel_values_list
200
+ pixel_values_list = []
201
+ for prompt_images in loaded_images:
202
+ if prompt_images:
203
+ # Concatenate all images for this prompt
204
+ pixel_values_list.append(torch.cat(prompt_images, dim=0))
205
+ else:
206
+ # Text-only prompt
207
+ pixel_values_list.append(None)
208
+
209
+ # Format messages for the model
210
+ if all(isinstance(m, str) for m in messages):
211
+ # Simple string messages: convert to chat format
212
+ batch_messages = [
213
+ [{"role": "user", "content": message}]
214
+ for message in messages
215
+ ]
216
+ else:
217
+ # Assume messages are already in the right format
218
+ batch_messages = messages
219
+
220
+ # Apply chat template
221
+ prompts = tokenizer.apply_chat_template(
222
+ batch_messages,
223
+ tokenize=False,
224
+ add_generation_prompt=True
225
+ )
226
+
227
+ return {
228
+ 'prompts': prompts,
229
+ 'pixel_values_list': pixel_values_list,
230
+ 'num_patches_list': num_patches_list
231
+ }
232
+
233
+ def construct_message(text, template, examples=None):
234
+ """
235
+ Construct the individual NuExtract message texts, prior to chat template formatting.
236
+ """
237
+ # add few-shot examples if needed
238
+ if examples is not None and len(examples) > 0:
239
+ icl = "# Examples:\n"
240
+ for row in examples:
241
+ icl += f"## Input:\n{row['input']}\n## Output:\n{row['output']}\n"
242
+ else:
243
+ icl = ""
244
+
245
+ return f"""# Template:\n{template}\n{icl}# Context:\n{text}"""
246
+ ```
247
+
248
+ To handle inference:
249
+
250
+ ```python
251
+ IMG_START_TOKEN='<img>'
252
+ IMG_END_TOKEN='</img>'
253
+ IMG_CONTEXT_TOKEN='<IMG_CONTEXT>'
254
+
255
+ def nuextract_generate(model, tokenizer, prompts, generation_config, pixel_values_list=None, num_patches_list=None):
256
+ """
257
+ Generate responses for a batch of NuExtract inputs.
258
+ Support for multiple and varying numbers of images per prompt.
259
+
260
+ Args:
261
+ model: The vision-language model
262
+ tokenizer: The tokenizer for the model
263
+ pixel_values_list: List of tensor batches, one per prompt
264
+ Each batch has shape [num_images, channels, height, width] or None for text-only prompts
265
+ prompts: List of text prompts
266
+ generation_config: Configuration for text generation
267
+ num_patches_list: List of lists, each containing patch counts for images in a prompt
268
+
269
+ Returns:
270
+ List of generated responses
271
+ """
272
+ img_context_token_id = tokenizer.convert_tokens_to_ids(IMG_CONTEXT_TOKEN)
273
+ model.img_context_token_id = img_context_token_id
274
+
275
+ # Replace all image placeholders with appropriate tokens
276
+ modified_prompts = []
277
+ total_image_files = 0
278
+ total_patches = 0
279
+ image_containing_prompts = []
280
+ for idx, prompt in enumerate(prompts):
281
+ # check if this prompt has images
282
+ has_images = (pixel_values_list and
283
+ idx < len(pixel_values_list) and
284
+ pixel_values_list[idx] is not None and
285
+ isinstance(pixel_values_list[idx], torch.Tensor) and
286
+ pixel_values_list[idx].shape[0] > 0)
287
+
288
+ if has_images:
289
+ # prompt with image placeholders
290
+ image_containing_prompts.append(idx)
291
+ modified_prompt = prompt
292
+
293
+ patches = num_patches_list[idx] if (num_patches_list and idx < len(num_patches_list)) else []
294
+ num_images = len(patches)
295
+ total_image_files += num_images
296
+ total_patches += sum(patches)
297
+
298
+ # replace each <image> placeholder with image tokens
299
+ for i, num_patches in enumerate(patches):
300
+ image_tokens = IMG_START_TOKEN + IMG_CONTEXT_TOKEN * model.num_image_token * num_patches + IMG_END_TOKEN
301
+ modified_prompt = modified_prompt.replace('<image>', image_tokens, 1)
302
+ else:
303
+ # text-only prompt
304
+ modified_prompt = prompt
305
+
306
+ modified_prompts.append(modified_prompt)
307
+
308
+ # process all prompts in a single batch
309
+ tokenizer.padding_side = 'left'
310
+ model_inputs = tokenizer(modified_prompts, return_tensors='pt', padding=True)
311
+ input_ids = model_inputs['input_ids'].to(model.device)
312
+ attention_mask = model_inputs['attention_mask'].to(model.device)
313
+
314
+ eos_token_id = tokenizer.convert_tokens_to_ids("<|im_end|>\n".strip())
315
+ generation_config['eos_token_id'] = eos_token_id
316
+
317
+ # prepare pixel values
318
+ flattened_pixel_values = None
319
+ if image_containing_prompts:
320
+ # collect and concatenate all image tensors
321
+ all_pixel_values = []
322
+ for idx in image_containing_prompts:
323
+ all_pixel_values.append(pixel_values_list[idx])
324
+
325
+ flattened_pixel_values = torch.cat(all_pixel_values, dim=0)
326
+ print(f"Processing batch with {len(prompts)} prompts, {total_image_files} actual images, and {total_patches} total patches")
327
+ else:
328
+ print(f"Processing text-only batch with {len(prompts)} prompts")
329
+
330
+ # generate outputs
331
+ outputs = model.generate(
332
+ pixel_values=flattened_pixel_values, # will be None for text-only prompts
333
+ input_ids=input_ids,
334
+ attention_mask=attention_mask,
335
+ **generation_config
336
+ )
337
+
338
+ # Decode responses
339
+ responses = tokenizer.batch_decode(outputs, skip_special_tokens=True)
340
+
341
+ return responses
342
+ ```
343
+
344
+ To load the model:
345
+
346
+ ```python
347
+ import torch
348
+ from transformers import AutoModelForCausalLM, AutoTokenizer
349
+
350
+ model_name = ""
351
+
352
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, padding_side='left')
353
+ model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True,
354
+ torch_dtype=torch.bfloat16,
355
+ attn_implementation="flash_attention_2" # we recommend using flash attention
356
+ ).to("cuda")
357
+ ```
358
+
359
+ Simple 0-shot text-only example:
360
+ ```python
361
+ template = """{"names": ["verbatim-string"]}"""
362
+ text = "John went to the restaurant with Mary. James went to the cinema."
363
+
364
+ input_messages = [construct_message(text, template)]
365
+
366
+ input_content = prepare_inputs(
367
+ messages=input_messages,
368
+ image_paths=[],
369
+ tokenizer=tokenizer,
370
+ )
371
+
372
+ generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
373
+
374
+ with torch.no_grad():
375
+ result = nuextract_generate(
376
+ model=model,
377
+ tokenizer=tokenizer,
378
+ prompts=input_content['prompts'],
379
+ pixel_values_list=input_content['pixel_values_list'],
380
+ num_patches_list=input_content['num_patches_list'],
381
+ generation_config=generation_config
382
+ )
383
+ for y in result:
384
+ print(y)
385
+ # {"names": ["John", "Mary", "James"]}
386
+ ```
387
+
388
+ Text-only input with an in-context example:
389
+ ```python
390
+ template = """{"names": ["verbatim-string"], "female_names": ["verbatim-string"]}"""
391
+ text = "John went to the restaurant with Mary. James went to the cinema."
392
+ examples = [
393
+ {
394
+ "input": "Stephen is the manager at Susan's store.",
395
+ "output": """{"names": ["STEPHEN", "SUSAN"], "female_names": ["SUSAN"]}"""
396
+ }
397
+ ]
398
+
399
+ input_messages = [construct_message(text, template, examples)]
400
+
401
+ input_content = prepare_inputs(
402
+ messages=input_messages,
403
+ image_paths=[],
404
+ tokenizer=tokenizer,
405
+ )
406
+
407
+ generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
408
+
409
+ with torch.no_grad():
410
+ result = nuextract_generate(
411
+ model=model,
412
+ tokenizer=tokenizer,
413
+ prompts=input_content['prompts'],
414
+ pixel_values_list=input_content['pixel_values_list'],
415
+ num_patches_list=input_content['num_patches_list'],
416
+ generation_config=generation_config
417
+ )
418
+ for y in result:
419
+ print(y)
420
+ # {"names": ["JOHN", "MARY", "JAMES"], "female_names": ["MARY"]}
421
+ ```
422
+
423
+ Example with image input and an in-context example. Image inputs should use `<image>` placeholder instead of text and image paths should be provided in a list in order of appearance in the prompt (in this example `0.jpg` will be for the in-context example and `1.jpg` for the true input).
424
+ ```python
425
+ template = """{"store": "verbatim-string"}"""
426
+ text = "<image>"
427
+ examples = [
428
+ {
429
+ "input": "<image>",
430
+ "output": """{"store": "Walmart"}"""
431
+ }
432
+ ]
433
+
434
+ input_messages = [construct_message(text, template, examples)]
435
+
436
+ images = [
437
+ ["0.jpg", "1.jpg"]
438
+ ]
439
+
440
+ input_content = prepare_inputs(
441
+ messages=input_messages,
442
+ image_paths=images,
443
+ tokenizer=tokenizer,
444
+ )
445
+
446
+ generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
447
+
448
+ with torch.no_grad():
449
+ result = nuextract_generate(
450
+ model=model,
451
+ tokenizer=tokenizer,
452
+ prompts=input_content['prompts'],
453
+ pixel_values_list=input_content['pixel_values_list'],
454
+ num_patches_list=input_content['num_patches_list'],
455
+ generation_config=generation_config
456
+ )
457
+ for y in result:
458
+ print(y)
459
+ # {"store": "Trader Joe's"}
460
+ ```
461
+
462
+ Multi-modal batched input:
463
+ ```python
464
+ inputs = [
465
+ # image input with no ICL examples
466
+ {
467
+ "text": "<image>",
468
+ "template": """{"store_name": "verbatim-string"}""",
469
+ "examples": None,
470
+ },
471
+ # image input with 1 ICL example
472
+ {
473
+ "text": "<image>",
474
+ "template": """{"store_name": "verbatim-string"}""",
475
+ "examples": [
476
+ {
477
+ "input": "<image>",
478
+ "output": """{"store_name": "Walmart"}""",
479
+ }
480
+ ],
481
+ },
482
+ # text input with no ICL examples
483
+ {
484
+ "text": "John went to the restaurant with Mary. James went to the cinema.",
485
+ "template": """{"names": ["verbatim-string"]}""",
486
+ "examples": None,
487
+ },
488
+ # text input with ICL example
489
+ {
490
+ "text": "John went to the restaurant with Mary. James went to the cinema.",
491
+ "template": """{"names": ["verbatim-string"], "female_names": ["verbatim-string"]}""",
492
+ "examples": [
493
+ {
494
+ "input": "Stephen is the manager at Susan's store.",
495
+ "output": """{"names": ["STEPHEN", "SUSAN"], "female_names": ["SUSAN"]}"""
496
+ }
497
+ ],
498
+ },
499
+ ]
500
+
501
+ input_messages = [
502
+ construct_message(
503
+ x["text"],
504
+ x["template"],
505
+ x["examples"]
506
+ ) for x in inputs
507
+ ]
508
+
509
+ images = [
510
+ ["0.jpg"],
511
+ ["0.jpg", "1.jpg"],
512
+ None,
513
+ None
514
+ ]
515
+
516
+ input_content = prepare_inputs(
517
+ messages=input_messages,
518
+ image_paths=images,
519
+ tokenizer=tokenizer,
520
+ )
521
+
522
+ generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
523
+
524
+ with torch.no_grad():
525
+ result = nuextract_generate(
526
+ model=model,
527
+ tokenizer=tokenizer,
528
+ prompts=input_content['prompts'],
529
+ pixel_values_list=input_content['pixel_values_list'],
530
+ num_patches_list=input_content['num_patches_list'],
531
+ generation_config=generation_config
532
+ )
533
+ for y in result:
534
+ print(y)
535
+ # {"store_name": "WAL*MART"}
536
+ # {"store_name": "Trader Joe's"}
537
+ # {"names": ["John", "Mary", "James"]}
538
+ # {"names": ["JOHN", "MARY", "JAMES"], "female_names": ["MARY"]}
539
+ ```
540
+
541
+ ## Template Generation
542
+ If you want to convert existing schema files you have in other formats (e.g. XML, YAML, etc.) or start from an example, NuExtract 2 models can automatically generate this for you.
543
+
544
+ E.g. convert XML into a NuExtract template:
545
+ ```python
546
+ def generate_template(description):
547
+ input_messages = [description]
548
+ input_content = prepare_inputs(
549
+ messages=input_messages,
550
+ image_paths=[],
551
+ tokenizer=tokenizer,
552
+ )
553
+ generation_config = {"do_sample": True, "temperature": 0.4, "max_new_tokens": 256}
554
+ with torch.no_grad():
555
+ result = nuextract_generate(
556
+ model=model,
557
+ tokenizer=tokenizer,
558
+ prompts=input_content['prompts'],
559
+ pixel_values_list=input_content['pixel_values_list'],
560
+ num_patches_list=input_content['num_patches_list'],
561
+ generation_config=generation_config
562
+ )
563
+ return result[0]
564
+ xml_template = """<SportResult>
565
+ <Date></Date>
566
+ <Sport></Sport>
567
+ <Venue></Venue>
568
+ <HomeTeam></HomeTeam>
569
+ <AwayTeam></AwayTeam>
570
+ <HomeScore></HomeScore>
571
+ <AwayScore></AwayScore>
572
+ <TopScorer></TopScorer>
573
+ </SportResult>"""
574
+ result = generate_template(xml_template)
575
+
576
+ print(result)
577
+ # {
578
+ # "SportResult": {
579
+ # "Date": "date-time",
580
+ # "Sport": "verbatim-string",
581
+ # "Venue": "verbatim-string",
582
+ # "HomeTeam": "verbatim-string",
583
+ # "AwayTeam": "verbatim-string",
584
+ # "HomeScore": "integer",
585
+ # "AwayScore": "integer",
586
+ # "TopScorer": "verbatim-string"
587
+ # }
588
+ # }
589
+ ```
590
+
591
+ E.g. generate a template from natural language description:
592
+ ```python
593
+ text = """Give me relevant info about startup companies mentioned."""
594
+ result = generate_template(text)
595
+
596
+ print(result)
597
+ # {
598
+ # "Startup_Companies": [
599
+ # {
600
+ # "Name": "verbatim-string",
601
+ # "Products": [
602
+ # "string"
603
+ # ],
604
+ # "Location": "verbatim-string",
605
+ # "Company_Type": [
606
+ # "Technology",
607
+ # "Finance",
608
+ # "Health",
609
+ # "Education",
610
+ # "Other"
611
+ # ]
612
+ # }
613
+ # ]
614
+ # }
615
+ ```