Spaces:

numind
/

NuExtract-2.0

Sleeping

App Files Files Community

Alexandre-Numind commited on 5 days ago

Commit

35ae9b2

verified ·

1 Parent(s): 9134640

Update app.py

Browse files

Files changed (1) hide show

app.py +83 -60

app.py CHANGED Viewed

@@ -8,6 +8,7 @@ model = AutoModelForVision2Seq.from_pretrained(
     model_name,
     trust_remote_code=True,
     torch_dtype=torch.bfloat16,
     device_map="auto",
 )
 processor = AutoProcessor.from_pretrained(
@@ -91,6 +92,7 @@ with gr.Blocks(title="NuExtract – zero-shot structured extraction") as demo:
   <meta charset="UTF-8" />
   <meta name="viewport" content="width=device-width, initial-scale=1.0" />
   <title>NuExtract-2 Overview</title>
   <style>
     img   { display:block; margin-bottom:1rem; }
     ul    { margin:1rem 0; padding-left:1.5rem; }
@@ -98,83 +100,104 @@ with gr.Blocks(title="NuExtract – zero-shot structured extraction") as demo:
     a:hover { text-decoration:underline; }
     h1,h2 { margin:0 0 .5rem 0; font-weight:600; }
     pre   { overflow-x:auto; border-radius:6px; padding:1rem; }
-    code  { border-radius:4px; padding:.2em .4em; font-family:monospace; }
     html[data-theme="dark"],
     @media (prefers-color-scheme: dark) {
-      body { background-color:#1e1e1e;}
-      code { background-color: #2d2d2d;}
-      pre  { background-color:#2a2a2a;}
     }
     html[data-theme="light"],
     @media (prefers-color-scheme: light) {
-      body { background-color#ffffff;}
-      code { background-color#f5f5f5;}
-      pre  { background-color#f5f5f5;}
     }
-</style>
 </head>
 <body>
-<p align="center">
     <a href="https://nuextract.ai/">
-        <img src="https://cdn.prod.website-files.com/638364a4e52e440048a9529c/64188f405afcf42d0b85b926_logo_numind_final.png"
-             alt="NuMind Logo" style="width: 200px; height: 50px;" />
     </a>
-</p>
-<p align="center">
-    🖥️ <a href="https://nuextract.ai/">API / Platform</a>&nbsp&nbsp | &nbsp&nbsp📑 <a href="https://numind.ai/blog">Blog</a>&nbsp&nbsp | &nbsp&nbsp🗣️ <a href="https://discord.gg/3tsEtJNCDe">Discord</a> &nbsp&nbsp | &nbsp&nbsp🛠️  <a href="https://github.com/numindai/nuextract">Github</a>
-</p>
-<section>
-  <h3> This space is a demo for <a href="https://huggingface.co/numind/NuExtract-2.0-4B" target="_blank">NuExtract-2.0-4B</a> </h3>
-  <h3> You can also check: <a href="https://huggingface.co/numind/NuExtract-2.0-2B" target="_blank">NuExtract-2.0-2B</a> and <a href="https://huggingface.co/numind/NuExtract-2.0-8B" target="_blank">NuExtract-2.0-8B</a> and our top performing model via the <a href="https://nuextract.ai/">API / Platform</a> </h3>
-    <h1>NuExtreact-2.0</h1>
-  <p>NuExtract 2.0 is a family of models trained specifically for structured information extraction tasks. It supports both multimodal inputs and is multilingual.</p>
-<p> To use the model, provide an input text/image and a JSON template describing the information you need to extract. The template should be a JSON object, specifying field names and their expected type. </p>
-  <article>
-    <h3>Supported Template Types</h3>
-    <ul>
-      <li><code>verbatim-string</code> — extract text exactly as it appears.</li>
-      <li><code>string</code> — generic text, with possible paraphrasing.</li>
-      <li><code>integer</code> — whole number.</li>
-      <li><code>number</code> — decimal or whole number.</li>
-      <li><code>date-time</code> — ISO 8601 date format.</li>
-      <li><code>boolean</code> — True or False.</li>
-      <li>Array of any type above (e.g. <code>["string"]</code>).</li>
-      <li><code>enum</code> — one value from a predefined list (e.g. <code>["yes", "no", "maybe"]</code>).</li>
-      <li><code>multi-label</code> — multiple values from a list (e.g. <code>[["A", "B", "C"]]</code>).</li>
-    </ul>
-    <p> You can specify any nested strucure, such as object inside object or list of object </p>
-    <p>If no relevant information is found, the model returns <code>null</code> or <code>[]</code>.</p>
-  </article>
-  <article>
-    <h3>Example Template</h3>
-    <pre><code>{
   "first_name": "verbatim-string",
-  "last_name": "verbatim-string",
   "description": "string",
-  "age": "integer",
   "classes": [
-      {
-          "name" : "verbatim-string",
-          "professors" : ["verbatim-string"],
-          "gpa": number",
-      }
   ],
   "average_gpa": "number",
-  "birth_date": "date-time",
   "nationality": ["France", "England", "Japan", "USA", "China"],
   "languages_spoken": [["English", "French", "Japanese", "Mandarin", "Spanish"]]
 }</code></pre>
-  </article>
-  <strong>You can also provide a description of what you want to extract, use a non-JSON format (e.g. YAML, Pydantic) or even an example of input text. The model will automatically update the template field and generate a compatible JSON template based on our typing system.</strong>
-</section>
-<br>
-<section>
-  <ul><h4><strong>Model used in this demo:</strong> <a href="https://huggingface.co/numind/NuExtract-2.0-4B" target="_blank">NuExtract-2.0-4B</a></h4></ul>
-  <i>⚠️ This demo restricts inputs to 10,000 tokens</i>
-</section>
 </body>
 </html>
 """)
@@ -191,7 +214,7 @@ with gr.Blocks(title="NuExtract – zero-shot structured extraction") as demo:
     example_data = [
         [
-            "data/affiche.jpg",      # image file
             "",                          # no text
             """{
     "movie_name": "verbatim-string",

     model_name,
     trust_remote_code=True,
     torch_dtype=torch.bfloat16,
+    attn_implementation="flash_attention_2",
     device_map="auto",
 )
 processor = AutoProcessor.from_pretrained(
   <meta charset="UTF-8" />
   <meta name="viewport" content="width=device-width, initial-scale=1.0" />
   <title>NuExtract-2 Overview</title>
   <style>
     img   { display:block; margin-bottom:1rem; }
     ul    { margin:1rem 0; padding-left:1.5rem; }
     a:hover { text-decoration:underline; }
     h1,h2 { margin:0 0 .5rem 0; font-weight:600; }
     pre   { overflow-x:auto; border-radius:6px; padding:1rem; }
+    code  { border-radius:1px; padding:.1em .1em; font-family:monospace; }
+    /* ——— Dark / light themes ——— */
     html[data-theme="dark"],
     @media (prefers-color-scheme: dark) {
+      body { background-color:#1e1e1e; }
+      code { background-color:#2d2d2d; }
+      pre  { background-color:#2a2a2a; }
     }
     html[data-theme="light"],
     @media (prefers-color-scheme: light) {
+      body { background-color:#ffffff; }
+      code { background-color:#f5f5f5; }
+      pre  { background-color:#f5f5f5; }
+    }
+    /* ——— NEW: put the two articles side-by-side ——— */
+    .template-container {
+      display: flex;
+      flex-wrap: wrap;           /* stacks on small screens */
+      gap: 2rem;
+      margin-top: 1rem;
+    }
+    .template-container article {
+      flex: 1 1 320px;           /* grow / shrink with a sensible min width */
+      min-width: 280px;
     }
+  </style>
 </head>
 <body>
+  <p align="center">
     <a href="https://nuextract.ai/">
+      <img src="https://cdn.prod.website-files.com/638364a4e52e440048a9529c/64188f405afcf42d0b85b926_logo_numind_final.png"
+           alt="NuMind Logo" style="width:200px;height:50px;" />
     </a>
+  </p>
+  <p align="center">
+    🖥️ <a href="https://nuextract.ai/">API / Platform</a>&nbsp;|&nbsp;📑 <a href="https://numind.ai/blog">Blog</a>&nbsp;|&nbsp;🗣️ <a href="https://discord.gg/3tsEtJNCDe">Discord</a>&nbsp;|&nbsp;🛠️ <a href="https://github.com/numindai/nuextract">Github</a>
+  </p>
+  <section>
+    <h3>This space is a demo for <a href="https://huggingface.co/numind/NuExtract-2.0-4B" target="_blank">NuExtract-2.0-4B</a></h3>
+    <h3>You can also check: <a href="https://huggingface.co/numind/NuExtract-2.0-2B" target="_blank">NuExtract-2.0-2B</a> and <a href="https://huggingface.co/numind/NuExtract-2.0-8B" target="_blank">NuExtract-2.0-8B</a> and our top-performing model via the <a href="https://nuextract.ai/">API / Platform</a></h3>
+    <h1>NuExtract-2.0</h1>
+    <p>NuExtract 2.0 is a family of models trained specifically for structured information extraction tasks. It supports both multimodal inputs and is multilingual.</p>
+    <p>To use the model, provide an input text/image and a JSON template describing the information you need to extract. The template should be a JSON object, specifying field names and their expected type.</p>
+    <!-- ------------- SIDE-BY-SIDE CONTAINER ------------- -->
+    <div class="template-container">
+      <!-- Supported Template Types -->
+      <article>
+        <h3>Supported Template Types</h3>
+        <ul>
+          <li><code>verbatim-string</code> — extract text exactly as it appears.</li>
+          <li><code>string</code> — generic text, with possible paraphrasing.</li>
+          <li><code>integer</code> — whole number.</li>
+          <li><code>number</code> — decimal or whole number.</li>
+          <li><code>date-time</code> — ISO 8601 date format.</li>
+          <li><code>boolean</code> — True or False.</li>
+          <li>Array of any type above (e.g. <code>["string"]</code>).</li>
+          <li><code>enum</code> — one value from a predefined list (e.g. <code>["yes", "no", "maybe"]</code>).</li>
+          <li><code>multi-label</code> — multiple values from a list (e.g. <code>[["A", "B", "C"]]</code>).</li>
+        </ul>
+        <p>You can specify any nested structure, such as an object inside an object or a list of objects. If no relevant information is found, the model returns <code>null</code> or <code>[]</code>.</p>
+      </article>
+      <!-- Example Template -->
+      <article>
+        <h3>Example Template</h3>
+<pre><code>{
   "first_name": "verbatim-string",
+  "last_name":  "verbatim-string",
   "description": "string",
+  "age":        "integer",
   "classes": [
+    {
+      "name":       "verbatim-string",
+      "professors": ["verbatim-string"],
+      "gpa":        "number"
+    }
   ],
   "average_gpa": "number",
+  "birth_date":  "date-time",
   "nationality": ["France", "England", "Japan", "USA", "China"],
   "languages_spoken": [["English", "French", "Japanese", "Mandarin", "Spanish"]]
 }</code></pre>
+      </article>
+    </div><!-- /.template-container -->
+    <br>
+    <strong>You can also provide a description of what you want to extract, use a non-JSON format (e.g. YAML, Pydantic) or even an example of input text. The model will automatically update the template field and generate a compatible JSON template based on our typing system.</strong>
+  </section>
+  <br>
+  <section>
+    <ul><h4><strong>Model used in this demo:</strong> <a href="https://huggingface.co/numind/NuExtract-2.0-4B" target="_blank">NuExtract-2.0-4B</a></h4></ul>
+    <i>⚠️ This demo restricts inputs to 10,000 tokens</i>
+  </section>
 </body>
 </html>
 """)
     example_data = [
         [
+            "examples/affiche.jpg",      # image file
             "",                          # no text
             """{
     "movie_name": "verbatim-string",