Spaces:

Nitzz4952
/

TextSummarizer

Sleeping

App Files Files Community

Nitzz4952 commited on Nov 25, 2024

Commit

c41a775

verified ·

1 Parent(s): 2b7c14a

Update app.py

Browse files

Files changed (1) hide show

app.py +95 -79

app.py CHANGED Viewed

@@ -7,104 +7,120 @@ import math
 # Constants
 TOKEN_LIMIT = 1024  # Maximum tokens the model can handle
-# Path to the model
 model_path = "sshleifer/distilbart-cnn-12-6"
 text_summary = pipeline("summarization", model=model_path, torch_dtype=torch.bfloat16)
 tokenizer = AutoTokenizer.from_pretrained(model_path)
 # Function to summarize text
-def summarize_text(input_text, min_length=50, max_length=150):
     summary_output = text_summary(
         input_text,
-        min_length=min_length,
-        max_length=max_length
     )
     return summary_output[0]['summary_text']
-# Article Example
-article_example = """Niteesh Nigam: A Visionary in Robotics, Machine Vision, and AI
-Niteesh Nigam, a forward-thinking robotics engineer and AI developer, has consistently demonstrated his passion for innovation and his ability to transform complex technological concepts into impactful real-world solutions. With a Master’s degree in Robotics and Autonomous Systems from Arizona State University (ASU) and a Bachelor’s degree in Mechanical Engineering from the Birla Institute of Technology and Science, Pilani Dubai, Niteesh has cultivated a robust foundation in engineering, computer vision, and artificial intelligence. His interdisciplinary expertise and hands-on experience have made him a standout professional in fields such as robotics, machine vision, deep learning, and control systems.
-### Educational and Technical Expertise
-Niteesh’s academic journey is marked by excellence in courses that form the backbone of modern robotics and AI, including Machine Vision and Pattern Recognition, Applied Machine Learning, and Artificial Neural Computation. These courses have equipped him with an advanced understanding of statistical methods, machine learning algorithms, and their applications in robotics and autonomous systems.
-His technical arsenal includes a diverse array of tools and frameworks like ROS2, Gazebo, Docker, PyTorch, TensorFlow, OpenCV, and AWS. Proficient in Python, MATLAB, and C++, Niteesh is skilled in designing and deploying solutions for robotics, automation, and data-driven systems. His hands-on approach is complemented by certifications, such as the ROS2 Level 2 certification and Stanford’s Machine Learning Specialization, which highlight his commitment to staying at the cutting edge of his field.
-### Expertise in Robotics and Control Systems
-Niteesh’s work in robotics exemplifies his ability to design, develop, and deploy innovative systems. His Autonomous Warehouse System Project is a testament to his expertise in robotic manipulation and motion planning. In this project, he integrated 3D point cloud-based object localization and barcode tracking with ROS2 and OpenCV to automate package management tasks. Using custom YAML and SDF models in Gazebo, Niteesh designed precise robotic motion with suction grippers, achieving seamless pick-and-place operations.
-Additionally, his contributions to line-follower drone systems underscore his proficiency in control systems. Niteesh developed a wind-resistance module capable of handling up to 30 m/s winds, achieving 82% path accuracy. This work demonstrates his mastery of trajectory planning and his ability to merge simulation fidelity with real-world performance, enhancing stability and efficiency.
-### Pioneering Work in Machine Vision and Deep Learning
-Niteesh’s expertise in machine vision and deep learning is highlighted through groundbreaking projects like the YOLO-Based Unified Framework for Real-Time Vehicle Profiling. By leveraging CUDA during training, he accelerated model convergence, achieving an 83% profiling accuracy and 94% license plate recognition. His innovative approach combined EasyOCR, Deep SORT, and advanced color-matching algorithms to ensure a 90% tracking accuracy for vehicle profiling.
-In the field of semantic segmentation, his project Evaluation of Encoder-Decoder Strategies explored architectures like ResNet50-PPM and EfficientNet-DeepLabV3. By optimizing neural networks for feature extraction, Niteesh ensured real-time segmentation speeds up to 15 fps with a mean IoU of 42.14%, setting benchmarks for accuracy and efficiency.
-### Advancements in NLP and Large Language Models
-Niteesh’s role as an AI Developer at YourBeat Inc. showcases his expertise in NLP and large language models (LLMs). He spearheaded the design of a PostgreSQL database strategy, consolidating multi-source data for accurate analysis and fine-tuning chatbot models. His work involved preprocessing over a million Reddit posts and YouTube transcripts using Python scripts in Dockerized environments, ensuring robust and scalable data pipelines.
-Additionally, his research on chatbot platforms like Rasa and Microsoft Bot Framework has informed strategic AI development, enabling more intuitive and flexible conversational systems tailored to user needs.
-### Noteworthy Projects and Patents
-Niteesh’s innovation extends beyond academic and professional achievements. His Panorama Image Stitching Tool, developed using OpenCV, SIFT, and RANSAC, achieved a remarkable 95% stitching accuracy, enhancing image processing capabilities. By containerizing the tool with Docker and integrating a Flask web interface, he showcased his ability to combine backend efficiency with user-friendly interfaces.
-Furthermore, Niteesh is a co-inventor of a Hand Grasp Stimulating Device, provisionally patented to assist patients with spinal cord injuries. This groundbreaking neuro-rehabilitation glove reflects his commitment to leveraging technology for improving lives.
-### Professional Experience and Industry Contributions
-Niteesh’s professional journey reflects his ability to drive impactful change. At YourBeat Inc., he managed end-to-end data acquisition, preprocessing, and analysis, building the foundation for advanced AI systems. His expertise in AWS, PostgreSQL, and GitHub ensured the seamless integration of tools and frameworks for efficient workflow management.
-His tenure at KHK Scaffolding and Formwork Ltd. demonstrated his versatility as an engineer. By integrating robotic arms into welding automation, he improved weld quality by 15% and increased production output by 10%, saving over 200 man-hours monthly. His ability to lead cross-departmental initiatives highlights his collaborative and leadership skills.
-### Vision for the Future
-Niteesh Nigam’s work embodies a perfect blend of technical mastery, innovation, and a relentless drive to solve complex problems. Whether it’s optimizing robotic systems, advancing machine vision techniques, or harnessing the power of LLMs for conversational AI, Niteesh continues to push the boundaries of what’s possible in technology.
-As he looks to the future, Niteesh remains committed to making meaningful contributions in robotics, AI, and automation, with a focus on scalable solutions that benefit industries and communities alike. His journey is a testament to the transformative potential of technology when guided by a visionary like him.
-"""
-# Precomputed Summary
-summary_example = summarize_text(article_example, min_length=50, max_length=150)
 # Create Gradio interface
 with gr.Blocks() as demo:
-    gr.Markdown("## **Niteesh Nigam Portfolio: Summarization Showcase**")
-    gr.Markdown(
-        "This app showcases the ability to summarize complex content into concise information. Below is an example:"
-    )
-    # Input Text
-    input_text = gr.Textbox(label="Input Text", lines=10)
-    # Button to Summarize
     summarize_button = gr.Button("Summarize")
-    # Output Summary
     summary_output = gr.Textbox(label="Summary Output", lines=10)
-    # Add examples using Gradio's example feature
-    gr.Examples(
-        examples=[
-            [article_example],  # Input text example
-        ],
-        inputs=input_text,
-        outputs=summary_output,
-        fn=summarize_text,
-        label="Example: Summarizing a Detailed Article"
-    )
-    # Button click to summarize user-provided input
-    summarize_button.click(
-        summarize_text,
-        inputs=[input_text],
-        outputs=[summary_output]
     )
 # Launch the Gradio app
 demo.launch(share=False)

 # Constants
 TOKEN_LIMIT = 1024  # Maximum tokens the model can handle
+# Model path and pipeline initialization
 model_path = "sshleifer/distilbart-cnn-12-6"
 text_summary = pipeline("summarization", model=model_path, torch_dtype=torch.bfloat16)
 tokenizer = AutoTokenizer.from_pretrained(model_path)
+# Function to process and tokenize text
+def tokenize_text(input_text):
+    return tokenizer.encode(input_text, truncation=False)
 # Function to summarize text
+def process_batches(input_text):
+    # Tokenize the input text
+    tokens = tokenize_text(input_text)
+    text_tokens = len(tokens)
+    if text_tokens < 0.05 * TOKEN_LIMIT:
+        return "Error: Text too small to summarize."
+    # Batch calculation
+    batches = math.ceil(text_tokens / TOKEN_LIMIT)
+    summaries = []
+    while batches > 1:
+        # Split text into approximately equal batches using sent_tokenize
+        sentences = sent_tokenize(input_text)
+        avg_batch_size = math.ceil(len(sentences) / batches)
+        text_batches = [
+            " ".join(sentences[i:i+avg_batch_size])
+            for i in range(0, len(sentences), avg_batch_size)
+        ]
+        # Process each batch
+        batch_summaries = []
+        for batch in text_batches:
+            max_length = int((TOKEN_LIMIT / batches) * 0.9)
+            min_length = int((TOKEN_LIMIT / batches) * 0.5)
+            summary_output = text_summary(
+                batch,
+                min_length=min_length,
+                max_length=max_length
+            )
+            batch_summaries.append(summary_output[0]['summary_text'])
+        # Stitch all batch summaries
+        input_text = " ".join(batch_summaries)
+        tokens = tokenize_text(input_text)
+        text_tokens = len(tokens)
+        batches = math.ceil(text_tokens / TOKEN_LIMIT)
+    # Final check for short text
+    if text_tokens < 0.05 * TOKEN_LIMIT:
+        return "Error: Text too small to summarize."
+    return input_text
+# Gradio button to set max/min lengths
+user_lengths = {"min_length": None, "max_length": None}
+def set_max_min_lengths(latest_text):
+    tokens = tokenize_text(latest_text)
+    text_tokens = len(tokens)
+    max_range = int(0.08 * text_tokens)
+    min_range = int(0.02 * text_tokens)
+    return f"Set max_length between {min_range} and {max_range} tokens."
+def validate_lengths(min_length, max_length, text):
+    if not (isinstance(min_length, int) and isinstance(max_length, int)):
+        return "Error: Length values must be integers."
+    if min_length <= 0 or max_length <= 0 or min_length >= max_length:
+        return "Error: Invalid length values. Ensure min_length < max_length and both are positive."
+    user_lengths["min_length"] = min_length
+    user_lengths["max_length"] = max_length
+    return "Done"
+# Function to summarize final text
+def summarize_text(input_text):
+    if user_lengths["min_length"] is None or user_lengths["max_length"] is None:
+        return "Error: Please set valid max and min lengths first."
     summary_output = text_summary(
         input_text,
+        min_length=user_lengths["min_length"],
+        max_length=user_lengths["max_length"]
     )
     return summary_output[0]['summary_text']
+# Close any existing Gradio interfaces
+gr.close_all()
 # Create Gradio interface
 with gr.Blocks() as demo:
+    text_input = gr.Textbox(label="Input Text", lines=10)
+    analyze_button = gr.Button("Analyze Text")
+    analysis_output = gr.Textbox(label="Analysis Result")
+    max_length_input = gr.Number(label="Max Length")
+    min_length_input = gr.Number(label="Min Length")
+    set_lengths_button = gr.Button("Set Max and Min Lengths")
+    length_result = gr.Textbox(label="Length Validation Result")
     summarize_button = gr.Button("Summarize")
     summary_output = gr.Textbox(label="Summary Output", lines=10)
+    def analyze_and_set_text(input_text):
+        latest_text = process_batches(input_text)
+        if latest_text.startswith("Error"):
+            return latest_text, ""
+        return latest_text, set_max_min_lengths(latest_text)
+    analyze_button.click(analyze_and_set_text, inputs=text_input, outputs=[analysis_output, length_result])
+    set_lengths_button.click(
+        validate_lengths,
+        inputs=[min_length_input, max_length_input, analysis_output],
+        outputs=length_result
     )
+    summarize_button.click(summarize_text, inputs=analysis_output, outputs=summary_output)
 # Launch the Gradio app
 demo.launch(share=False)