added possibility for video processing via link

Browse files

Files changed (13) hide show

README.md +76 -123
analyzer.py +9 -0
detector.py +129 -0
downloader_manager.py +50 -0
flie_processor.py +23 -0
handler.py +23 -11
imgcomparison.py +106 -0
mediaoutput.py +196 -0
requirements.txt +0 -0
slides.py +177 -0
sorter.py +109 -0
sources.py +32 -0
timeline.py +166 -0

README.md CHANGED Viewed

@@ -1,140 +1,93 @@
----
-language:
-  - en
-  - zh
-  - de
-  - es
-  - ru
-  - ko
-  - fr
-  - ja
-  - pt
-  - tr
-  - pl
-  - ca
-  - nl
-  - ar
-  - sv
-  - it
-  - id
-  - hi
-  - fi
-  - vi
-  - he
-  - uk
-  - el
-  - ms
-  - cs
-  - ro
-  - da
-  - hu
-  - ta
-  - 'no'
-  - th
-  - ur
-  - hr
-  - bg
-  - lt
-  - la
-  - mi
-  - ml
-  - cy
-  - sk
-  - te
-  - fa
-  - lv
-  - bn
-  - sr
-  - az
-  - sl
-  - kn
-  - et
-  - mk
-  - br
-  - eu
-  - is
-  - hy
-  - ne
-  - mn
-  - bs
-  - kk
-  - sq
-  - sw
-  - gl
-  - mr
-  - pa
-  - si
-  - km
-  - sn
-  - yo
-  - so
-  - af
-  - oc
-  - ka
-  - be
-  - tg
-  - sd
-  - gu
-  - am
-  - yi
-  - lo
-  - uz
-  - fo
-  - ht
-  - ps
-  - tk
-  - nn
-  - mt
-  - sa
-  - lb
-  - my
-  - bo
-  - tl
-  - mg
-  - as
-  - tt
-  - haw
-  - ln
-  - ha
-  - ba
-  - jw
-  - su
-tags:
-  - audio
-  - automatic-speech-recognition
-license: mit
-library_name: ctranslate2
----
-# Whisper large-v2 model for CTranslate2
-This repository contains the conversion of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format.
-This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/guillaumekln/faster-whisper).
-## Example
 ```python
-from faster_whisper import WhisperModel
-model = WhisperModel("large-v2")
-segments, info = model.transcribe("audio.mp3")
-for segment in segments:
-    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
 ```
-## Conversion details
-The original model was converted with the following command:
 ```
-ct2-transformers-converter --model openai/whisper-large-v2 --output_dir faster-whisper-large-v2 \
-    --copy_files tokenizer.json --quantization float16
 ```
-Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the [`compute_type` option in CTranslate2](https://opennmt.net/CTranslate2/quantization.html).
-## More information
-**For more information about the original model, see its [model card](https://huggingface.co/openai/whisper-large-v2).**

+# Advanced Speech Processing with faster-whisper
+Welcome to the advanced speech processing utility leveraging the powerful Whisper large-v2 model for the CTranslate2
+framework. This tool is designed for high-performance speech recognition and processing, supporting a wide array of
+languages and the capability to handle video inputs for slide detection and audio transcription.
+## Features
+- **Language Support**: Extensive language support covering major global languages for speech recognition tasks.
+- **Video Processing**: Download MP4 files from links and extract audio content for transcription.
+- **Slide Detection**: Detect and sort presentation slides from video lectures or meetings.
+- **Audio Transcription**: Leverage the Whisper large-v2 model to transcribe audio content with high accuracy.
+## Getting Started
+To begin using this utility, set up the `WhisperModel` from the `faster_whisper` package with the provided language
+configurations. The `EndpointHandler` class is your main interface for processing the data.
+### Example Usage
 ```python
+import requests
+import os
+# Sample data dict with the link to the video file and the desired language for transcription
+DATA = {
+    "inputs": "<base64_encoded_audio_string>",
+    "link": "<your_mp4_video_link>",
+    "language": "en",  # Choose from supported languages
+    "task": "transcribe",
+    "type": "audio"  # Use "link" for video files
+}
+HF_ACCESS_TOKEN = os.environ.get("HF_TRANSCRIPTION_ACCESS_TOKEN")
+API_URL = os.environ.get("HF_TRANSCRIPTION_ENDPOINT")
+HEADERS = {
+    "Authorization": HF_ACCESS_TOKEN,
+    "Content-Type": "application/json"
+}
+response = requests.post(API_URL, headers=HEADERS, json=DATA)
+print(response)
+# The response will contain transcribed audio and detected slides if a video link was provided
 ```
+### Processing Video Files
+To process video files, the `process_video` function downloads the MP4 file, extracts the audio, and passes it to the
+Whisper model for transcription. It also utilizes the `Detector` and `SlideSorter` classes to identify and sort
+presentation slides within the video.
+### Error Handling
+Comprehensive logging and error handling are in place to ensure you're informed of each step's success or failure.
+## Installation
+Ensure that you have the following dependencies installed:
+```plaintext
+opencv-python~=4.8.1.78
+numpy~=1.26.1
+Pillow~=10.0.1
+tqdm~=4.66.1
+requests~=2.31.0
+moviepy~=1.0.3
+scipy~=1.11.3
 ```
+Install them using pip with the provided `requirements.txt` file:
+```bash
+pip install -r requirements.txt
 ```
+## Languages Supported
+This tool supports a plethora of languages, making it highly versatile for global applications. The full list of
+supported languages can be found in the `language` section of the old README.
+## License
+This project is available under the MIT license.
+## More Information
+For more information about the original Whisper large-v2 model, please refer to
+its [model card on Hugging Face](https://huggingface.co/openai/whisper-large-v2).
+---

analyzer.py ADDED Viewed

	@@ -0,0 +1,9 @@

+from abc import ABCMeta, abstractmethod
+class Analyzer(object):
+    __metaclass__ = ABCMeta
+    @abstractmethod
+    def analyze(self):
+        pass

detector.py ADDED Viewed

	@@ -0,0 +1,129 @@

+# -*- coding: utf-8 -*-
+import argparse
+import cProfile
+import pstats
+import cv2
+from tqdm import tqdm
+import imgcomparison
+import mediaoutput
+import timeline
+from analyzer import Analyzer
+from slides import Slide
+class InfiniteCounter(object):
+    """
+    InfiniteCounter is a class that represents a counter that will
+    return the next number indefinitely. When the user calls count()
+    return the current number. Then it will increment the current
+    number by the specified steps.
+    """
+    def __init__(self, start=0, step=1):
+        """
+        Default Initializer
+        :param start: the starting value of the counter
+        :param step: the amount that should be added at each step
+        """
+        self.current = start
+        self.step = step
+    def increment(self):
+        self.current += self.step
+    def count(self):
+        """
+        The count method yields the current number and then
+        increments the current number by the specified step in the
+        default initializer
+        :return: the successor from the previous number
+        """
+        while True:
+            yield self.current
+            self.current += self.step
+class Detector(Analyzer):
+    def __init__(self, device, outpath=None, fileformat=".png"):
+        cap = cv2.VideoCapture(sanitize_device(device))
+        self.sequence = timeline.Timeline(cap)
+        self.writer = mediaoutput.NullWriter()
+        if outpath is not None:
+            self.writer = mediaoutput.TimestampImageWriter(self.sequence.fps, outpath, fileformat)
+        self.comparator = imgcomparison.AbsDiffHistComparator(0.97)
+    def detect_slides(self):
+        frames = []
+        name_getter = mediaoutput.TimestampImageWriter(self.sequence.fps)
+        with tqdm(total=self.sequence.len, desc='Detecting Slides: ') as pbar:
+            for i, frame in self.check_transition():
+                if frame is not None:
+                    frames.append(Slide(name_getter.next_name([i]), frame))
+                pbar.update(1)
+        self.sequence.release_stream()
+        return frames
+    def check_transition(self):
+        prev_frame = self.sequence.next_frame()
+        self.writer.write(prev_frame, 0)
+        yield 0, prev_frame
+        frame_counter = InfiniteCounter()
+        for frame_count in frame_counter.count():
+            frame = self.sequence.next_frame()
+            if frame is None:
+                break
+            elif not self.comparator.are_same(prev_frame, frame):
+                while True:
+                    if self.comparator.are_same(prev_frame, frame):
+                        break
+                    prev_frame = frame
+                    frame = self.sequence.next_frame()
+                    frame_counter.increment()
+                self.writer.write(frame, frame_count)
+                yield frame_count, frame
+            prev_frame = frame
+            yield frame_count, None
+    def analyze(self):
+        for i, frame in self.check_transition():
+            time = mediaoutput.TimestampImageWriter(self.sequence.fps).next_name([i])
+            yield Slide(time, frame)
+def sanitize_device(device):
+    """returns device id if device can be converted to an integer"""
+    try:
+        return int(device)
+    except (TypeError, ValueError):
+        return device
+if __name__ == "__main__":
+    Parser = argparse.ArgumentParser(description="Slide Detector")
+    Parser.add_argument("-d", "--device", help="video device number or path to video file")
+    Parser.add_argument("-o", "--outpath", help="path to output video file", default="slides/", nargs='?')
+    Parser.add_argument("-f", "--fileformat", help="file format of the output images e.g. '.jpg'",
+                        default=".jpg", nargs='?')
+    Args = Parser.parse_args()
+    def run():
+        detector = Detector(Args.device, Args.outpath, Args.fileformat)
+        detector.detect_slides()
+    cProfile.run('run()', 'profiling_stats.prof')
+    p = pstats.Stats('profiling_stats.prof')
+    p.sort_stats('cumulative').print_stats(10)

downloader_manager.py ADDED Viewed

	@@ -0,0 +1,50 @@

+import logging
+import os
+import tempfile
+from io import BytesIO
+import requests
+from moviepy.editor import VideoFileClip
+from tqdm import tqdm
+def download_mp4_and_extract_audio(link: str):
+    """Download an MP4 file from a given link and return the video and audio content as bytes."""
+    logging.info("Starting the download of the MP4 file...")
+    try:
+        r = requests.get(link, stream=True)
+        r.raise_for_status()
+        total_size = int(r.headers.get('content-length', 0))
+        video_content = BytesIO()
+        with tqdm(total=total_size, unit='B', unit_scale=True, desc="Downloading...") as bar:
+            for data in r.iter_content(chunk_size=1024):
+                bar.update(len(data))
+                video_content.write(data)
+            with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as temp_video_file:
+                temp_video_file.write(video_content.getvalue())
+                temp_video_file_path = temp_video_file.name
+            logging.info("Extracting audio from video...")
+            with VideoFileClip(temp_video_file_path) as video:
+                audio = video.audio
+                with tempfile.NamedTemporaryFile(suffix=".aac", delete=False) as temp_audio_file:
+                    audio.write_audiofile(temp_audio_file.name, codec='aac')
+                    temp_audio_file_path = temp_audio_file.name
+                with open(temp_audio_file_path, 'rb') as f:
+                    audio_content = BytesIO(f.read())
+            os.remove(temp_video_file_path)
+            os.remove(temp_audio_file_path)
+            logging.info("Download and audio extraction completed")
+            return video_content.getvalue(), audio_content.getvalue()
+    except requests.exceptions.HTTPError as e:
+        logging.error(f"HTTP Error: {e}")
+    except Exception as e:
+        logging.error(f"Failed to download MP4 and extract audio: {e}")
+        return None, None

flie_processor.py ADDED Viewed

	@@ -0,0 +1,23 @@

+import logging
+import tempfile
+import sources
+from detector import Detector
+from downloader_manager import download_mp4_and_extract_audio
+from sorter import SlideSorter
+def process_video(link):
+    try:
+        video_bytes, audio_bytes = download_mp4_and_extract_audio(link)
+        with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as temp_video:
+            temp_video.write(video_bytes)
+            temp_video_path = temp_video.name
+        detector = Detector(temp_video_path)
+        sorter = SlideSorter(sources.ListSource(detector.detect_slides()), outpath="sorted_slides/")
+        slides = sorter.sort()
+        return slides, audio_bytes
+    except Exception as e:
+        logging.exception("Failed to execute sorter: %s", e)

handler.py CHANGED Viewed

@@ -1,9 +1,12 @@
-import io
 import base64
-from faster_whisper import WhisperModel
 import logging
-logging.basicConfig(level=logging.DEBUG)
 class EndpointHandler:
@@ -11,21 +14,28 @@ class EndpointHandler:
         self.model = WhisperModel("large-v2", num_workers=30)
     def __call__(self, data: dict[str, str]):
-        # process inputs
-        inputs = data.pop("inputs", data)
         language = data.pop("language", "de")
         task = data.pop("task", "transcribe")
-        # Decode base64 string to bytes
-        audio_bytes_decoded = base64.b64decode(inputs)
-        logging.debug(f"Decoded Bytes Length: {len(audio_bytes_decoded)}")
-        audio_bytes = io.BytesIO(audio_bytes_decoded)
         # run inference pipeline
         logging.info("Running inference...")
         segments, info = self.model.transcribe(audio_bytes, language=language, task=task)
-        # postprocess the prediction
         full_text = []
         for segment in segments:
             full_text.append({"segmentId": segment.id,
@@ -40,4 +50,6 @@ class EndpointHandler:
                 logging.info("segment " + str(segment.id) + " transcribed")
         logging.info("Inference completed.")
-        return full_text

 import base64
+import io
 import logging
+from faster_whisper import WhisperModel
+from flie_processor import process_video
+logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
 class EndpointHandler:
         self.model = WhisperModel("large-v2", num_workers=30)
     def __call__(self, data: dict[str, str]):
+        inputs = data.pop("inputs")
+        link = data.pop("link")
         language = data.pop("language", "de")
         task = data.pop("task", "transcribe")
+        processing_type = data.pop("type", "audio")
+        response = {}
+        if processing_type == "link":
+            slides, audio_bytes = process_video(link)
+            slides_list = [slide.to_dict() for slide in slides]
+            response.update({"slides": slides_list})
+        else:
+            audio_bytes_decoded = base64.b64decode(inputs)
+            logging.debug(f"Decoded Bytes Length: {len(audio_bytes_decoded)}")
+            audio_bytes = io.BytesIO(audio_bytes_decoded)
         # run inference pipeline
         logging.info("Running inference...")
         segments, info = self.model.transcribe(audio_bytes, language=language, task=task)
         full_text = []
         for segment in segments:
             full_text.append({"segmentId": segment.id,
                 logging.info("segment " + str(segment.id) + " transcribed")
         logging.info("Inference completed.")
+        response.update({"audios": full_text})
+        logging.debug(response)
+        return response

imgcomparison.py ADDED Viewed

	@@ -0,0 +1,106 @@

+import cv2
+import numpy as np
+from abc import ABCMeta, abstractmethod
+import operator
+import scipy.spatial.distance as dist
+class ImageComparator(object):
+    __metaclass__ = ABCMeta
+    @abstractmethod
+    def are_similar(self, first, second):
+        pass
+    def __init__(self, threshold):
+        self.threshold = threshold
+    def are_same(self, first, second, op=operator.ge):
+        return op(self.are_similar(first, second), self.threshold)
+class AbsDiffHistComparator(ImageComparator):
+    def __init__(self, threshold):
+        super(AbsDiffHistComparator, self).__init__(threshold)
+    def are_similar(self, first, second):
+        res = cv2.absdiff(first, second)
+        hist = cv2.calcHist([res], [0], None, [256], [0, 256])
+        return 1 - np.sum(hist[15::]) / np.sum(hist)
+class EuclideanComparator(ImageComparator):
+    def __init__(self, threshold):
+        super(EuclideanComparator, self).__init__(threshold)
+    def are_similar(self, first, second):
+        return dist.euclidean(first, second)
+class ChebysevComparator(ImageComparator):
+    def __init__(self, threshold):
+        super(ChebysevComparator, self).__init__(threshold)
+    def are_similar(self, first, second):
+        return dist.chebyshev(first, second)
+class OpenCVComparator(ImageComparator):
+    __metaclass__ = ABCMeta
+    def __init__(self, threshold):
+        super(OpenCVComparator, self).__init__(threshold)
+    @abstractmethod
+    def get_technique(self):
+        pass
+    def are_similar(self, first, second):
+        result = 0
+        for i in xrange(3):
+            hist1 = cv2.calcHist([first], [i], None, [256], [0,256])
+            hist2 = cv2.calcHist([second], [i], None, [256], [0,256])
+            result += cv2.compareHist(hist1, hist2, self.get_technique())
+        return result / 3
+class CorrelationOpenCVComparator(OpenCVComparator):
+    def __init__(self, threshold):
+        super(CorrelationOpenCVComparator, self).__init__(threshold)
+    def get_technique(self):
+        return cv2.HISTCMP_CORREL
+class ChiOpenCVComparator(OpenCVComparator):
+    def __init__(self, threshold):
+        super(ChiOpenCVComparator, self).__init__(threshold)
+    def get_technique(self):
+        return cv2.HISTCMP_CHISQR
+class IntersectionOpenCVComparator(OpenCVComparator):
+    def __init__(self, threshold):
+        super(IntersectionOpenCVComparator, self).__init__(threshold)
+    def get_technique(self):
+        return cv2.HISTCMP_INTERSECT
+class BhattacharyyaOpenCVComparator(OpenCVComparator):
+    def __init__(self, threshold):
+        super(BhattacharyyaOpenCVComparator, self).__init__(threshold)
+    def get_technique(self):
+        return cv2.HISTCMP_BHATTACHARYYA

mediaoutput.py ADDED Viewed

	@@ -0,0 +1,196 @@

+from abc import ABCMeta, abstractmethod
+import datetime
+import cv2
+import math
+import os
+import errno
+class MediaWriter(object):
+    """
+    Abstract class for all media outputs. Forcing each inheritance
+    to have a write class.
+    """
+    __metaclass__ = ABCMeta
+    @abstractmethod
+    def write(self, content, *args):
+        """
+        Write method to write media to disk
+        :param media: the media to be written
+        :param args: additional arguments that may be helpful
+        """
+        pass
+class NullWriter(MediaWriter):
+    def write(self, content, *args):
+        pass
+class ImageWriter(MediaWriter):
+    """
+    The ImageWriter will write an image to disk.
+    """
+    __metaclass__ = ABCMeta
+    def __init__(self, prefix, file_format):
+        """
+        Default initializer
+        :param prefix: the filename prefix a counter will be added
+        after this string and incremented after each write to disk
+        :param file_format: the file format for the images.
+        """
+        if not file_format.startswith('.'):
+            file_format = '.' + file_format
+        if prefix is not None:
+            setup_dirs(prefix)
+            self.name = prefix + file_format
+    def write(self, img, *args):
+        """
+        Writes the given image to the location specified through the
+        initializer
+        :param img: the image that will be written to disk
+        """
+        cv2.imwrite(self.name % self.next_name(args), img)
+    @abstractmethod
+    def next_name(self, *args):
+        """
+        This abstract method returns the object that should be inserted
+        into the filename
+        :param args: the args, that is passed to write_image
+        :return: the object that will be inserted into the filename
+        """
+class CustomImageWriter(ImageWriter):
+    """
+    Image Writer that uses a custom name. It takes it as the first
+    argument in *args in the write method.
+    """
+    def __init__(self, prefix=None, file_format='.jpg'):
+        """
+        Default initializer
+        :param prefix: the file location and file name prefix
+        :param file_format: the file format e.g. .jpg, .png
+        """
+        super(CustomImageWriter, self).__init__(prefix + '%s', file_format)
+    def next_name(self, *args):
+        return args[0]
+class IncrementalImageWriter(ImageWriter):
+    """
+    The IncrementalImageWriter will write an image to disk and append a
+    number to the file name. This number will be auto-incremented by the
+    specified step size after each write.
+    """
+    def __init__(self, prefix=None, file_format='.jpg', start=0, step=1):
+        """
+        Default initializer
+        :param prefix: the file location and file name
+        :param file_format: the file format e.g. .jpg, .png
+        :param start: the starting number for the incremental count
+        :param step: the step by which the count should increment
+        """
+        self.count = start - step
+        self.step = step
+        if prefix is not None:
+            prefix += '%d'
+        super(IncrementalImageWriter, self).__init__(prefix, file_format)
+    def next_name(self, *args):
+        self.count += self.step
+        return self.count
+class TimestampImageWriter(ImageWriter):
+    """
+    TimestampImageWriter is a ImageWriter that adds the timestamp of when
+    the image was first shown in the original stream
+    """
+    def __init__(self, fps, prefix=None, file_format='.jpg'):
+        """
+        Default initializer
+        :param fps: The number of frames per second in the original stream
+        :param prefix: the prefix of the path to the output location
+        :param file_format: the file format of the output image
+        """
+        self.fps = fps
+        if prefix is not None:
+            prefix += '%s'
+        super(TimestampImageWriter, self).__init__(prefix, file_format)
+    def next_name(self, args):
+        current_frame = args[0]
+        seconds = current_frame / self.fps
+        milliseconds = seconds - math.floor(seconds)
+        if milliseconds == 0:
+            milliseconds = '000'
+        else:
+            milliseconds = str(int(milliseconds * (10 ** 3)))
+        return str(datetime.timedelta(seconds=int(seconds))) + '.' + milliseconds.zfill(3)
+class TimetableWriter(MediaWriter):
+    """
+    The Timetable Writer outputs each slide iteratively using
+    the IncrementalImageWriter. Additionally it outputs a ".txt"
+    document containing the slide name and their appearances.
+    """
+    def __init__(self, output_dir, timetable_loc, file_format):
+        """
+        Default initializer
+        :param output_dir: the output directory for the sorted slides
+        :param timetable_file: where the timetable file should be stored
+        """
+        setup_dirs(timetable_loc)
+        self.timetable = open(timetable_loc, 'w')
+        self.img_writer = IncrementalImageWriter(prefix=output_dir, start=1, file_format=file_format)
+        self.txt_writer = TextWriter(self.timetable)
+    def write(self, slides, *args):
+        i = 1
+        for slide in slides:
+            if slide.marked:
+                continue
+            self.img_writer.write(slide.img)
+            appearances = slide.time
+            for com in slide.times:
+                appearances += " " + com
+            self.txt_writer.write("Slide %d: %s\n" % (i, appearances))
+            i += 1
+    def close(self):
+        self.timetable.close()
+class TextWriter(MediaWriter):
+    def __init__(self, output_file):
+        self.output_file = output_file
+    def write(self, content, *args):
+        self.output_file.write(content)
+def setup_dirs(path):
+    """
+    Takes a path and makes sure that directories to the path
+    gets created and is writable.
+    :param filename: the path to file
+    """
+    path = os.path.dirname(path)
+    if path == '':
+        return
+    if not os.path.exists(path):
+        try:
+            os.makedirs(path)
+        except OSError as exc:  # Guard against race condition
+            if exc.errno != errno.EEXIST:
+                raise

requirements.txt CHANGED Viewed

Binary files a/requirements.txt and b/requirements.txt differ

slides.py ADDED Viewed

	@@ -0,0 +1,177 @@

+import base64
+import json
+import os
+import cv2
+import numpy
+import re
+import numpy as np
+from PIL import Image
+from abc import ABCMeta, abstractmethod
+def numerical_sort(value):
+    numbers = re.compile(r'(\d+)')
+    parts = numbers.split(value)
+    parts[1::2] = map(int, parts[1::2])
+    return parts
+class Slide(object):
+    """
+    Represents a slide
+    """
+    def __init__(self, time, img):
+        """
+        Default initializer for a slide representation
+        :param time: the time when the slide appears
+        :param img: the image representing the slide
+        """
+        self.time = time
+        self.img = img
+        self.marked = False
+        self.times = []
+        self.reference = None
+        self.page_number = 0
+    def add_time(self, time):
+        """
+        Add an additional instance in time, when the slide
+        is displayed.
+        :param time: the time when the slide is displayed
+        """
+        self.times.append(time)
+    def to_dict(self):
+        """
+        Convert the Slide object to a dictionary, handling image serialization.
+        """
+        # Convert the image to a bytes object and then encode it to a base64 string
+        _, buffer = cv2.imencode('.jpg', self.img)
+        img_encoded = base64.b64encode(buffer).decode('utf-8')
+        return {
+            'time': self.time,
+            'img': img_encoded,  # Use the encoded image string
+            'marked': self.marked,
+            'times': self.times,
+            'reference': self.reference,
+            'page_number': self.page_number
+        }
+    @classmethod
+    def from_dict(cls, data):
+        """
+        Create a Slide object from a dictionary, handling image deserialization.
+        """
+        # Decode the image from a base64 string to a bytes object and then to a numpy array
+        img_decoded = base64.b64decode(data['img'])
+        img = cv2.imdecode(np.frombuffer(img_decoded, np.uint8), cv2.IMREAD_COLOR)
+        slide = cls(data['time'], img)
+        slide.marked = data.get('marked', False)
+        slide.times = data.get('times', [])
+        slide.reference = data.get('reference')
+        slide.page_number = data.get('page_number', 0)
+        return slide
+    def to_json(self):
+        """
+        Convert the Slide object to a JSON string.
+        """
+        return json.dumps(self.to_dict())
+    @classmethod
+    def from_json(cls, json_str):
+        """
+        Create a Slide object from a JSON string.
+        """
+        data = json.loads(json_str)
+        return cls.from_dict(data)
+def slides_to_json(slides):
+    """
+    Convert a list of Slide objects to a JSON string.
+    """
+    return json.dumps([slide.to_dict() for slide in slides])
+def slides_from_json(json_str):
+    """
+    Create a list of Slide objects from a JSON string.
+    """
+    slides_data = json.loads(json_str)
+    return [Slide.from_dict(slide_data) for slide_data in slides_data]
+class SlideDataHelper(object):
+    """
+    The helps to get slides from data.
+    """
+    def __init__(self, path, image_type="opencv"):
+        """
+        Default initializer
+        :param path: the path, where the slide is stored on disk
+        :image_type: the type representing the image. Either "opencv" or "pil" might be required for certain usage.
+        """
+        self.path = path
+        if image_type == "pil":
+            self.imgreader = PILReader()
+        else:
+            self.imgreader = OpenCVReader()
+    def get_slides(self):
+        """
+        Gets the slide from disk and returns them as list of "Slide"
+        objects.
+        :return: The slides stored on disk as list of "Slide" objects.
+        """
+        slides = []
+        for filename in sorted(os.listdir(self.path), key=numerical_sort):
+            file_path = os.path.join(self.path, filename)
+            _, ext = os.path.splitext(file_path)
+            if not is_image(ext):
+                continue
+            time, _ = os.path.splitext(filename)
+            slide = Slide(time, self.imgreader.get_img(file_path))
+            slides.append(slide)
+        return slides
+class ImageReader(object):
+    __metaclass__ = ABCMeta
+    @abstractmethod
+    def get_img(self, file_path):
+        pass
+class PILReader(ImageReader):
+    def get_img(self, file_path):
+        return Image.open(file_path)
+class OpenCVReader(ImageReader):
+    def get_img(self, file_path):
+        return cv2.imread(file_path)
+def convert_to_opencv(img):
+    return cv2.cvtColor(numpy.array(img.convert('RGB')), cv2.COLOR_RGB2BGR)
+def convert_to_PIL(img):
+    return Image.fromarray(img)
+def is_image(ext):
+    """
+    Checks if the file_format is a supported image to read.
+    :param ext: the extension of a file.
+    :return: whether or not the file is a image
+    """
+    return ext == '.jpeg' or ext == '.png' or ext == '.jpg' or ext == '.bmp'

sorter.py ADDED Viewed

	@@ -0,0 +1,109 @@

+import argparse
+import os
+import cv2
+from tqdm import tqdm
+import imgcomparison as ic
+import mediaoutput
+import sources
+from analyzer import Analyzer
+from slides import SlideDataHelper
+class SlideSorter(Analyzer):
+    """
+    Sorts the slides according to their timestamp.
+    """
+    def __init__(self, source, outpath=None, timetable_loc=None, file_format=".png",
+                 comparator=ic.AbsDiffHistComparator(0.98)):
+        """
+        Default initializer
+        :param path: the path where the slides are located on disk
+        :param comparator: the comparator to determine, if two slides
+        are duplicates.
+        """
+        self.comparator = comparator
+        self.writer = mediaoutput.NullWriter()
+        if outpath is not None:
+            if timetable_loc is None:
+                timetable_loc = os.path.join(outpath, 'timetable.txt')
+            self.file_format = file_format
+            self.writer = mediaoutput.TimetableWriter(outpath, timetable_loc, self.file_format)
+        self.source = source
+    def sort(self):
+        """
+        Sorting the slides and write the new slides without duplicates
+        but with a timetable to disk.
+        """
+        slides = []
+        with tqdm(total=len(self.source), desc="Sorting Slides: ") as pbar:
+            for i, slide in self.group_slides():
+                pbar.update(i)
+                if slide is not None:
+                    slides.append(slide)
+        return slides
+    def group_slides(self):
+        """
+        Groups the slides by eliminating duplicates.
+        :param slides: the list of slides possibly containing duplicates
+        :return: a list of slides without duplicates
+        """
+        slides = []
+        sorted_slides = []
+        loop_counter = 0
+        page_counter = 1
+        for slide in self.source.contents():
+            slides.append(slide)
+            if slide.marked:
+                continue
+            found = False
+            for other in slides[:-1]:
+                if self.comparator.are_same(slide.img, other.img):
+                    found = True
+                    if other.marked:
+                        other.reference.add_time(slide.time)
+                        slide.reference = other.reference
+                        slide.marked = True
+                    else:
+                        slide.reference = other
+                        other.add_time(slide.time)
+                        slide.marked = True
+                    yield loop_counter, None
+            if not found:
+                slide.page_number = page_counter
+                yield loop_counter, slide
+                sorted_slides.append(slide)
+                page_counter += 1
+            loop_counter += 1
+        self.writer.write(sorted_slides)
+        self.writer.close()
+    def analyze(self):
+        for _, slide in self.group_slides():
+            if slide is None:
+                continue
+            yield slide
+if __name__ == '__main__':
+    Parser = argparse.ArgumentParser(description="Slide Sorter")
+    Parser.add_argument("-d", "--inputslides", help="path of the sequentially sorted slides", default="slides/")
+    Parser.add_argument("-o", "--outpath", help="path to output slides", default="unique/", nargs='?')
+    Parser.add_argument("-f", "--fileformat", help="file format of the output images e.g. '.jpg'",
+                        default=".jpg", nargs='?')
+    Parser.add_argument("-t", "--timetable",
+                        help="path where the timetable should be written (default is the outpath+'timetable.txt')",
+                        nargs='?', default=None)
+    Args = Parser.parse_args()
+    if Args.timetable is None:
+        Args.timetable = os.path.join(Args.outpath, "timetable.txt")
+    sorter = SlideSorter(sources.ListSource(SlideDataHelper(Args.inputslides).get_slides()), Args.outpath,
+                         Args.timetable, Args.fileformat)
+    sorter.sort()

sources.py ADDED Viewed

	@@ -0,0 +1,32 @@

+from abc import ABCMeta, abstractmethod
+import sys
+class Source(object):
+    __metaclass__ = ABCMeta
+    @abstractmethod
+    def contents(self):
+        pass
+    def __len__(self):
+        return sys.maxint
+class ListSource(Source):
+    def __init__(self, list):
+        self.list = list
+    def contents(self):
+        return self.list
+    def __len__(self):
+        return len(self.contents())
+class AnalyzerSource(Source):
+    def __init__(self, analyzer):
+        self.analyzer = analyzer
+    def contents(self):
+        for content in self.analyzer.analyze():
+            yield content

timeline.py ADDED Viewed

	@@ -0,0 +1,166 @@

+import cv2
+class Timeline(object):
+    """
+    The Timeline represents a logical sequence of frames, where the
+    rendering of frames from the video stream will be done through
+    lazy evaluation.
+    """
+    reader_head = 0
+    def __init__(self, stream):
+        """
+        Default Initializer
+        :param stream: the video stream from OpenCV
+        """
+        self.stream = stream
+        self.len = stream.get(cv2.CAP_PROP_FRAME_COUNT)
+        self.fps = stream.get(cv2.CAP_PROP_FPS)
+    def next_frame(self):
+        """
+        This method reads the next frame from the video stream and
+        append it to the rendered_frames list. It also increments the
+        reader_head by 1.
+        :return: Usually the recently evaluated frame.
+        If the video stream has been completely read, it will return
+        None
+        """
+        ret, frame = self.stream.read()
+        self.reader_head += 1
+        if not ret:
+            return None
+        return frame
+    def get_frame(self, pos):
+        """
+        Returns the frame at the given position of the frame sequence
+        :param pos: the position of the frame in the sequence
+        :return: the frame at the specified position
+        """
+        assert pos >= 0
+        self.stream.set(cv2.CAP_PROP_POS_FRAMES, self.len - 1)
+        _, frame = self.stream.read()
+        self.reader_head = pos + 1
+        return frame
+    def get_frames(self, start, end):
+        """
+        Returns the list of frames at between the specified start and
+        end position in the frame sequence.
+        :param start: Where the frame sequence should start
+        :param end: Where the frame sequence should end
+        :return: the frame sequence from start to end
+        """
+        assert end >= start
+        assert start >= 0
+        result = []
+        for i in xrange(start, end, 1):
+            result.append(self.get_frame(i))
+        return result
+    def release_stream(self):
+        self.stream.release()
+class SlidingWindow(object):
+    """
+    This class represents an adaptive sliding window. Meaning
+    that it has a pointer to the start position of the window
+    and its size. The size of the window can be changed at any
+    time. Move operations and shrink and expand operations are
+    included.
+    """
+    def __init__(self, timeline, pos=0, size=2):
+        """
+        Default Initializer for the sliding window
+        :param timeline: the timeline where the sliding window
+        should be applied
+        :param pos: the position where the beginning of the
+        window points to
+        :param size: the size of the window
+        """
+        self.timeline = timeline
+        self.pos = pos
+        self.size = size
+    def move_right(self):
+        """
+        This method does this:
+        ░|░|█|█|░|░ => ░|░|░|█|█|░
+        1 2 3 4 5 6    1 2 3 4 5 6
+        :return: the changed list of frame
+        """
+        self.pos += 1
+    def move_left(self):
+        """
+        This method does this:
+        ░|░|█|█|░|░ => ░|█|█|░|░|░
+        1 2 3 4 5 6    1 2 3 4 5 6
+        :return: the changed list of frame
+        """
+        self.pos -= 1
+    def shrink_from_left(self):
+        """
+        This method does this:
+        ░|░|█|█|░|░ => ░|░|░|█|░|░
+        1 2 3 4 5 6    1 2 3 4 5 6
+        :return: the changed list of frame
+        """
+        self.pos += 1
+        self.size -= 1
+    def shrink_from_right(self):
+        """
+        This method does this:
+        ░|░|█|█|░|░ => ░|░|█|░|░|░
+        1 2 3 4 5 6    1 2 3 4 5 6
+        :return: the changed list of frame
+        """
+        self.size -= 1
+    def expand_to_left(self):
+        """
+        This method does this:
+        ░|░|█|█|░|░ => ░|█|█|█|░|░
+        1 2 3 4 5 6    1 2 3 4 5 6
+        :return: the changed list of frame
+        """
+        self.pos -= 1
+        self.size += 1
+    def expand_to_right(self):
+        """
+        This method does$$ this:
+        ░|░|█|█|░|░ => ░|░|█|█|█|░
+        1 2 3 4 5 6    1 2 3 4 5 6
+        :return: the changed list of frame
+        """
+        self.size += 1
+    def get_frames(self):
+        """
+        Retrieves all the frames that are currently in this adaptive
+        sliding window.
+        :return: the frames in the sliding window
+        """
+        return self.timeline.get_frames(self.pos, self.pos + self.size)
+    def get_frame(self, pos):
+        return self.timeline.get_frame(self.pos)
+    def get_start_frame(self):
+        return self.timeline.get_frame(self.pos)
+    def get_end_frame(self):
+        return self.timeline.get_frame(self.pos + self.size - 1)
+    def at_end(self):
+        return self.pos + self.size == self.timeline.len