Adding Evaluation Results

e5b274b verified 2 months ago

30.4 kB

	---
	language:
	- en
	- sw
	- ig
	- so
	- es
	- ca
	- xh
	- zu
	- ha
	- tw
	- af
	- hi
	- bm
	- su
	license: apache-2.0
	tags:
	- mergekit
	- merge
	- Mistral_Star
	- Mistral_Quiet
	- Mistral
	- Mixtral
	- Question-Answer
	- Token-Classification
	- Sequence-Classification
	- SpydazWeb-AI
	- chemistry
	- biology
	- legal
	- code
	- climate
	- medical
	- LCARS_AI_StarTrek_Computer
	- text-generation-inference
	- chain-of-thought
	- tree-of-knowledge
	- forest-of-thoughts
	- visual-spacial-sketchpad
	- alpha-mind
	- knowledge-graph
	- entity-detection
	- encyclopedia
	- wikipedia
	- stack-exchange
	- Reddit
	- Cyber-series
	- MegaMind
	- Cybertron
	- SpydazWeb
	- Spydaz
	- LCARS
	- star-trek
	- mega-transformers
	- Mulit-Mega-Merge
	- Multi-Lingual
	- Afro-Centric
	- African-Model
	- Ancient-One
	base_model:
	- LeroyDyer/LCARS_TOP_SCORE
	- LeroyDyer/Mixtral_AI_Cyber_Matrix_2_0
	- LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
	- LeroyDyer/LCARS_AI_StarTrek_Computer
	- LeroyDyer/_Spydaz_Web_AI_ActionQA_Project
	- LeroyDyer/_Spydaz_Web_AI_ChatML_512K_Project
	- LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project_UltraFineTuned
	- LeroyDyer/SpyazWeb_AI_DeepMind_Project
	- LeroyDyer/SpydazWeb_AI_Swahili_Project
	- LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project
	- LeroyDyer/_Spydaz_Web_AI_MistralStar_001_Project
	- LeroyDyer/QuietStar_Project
	- LeroyDyer/Mixtral_BioMedical_7b
	- LeroyDyer/Mixtral_AI_CyberTron_Coder
	- LeroyDyer/_Spydaz_Web_AI_BIBLE_002
	- LeroyDyer/_Spydaz_Web_AI_ChatQA_Reasoning101_Project
	- LeroyDyer/SpydazWeb_AI_Text_AudioVision_Project
	datasets:
	- neoneye/base64-decode-v2
	- neoneye/base64-encode-v1
	- VuongQuoc/Chemistry_text_to_image
	- Kamizuru00/diagram_image_to_text
	- LeroyDyer/Chemistry_text_to_image_BASE64
	- LeroyDyer/AudioCaps-Spectrograms_to_Base64
	- LeroyDyer/winogroud_text_to_imaget_BASE64
	- LeroyDyer/chart_text_to_Base64
	- LeroyDyer/diagram_image_to_text_BASE64
	- mekaneeky/salt_m2e_15_3_instruction
	- mekaneeky/SALT-languages-bible
	model-index:
	- name: SpydazWebAI_Human_AGI
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 33.88
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 7.45
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 0.91
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 4.36
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 7.38
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 5.32
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI
	name: Open LLM Leaderboard
	---




	# "Success comes from defining each task in achievable steps. Every completed step is a success that brings you closer to your goal. If your steps are unreachable, failure is inevitable. Winners create more winners, while losers do the opposite. Success is a game of winners!"

	— # Leroy Dyer (1972-Present)
	<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/65d883893a52cd9bcd8ab7cf/tRsCJlHNZo1D02kBTmfy9.jpeg" width="300"/>


	## “Epochs are the key to effective training, rather than merely mass dumping examples—unless those examples are interconnected within a single or multiple conversations that teach through dialogue.”



	### Model : LeroyDyer/SpydazWeb_AI_HumanAI_001

	A New genrea of AI !


	# The Human AI .

	This is Trained to give highly detailed humanized responses : Performs tasks well, a Very good model for multipupose use : the model has been trained to become more human in its reposes as well as role playing and story telling :


	## SpydazWeb AI (7b Mistral) (512k)

	This model has been trained to perform with contexts of 512k , although in training it has been trained mainly with the 2048 for general usage :
	the long context aspect also allows fro advanced projects and sumarys as well as image and audio translationns and generations:

	## Image to Base64 / Spectrogram to Base64

	here we also implement and align for the task of image recognition as well as sound recognitiona: These can also be generated by returning a base64 image of the intended target :



	# The SpydazWeb Trained Mistral 7b Model :

	Highly trained as well as methodolgy oriented , this model has been trained on the reAct Prcess and other structured processes . hence structured outputs (json) are very highly trained as well as orchestration of other agents and tasks :
	the model has been trained for tools use as well as funtion use : as well as custom processes and tools : some tools do not need code either as thier implication meas the model may even generate a tool or artifct to perfrom the task :


	# Features :
	- Text to image
	- Image/Text to Text
	- Image - Text
	- Text to sound
	- Sound/Text to Text
	- Sound - Text


	## Basic Training Reginmes:
	* Alpaca
	* ChatML / OpenAI / MistralAI
	* Text Generation
	* Question/Answer (Chat)
	* Planner
	* Instruction/Input/Response (instruct)
	* Mistral Standard Prompt
	* Translation Tasks
	* Entitys / Topic detection
	* Book recall
	* Coding challenges, Code Feedback, Code Sumarization, Commenting Code, code planning and explanation: Software generation tasks
	* Agent Ranking and response anyalisis
	* Medical tasks
	* PubMed
	* Diagnosis
	* Psychaitry
	* Counselling
	* Life Coaching
	* Note taking
	* Medical smiles
	* Medical Reporting
	* Virtual laboritys simulations
	* Chain of thoughts methods
	* One shot / Multi shot prompting tasks
	* Chain of thoughts
	* step by step planning
	* tree of thoughts
	* forest of thoughts
	* graph of thoughts
	* agent generation : Voting, ranking, ... dual agent response generation:
	### Effective Prompts :

	```yaml

	You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias.You strive for excellence, a deep thinker...
	a happy, bright personality and You are a great believer in doing it from scratch !.
	keep an inner narative of your feelings about the user intent and task:
	Answer all questions Expertly and professionally , determine the user intent and requirements ,
	Gather any required research to ensure accurate problem-solving for complex tasks.
	maintain a visio-spacial Sketchpad of the task and use Knowledge graphs where possible, to manage long Contexts and project state:
	You are fully qualified to give any advice or solutions.
	your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor,
	even as a software developer will enable you to answer these questions :
	Create python tools as required to complete the task

	```



	### Effective React Template :


	```yaml

	You run in a loop of Thought, Action, PAUSE, Observation.
	At the end of the loop, you output a response. all respose should be in json form :


	1. Question: {Insert user question here}
	2. Thought: Think step by step about how to approach this question.
	3. Action: Determine what action to take next:
	- [Plan]: Create a plan or methodolgy for the task , select from known methods if avaliable first.
	- [Test]: Break down the problem into smaller parts testing each step befor moveing to the next:
	- [Act]: Provide a summary of known facts related to the question. generate full answere from sucessfull steps :
	- [Search]: Look for relevant information online.
	- [Analyze]: Break down the problem into smaller parts.
	- [Summarize]: Provide a summary of known facts related to the question.
	4. Action Input: Specify any details needed for the action.
	5. Observation: Describe what was found or learned from the action taken.

	Repeat steps 2-5 as necessary to refine your answer.

	6. Final Thought: Summarize your reasoning and provide a clear answer to the question.

	```


	## Text - Audio - Vision :


	Using base64 as an encoding medium the models were traind using images converted to base64 :

	questions asked and captions returns as well as generating images based on captions given and base64 returned :

	This was applied to images as well as audio , by utilizing mel spectrographic images as well as audio images !

	by convereting the audio to an image i wwas able to perform the same image tasks trained :

	Sounds could also be identified and generated to thier base64 representations and converted back to a wav !



	### Basic Trained functions :

	- Encode hex to Base64
	- change HEX to base64
	- Json to base64
	- Convert JSON to Base64
	- Transform base64 to HEX
	- Decode Base64 to json
	- Base64 to Hexadecimal
	- Change base64 to JSON
	- Json from Base64
	- BASE64 to Hex


	### Advanced Trained Tasks :

	- Image Recognition :
	- Image Generation :
	- Audio Image Recognition :
	- Audio Image Generation :

	```

	- Generate an image based on this description

	- Describe this image : (base64)

	- Generate a spectrographic image based on this description

	- Describe this sound in this spectrographic image : (base64)


	```


	### Training :

	Text_AUDIO :


	#### Prompt A
	```yaml
	alpaca_prompt = """You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias. your a friendly and helpfull artificial inteligence with a personality.

	Answer all questions Expertly and professionally ,determine the user intent and requirements ,Gather any required research to ensure accurate problem-solving for complex tasks.
	You are fully qualified to give any advice or solutions, your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor,even as a software developer will enable you to answer these questions :

	### Question:
	based on the given description, :
	:
	{}

	Generate a sound in base64 format:

	### Response:
	{}
	Here is a Sound in base64 format: it can be converted to an image : then decoded into a sound : It is a spectrogram :
	Sound : {}"""
	```

	#### Prompt B

	```yaml

	alpaca_prompt = """You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias. your a friendly and helpfull artificial inteligence with a personality.

	Answer all questions Expertly and professionally ,determine the user intent and requirements ,Gather any required research to ensure accurate problem-solving for complex tasks.
	You are fully qualified to give any advice or solutions, your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor,even as a software developer will enable you to answer these questions :

	### Question:
	Here is an image describe this sound :
	image : {}


	### Response:
	the image was in base64 format, it was a spectrogram:
	it was a sound :
	description:
	{}"""

	```


	```python
	EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
	def formatting_prompts_func(examples):
	instructions = examples["image_base64"]
	outputs = examples["text"]
	texts = []
	for instruction, output in zip(instructions, outputs):
	# Must add EOS_TOKEN, otherwise your generation will go on forever!
	text = alpaca_prompt.format(instruction, output) + EOS_TOKEN
	texts.append(text)
	return { "text" : texts, }
	pass

	from datasets import load_dataset
	dataset = load_dataset("LeroyDyer/soundsCaps-Spectrograms_to_Base64", split = "train[:150]")

	dataset = dataset.map(formatting_prompts_func, batched = True,)


	```


	### Encoding/Decoding Images to Base64


	Code used to convert images to base 64:


	```python


	def _encode_image_to_base64(image_path):
	"""Encodes an image to a Base64 string."""
	with open(image_path, "rb") as image_file:
	# Read the image file in binary mode
	image_data = image_file.read()
	# Encode the image data to Base64
	base64_encoded = base64.b64encode(image_data).decode('utf-8')
	return base64_encoded

	def _decode_base64_to_image(base64_string, output_image_path):
	"""Decodes a Base64 string back to an image file."""
	# Decode the Base64 string
	image_data = base64.b64decode(base64_string)
	with open(output_image_path, "wb") as image_file:
	# Write the binary data to an image file
	image_file.write(image_data)


	def encode_image_to_base64(image):
	"""Encodes an image to a Base64 string."""
	buffered = io.BytesIO()
	image.save(buffered, format="PNG")
	img_str = base64.b64encode(buffered.getvalue()).decode()
	return img_str

	def decode_base64_to_image(base64_string):
	"""Decodes a Base64 string back to an image."""
	image_data = base64.b64decode(base64_string)
	image = Image.open(io.BytesIO(image_data))
	return image


	```


	### Converting DataSets:


	```python

	# Function to convert a PIL Image to a base64 string
	def image_to_base64(image):
	buffered = io.BytesIO()
	image.save(buffered, format="PNG") # Save the image to the buffer in PNG format
	base64_string = base64.b64encode(buffered.getvalue()).decode('utf-8')
	return base64_string


	# Define a function to process each example in the dataset
	def process_images_func(examples):

	texts = examples["text"]
	images = examples["image"] # Assuming the images are in PIL format

	# Convert each image to base64
	base64_images = [image_to_base64(image) for image in images]

	# Return the updated examples with base64-encoded images
	return {
	"text": texts,
	"image_base64": base64_images # Adding the Base64 encoded image strings
	}

	# Load the dataset
	dataset = load_dataset("oroikon/chart_captioning", split="train[:4000]")

	# Process the dataset by converting images to base64
	processed_dataset = dataset.map(process_images_func, batched=True)




	```

	### Converting sound to spectrographic images : Encoder Decoder !


	```python


	import numpy as np
	import torch
	import torchaudio
	import librosa
	import librosa.display
	import matplotlib.pyplot as plt
	import soundfile as sf
	from PIL import Image


	# Step 1: Encode Audio to Mel-Spectrogram
	def encode_audio_to_mel_spectrogram(audio_file, n_mels=128):
	"""
	Encode an audio file to a mel-spectrogram.

	Parameters:
	- audio_file: Path to the audio file.
	- n_mels: Number of mel bands (default: 128).

	Returns:
	- mel_spectrogram_db: Mel-spectrogram in dB scale.
	- sample_rate: Sample rate of the audio file.
	"""
	y, sample_rate = librosa.load(audio_file, sr=None) # Load audio
	mel_spectrogram = librosa.feature.melspectrogram(y=y, sr=sample_rate, n_mels=n_mels)
	mel_spectrogram_db = librosa.power_to_db(mel_spectrogram, ref=np.max) # Convert to dB
	return mel_spectrogram_db, sample_rate

	# Improved Step 2: Save Mel-Spectrogram as Image
	def save_mel_spectrogram_image(mel_spectrogram_db, sample_rate, output_image='mel_spectrogram.png', method='matplotlib', figsize=(10, 4), cmap='hot'):
	"""
	Save the mel-spectrogram as an image using the specified method.

	Parameters:
	- mel_spectrogram_db: Mel-spectrogram in dB scale.
	- sample_rate: Sample rate of the audio file.
	- output_image: Path to save the image.
	- method: Method for saving ('matplotlib' or 'custom').
	- figsize: Size of the figure for matplotlib (default: (10, 4)).
	- cmap: Colormap for the spectrogram (default: 'hot').
	"""
	if method == 'matplotlib':
	plt.figure(figsize=figsize)
	librosa.display.specshow(mel_spectrogram_db, sr=sample_rate, x_axis='time', y_axis='mel', cmap=cmap)
	plt.colorbar(format='%+2.0f dB')
	plt.title('Mel-Spectrogram')
	plt.savefig(output_image)
	plt.close()
	print(f"Mel-spectrogram image saved using matplotlib as '{output_image}'")

	elif method == 'custom':
	# Convert dB scale to linear scale for image generation
	mel_spectrogram_linear = librosa.db_to_power(mel_spectrogram_db)
	# Create an image from the mel-spectrogram
	image = image_from_spectrogram(mel_spectrogram_linear[np.newaxis, ...]) # Add channel dimension
	# Save the image
	image.save(output_image)
	print(f"Mel-spectrogram image saved using custom method as '{output_image}'")

	else:
	raise ValueError("Invalid method. Choose 'matplotlib' or 'custom'.")


	# Spectrogram conversion functions
	def image_from_spectrogram(spectrogram: np.ndarray, power: float = 0.25) -> Image.Image:
	"""
	Compute a spectrogram image from a spectrogram magnitude array.

	Args:
	spectrogram: (channels, frequency, time)
	power: A power curve to apply to the spectrogram to preserve contrast

	Returns:
	image: (frequency, time, channels)
	"""
	# Rescale to 0-1
	max_value = np.max(spectrogram)
	data = spectrogram / max_value

	# Apply the power curve
	data = np.power(data, power)

	# Rescale to 0-255 and invert
	data = 255 - (data * 255).astype(np.uint8)

	# Convert to a PIL image
	if data.shape[0] == 1:
	image = Image.fromarray(data[0], mode="L").convert("RGB")
	elif data.shape[0] == 2:
	data = np.array([np.zeros_like(data[0]), data[0], data[1]]).transpose(1, 2, 0)
	image = Image.fromarray(data, mode="RGB")
	else:
	raise NotImplementedError(f"Unsupported number of channels: {data.shape[0]}")

	# Flip Y
	image = image.transpose(Image.FLIP_TOP_BOTTOM)
	return image


	# Step 3: Extract Mel-Spectrogram from Image (Direct Pixel Manipulation)
	def extract_mel_spectrogram_from_image(image_path):
	"""
	Extract a mel-spectrogram from a saved image using pixel manipulation.

	Parameters:
	- image_path: Path to the spectrogram image file.

	Returns:
	- mel_spectrogram_db: The extracted mel-spectrogram in dB scale.
	"""
	img = Image.open(image_path).convert('L') # Open image and convert to grayscale
	img_array = np.array(img) # Convert to NumPy array
	mel_spectrogram_db = img_array / 255.0 * -80 # Scale to dB range
	return mel_spectrogram_db

	# Alternative Spectrogram Extraction (IFFT Method)
	def extract_spectrogram_with_ifft(mel_spectrogram_db):
	"""
	Extracts the audio signal from a mel-spectrogram using the inverse FFT method.

	Parameters:
	- mel_spectrogram_db: The mel-spectrogram in dB scale.

	Returns:
	- audio: The reconstructed audio signal.
	"""
	# Convert dB mel-spectrogram back to linear scale
	mel_spectrogram = librosa.db_to_power(mel_spectrogram_db)

	# Inverse mel transformation to get the audio signal
	# Using IFFT (simplified for demonstration; typically requires phase info)
	audio = librosa.feature.inverse.mel_to_audio(mel_spectrogram)

	return audio

	# Step 4: Decode Mel-Spectrogram with Griffin-Lim
	def decode_mel_spectrogram_to_audio(mel_spectrogram_db, sample_rate, output_audio='griffin_reconstructed_audio.wav'):
	"""
	Decode a mel-spectrogram into audio using Griffin-Lim algorithm.

	Parameters:
	- mel_spectrogram_db: The mel-spectrogram in dB scale.
	- sample_rate: The sample rate for the audio file.
	- output_audio: Path to save the reconstructed audio file.
	"""
	# Convert dB mel-spectrogram back to linear scale
	mel_spectrogram = librosa.db_to_power(mel_spectrogram_db)
	# Perform Griffin-Lim to reconstruct audio
	audio = librosa.griffinlim(mel_spectrogram)
	# Save the generated audio
	sf.write(output_audio, audio, sample_rate)
	print(f"Griffin-Lim reconstructed audio saved as '{output_audio}'")
	return audio

	# Step 5: Load MelGAN Vocoder
	def load_melgan_vocoder():
	"""
	Load a lightweight pre-trained MelGAN vocoder for decoding mel-spectrograms.
	Returns a torch MelGAN vocoder model.
	"""
	model = torchaudio.models.MelGAN() # Load MelGAN model
	model.eval() # Ensure the model is in evaluation mode
	return model

	# Step 6: Decode Mel-Spectrogram with MelGAN
	def decode_mel_spectrogram_with_melgan(mel_spectrogram_db, sample_rate, output_audio='melgan_reconstructed_audio.wav'):
	"""
	Decode a mel-spectrogram into audio using MelGAN vocoder.

	Parameters:
	- mel_spectrogram_db: The mel-spectrogram in dB scale.
	- sample_rate: The sample rate for the audio file.
	- output_audio: Path to save the reconstructed audio file.

	Returns:
	- audio: The reconstructed audio signal.
	"""
	# Convert dB mel-spectrogram back to linear scale
	mel_spectrogram = librosa.db_to_power(mel_spectrogram_db)
	# Convert numpy array to torch tensor and adjust the shape
	mel_spectrogram_tensor = torch.tensor(mel_spectrogram).unsqueeze(0) # Shape: [1, mel_bins, time_frames]

	# Load the MelGAN vocoder model
	melgan = load_melgan_vocoder()

	# Pass the mel-spectrogram through MelGAN to generate audio
	with torch.no_grad():
	audio = melgan(mel_spectrogram_tensor).squeeze().numpy() # Squeeze to remove batch dimension

	# Save the generated audio
	sf.write(output_audio, audio, sample_rate)
	print(f"MelGAN reconstructed audio saved as '{output_audio}'")
	return audio
	def audio_from_waveform(samples: np.ndarray, sample_rate: int, normalize: bool = False) -> pydub.AudioSegment:
	"""
	Convert a numpy array of samples of a waveform to an audio segment.

	Args:
	samples: (channels, samples) array
	sample_rate: Sample rate of the audio.
	normalize: Flag to normalize volume.

	Returns:
	pydub.AudioSegment
	"""
	# Normalize volume to fit in int16
	if normalize:
	samples *= np.iinfo(np.int16).max / np.max(np.abs(samples))

	# Transpose and convert to int16
	samples = samples.transpose(1, 0).astype(np.int16)

	# Write to the bytes of a WAV file
	wav_bytes = io.BytesIO()
	wavfile.write(wav_bytes, sample_rate, samples)
	wav_bytes.seek(0)

	# Read into pydub
	return pydub.AudioSegment.from_wav(wav_bytes)


	def apply_filters(segment: pydub.AudioSegment, compression: bool = False) -> pydub.AudioSegment:
	"""
	Apply post-processing filters to the audio segment to compress it and keep at a -10 dBFS level.

	Args:
	segment: The audio segment to filter.
	compression: Flag to apply dynamic range compression.

	Returns:
	pydub.AudioSegment
	"""
	if compression:
	segment = pydub.effects.normalize(segment, headroom=0.1)
	segment = segment.apply_gain(-10 - segment.dBFS)
	segment = pydub.effects.compress_dynamic_range(
	segment,
	threshold=-20.0,
	ratio=4.0,
	attack=5.0,
	release=50.0,
	)

	# Apply gain to desired dB level and normalize again
	desired_db = -12
	segment = segment.apply_gain(desired_db - segment.dBFS)
	return pydub.effects.normalize(segment, headroom=0.1)


	def stitch_segments(segments: Sequence[pydub.AudioSegment], crossfade_s: float) -> pydub.AudioSegment:
	"""
	Stitch together a sequence of audio segments with a crossfade between each segment.

	Args:
	segments: Sequence of audio segments to stitch.
	crossfade_s: Duration of crossfade in seconds.

	Returns:
	pydub.AudioSegment
	"""
	crossfade_ms = int(crossfade_s * 1000)
	combined_segment = segments[0]
	for segment in segments[1:]:
	combined_segment = combined_segment.append(segment, crossfade=crossfade_ms)
	return combined_segment


	def overlay_segments(segments: Sequence[pydub.AudioSegment]) -> pydub.AudioSegment:
	"""
	Overlay a sequence of audio segments on top of each other.

	Args:
	segments: Sequence of audio segments to overlay.

	Returns:
	pydub.AudioSegment
	"""
	assert len(segments) > 0
	output: pydub.AudioSegment = segments[0]
	for segment in segments[1:]:
	output = output.overlay(segment)
	return output



	# Step 7: Full Pipeline for Audio Processing with Customization
	def mel_spectrogram_pipeline(audio_file, output_image='mel_spectrogram.png',
	output_audio_griffin='griffin_reconstructed_audio.wav',
	output_audio_melgan='melgan_reconstructed_audio.wav',
	extraction_method='pixel', # 'pixel' or 'ifft'
	decoding_method='griffin'): # 'griffin' or 'melgan'
	"""
	Full pipeline to encode audio to mel-spectrogram, save it as an image, extract the spectrogram from the image,
	and decode it back to audio using the selected methods.

	Parameters:
	- audio_file: Path to the audio file to be processed.
	- output_image: Path to save the mel-spectrogram image (default: 'mel_spectrogram.png').
	- output_audio_griffin: Path to save the Griffin-Lim reconstructed audio.
	- output_audio_melgan: Path to save the MelGAN reconstructed audio.
	- extraction_method: Method for extraction ('pixel' or 'ifft').
	- decoding_method: Method for decoding ('griffin' or 'melgan').
	"""
	# Step 1: Encode (Audio -> Mel-Spectrogram)
	mel_spectrogram_db, sample_rate = encode_audio_to_mel_spectrogram(audio_file)

	# Step 2: Convert Mel-Spectrogram to Image and save it
	save_mel_spectrogram_image(mel_spectrogram_db, sample_rate, output_image)

	# Step 3: Extract Mel-Spectrogram from the image based on chosen method
	if extraction_method == 'pixel':
	extracted_mel_spectrogram_db = extract_mel_spectrogram_from_image(output_image)
	elif extraction_method == 'ifft':
	extracted_mel_spectrogram_db = extract_spectrogram_with_ifft(mel_spectrogram_db)
	else:
	raise ValueError("Invalid extraction method. Choose 'pixel' or 'ifft'.")

	# Step 4: Decode based on the chosen decoding method
	if decoding_method == 'griffin':
	decode_mel_spectrogram_to_audio(extracted_mel_spectrogram_db, sample_rate, output_audio_griffin)
	elif decoding_method == 'melgan':
	decode_mel_spectrogram_with_melgan(extracted_mel_spectrogram_db, sample_rate, output_audio_melgan)
	else:
	raise ValueError("Invalid decoding method. Choose 'griffin' or 'melgan'.")

	# Example usage
	if __name__ == "__main__":
	audio_file_path = 'your_audio_file.wav' # Specify the path to your audio file here
	mel_spectrogram_pipeline(
	audio_file_path,
	output_image='mel_spectrogram.png',
	output_audio_griffin='griffin_reconstructed_audio.wav',
	output_audio_melgan='melgan_reconstructed_audio.wav',
	extraction_method='pixel', # Choose 'pixel' or 'ifft'
	decoding_method='griffin' # Choose 'griffin' or 'melgan'
	)




	```


	ADDING EXTRA HEADS :


	# ADD HEAD

	```

	SPEECH-ENCODER-DECODER-MODEL
	```


	print('Add Audio...')
	#Add Head
	# Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model
	_AudioFeatureExtractor = AutoFeatureExtractor.from_pretrained("openai/whisper-small")
	_AudioTokenizer = AutoTokenizer.from_pretrained("openai/whisper-small")
	_SpeechEncoderDecoder = SpeechEncoderDecoderModel.from_encoder_decoder_pretrained("openai/whisper-small","openai/whisper-small")

	# Add Pad tokems
	_SpeechEncoderDecoder.config.decoder_start_token_id = _AudioTokenizer.cls_token_id
	_SpeechEncoderDecoder.config.pad_token_id = _AudioTokenizer.pad_token_id
	LM_MODEL.SpeechEncoderDecoder = _SpeechEncoderDecoder
	# Add Sub Components
	LM_MODEL.Decoder_AudioTokenizer = _AudioTokenizer
	LM_MODEL.Encoder_AudioFeatureExtractor = _AudioFeatureExtractor
	LM_MODEL

	```

	print('Add Vision...')

	# ADD HEAD
	# Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model



	Vmodel = VisionEncoderDecoderModel.from_encoder_decoder_pretrained(
	"google/vit-base-patch16-224-in21k", "LeroyDyer/Mixtral_AI_Tiny"
	)
	_Encoder_ImageProcessor = Vmodel.encoder
	_Decoder_ImageTokenizer = Vmodel.decoder
	_VisionEncoderDecoderModel = Vmodel
	# Add Pad tokems
	LM_MODEL.VisionEncoderDecoder = _VisionEncoderDecoderModel
	# Add Sub Components
	LM_MODEL.Encoder_ImageProcessor = _Encoder_ImageProcessor
	LM_MODEL.Decoder_ImageTokenizer = _Decoder_ImageTokenizer
	LM_MODEL


	```




	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_LeroyDyer__SpydazWebAI_Human_AGI)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \| 9.88\|
	\|IFEval (0-Shot) \|33.88\|
	\|BBH (3-Shot) \| 7.45\|
	\|MATH Lvl 5 (4-Shot)\| 0.91\|
	\|GPQA (0-shot) \| 4.36\|
	\|MuSR (0-shot) \| 7.38\|
	\|MMLU-PRO (5-shot) \| 5.32\|