0xnu
/

AGTD-v0.1

Model card Files Files and versions Community

AGTD-v0.1 / README.md

0xnu's picture

Upload README.md

a06e72a verified 7 months ago

|

history blame contribute delete

2.96 kB

	---
	license: apache-2.0
	datasets:
	- dmitva/human_ai_generated_text
	---

	## 0xnu/AGTD-v0.1

	The "0xnu/AGTD-v0.1" model represents a significant breakthrough in distinguishing between text written by humans and one generated by Artificial Intelligence (AI). It is rooted in sophisticated algorithms and offers exceptional accuracy and efficiency in text analysis and classification. Everything is detailed in the study and accessible [here](https://arxiv.org/abs/2311.15565).

	### Training Details

	```sh
	Precision: 0.6269
	Recall: 1.0000
	F1-score: 0.7707
	Accuracy: 0.7028
	Confusion Matrix:
	[[197 288]
	[ 0 484]]
	```

	![Training History](training_history.png "Training History")

	### Run the model

	```Python
	import os
	os.environ["KERAS_BACKEND"] = "tensorflow"

	import keras
	import tensorflow as tf
	import pickle
	import numpy as np
	from huggingface_hub import hf_hub_download

	# Hugging Face repository details
	REPO_ID = "0xnu/AGTD-v0.1"
	MODEL_FILENAME = "human_ai_text_classification_model.keras"
	TOKENIZER_FILENAME = "tokenizer.pkl"

	# Download the model and tokenizer
	model_path = hf_hub_download(repo_id=REPO_ID, filename=MODEL_FILENAME)
	tokenizer_path = hf_hub_download(repo_id=REPO_ID, filename=TOKENIZER_FILENAME)

	# Load the model
	model = keras.models.load_model(model_path)

	# Load the tokenizer
	with open(tokenizer_path, 'rb') as tokenizer_file:
	tokenizer = pickle.load(tokenizer_file)

	# Input text
	text = "This model trains on a diverse dataset and serves functions in applications requiring a mechanism for distinguishing between human and AI-generated text."

	# Parameters (these should match the training parameters)
	MAX_LENGTH = 100000

	# Tokenization function
	def tokenize_text(text, tokenizer, max_length):
	sequences = tokenizer.texts_to_sequences([text])
	padded_sequence = tf.keras.preprocessing.sequence.pad_sequences(sequences, maxlen=max_length, padding='post', truncating='post')
	return padded_sequence

	# Prediction function
	def predict_text(text, model, tokenizer, max_length):
	processed_text = tokenize_text(text, tokenizer, max_length)
	prediction = model.predict(processed_text)[0][0]
	return prediction

	# Make prediction
	prediction = predict_text(text, model, tokenizer, MAX_LENGTH)

	# Interpret results
	if prediction >= 0.5:
	print(f"The text is likely AI-generated (confidence: {prediction:.2f})")
	else:
	print(f"The text is likely human-written (confidence: {1-prediction:.2f})")

	print(f"Raw prediction value: {prediction}")
	```

	### Citation

	```tex
	@misc{agtd2024,
	author = {Oketunji, A.F.},
	title = {Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text},
	year = 2023,
	version = {v3},
	publisher = {arXiv},
	doi = {https://doi.org/10.48550/arXiv.2311.15565},
	url = {https://arxiv.org/abs/2311.15565}
	}
	```

	### Copyright

	(c) 2024 [Finbarrs Oketunji](https://finbarrs.eu). All Rights Reserved.