Spaces:
Running
Running
metadata
title: FAKE NEWS DETECTION
emoji: π
colorFrom: '#FF69B4'
colorTo: '#FF1493'
sdk: gradio
sdk_version: 5.8.0
app_file: application.py
pinned: false
[Text] SimLLM: Detecting Sentences Generated by Large Language Models Using Similarity between the Generation and its Re-Generation
Getting Started
- Clone the repository:
git clone https://github.com/Tokyo-Techies/prj-nict-ai-content-detection
- Set up the environment: Using virtual environment:
python -m venv .venv
source .venv/bin/activate
- Install dependencies:
- Torch: https://pytorch.org/get-started/locally/
- Others
pip install -r requirements.txt
API Keys (optional)
- Obtain API keys for the corresponding models and insert them into the
SimLLM.py
file:- ChatGPT: OpenAI API
- Gemini: Google Gemini API
- Other LLMs: Together API
- Obtain API keys for the corresponding models and insert them into the
Run the project:
- Text only:
python SimLLM.py
Parameters
LLMs
: List of large language models to use. Available models include 'ChatGPT', 'Yi', 'OpenChat', 'Gemini', 'LLaMa', 'Phi', 'Mixtral', 'QWen', 'OLMO', 'WizardLM', and 'Vicuna'. Default is['ChatGPT', 'Yi', 'OpenChat']
.train_indexes
: List of LLM indexes for training. Default is[0, 1, 2]
.test_indexes
: List of LLM indexes for testing. Default is[0]
.num_samples
: Number of samples. Default is 5000.
Examples
Running with default parameters:
python SimLLM.py
Running with customized parameters:
python SimLLM.py --LLMs ChatGPT --train_indexes 0 --test_indexes 0
Dataset
The dataset.csv
file contains both human and generated texts from 12 large language models, including:
ChatGPT, GPT-4o, Yi, OpenChat, Gemini, LLaMa, Phi, Mixtral, QWen, OLMO, WizardLM, and Vicuna.
Citation
@inproceedings{nguyen2024SimLLM,
title={SimLLM: Detecting Sentences Generated by Large Language Models Using Similarity between the Generation and its Re-generation},
author={Nguyen-Son, Hoang-Quoc and Dao, Minh-Son and Zettsu, Koji},
booktitle={The Conference on Empirical Methods in Natural Language Processing},
year={2024}
}
Acknowledgements
- BARTScore: BARTScore GitHub Repository