news_verification / README.md
pmkhanh7890's picture
Renamed to README.md
69fac09
|
raw
history blame
2.43 kB
metadata
title: FAKE NEWS DETECTION
emoji: πŸš€
colorFrom: '#FF69B4'
colorTo: '#FF1493'
sdk: gradio
sdk_version: 5.8.0
app_file: application.py
pinned: false

[Text] SimLLM: Detecting Sentences Generated by Large Language Models Using Similarity between the Generation and its Re-Generation

Getting Started

  1. Clone the repository:
git clone https://github.com/Tokyo-Techies/prj-nict-ai-content-detection
  1. Set up the environment: Using virtual environment:
python -m venv .venv
source .venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. API Keys (optional)

  2. Run the project:

  • Text only:
python SimLLM.py

Parameters

  • LLMs: List of large language models to use. Available models include 'ChatGPT', 'Yi', 'OpenChat', 'Gemini', 'LLaMa', 'Phi', 'Mixtral', 'QWen', 'OLMO', 'WizardLM', and 'Vicuna'. Default is ['ChatGPT', 'Yi', 'OpenChat'].
  • train_indexes: List of LLM indexes for training. Default is [0, 1, 2].
  • test_indexes: List of LLM indexes for testing. Default is [0].
  • num_samples: Number of samples. Default is 5000.

Examples

  • Running with default parameters: python SimLLM.py

  • Running with customized parameters: python SimLLM.py --LLMs ChatGPT --train_indexes 0 --test_indexes 0

Dataset

The dataset.csv file contains both human and generated texts from 12 large language models, including: ChatGPT, GPT-4o, Yi, OpenChat, Gemini, LLaMa, Phi, Mixtral, QWen, OLMO, WizardLM, and Vicuna.

Citation

@inproceedings{nguyen2024SimLLM,
  title={SimLLM: Detecting Sentences Generated by Large Language Models Using Similarity between the Generation and its Re-generation},
  author={Nguyen-Son, Hoang-Quoc and Dao, Minh-Son and Zettsu, Koji},
  booktitle={The Conference on Empirical Methods in Natural Language Processing},
  year={2024}
}

Acknowledgements