Bibtex classification using RoBERTa

Model Description

This model is a text classification tool designed to predict the likelihood of a given context paper being cited by a query paper. It processes concatenated titles of context and query papers and outputs a binary prediction: 1 indicates a potential citation relationship (though not necessary), and 0 suggests no such relationship.

Intended Use

  • Primary Use: To extract a subset of bibtex from ACL Anthology to make it < 50 MB.

Model Training

  • Data Description: The model was trained on a ACL Anthology dataset cestwc/anthology comprising pairs of paper titles. Each pair was annotated to indicate whether the context paper could potentially be cited by the query paper.

Performance

  • Metrics: [Include performance metrics like accuracy, precision, recall, F1-score, etc.]

How to Use

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_name = "cestwc/roberta-base-bib"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_citation(context_title, query_title):
    inputs = tokenizer.encode_plus(f"{context_title} </s> {query_title}", return_tensors="pt")
    outputs = model(**inputs)
    prediction = outputs.logits.argmax(-1).item()
    return "include" if prediction == 1 else "not include"

# Example
context_title = "Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples"
query_title = "Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility"
print(predict_citation(context_title, query_title))
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train cestwc/roberta-base-bib