lear-lab
/

modernbert-content

Text Classification

Generated from Trainer

Model card Files Files and versions Community

modernbert-content / README.md

wesleymorris's picture

Update README.md

1e19b6b verified 5 months ago

|

history blame contribute delete

3.43 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: answerdotai/ModernBERT-base
	tags:
	- generated_from_trainer
	model-index:
	- name: bin
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# bin

	This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1729
	- Mse: 0.1729

	## Model description

	This is a modernbert model with a regression head designed to predict the Content score of a summary.

	The input should be the summary + [sep] + source.

	```
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	model = AutoModelForSequenceClassification.from_pretrained("wesleymorris/modernbert-content", num_labels=1)
	tokenizer = AutoTokenizer.from_pretrained("wesleymorris/modernbert-content")

	def get_score(summary: str,
	source: str):
	text = summary+tokenizer.sep_token+source
	inputs = tokenizer(text, return_tensors = 'pt')
	return float(model(**inputs).logits[0])
	```


	### Corpus
	It was trained on a corpus of 4,233 summaries of 101 sources compiled by Botarleanu et al. (2022).
	The summaries were graded by expert raters on 6 criteria: Details, Main Point, Cohesion, Paraphrasing, Objective Language, and Language Beyond the Text.
	A principle component analyis was used to reduce the dimensionality of the outcome variables to two.

	Content includes Details, Main Point, Paraphrasing and Cohesion

	### Contact
	This model was developed by LEAR Lab at Vanderbilt University. For questions or comments about this model, please contact [email protected].

	## Intended uses & limitations

	This model can be used to predict human scores of content for a summary.
	The scores are normalized such that 0 is the mean of the training data and 1 is one standard deviation from the mean.

	## Training and evaluation data

	Before the finetuning step, the model was pretrained on a very large synthetic dataset.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 8
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 10

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Mse \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|
	\| No log \| 1.0 \| 411 \| 0.3181 \| 0.3181 \|
	\| 0.5319 \| 2.0 \| 822 \| 0.2884 \| 0.2884 \|
	\| 0.2343 \| 3.0 \| 1233 \| 0.2395 \| 0.2395 \|
	\| 0.1366 \| 4.0 \| 1644 \| 0.1885 \| 0.1885 \|
	\| 0.0688 \| 5.0 \| 2055 \| 0.1896 \| 0.1896 \|
	\| 0.0688 \| 6.0 \| 2466 \| 0.1854 \| 0.1854 \|
	\| 0.0417 \| 7.0 \| 2877 \| 0.1738 \| 0.1738 \|
	\| 0.0201 \| 8.0 \| 3288 \| 0.1759 \| 0.1759 \|
	\| 0.0086 \| 9.0 \| 3699 \| 0.1800 \| 0.1800 \|
	\| 0.0037 \| 10.0 \| 4110 \| 0.1729 \| 0.1729 \|


	### Framework versions

	- Transformers 4.48.3
	- Pytorch 2.6.0+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0