Add BERTopic model

90a0f99 almost 2 years ago

5.24 kB


	---
	tags:
	- bertopic
	library_name: bertopic
	pipeline_tag: text-classification
	---

	# transformers_issues_topics

	This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
	BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

	## Usage

	To use this model, please install BERTopic:

	```
	pip install -U bertopic
	```

	You can use the model as follows:

	```python
	from bertopic import BERTopic
	topic_model = BERTopic.load("ruanwz/transformers_issues_topics")

	topic_model.get_topic_info()
	```

	## Topic overview

	* Number of topics: 30
	* Number of training documents: 9000

	<details>
	<summary>Click here for an overview of all topics.</summary>

	\| Topic ID \| Topic Keywords \| Topic Frequency \| Label \|
	\|----------\|----------------\|-----------------\|-------\|
	\| -1 \| tensorflow - pytorch - tf - pretrained - gpu \| 11 \| -1_tensorflow_pytorch_tf_pretrained \|
	\| 0 \| tokenizer - tokenizers - tokenize - tokenization - token \| 2089 \| 0_tokenizer_tokenizers_tokenize_tokenization \|
	\| 1 \| gpt2 - gpt - gpt2doubleheadsmodel - gpt2lmheadmodel - distilgpt2 \| 1471 \| 1_gpt2_gpt_gpt2doubleheadsmodel_gpt2lmheadmodel \|
	\| 2 \| ner - seq2seqtrainer - seq2seq - runseq2seqpy - valueerror \| 856 \| 2_ner_seq2seqtrainer_seq2seq_runseq2seqpy \|
	\| 3 \| modelcard - modelcards - card - model - cards \| 601 \| 3_modelcard_modelcards_card_model \|
	\| 4 \| trainer - trainertrain - trainers - training - evaluateduringtraining \| 500 \| 4_trainer_trainertrain_trainers_training \|
	\| 5 \| longformer - longformers - longformerformultiplechoice - tf - longformertokenizerfast \| 455 \| 5_longformer_longformers_longformerformultiplechoice_tf \|
	\| 6 \| typos - typo - fix - correction - fixed \| 439 \| 6_typos_typo_fix_correction \|
	\| 7 \| albertbasev2 - albertforpretraining - albert - albertformaskedlm - xlnet \| 407 \| 7_albertbasev2_albertforpretraining_albert_albertformaskedlm \|
	\| 8 \| summarization - summaries - summary - text - nlp \| 351 \| 8_summarization_summaries_summary_text \|
	\| 9 \| readmemd - readmetxt - readme - modelcard - file \| 333 \| 9_readmemd_readmetxt_readme_modelcard \|
	\| 10 \| transformerscli - transformers - transformer - transformerxl - importerror \| 259 \| 10_transformerscli_transformers_transformer_transformerxl \|
	\| 11 \| ci - testing - tests - test - slow \| 228 \| 11_ci_testing_tests_test \|
	\| 12 \| questionansweringpipeline - questionanswering - answering - tfalbertforquestionanswering - questionasnwering \| 156 \| 12_questionansweringpipeline_questionanswering_answering_tfalbertforquestionanswering \|
	\| 13 \| pipeline - pipelines - pipelinespy - pipelineexception - fixpipeline \| 137 \| 13_pipeline_pipelines_pipelinespy_pipelineexception \|
	\| 14 \| onnxonnxruntime - onnx - onnxexport - 04onnxexport - 04onnxexportipynb \| 113 \| 14_onnxonnxruntime_onnx_onnxexport_04onnxexport \|
	\| 15 \| benchmark - benchmarks - accuracy - evaluation - metrics \| 98 \| 15_benchmark_benchmarks_accuracy_evaluation \|
	\| 16 \| huggingfacemaster - huggingfacetokenizers297 - huggingface - huggingfaces - huggingfacetransformers \| 81 \| 16_huggingfacemaster_huggingfacetokenizers297_huggingface_huggingfaces \|
	\| 17 \| generationbeamsearchpy - generatebeamsearch - generatebeamsearchoutputs - beamsearch - nonbeamsearch \| 69 \| 17_generationbeamsearchpy_generatebeamsearch_generatebeamsearchoutputs_beamsearch \|
	\| 18 \| wav2vec2 - wav2vec - wav2vec20 - wav2vec2forctc - wav2vec2xlrswav2vec2 \| 56 \| 18_wav2vec2_wav2vec_wav2vec20_wav2vec2forctc \|
	\| 19 \| flax - flaxelectraformaskedlm - flaxelectraforpretraining - flaxjax - flaxelectramodel \| 53 \| 19_flax_flaxelectraformaskedlm_flaxelectraforpretraining_flaxjax \|
	\| 20 \| cachedir - cache - cachedpath - cached - caching \| 43 \| 20_cachedir_cache_cachedpath_cached \|
	\| 21 \| notebook - notebooks - colab - community - t5 \| 33 \| 21_notebook_notebooks_colab_community \|
	\| 22 \| wandbproject - wandb - sagemaker - sagemakertrainer - wandbcallback \| 32 \| 22_wandbproject_wandb_sagemaker_sagemakertrainer \|
	\| 23 \| bigbird - py7zr - tapas - tres - v4 \| 32 \| 23_bigbird_py7zr_tapas_tres \|
	\| 24 \| electra - electrapretrainedmodel - electraformaskedlm - electraformultiplechoice - electrafortokenclassification \| 28 \| 24_electra_electrapretrainedmodel_electraformaskedlm_electraformultiplechoice \|
	\| 25 \| layoutlm - layout - layoutlmtokenizer - layoutlmbaseuncased - tf \| 24 \| 25_layoutlm_layout_layoutlmtokenizer_layoutlmbaseuncased \|
	\| 26 \| isort - blackisortflake8 - github - repo - version \| 18 \| 26_isort_blackisortflake8_github_repo \|
	\| 27 \| pplm - pr - deprecated - variable - ppl \| 14 \| 27_pplm_pr_deprecated_variable \|
	\| 28 \| blenderbot - blenderbot3b - blenderbotforcausallm - chatbot - boto3 \| 13 \| 28_blenderbot_blenderbot3b_blenderbotforcausallm_chatbot \|

	</details>

	## Training hyperparameters

	* calculate_probabilities: False
	* language: english
	* low_memory: False
	* min_topic_size: 10
	* n_gram_range: (1, 1)
	* nr_topics: 30
	* seed_topic_list: None
	* top_n_words: 10
	* verbose: True

	## Framework versions

	* Numpy: 1.23.5
	* HDBSCAN: 0.8.33
	* UMAP: 0.5.3
	* Pandas: 1.5.3
	* Scikit-Learn: 1.2.2
	* Sentence-transformers: 2.2.2
	* Transformers: 4.31.0
	* Numba: 0.56.4
	* Plotly: 5.15.0
	* Python: 3.10.12