Burc Gokden commited on
Commit
5ef08e7
1 Parent(s): 409ca8b

Initial commit

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.keras filter=lfs diff=lfs merge=lfs -text
37
+ *.data-* filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,63 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - text-generation
6
+ - large-language-model
7
+ - power-law-decoder-representations
8
+ - pldr-llm
9
+ - tensorflow
10
+ license: apache-2.0
11
+ datasets:
12
+ - tiiuae/falcon-refinedweb
13
+ ---
14
+
15
+ # PLDR-LLM-v5-DAG-2-110M
16
+
17
+ ## Model Description
18
+
19
+ PLDR-LLM-v5-DAG-2-110M is a large language model from power law decoder representations, which is a new language model architecture that utilizes power law graph attention to generate deductive and inductive outputs. This model has a parameter size of 110M. It refers to PLDRv5-DAG-2 whose architecture and training details are provided in Tables 1 and 2 of the research paper titled [PLDR-LLM: Large Language Model from Power Law Decoder Representations](https://arxiv.org/abs/2410.16703).
20
+
21
+ ## Training data
22
+
23
+ PLDR-LLM-v5-DAG-2-110M was pretrained on the [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), a publicly available English web dataset with extensive filtering and deduplication.
24
+
25
+ ## Training procedure
26
+
27
+ This model was trained for ~8B tokens on RefinedWeb over 250k steps per rank. It was trained autoregressively with cross-entropy loss and with DAG regularization on the deductive outputs.
28
+
29
+ ## Intended Use and Limitations
30
+
31
+ This model is intended to be used for research purposes. Given text as input prompt, it carries out next token prediction to generate continuation text. The context length for this model is 1024 tokens.
32
+
33
+ ### How to use
34
+
35
+ - The tensorflow model checkpoint and tokenizer can be loaded into the PLDR-LLM framework to generate text as described in the code repository for training this model: [LLM-from-Power-Law-Decoder-Representations](https://github.com/burcgokden/LLM-from-Power-Law-Decoder-Representations).
36
+
37
+ ### LM Evaluation Harness Support
38
+
39
+ - The keras model can be used with a fork of LM-Evaluation-Harness Suite with PLDR-LLM support: [lm-evaluation-harness-with-PLDR-LLM](https://github.com/burcgokden/lm-evaluation-harness-with-PLDR-LLM).
40
+
41
+ ### Limitations and Biases
42
+
43
+ Large Language Models may generate text that is profane, lewd, socially unacceptable or offensive based on the contents of the dataset it was pretrained. RefinedWeb is a dataset that is as toxic and biased as the Pile. Please see the papers for [RefinedWeb](https://arxiv.org/abs/2306.01116) and [the Pile](https://arxiv.org/pdf/2101.00027) for more information. Moreover, large language models are also susceptible to hallucinations and may generate text that contains incorrect, irrelevant or misleading information. Since it is very hard to expect the contents of generated text ahead of time, the output of the large language models need to be heavily moderated and curated to avoid undesired content to appear without warning.
44
+
45
+ ## Eval results
46
+
47
+ The evaluation results on benchmarks with zero-shot and few-shot setting and their comparison to LLM models of similar size reported in the literature can be found in Tables 3 and 4 of the [PLDR-LLM paper](https://arxiv.org/abs/2410.16703).
48
+
49
+ ### BibTeX entry and citation info
50
+
51
+ Please cite this model as:
52
+
53
+ ```bibtex
54
+ @misc{gokden2024pldrllm,
55
+ title={PLDR-LLM: Large Language Model from Power Law Decoder Representations},
56
+ author={Burc Gokden},
57
+ year={2024},
58
+ eprint={2410.16703},
59
+ archivePrefix={arXiv},
60
+ primaryClass={cs.CL},
61
+ url={https://arxiv.org/abs/2410.16703},
62
+ }
63
+ ```
pldrllmv5-DAG-2-110M.keras ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75feb9f72fafad063cc448d2b6c35f21ab54cdb5d1a1a7bea1b832783d8bace6
3
+ size 440056632
refinedweb-tokenizer-pldr-llm-paper.tar.gz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64ad7731741e37d2df354c9827e24524a18da91f3ac06f214368a5c7331f7097
3
+ size 1842758
tf-checkpoint/pldrllmv5-DAG-2-110M.data-00000-of-00001 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c896d4c8d4610d38a98cd75e3f9d6c9a53beaad5fbac73fcd3afb5f0133f0197
3
+ size 438938348
tf-checkpoint/pldrllmv5-DAG-2-110M.index ADDED
Binary file (46.6 kB). View file