Here's a list of simple tasks potentially suitable for small NNs and later tokenizable for LLMs, inspired by Schmidhuber's "parity" example:

Bitwise/Logical Operations:

Parity: Determine if the number of 1s in a binary sequence is even or odd.

Majority: Determine if more than half of the bits in a binary sequence are 1s.

AND/OR/XOR/NOT: Perform basic logical operations on two binary sequences (As few as two-bit inputs and upwards).

Bit Counting: Count the number of 1s in a binary sequence.

Binary Addition/Subtraction: Add or subtract two binary numbers.

Sequence Manipulation:

Reversal: Reverse the order of elements in a sequence.   (a known capability of large LLMs)

Duplication: Duplicate a given sequence.  (a known capability of large LLMs)

Detect which NN-input position contains a ___ (binary 1, or other number)

Sorting (Simple): Sort a short sequence of numbers or characters in ascending or descending order.  (probably hard for a NN to accomplish, but ...)

Membership: Determine if a specific element is present in a sequence.

n-gram Detection: Identify specific subsequences (n-grams) within a longer sequence.

Arithmetic and Basic Math:

Addition/Subtraction (small numbers): Add or subtract small integers (and up to large integers).

Comparison (greater than, less than, equal to): Compare two small numbers (and scale up to large integers.

Single Digit Multiplication/Division: Perform these operations on single-digit numbers.

Rounding: Round a number to the nearest integer.

String Manipulation (Tokenized):

Palindrome Detection: Determine if a short string is a palindrome.

Character Counting: Count the occurrences of a specific character in a short string (e.g., How man r's in strawberry).

Anagram Detection (simple cases): Determine if two short strings are anagrams of each other.

Tokenization Strategies:

For binary sequences, direct binary representation can be used. For numerical sequences, each number can be a token. For strings, character-level tokenization or subword tokenization can be employed.

Example: Parity Task Tokenization

Input: 10110

Tokenized Input: [1, 0, 1, 1, 0] or ["1", "0", "1", "1", "0"]

Output: 1 (Odd Parity) or 0 (Even Parity), or even ["Odd"], ["Even"]

Why These Tasks are Suitable:

Simplicity: These tasks (probably) can be learned by relatively small NNs with limited computational resources.

Clear Evaluation: The success or failure is easily measurable.

Tokenizability: They can be readily converted into a tokenized format suitable for LLM/Transformer input.

Gradual Complexity: You can increase the complexity (e.g., longer sequences, larger numbers) to test the limits of the evolving model.

By starting with a small NN and progressively incorporating Transformer components while preserving functionality on these simple tasks, you can gain valuable insights into how the architectural changes affect learning and performance, and potentially discover ways to enhance LLM capabilities in these fundamental areas.