GreaterThan_Detector_NN: A Challenge in Numerical Reasoning
Here is a GreaterThan_Detector_NN_ReadMe.md challenge defintion file. It reports the results, presents the core challenge, and provides the dataset generator, inviting the community to tackle the same problem.
GreaterThan_Detector_NN: A Challenge in Numerical Reasoning
This repository, part of the Neural_Nets_Doing_Simple_Tasks collection, explores a fundamental question: can a general-purpose neural network learn a task that is trivial for humans but requires symbolic reasoning? The specific task is to compare two numbers presented in a natural language format and identify the greater or lesser one.
The Objective
The goal is to create a model that can reliably process a text-based prompt, parse two numbers, understand the question ("Which is Greater?" or "Which is Lesser?"), and generate the correct numerical answer.
The input-output format is a single, continuous text sequence. For example:
Prompt:
Generated code 10.00 , 09.21 Which is Greater ?
Expected Completion:
10.00!
This makes the full, correct sequence: 10.00 , 09.21 Which is Greater ? 10.00!
The Dataset
The training data is synthetically generated. This ensures an endless supply of examples but also presents a challenge: is the generated data diverse enough to teach true reasoning, or does it just encourage brittle pattern matching?
You can generate your own dataset using the Python function below. This is the exact generator used for our baseline model, ensuring a level playing field.
Dataset Generator Code
def generate_synthetic_data(num_samples_per_task=4000): """ Generates a dataset for the numerical comparison task.
Args:
num_samples_per_task (int): The number of samples to generate for each
task ('Greater' and 'Lesser').
Returns:
list: A list of text sequences, e.g.,
["12.34 , 56.78 Which is Greater ? 56.78!", ...]
"""
print(f"Generating synthetic data...")
all_sequences = []
tasks = ['Greater', 'Lesser']
for task in tasks:
for _ in range(num_samples_per_task):
a = round(random.uniform(0, 99.99), 2)
b = round(random.uniform(0, 99.99), 2)
# Ensure the numbers are not identical to avoid ambiguity
while a == b:
b = round(random.uniform(0, 99.99), 2)
# Use fixed-width format to normalize number representation
a_str = f"{a:05.2f}" # e.g., 01.23
b_str = f"{b:05.2f}"
if task == 'Greater':
correct_num_str = a_str if a > b else b_str
seq = f"{a_str} , {b_str} Which is Greater ? {correct_num_str}!"
all_sequences.append(seq)
elif task == 'Lesser':
correct_num_str = a_str if a < b else b_str
seq = f"{a_str} , {b_str} Which is Lesser ? {lesser_num_str}!"
all_sequences.append(seq)
random.shuffle(all_sequences)
print(f"Generated {len(all_sequences)} total sequences.")
return all_sequences
if name == 'main': # To generate a dataset file: dataset = generate_synthetic_data(num_samples_per_task=50000) # Recommend a larger size with open("greater_lesser_dataset.txt", "w") as f: for line in dataset: f.write(line + "\n") print("Dataset saved to greater_lesser_dataset.txt")
Baseline Model Performance
A baseline sequence model was trained on a dataset of 8,000 examples generated by the function above. While it learned the general format and solved many cases correctly, its performance on tricky edge cases reveals a critical weakness.
Prompt Model's Completion Result 10.00 , 09.21 Which is Greater ? 10.00! ✅ Correct 10.00 , 09.31 Which is Lesser ? 09.31! ✅ Correct 54.12 , 54.13 Which is Greater ? 54.13! ✅ Correct 99.98 , 99.99 Which is Lesser ? 99.99! ❌ Incorrect 00.01 , 10.00 Which is Lesser ? 00.00! ❌ Incorrect
Analysis & The Challenge
The baseline model demonstrates a classic problem in machine learning: it has learned to be a good pattern matcher but has not acquired a robust, generalizable algorithm for numerical comparison.
The failures are illuminating:
Edge Case Failure (99.98 vs 99.99): The model failed when the numbers were very close. It may have learned a flawed heuristic (e.g., "pick the second number if they look similar") instead of the actual rule of "lesser than."
Context Collapse (00.01 vs 10.00): The model correctly identified that 00.01 was the smaller number but failed to generate it accurately, outputting 00.00 instead. This suggests it lost the precise details of the number during generation, defaulting to a common token (0).
Your challenge is to design and train a model that overcomes these limitations. The goal is to achieve near-perfect accuracy, especially on the tricky edge cases where the baseline fails.
Key Problems to Solve:
Robustness to Edge Cases: The model must be able to correctly distinguish between numbers that differ by a very small amount.
True Numerical Abstraction: The model should move beyond surface-level pattern matching to develop an internal representation that respects the actual magnitude of numbers.
Structural Generalization: A truly successful model should be robust even if the prompt phrasing or number formatting is slightly altered (a potential future test).
How to Contribute
Create a new directory for your solution (e.g., My_GreaterThan_Detector_NN_model/).
Inside your directory, add your notebook (.ipynb) or scripts (.py) that define, train, and evaluate your model.
Please include a GreaterThan_Detector_NN_ReadMe.md in your directory explaining your approach, architecture, and results.
Submit a "community" comment to link your solution to this repository.
Let's see who can build and train the most reliable numerical "Greater Than" reasoner!
License
The def generate_synthetic_data() function is open-sourced under the MIT License. See the LICENSE file for more details.
Notes: Generating synthetic data for tasks: ['4digit_Greater', '4digit_Lesser']... Generated 8000 total sequences.
Model initialized with 0.20M parameters.
Starting training... step 0: train loss 2.8690, val loss 2.8744 step 250: train loss 1.6123, val loss 1.6120 step 500: train loss 1.5058, val loss 1.5070 step 750: train loss 1.3708, val loss 1.3657 step 1000: train loss 1.2398, val loss 1.2389 step 1250: train loss 1.2230, val loss 1.2221 step 1500: train loss 1.2152, val loss 1.2153 step 1750: train loss 1.2106, val loss 1.2110 step 2000: train loss 1.2063, val loss 1.2070 step 2250: train loss 1.1900, val loss 1.1901 step 2500: train loss 1.1651, val loss 1.1653 step 2750: train loss 1.1584, val loss 1.1581 step 3000: train loss 1.1555, val loss 1.1556 step 3250: train loss 1.1531, val loss 1.1543 step 3500: train loss 1.1520, val loss 1.1545 step 3750: train loss 1.1511, val loss 1.1530 step 4000: train loss 1.1506, val loss 1.1522 step 4250: train loss 1.1501, val loss 1.1518 step 4500: train loss 1.1492, val loss 1.1516 step 4750: train loss 1.1484, val loss 1.1506 step 5000: train loss 1.1484, val loss 1.1504 step 5250: train loss 1.1468, val loss 1.1496 ... step 98500: train loss 0.6261, val loss 0.8897 step 98750: train loss 0.6267, val loss 0.8889 step 99000: train loss 0.6245, val loss 0.8924 step 99250: train loss 0.6214, val loss 0.8931 step 99500: train loss 0.6215, val loss 0.8943 step 99750: train loss 0.6239, val loss 0.8877 step 99999: train loss 0.6224, val loss 0.8901 Training finished!
--- Inference Demonstration --- Prompt: 10.00 , 09.21 Which is Greater ? Completion: 1 0. 0 0, 0 9. 2 1 Which is Greater? 1 0. 0 0!
Prompt: 10.00 , 09.31 Which is Lesser ? Completion: 1 0. 0 0, 0 9. 3 1 Which is Lesser? 0 9. 3 1!
Prompt: 54.12 , 54.13 Which is Greater ? Completion: 5 4. 1 2, 5 4. 1 3 Which is Greater? 5 4. 1 3!
Prompt: 99.98 , 99.99 Which is Lesser ? Completion: 9 9. 9 8, 9 9. 9 9 Which is Lesser? 9 9. 9 9!
Prompt: 00.01 , 10.00 Which is Lesser ? Completion: 0 0. 0 1, 1 0. 0 0 Which is Lesser? 0 0. 0 0!
Detected Google Colab environment. Zipping and downloading files... Downloaded greater_than_outputs.zip successfully.