Create GreaterThan_Detector_NN_ReadMe.md
Browse files
GreaterThan_Detector_NN_ReadMe.md
ADDED
@@ -0,0 +1,197 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# GreaterThan_Detector_NN: A Challenge in Numerical Reasoning
|
2 |
+
|
3 |
+
Of course. Here is a README.md file designed for your Hugging Face repository. It reports the results, presents the core challenge, and provides the dataset generator without disclosing the proprietary model details, inviting the community to tackle the problem.
|
4 |
+
|
5 |
+
GreaterThan_Detector_NN: A Challenge in Numerical Reasoning
|
6 |
+
|
7 |
+
This repository, part of the Neural_Nets_Doing_Simple_Tasks collection, explores a fundamental question: can a general-purpose neural network learn a task that is trivial for humans but requires symbolic reasoning? The specific task is to compare two numbers presented in a natural language format and identify the greater or lesser one.
|
8 |
+
|
9 |
+
The Objective
|
10 |
+
|
11 |
+
The goal is to create a model that can reliably process a text-based prompt, parse two numbers, understand the question ("Which is Greater?" or "Which is Lesser?"), and generate the correct numerical answer.
|
12 |
+
|
13 |
+
The input-output format is a single, continuous text sequence. For example:
|
14 |
+
|
15 |
+
Prompt:
|
16 |
+
|
17 |
+
Generated code
|
18 |
+
10.00 , 09.21 Which is Greater ?
|
19 |
+
|
20 |
+
|
21 |
+
Expected Completion:
|
22 |
+
|
23 |
+
Generated code
|
24 |
+
10.00!
|
25 |
+
IGNORE_WHEN_COPYING_START
|
26 |
+
content_copy
|
27 |
+
download
|
28 |
+
Use code with caution.
|
29 |
+
IGNORE_WHEN_COPYING_END
|
30 |
+
|
31 |
+
This makes the full, correct sequence:
|
32 |
+
10.00 , 09.21 Which is Greater ? 10.00!
|
33 |
+
|
34 |
+
The Dataset
|
35 |
+
|
36 |
+
The training data is synthetically generated. This ensures an endless supply of examples but also presents a challenge: is the generated data diverse enough to teach true reasoning, or does it just encourage brittle pattern matching?
|
37 |
+
|
38 |
+
You can generate your own dataset using the Python function below. This is the exact generator used for our baseline model, ensuring a level playing field.
|
39 |
+
|
40 |
+
Dataset Generator Code
|
41 |
+
|
42 |
+
def generate_synthetic_data(num_samples_per_task=4000):
|
43 |
+
"""
|
44 |
+
Generates a dataset for the numerical comparison task.
|
45 |
+
|
46 |
+
Args:
|
47 |
+
num_samples_per_task (int): The number of samples to generate for each
|
48 |
+
task ('Greater' and 'Lesser').
|
49 |
+
|
50 |
+
Returns:
|
51 |
+
list: A list of text sequences, e.g.,
|
52 |
+
["12.34 , 56.78 Which is Greater ? 56.78!", ...]
|
53 |
+
"""
|
54 |
+
print(f"Generating synthetic data...")
|
55 |
+
all_sequences = []
|
56 |
+
tasks = ['Greater', 'Lesser']
|
57 |
+
|
58 |
+
for task in tasks:
|
59 |
+
for _ in range(num_samples_per_task):
|
60 |
+
a = round(random.uniform(0, 99.99), 2)
|
61 |
+
b = round(random.uniform(0, 99.99), 2)
|
62 |
+
# Ensure the numbers are not identical to avoid ambiguity
|
63 |
+
while a == b:
|
64 |
+
b = round(random.uniform(0, 99.99), 2)
|
65 |
+
|
66 |
+
# Use fixed-width format to normalize number representation
|
67 |
+
a_str = f"{a:05.2f}" # e.g., 01.23
|
68 |
+
b_str = f"{b:05.2f}"
|
69 |
+
|
70 |
+
if task == 'Greater':
|
71 |
+
correct_num_str = a_str if a > b else b_str
|
72 |
+
seq = f"{a_str} , {b_str} Which is Greater ? {correct_num_str}!"
|
73 |
+
all_sequences.append(seq)
|
74 |
+
|
75 |
+
elif task == 'Lesser':
|
76 |
+
correct_num_str = a_str if a < b else b_str
|
77 |
+
seq = f"{a_str} , {b_str} Which is Lesser ? {lesser_num_str}!"
|
78 |
+
all_sequences.append(seq)
|
79 |
+
|
80 |
+
random.shuffle(all_sequences)
|
81 |
+
print(f"Generated {len(all_sequences)} total sequences.")
|
82 |
+
return all_sequences
|
83 |
+
|
84 |
+
if __name__ == '__main__':
|
85 |
+
# To generate a dataset file:
|
86 |
+
dataset = generate_synthetic_data(num_samples_per_task=50000) # Recommend a larger size
|
87 |
+
with open("greater_lesser_dataset.txt", "w") as f:
|
88 |
+
for line in dataset:
|
89 |
+
f.write(line + "\n")
|
90 |
+
print("Dataset saved to greater_lesser_dataset.txt")
|
91 |
+
|
92 |
+
Baseline Model Performance
|
93 |
+
|
94 |
+
A baseline sequence model was trained on a dataset of 8,000 examples generated by the function above. While it learned the general format and solved many cases correctly, its performance on tricky edge cases reveals a critical weakness.
|
95 |
+
|
96 |
+
Prompt Model's Completion Result
|
97 |
+
10.00 , 09.21 Which is Greater ? 10.00! ✅ Correct
|
98 |
+
10.00 , 09.31 Which is Lesser ? 09.31! ✅ Correct
|
99 |
+
54.12 , 54.13 Which is Greater ? 54.13! ✅ Correct
|
100 |
+
99.98 , 99.99 Which is Lesser ? 99.99! ❌ Incorrect
|
101 |
+
00.01 , 10.00 Which is Lesser ? 00.00! ❌ Incorrect
|
102 |
+
Analysis & The Challenge
|
103 |
+
|
104 |
+
The baseline model demonstrates a classic problem in machine learning: it has learned to be a good pattern matcher but has not acquired a robust, generalizable algorithm for numerical comparison.
|
105 |
+
|
106 |
+
The failures are illuminating:
|
107 |
+
|
108 |
+
Edge Case Failure (99.98 vs 99.99): The model failed when the numbers were very close. It may have learned a flawed heuristic (e.g., "pick the second number if they look similar") instead of the actual rule of "lesser than."
|
109 |
+
|
110 |
+
Context Collapse (00.01 vs 10.00): The model correctly identified that 00.01 was the smaller number but failed to generate it accurately, outputting 00.00 instead. This suggests it lost the precise details of the number during generation, defaulting to a common token (0).
|
111 |
+
|
112 |
+
Your challenge is to design and train a model that overcomes these limitations. The goal is to achieve near-perfect accuracy, especially on the tricky edge cases where the baseline fails.
|
113 |
+
|
114 |
+
Key Problems to Solve:
|
115 |
+
|
116 |
+
Robustness to Edge Cases: The model must be able to correctly distinguish between numbers that differ by a very small amount.
|
117 |
+
|
118 |
+
True Numerical Abstraction: The model should move beyond surface-level pattern matching to develop an internal representation that respects the actual magnitude of numbers.
|
119 |
+
|
120 |
+
Structural Generalization: A truly successful model should be robust even if the prompt phrasing or number formatting is slightly altered (a potential future test).
|
121 |
+
|
122 |
+
How to Contribute
|
123 |
+
|
124 |
+
Create a new directory for your solution (e.g., My_GreaterThan_Detector_NN_model/).
|
125 |
+
|
126 |
+
Inside your directory, add your notebook (.ipynb) or scripts (.py) that define, train, and evaluate your model.
|
127 |
+
|
128 |
+
Please include a GreaterThan_Detector_NN_ReadMe.md in your directory explaining your approach, architecture, and results.
|
129 |
+
|
130 |
+
Submit a "community" comment to merge your solution into this repository.
|
131 |
+
|
132 |
+
Let's see who can build and train the most reliable numerical reasoner!
|
133 |
+
|
134 |
+
License
|
135 |
+
|
136 |
+
The def generate_synthetic_data() function is open-sourced under the MIT License. See the LICENSE file for more details.
|
137 |
+
|
138 |
+
|
139 |
+
|
140 |
+
Notes:
|
141 |
+
Generating synthetic data for tasks: ['4digit_Greater', '4digit_Lesser']...
|
142 |
+
Generated 8000 total sequences.
|
143 |
+
|
144 |
+
Model initialized with 0.20M parameters.
|
145 |
+
|
146 |
+
Starting training...
|
147 |
+
step 0: train loss 2.8690, val loss 2.8744
|
148 |
+
step 250: train loss 1.6123, val loss 1.6120
|
149 |
+
step 500: train loss 1.5058, val loss 1.5070
|
150 |
+
step 750: train loss 1.3708, val loss 1.3657
|
151 |
+
step 1000: train loss 1.2398, val loss 1.2389
|
152 |
+
step 1250: train loss 1.2230, val loss 1.2221
|
153 |
+
step 1500: train loss 1.2152, val loss 1.2153
|
154 |
+
step 1750: train loss 1.2106, val loss 1.2110
|
155 |
+
step 2000: train loss 1.2063, val loss 1.2070
|
156 |
+
step 2250: train loss 1.1900, val loss 1.1901
|
157 |
+
step 2500: train loss 1.1651, val loss 1.1653
|
158 |
+
step 2750: train loss 1.1584, val loss 1.1581
|
159 |
+
step 3000: train loss 1.1555, val loss 1.1556
|
160 |
+
step 3250: train loss 1.1531, val loss 1.1543
|
161 |
+
step 3500: train loss 1.1520, val loss 1.1545
|
162 |
+
step 3750: train loss 1.1511, val loss 1.1530
|
163 |
+
step 4000: train loss 1.1506, val loss 1.1522
|
164 |
+
step 4250: train loss 1.1501, val loss 1.1518
|
165 |
+
step 4500: train loss 1.1492, val loss 1.1516
|
166 |
+
step 4750: train loss 1.1484, val loss 1.1506
|
167 |
+
step 5000: train loss 1.1484, val loss 1.1504
|
168 |
+
step 5250: train loss 1.1468, val loss 1.1496
|
169 |
+
...
|
170 |
+
step 98500: train loss 0.6261, val loss 0.8897
|
171 |
+
step 98750: train loss 0.6267, val loss 0.8889
|
172 |
+
step 99000: train loss 0.6245, val loss 0.8924
|
173 |
+
step 99250: train loss 0.6214, val loss 0.8931
|
174 |
+
step 99500: train loss 0.6215, val loss 0.8943
|
175 |
+
step 99750: train loss 0.6239, val loss 0.8877
|
176 |
+
step 99999: train loss 0.6224, val loss 0.8901
|
177 |
+
Training finished!
|
178 |
+
|
179 |
+
--- Inference Demonstration ---
|
180 |
+
Prompt: 10.00 , 09.21 Which is Greater ?
|
181 |
+
Completion: 1 0. 0 0, 0 9. 2 1 Which is Greater? 1 0. 0 0!
|
182 |
+
-------------------------
|
183 |
+
Prompt: 10.00 , 09.31 Which is Lesser ?
|
184 |
+
Completion: 1 0. 0 0, 0 9. 3 1 Which is Lesser? 0 9. 3 1!
|
185 |
+
-------------------------
|
186 |
+
Prompt: 54.12 , 54.13 Which is Greater ?
|
187 |
+
Completion: 5 4. 1 2, 5 4. 1 3 Which is Greater? 5 4. 1 3!
|
188 |
+
-------------------------
|
189 |
+
Prompt: 99.98 , 99.99 Which is Lesser ?
|
190 |
+
Completion: 9 9. 9 8, 9 9. 9 9 Which is Lesser? 9 9. 9 9!
|
191 |
+
-------------------------
|
192 |
+
Prompt: 00.01 , 10.00 Which is Lesser ?
|
193 |
+
Completion: 0 0. 0 1, 1 0. 0 0 Which is Lesser? 0 0. 0 0!
|
194 |
+
-------------------------
|
195 |
+
|
196 |
+
Detected Google Colab environment. Zipping and downloading files...
|
197 |
+
Downloaded greater_than_outputs.zip successfully.
|