|
|
|
|
|
N = 5 |
|
L = 2 |
|
hidden_layer_size = 10 |
|
epochs = 10000 |
|
learning_rate = 0.003 |
|
min_loss_threshold = 0.01 |
|
|
|
|
|
|
|
|
|
|
|
|
|
""" |
|
Python script to compute parity of an N-bit binary number using an N-input neural network with L hidden layers, and one output neuron, in PyTorch. |
|
|
|
The Python script calculates even parity. The code that determines the parity is within the generate_data function: |
|
|
|
parity = sum(bits) % 2 # Even parity |
|
|
|
The sum(bits) calculates the number of 1s in the input bit sequence. The modulo operator % 2 returns 0 if the sum is even, and 1 if the sum is odd. This result is assigned to the parity variable. Because parity is set to 1 only when sum(bits) is odd, parity is '1' when number of bits set to 1 is odd, and therefore this is an even parity calculation. This explicitly calculates even parity: the parity bit is set so the total number of 1s (including the parity bit itself) in the sequence becomes even.[1] |
|
|
|
Even parity is not about having an odd number of bits set to one; it's about ensuring that the total number of 1s (data bits + parity bit) is even. This is how error detection works with parity bits: the receiver knows they should always see an even number of 1s. If they encounter an odd number, it flags a transmission error.[1] The script itself generates random data with a variable bit-width N and then adds this parity bit to compute labels. This combined sequence (with a length N + 1) is never used in the script itself. It is the job of the Neural Network to discover the parity calculation on its own during the training phase, where only the input bits without the parity bit, and the calculated parity bit, are presented to the NN. |
|
|
|
Includes modes for inference and pretraining with explicit gradient computations and backpropagation. Displays real-time error during training in a popup window. |
|
|
|
Inspired by: [1] Aug 28, 2024 Youtube Interview of Juergen Schmidhuber Schmidhuber at youtube =DP454c1K_vQ |
|
See also (seven years ago) True Artificial Intelligence will change everything | Juergen Schmidhuber | TEDxLakeComo www.youtube =-Y7PLaxXUrs |
|
|
|
[Jürgen Schmidhuber, the father of generative AI shares his groundbreaking work in deep learning and artificial intelligence. In this exclusive interview, he discusses the history of AI, some of his contributions to the field, and his vision for the future of intelligent machines. Schmidhuber offers unique insights into the exponential growth of technology and the potential impact of AI on humanity and the universe.] |
|
|
|
In this interview, Schmidhuber stated that LLMs cannot compute "parity" of bits in a binary number sequence, but that "Recurrent" NNs (RNNs) can compute "parity". I wanted to know whether a simple feed forward NNs can compute "parity". (If so, perhaps LLMs actually can compute "parity" if specifically trained to do so) |
|
|
|
Method of the python script informed by: |
|
Create a Basic Neural Network Model - Deep Learning with PyTorch 5 - YouTube =JHWqWIoac2I (2023-06-05) |
|
and |
|
Building a Neural Network with PyTorch in 15 Minutes youtube =mozBidd58VQ |
|
|
|
Loss, as defined, drops smothly and with some bumps visible in the graph, depending on the "random" weights preloaded into the model each run. |
|
Example results: |
|
Epoch [1000/1000], Loss: 0.0395 |
|
Test Accuracy: 1.0000 |
|
[Once, model loss fell to 0.0642 after 2000 epochs and did not fall below .0637 after 9999 epochs. The final sample-tested prediction Accuracy: 0.9700] |
|
This shows that the model typically does not need to run the loss all the way down to loss a less than .01 because the margins at the final neuron are earlier greater than .5 between zero/one logit values. The model script is not specifically written to optimize these margins. The margins just emerged. |
|
|
|
done loading libraries |
|
Epoch [100/1000], Loss: 0.6247 |
|
Epoch [200/1000], Loss: 0.4112 |
|
Epoch [300/1000], Loss: 0.1990 |
|
Epoch [400/1000], Loss: 0.0849 |
|
Epoch [500/1000], Loss: 0.0413 |
|
Epoch [600/1000], Loss: 0.0250 |
|
Epoch [700/1000], Loss: 0.0172 |
|
Epoch [800/1000], Loss: 0.0128 |
|
Epoch [900/1000], Loss: 0.0100 |
|
Epoch [1000/1000], Loss: 0.0081 |
|
|
|
Test Accuracy: 1.0000 |
|
Predictions tensor([1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., |
|
0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0., |
|
0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1., |
|
0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., |
|
0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., |
|
0., 1., 0., 1., 0., 0., 0., 1., 0., 0.]) |
|
Labels tensor([1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., |
|
0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0., |
|
0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1., |
|
0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., |
|
0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., |
|
0., 1., 0., 1., 0., 0., 0., 1., 0., 0.]) |
|
|
|
|
|
The line that calculates test_outputs is: |
|
test_outputs = model(test_data) |
|
|
|
This line uses the model object (an instance of the ParityNet class) and calls its forward method. The code is equivalent to test_outputs = model.forward(test_data). In this line test_data represents a tensor with all the parity bit sequences to check. |
|
|
|
The number of neurons and activations that directly contribute to generating test_outputs depends on the network architecture defined by N, L, and hidden_layer_size. |
|
|
|
Output Layer: The final layer has one output neuron (because we are predicting a single binary value - the parity). This neuron uses a Sigmoid activation function. |
|
|
|
Last Hidden Layer: This layer has hidden_layer_size neurons, each using a ReLU activation function. The output of each neuron in this layer feeds directly into the single output neuron. |
|
|
|
Previous Hidden Layers: If L > 1, there are L-1 previous hidden layers, each also with hidden_layer_size neurons and ReLU activations. The activations of each layer feed into the next. |
|
|
|
Input Layer: The input layer consists of N nodes which represent the input bits, and are directly connected to the first hidden layer. We might consider that a linear activation function is applied to such input layer. |
|
|
|
So, to generate test_outputs, you have the following activations: |
|
|
|
hidden_layer_size * L ReLU activations in the hidden layers. |
|
|
|
1 Sigmoid activation in the output layer. |
|
|
|
N linear activations in the input layer (optional). |
|
|
|
In summary, hidden_layer_size neurons and ReLU activations in the last hidden layer and 1 neuron with a Sigmoid activation in the output layer immediately generate the test_outputs values. All the other (L-1) * hidden_layer_size neurons with ReLU activations in the preceding hidden layers indirectly contribute by feeding into the last hidden layer. Each of the input bits are treated individually via the N nodes of the input layer, that might be considered having a linear activation or no activation at all. |
|
|
|
""" |
|
|
|
|
|
print("load libraries") |
|
print("import torch") |
|
import torch |
|
print("import torch.nn as nn") |
|
import torch.nn as nn |
|
print("import numpy as np") |
|
import numpy as np |
|
print("import matplotlib.pyplot as plt # For the popup error plot") |
|
import matplotlib.pyplot as plt |
|
print("import random") |
|
import random |
|
print("done loading libraries") |
|
|
|
|
|
def generate_data(num_samples, num_bits): |
|
data = [] |
|
labels = [] |
|
for _ in range(num_samples): |
|
bits = [random.randint(0, 1) for _ in range(num_bits)] |
|
parity = sum(bits) % 2 |
|
data.append(bits) |
|
labels.append(parity) |
|
|
|
return torch.tensor(data, dtype=torch.float32), torch.tensor(labels, dtype=torch.float32).reshape(-1, 1) |
|
|
|
train_data, train_labels = generate_data(1000, N) |
|
test_data, test_labels = generate_data(100, N) |
|
|
|
|
|
|
|
class ParityNet(nn.Module): |
|
def __init__(self, input_size, hidden_size, num_hidden_layers, output_size): |
|
super(ParityNet, self).__init__() |
|
layers = [] |
|
layers.append(nn.Linear(input_size, hidden_size)) |
|
layers.append(nn.ReLU()) |
|
for _ in range(num_hidden_layers - 1): |
|
layers.append(nn.Linear(hidden_size, hidden_size)) |
|
layers.append(nn.ReLU()) |
|
layers.append(nn.Linear(hidden_size, output_size)) |
|
layers.append(nn.Sigmoid()) |
|
self.layers = nn.Sequential(*layers) |
|
|
|
|
|
def forward(self, x): |
|
return self.layers(x) |
|
|
|
|
|
|
|
model = ParityNet(N, hidden_layer_size, L, 1) |
|
|
|
|
|
criterion = nn.BCELoss() |
|
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) |
|
|
|
|
|
losses = [] |
|
plt.ion() |
|
fig, ax = plt.subplots() |
|
|
|
|
|
for epoch in range(epochs): |
|
|
|
outputs = model(train_data) |
|
loss = criterion(outputs, train_labels) |
|
|
|
|
|
optimizer.zero_grad() |
|
loss.backward() |
|
optimizer.step() |
|
|
|
|
|
losses.append(loss.item()) |
|
|
|
|
|
|
|
if loss.item() < min_loss_threshold: |
|
print(f"Reached minimum loss threshold of {min_loss_threshold} at epoch {epoch+1}. Stopping training.") |
|
break |
|
|
|
if (epoch + 1) % 100 == 0: |
|
print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}') |
|
|
|
|
|
ax.clear() |
|
ax.plot(losses) |
|
ax.set_title("Training Loss") |
|
ax.set_xlabel("Epoch (x 100)") |
|
ax.set_ylabel("Loss") |
|
plt.draw() |
|
plt.pause(0.01) |
|
|
|
|
|
plt.ioff() |
|
plt.show() |
|
|
|
|
|
|
|
with torch.no_grad(): |
|
test_outputs = model(test_data) |
|
predicted = (test_outputs > 0.5).float() |
|
|
|
accuracy = (predicted == test_labels).sum() / len(test_labels) |
|
print(f'Test Accuracy: {accuracy:.4f}') |
|
print("Predictions", predicted.flatten()) |
|
|
|
print("Labels ", test_labels.flatten()) |
|
|
|
|
|
margins_ones = test_outputs[predicted == 1] - 0.5 |
|
margins_zeros = 0.5 - test_outputs[predicted == 0] |
|
|
|
|
|
|
|
if margins_ones.numel() > 0: |
|
min_margin_ones = margins_ones.min().item() |
|
max_margin_ones = margins_ones.max().item() |
|
avg_margin_ones = margins_ones.mean().item() |
|
|
|
print(f"Min Margin (Ones): {min_margin_ones:.2f}") |
|
print(f"Max Margin (Ones): {max_margin_ones:.2f}") |
|
print(f"Avg Margin (Ones): {avg_margin_ones:.2f}") |
|
print("Margins (Ones):", margins_ones.flatten().numpy()) |
|
else: |
|
print("No predictions of 1 in the test dataset.") |
|
|
|
|
|
if margins_zeros.numel() > 0: |
|
|
|
min_margin_zeros = margins_zeros.min().item() |
|
max_margin_zeros = margins_zeros.max().item() |
|
avg_margin_zeros = margins_zeros.mean().item() |
|
|
|
print(f"Min Margin (Zeros): {min_margin_zeros:.2f}") |
|
print(f"Max Margin (Zeros): {max_margin_zeros:.2f}") |
|
print(f"Avg Margin (Zeros): {avg_margin_zeros:.2f}") |
|
print("Margins (Zeros):", margins_zeros.flatten().numpy()) |
|
|
|
else: |
|
print("No predictions of 0 in the test dataset.") |
|
|
|
|
|
|