|
## metrics to track |
|
- loss per epoch per model boost layer |
|
- number of error per epoch model boost layer |
|
- number of resolved puzzles per epochs |
|
- threshold per epochs per model layer |
|
- number of filled digits per model boost layer per epoch for both pis ans abs |
|
|
|
## TODO |
|
- jupyter notebook to python file |
|
- threshold compute on test set (with adding a gap) each epoch. and training threshold initialised with test thresholds that evolve each error during training. |
|
|
|
## Possible way |
|
- it might be smart to store the intermitent states as boost layereds "buffers". at the end the first X go to the model layer 0 let write it as puseudo code |
|
|
|
### Method threshold |
|
``` |
|
global init |
|
th -> -10 |
|
training step |
|
init |
|
pass |
|
training loop |
|
keep th behind the error limit |
|
validation step |
|
init |
|
compute_th =-10 |
|
validation loop |
|
keep compute_th behind error limit + marge |
|
but use th |
|
end |
|
th= compute_th |
|
``` |
|
|
|
### Method training |
|
``` |
|
Xs -> the x initial batch vector |
|
Y -> the y batch vector |
|
Xs' = M0(Xs) |
|
then we filter Xs'=Y -> resolved sudokus |
|
Xs'==Xs -> we add the rows to X1 buffer |
|
and the remaning Xs' is added to X0 buffer. |
|
``` |
|
|
|
then we look at each buffers X0 to Xn and we process each of them that are => batch size. |
|
|
|
When every buffer are smaller than batch size the process is finished. |
|
|
|
object |
|
``` |
|
Buffers |
|
get_batch(limit_batch_size=True) -> idx, Xb # Xb could be none. (Xb should be a shuffled sample of the batch) |
|
add_batch(Xp, idx) |
|
``` |
|
|
|
|
|
### Loss optimisation |
|
Both 0 and 1 target are different in the way we should gradient descend them. |
|
y==0 point is something easy: it should be as low as possible I thing we can use the usual log loss function on it. |
|
y==1 is different: there is different case possible: |
|
- the point could be "unpredictable" in that case the gradient descend should be tuned to low, we expect the predictive function to have a low score. |
|
- the point could be well predicted in that case we hope the value is prety hight and we would like to the the gradient descend more heavely. |
|
This could be applied by using a sigmoid centered on the threshold |
|
|
|
|
|
### Paper writing |
|
|
|
Les niveaux supérieurs font appel à divers types de chaînes : |
|
|
|
11.6 Dynamic + Dynamic Forcing Chains (145-192 nodes) Cell Forcing Chains |
|
11.7 Dynamic + Dynamic Forcing Chains (193-288 nodes) Double Forcing Chains |
|
Ces Dynamic Forcing Chains sont une forme d’essais et erreurs. |
|
|
|
### Trial and error solving technique |
|
We applied trial and error solving technique to reach 100% accuracy over sudoku. The resoning is simple we find the best digit/position to test and produce 2 children grid one with the number the other without. the we process each grid until one of them break sudoku's rules . |
|
|
|
The V1 of this algorithm should only stopped at 1 trail and error test (no binary tree search) it should be simpler and feasible and if not : we will se an improve and try the next step. |
|
|
|
|