File size: 2,904 Bytes
4484b8a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
## metrics to track
- loss per epoch per model boost layer
- number of error per epoch model boost layer
- number of resolved puzzles per epochs
- threshold per epochs per model layer
- number of filled digits per model boost layer per epoch for both pis ans abs
## TODO
- jupyter notebook to python file
- threshold compute on test set (with adding a gap) each epoch. and training threshold initialised with test thresholds that evolve each error during training.
## Possible way
- it might be smart to store the intermitent states as boost layereds "buffers". at the end the first X go to the model layer 0 let write it as puseudo code
### Method threshold
```
global init
th -> -10
training step
init
pass
training loop
keep th behind the error limit
validation step
init
compute_th =-10
validation loop
keep compute_th behind error limit + marge
but use th
end
th= compute_th
```
### Method training
```
Xs -> the x initial batch vector
Y -> the y batch vector
Xs' = M0(Xs)
then we filter Xs'=Y -> resolved sudokus
Xs'==Xs -> we add the rows to X1 buffer
and the remaning Xs' is added to X0 buffer.
```
then we look at each buffers X0 to Xn and we process each of them that are => batch size.
When every buffer are smaller than batch size the process is finished.
object
```
Buffers
get_batch(limit_batch_size=True) -> idx, Xb # Xb could be none. (Xb should be a shuffled sample of the batch)
add_batch(Xp, idx)
```
### Loss optimisation
Both 0 and 1 target are different in the way we should gradient descend them.
y==0 point is something easy: it should be as low as possible I thing we can use the usual log loss function on it.
y==1 is different: there is different case possible:
- the point could be "unpredictable" in that case the gradient descend should be tuned to low, we expect the predictive function to have a low score.
- the point could be well predicted in that case we hope the value is prety hight and we would like to the the gradient descend more heavely.
This could be applied by using a sigmoid centered on the threshold
### Paper writing
Les niveaux supérieurs font appel à divers types de chaînes :
11.6 Dynamic + Dynamic Forcing Chains (145-192 nodes) Cell Forcing Chains
11.7 Dynamic + Dynamic Forcing Chains (193-288 nodes) Double Forcing Chains
Ces Dynamic Forcing Chains sont une forme d’essais et erreurs.
### Trial and error solving technique
We applied trial and error solving technique to reach 100% accuracy over sudoku. The resoning is simple we find the best digit/position to test and produce 2 children grid one with the number the other without. the we process each grid until one of them break sudoku's rules .
The V1 of this algorithm should only stopped at 1 trail and error test (no binary tree search) it should be simpler and feasible and if not : we will se an improve and try the next step.
|