Spaces:

SebastienGuissart
/

deeplearning_sudoku_solver

Running

App Files Files Community

deeplearning_sudoku_solver / Readme.md

Sebastien

first commit

4484b8a 10 months ago

preview code

raw

history blame

2.9 kB

	## metrics to track
	- loss per epoch per model boost layer
	- number of error per epoch model boost layer
	- number of resolved puzzles per epochs
	- threshold per epochs per model layer
	- number of filled digits per model boost layer per epoch for both pis ans abs

	## TODO
	- jupyter notebook to python file
	- threshold compute on test set (with adding a gap) each epoch. and training threshold initialised with test thresholds that evolve each error during training.

	## Possible way
	- it might be smart to store the intermitent states as boost layereds "buffers". at the end the first X go to the model layer 0 let write it as puseudo code

	### Method threshold
	```
	global init
	th -> -10
	training step
	init
	pass
	training loop
	keep th behind the error limit
	validation step
	init
	compute_th =-10
	validation loop
	keep compute_th behind error limit + marge
	but use th
	end
	th= compute_th
	```

	### Method training
	```
	Xs -> the x initial batch vector
	Y -> the y batch vector
	Xs' = M0(Xs)
	then we filter Xs'=Y -> resolved sudokus
	Xs'==Xs -> we add the rows to X1 buffer
	and the remaning Xs' is added to X0 buffer.
	```

	then we look at each buffers X0 to Xn and we process each of them that are => batch size.

	When every buffer are smaller than batch size the process is finished.

	object
	```
	Buffers
	get_batch(limit_batch_size=True) -> idx, Xb # Xb could be none. (Xb should be a shuffled sample of the batch)
	add_batch(Xp, idx)
	```


	### Loss optimisation
	Both 0 and 1 target are different in the way we should gradient descend them.
	y==0 point is something easy: it should be as low as possible I thing we can use the usual log loss function on it.
	y==1 is different: there is different case possible:
	- the point could be "unpredictable" in that case the gradient descend should be tuned to low, we expect the predictive function to have a low score.
	- the point could be well predicted in that case we hope the value is prety hight and we would like to the the gradient descend more heavely.
	This could be applied by using a sigmoid centered on the threshold


	### Paper writing

	Les niveaux supérieurs font appel à divers types de chaînes :

	11.6 Dynamic + Dynamic Forcing Chains (145-192 nodes) Cell Forcing Chains
	11.7 Dynamic + Dynamic Forcing Chains (193-288 nodes) Double Forcing Chains
	Ces Dynamic Forcing Chains sont une forme d’essais et erreurs.

	### Trial and error solving technique
	We applied trial and error solving technique to reach 100% accuracy over sudoku. The resoning is simple we find the best digit/position to test and produce 2 children grid one with the number the other without. the we process each grid until one of them break sudoku's rules .

	The V1 of this algorithm should only stopped at 1 trail and error test (no binary tree search) it should be simpler and feasible and if not : we will se an improve and try the next step.