Scratch Chess Engine - Game of Kings

waabooboo

Arno, what is your experience with pruning mechanisms that skip moves entirely near the leaves (late move pruning, history leaf pruning, things of that nature)?

I tried several of these pruning methods, and in every case it harmed Wolverine's play – node counts increased on most positions. I imagine GoK's excellent evaluation allows for more aggressive pruning of this type, since GoK will be more accurate with its evals near the leaves. Still, I don't understand why Wolverine can't use even conservative late move pruning without completely destabilizing the search… I would greatly appreciate any wisdom you have to share on this subject

ArnoHu

waabooboo wrote:
Arno, what is your experience with pruning mechanisms that skip moves entirely near the leaves (late move pruning, history leaf pruning, things of that nature)?

I tried several of these pruning methods, and in every case it harmed Wolverine's play – node counts increased on most positions. I imagine GoK's excellent evaluation allows for more aggressive pruning of this type, since GoK will be more accurate with its evals near the leaves. Still, I don't understand why Wolverine can't use even conservative late move pruning without completely destabilizing the search… I would greatly appreciate any wisdom you have to share on this subject

I once tried razoring and SEE-based late pruning, didnt work so well. Have not implemented late move pruning yet. In general, not all pruning techniques will pay off on Scratch (or TW), due to the different overhead-ratios. E.g. I try to avoid running evaluation close to the leaves for Classic (required for some approaches), as it is too expensive.

Last edited by ArnoHu (Nov. 23, 2025 09:09:06)

ArnoHu

November 2025 Tournament standings before final round:

1. GoK NNUE       6.0 - 0.0
2. GoK Classic    5.0 - 1.0
3. Black Crow     4.5 - 1.5
4. White Dove     3.0 - 3.0
5. Wolverine      2.0 - 4.0
6. Delta          1.5 - 4.5
7. Shallow Blue   1.0 - 5.0
7. TurboKnight    1.0 - 5.0

https://lichess.org/broadcast/november-2025-scratch-chess-engine-tournament/112025-scet-round-6/bbTCH8zY#players

Final results (BH tie-breaker):

1. GoK NNUE       7.0 - 0.0
2. Black Crow     5.5 - 1.5
3. GoK Classic    5.0 - 2.0
4. White Dove     4.0 - 3.0
5. Shallow Blue   2.0 - 5.0
6. Wolverine      2.0 - 5.0
7. Delta          1.5 - 5.5
8. TurboKnight    1.0 - 6.0

Last edited by ArnoHu (Nov. 23, 2025 15:15:05)

waabooboo

ArnoHu wrote:
waabooboo wrote:
Arno, what is your experience with pruning mechanisms that skip moves entirely near the leaves (late move pruning, history leaf pruning, things of that nature)?

I tried several of these pruning methods, and in every case it harmed Wolverine's play – node counts increased on most positions. I imagine GoK's excellent evaluation allows for more aggressive pruning of this type, since GoK will be more accurate with its evals near the leaves. Still, I don't understand why Wolverine can't use even conservative late move pruning without completely destabilizing the search… I would greatly appreciate any wisdom you have to share on this subject

I once tried razoring and SEE-based late pruning, didnt work so well. Have not implemented late move pruning yet. In general, not all pruning techniques will pay off on Scratch (or TW), due to the different overhead-ratios. E.g. I try to avoid running evaluation close to the leaves for Classic (required for some approaches), as it is too expensive.

Thanks, that gives me some great ideas! LMP often made Wolverine's search tree smaller for a while, but errors accumulate – eventually I would end up with a garbage tree, and the branching factor would explode.

Maybe razoring can help for me, since in Wolverine's case eval is cheap but move generation is expensive. I will give it a shot

ArnoHu

ArnoHu wrote:
ArnoHu wrote:
ArnoHu wrote:
ArnoHu wrote:
ArnoHu wrote:
ArnoHu wrote:
GoK Classic vs. Wolverine 2, 97% vs. 91%:
https://lichess.org/study/SaWdnTo2/6HrmARiT

GoK Classic vs. Wolverine 2, 91% vs. 72%:
https://lichess.org/study/SaWdnTo2/iPvyQPVp

GoK Classic vs. Wolverine 2 (15sec), 96% vs. 92%:
https://lichess.org/study/SaWdnTo2/9hYyzSwE

GoK Classic vs. White Dove, 91% vs. 91%:
https://lichess.org/study/SaWdnTo2/QR08vJWr

GoK Classic vs. Wolverine 2, 94% vs. 94%:
https://lichess.org/study/SaWdnTo2/KBnHbupw

GoK Classic vs. Wolverine 2, 88% vs. 86%:
https://lichess.org/study/SaWdnTo2/W1gORcVB

GoK NNUE vs. Black Crow, 96% vs. 90%:
https://lichess.org/study/SaWdnTo2/kk0gYXln

GoK Classic vs. Wolverine 2, 93% vs. 91%:
https://lichess.org/study/SaWdnTo2/9JQSSekd

GoK Classic vs. Shallow Blue 3 NNUE, 92% vs. 81%:
https://lichess.org/study/SaWdnTo2/35xi9xQh

GoK Classic vs. White Dove, 90% vs. 91%:
https://lichess.org/study/SaWdnTo2/GDhfiZQn

GoK Classic vs. TurboKnight 4, 86% vs. 81%:
https://lichess.org/study/SaWdnTo2/qCHBxV93

GoK Classic vs. White Dove, 95% vs. 90%:
https://lichess.org/study/SaWdnTo2/IkY4FY8T

GoK Classic vs. Wolverine 2, 88% vs. 85%:
https://lichess.org/study/SaWdnTo2/Ao7WhLve

GoK NNUE vs. Black Crow, 97% vs. 87%:
https://lichess.org/study/SaWdnTo2/hMcw0CkK

GoK Classic vs. Wolverine 2, 88% vs. 82%:
https://lichess.org/study/SaWdnTo2/He0543eP

GoK Classic vs. Wolverine 2, 85% vs. 78% (GoK had regression, fixed now):
https://lichess.org/study/SaWdnTo2/YUgusS4Z

GoK NNUE vs. Black Crow, 97% vs. 88%:
https://lichess.org/study/SaWdnTo2/RMYdgjvz

GoK Classic vs. Wolverine 2, 93% vs. 89%:
https://lichess.org/study/SaWdnTo2/0AmUDPOI

GoK NNUE vs. Black Crow, 97% vs. 82%:
https://lichess.org/study/SaWdnTo2/BPPbpzQf

GoK Classic vs. Wolverine 2, 98% vs. 92%:
https://lichess.org/study/SaWdnTo2/nkjR0ZCV

Last edited by ArnoHu (Nov. 23, 2025 18:25:56)

Destructor_chess

Turboknight's bug is really boring… as I said, I will switch to the beta 50 or 52, and stop any update on this engine.

iceysnowman

Hi everybody!
I’m really confused on how depth-first and iterative deepening works. Take for example a negamax framework. We have all of the moves and stuff, but how exactly do we traverse depth-first compared to best first? Because doesn’t it just keep going through the moves of 1 line then back track and go through more lines so on and so forth? How do you make it go through depth-first in the first place? I’ve read the the CPW many times and other articles and I’m still confused, so if anybody has a good explanation on how all of this works it would be very much appreciated

ArnoHu

iceysnowman wrote:
Hi everybody!
I’m really confused on how depth-first and iterative deepening works. Take for example a negamax framework. We have all of the moves and stuff, but how exactly do we traverse depth-first compared to best first? Because doesn’t it just keep going through the moves of 1 line then back track and go through more lines so on and so forth? How do you make it go through depth-first in the first place? I’ve read the the CPW many times and other articles and I’m still confused, so if anybody has a good explanation on how all of this works it would be very much appreciated

I tried to find a YT animation, but the quality of the explanations and visuals is really low on whatever I saw there. The best result for me was this SO posting with a page copied from a book: https://stackoverflow.com/a/52530436/31939130

Last edited by ArnoHu (Nov. 24, 2025 21:04:55)

ArnoHu

ArnoHu wrote:
ArnoHu wrote:
ArnoHu wrote:
ArnoHu wrote:
ArnoHu wrote:
ArnoHu wrote:
ArnoHu wrote:
GoK Classic vs. Wolverine 2, 97% vs. 91%:
https://lichess.org/study/SaWdnTo2/6HrmARiT

GoK Classic vs. Wolverine 2, 91% vs. 72%:
https://lichess.org/study/SaWdnTo2/iPvyQPVp

GoK Classic vs. Wolverine 2 (15sec), 96% vs. 92%:
https://lichess.org/study/SaWdnTo2/9hYyzSwE

GoK Classic vs. White Dove, 91% vs. 91%:
https://lichess.org/study/SaWdnTo2/QR08vJWr

GoK Classic vs. Wolverine 2, 94% vs. 94%:
https://lichess.org/study/SaWdnTo2/KBnHbupw

GoK Classic vs. Wolverine 2, 88% vs. 86%:
https://lichess.org/study/SaWdnTo2/W1gORcVB

GoK NNUE vs. Black Crow, 96% vs. 90%:
https://lichess.org/study/SaWdnTo2/kk0gYXln

GoK Classic vs. Wolverine 2, 93% vs. 91%:
https://lichess.org/study/SaWdnTo2/9JQSSekd

GoK Classic vs. Shallow Blue 3 NNUE, 92% vs. 81%:
https://lichess.org/study/SaWdnTo2/35xi9xQh

GoK Classic vs. White Dove, 90% vs. 91%:
https://lichess.org/study/SaWdnTo2/GDhfiZQn

GoK Classic vs. TurboKnight 4, 86% vs. 81%:
https://lichess.org/study/SaWdnTo2/qCHBxV93

GoK Classic vs. White Dove, 95% vs. 90%:
https://lichess.org/study/SaWdnTo2/IkY4FY8T

GoK Classic vs. Wolverine 2, 88% vs. 85%:
https://lichess.org/study/SaWdnTo2/Ao7WhLve

GoK NNUE vs. Black Crow, 97% vs. 87%:
https://lichess.org/study/SaWdnTo2/hMcw0CkK

GoK Classic vs. Wolverine 2, 88% vs. 82%:
https://lichess.org/study/SaWdnTo2/He0543eP

GoK Classic vs. Wolverine 2, 85% vs. 78% (GoK had regression, fixed now):
https://lichess.org/study/SaWdnTo2/YUgusS4Z

GoK NNUE vs. Black Crow, 97% vs. 88%:
https://lichess.org/study/SaWdnTo2/RMYdgjvz

GoK Classic vs. Wolverine 2, 93% vs. 89%:
https://lichess.org/study/SaWdnTo2/0AmUDPOI

GoK NNUE vs. Black Crow, 97% vs. 82%:
https://lichess.org/study/SaWdnTo2/BPPbpzQf

GoK Classic vs. Wolverine 2, 98% vs. 92%:
https://lichess.org/study/SaWdnTo2/nkjR0ZCV

GoK Classic vs. Wolverine 2, 92% vs. 89%:
https://lichess.org/study/SaWdnTo2/dknBzU2B

GoK Classic (dev version) vs. Wolverine 2, 95% vs. 93%:
https://lichess.org/study/SaWdnTo2/aZEPMqU6

Last edited by ArnoHu (Nov. 25, 2025 17:09:25)

waabooboo

https://www.chessprogramming.org/Kaufman_Test

Have you guys tried this? I am curious how the other engines do. It's a good test and shouldn't be too much effort (took me maybe 20 minutes to set things up so Wolverine could automatically run through all the positions and get a score at the end).

Wolverine's results:

100k nodes per move – 5/25
1M nodes per move (similar to the 5 sec/move time control on my system) – 14/25
5M nodes per move – 17/25

S_P_A_R_T_Test

waabooboo wrote:
https://www.chessprogramming.org/Kaufman_Test

Have you guys tried this? I am curious how the other engines do. It's a good test and shouldn't be too much effort (took me maybe 20 minutes to set things up so Wolverine could automatically run through all the positions and get a score at the end).

Wolverine's results:

100k nodes per move – 5/25
1M nodes per move (similar to the 5 sec/move time control on my system) – 14/25
5M nodes per move – 17/25

Next time I find the time to work on WD / BC I'll add support for EPD testing and I'll let you all know WD + BC scores

internet44

waabooboo wrote:
https://www.chessprogramming.org/Kaufman_Test

Have you guys tried this? I am curious how the other engines do. It's a good test and shouldn't be too much effort (took me maybe 20 minutes to set things up so Wolverine could automatically run through all the positions and get a score at the end).

Wolverine's results:

100k nodes per move – 5/25
1M nodes per move (similar to the 5 sec/move time control on my system) – 14/25
5M nodes per move – 17/25

interesting! I have never seen that before, but that is for sure worth it as a means to compare our engines. I'll cobble something together real quick to get testing. just, well sb isn't set up for node limits, I'm counting nodes while undoing so it will always slightly overshoot the limit, I hope that won't skew results. first test looks acceptable tho, I set the limiter to 10000 and the search stopped at 10039. hopefully that's close enough to have a valid comparison

I suspect sb will do quite poorly on this lol

will report back with results in a couple minutes

ArnoHu

internet44 wrote:
waabooboo wrote:
https://www.chessprogramming.org/Kaufman_Test

Have you guys tried this? I am curious how the other engines do. It's a good test and shouldn't be too much effort (took me maybe 20 minutes to set things up so Wolverine could automatically run through all the positions and get a score at the end).

Wolverine's results:

100k nodes per move – 5/25
1M nodes per move (similar to the 5 sec/move time control on my system) – 14/25
5M nodes per move – 17/25

interesting! I have never seen that before, but that is for sure worth it as a means to compare our engines. I'll cobble something together real quick to get testing. just, well sb isn't set up for node limits, I'm counting nodes while undoing so it will always slightly overshoot the limit, I hope that won't skew results. first test looks acceptable tho, I set the limiter to 10000 and the search stopped at 10039. hopefully that's close enough to have a valid comparison

I suspect sb will do quite poorly on this lol

will report back with results in a couple minutes

https://scratch.mit.edu/discuss/post/7720081/

internet44

ArnoHu wrote:
internet44 wrote:
waabooboo wrote:
https://www.chessprogramming.org/Kaufman_Test

Have you guys tried this? I am curious how the other engines do. It's a good test and shouldn't be too much effort (took me maybe 20 minutes to set things up so Wolverine could automatically run through all the positions and get a score at the end).

Wolverine's results:

100k nodes per move – 5/25
1M nodes per move (similar to the 5 sec/move time control on my system) – 14/25
5M nodes per move – 17/25

interesting! I have never seen that before, but that is for sure worth it as a means to compare our engines. I'll cobble something together real quick to get testing. just, well sb isn't set up for node limits, I'm counting nodes while undoing so it will always slightly overshoot the limit, I hope that won't skew results. first test looks acceptable tho, I set the limiter to 10000 and the search stopped at 10039. hopefully that's close enough to have a valid comparison

I suspect sb will do quite poorly on this lol

will report back with results in a couple minutes

https://scratch.mit.edu/discuss/post/7720081/

ah, this has been discussed before? that was way before my time
anyway, here are SB's results (not great as I thought, I might try switching back to the old LMR config to see if that does better)

100k - 1/25
1m - 9/25
5m - 12/25

internet44

internet44 wrote:
ArnoHu wrote:
internet44 wrote:
waabooboo wrote:
https://www.chessprogramming.org/Kaufman_Test

Have you guys tried this? I am curious how the other engines do. It's a good test and shouldn't be too much effort (took me maybe 20 minutes to set things up so Wolverine could automatically run through all the positions and get a score at the end).

Wolverine's results:

100k nodes per move – 5/25
1M nodes per move (similar to the 5 sec/move time control on my system) – 14/25
5M nodes per move – 17/25

interesting! I have never seen that before, but that is for sure worth it as a means to compare our engines. I'll cobble something together real quick to get testing. just, well sb isn't set up for node limits, I'm counting nodes while undoing so it will always slightly overshoot the limit, I hope that won't skew results. first test looks acceptable tho, I set the limiter to 10000 and the search stopped at 10039. hopefully that's close enough to have a valid comparison

I suspect sb will do quite poorly on this lol

will report back with results in a couple minutes

https://scratch.mit.edu/discuss/post/7720081/

ah, this has been discussed before? that was way before my time
anyway, here are SB's results (not great as I thought, I might try switching back to the old LMR config to see if that does better)

100k - 1/25
1m - 9/25
5m - 12/25

a little better:

100k - 2/25
1m - 11/25
5m - 12/25 (interestingly not the same 12 it found last time)

ArnoHu

internet44 wrote:
internet44 wrote:
ArnoHu wrote:
internet44 wrote:
waabooboo wrote:
https://www.chessprogramming.org/Kaufman_Test

Have you guys tried this? I am curious how the other engines do. It's a good test and shouldn't be too much effort (took me maybe 20 minutes to set things up so Wolverine could automatically run through all the positions and get a score at the end).

Wolverine's results:

100k nodes per move – 5/25
1M nodes per move (similar to the 5 sec/move time control on my system) – 14/25
5M nodes per move – 17/25

interesting! I have never seen that before, but that is for sure worth it as a means to compare our engines. I'll cobble something together real quick to get testing. just, well sb isn't set up for node limits, I'm counting nodes while undoing so it will always slightly overshoot the limit, I hope that won't skew results. first test looks acceptable tho, I set the limiter to 10000 and the search stopped at 10039. hopefully that's close enough to have a valid comparison

I suspect sb will do quite poorly on this lol

will report back with results in a couple minutes

https://scratch.mit.edu/discuss/post/7720081/

ah, this has been discussed before? that was way before my time
anyway, here are SB's results (not great as I thought, I might try switching back to the old LMR config to see if that does better)

100k - 1/25
1m - 9/25
5m - 12/25

a little better:

100k - 2/25
1m - 11/25
5m - 12/25 (interestingly not the same 12 it found last time)

GoK Classic:

100k:  8 / 25
  1m: 16 / 25
  5m: 19 / 25

Although I must admit two of the correct 100k findings were disposed again before reaching 1m. Do they still count for 100k? I thought yes, after all they were found at 100k.

Last edited by ArnoHu (Nov. 27, 2025 21:24:02)

waabooboo

ArnoHu wrote:
internet44 wrote:
internet44 wrote:
ArnoHu wrote:
internet44 wrote:
waabooboo wrote:
https://www.chessprogramming.org/Kaufman_Test

Have you guys tried this? I am curious how the other engines do. It's a good test and shouldn't be too much effort (took me maybe 20 minutes to set things up so Wolverine could automatically run through all the positions and get a score at the end).

Wolverine's results:

100k nodes per move – 5/25
1M nodes per move (similar to the 5 sec/move time control on my system) – 14/25
5M nodes per move – 17/25

interesting! I have never seen that before, but that is for sure worth it as a means to compare our engines. I'll cobble something together real quick to get testing. just, well sb isn't set up for node limits, I'm counting nodes while undoing so it will always slightly overshoot the limit, I hope that won't skew results. first test looks acceptable tho, I set the limiter to 10000 and the search stopped at 10039. hopefully that's close enough to have a valid comparison

I suspect sb will do quite poorly on this lol

will report back with results in a couple minutes

https://scratch.mit.edu/discuss/post/7720081/

ah, this has been discussed before? that was way before my time
anyway, here are SB's results (not great as I thought, I might try switching back to the old LMR config to see if that does better)

100k - 1/25
1m - 9/25
5m - 12/25

a little better:

100k - 2/25
1m - 11/25
5m - 12/25 (interestingly not the same 12 it found last time)

GoK Classic:
100k:  8 / 25
  1m: 16 / 25
  5m: 19 / 25
Although I must admit two of the correct 100k findings were disposed again before reaching 1m. Do they still count for 100k? I thought yes, after all they were found at 100k.

Yes, that's how I scored things as well. I just gathered the moves suggested after 100k, 1M, and 5M. I'm not sure how often Wolverine changed its mind between those three checks, but usually once it finds the right move it doesn't look back…

internet44

I just found a fairly major bug related to TT horizon flags. here's the results of the test with this taken care of:

100k - 4
1m - 15
5m - 16

I'll do some more testing and then push that to main later probably. I doubt I would have noticed that if it wasn't for you bringing up the kaufman test @waabooboo, so thanks for that. this explains the weirdly inconsistent results of the last 2 tests and probably some of the blunders from recent games

Destructor_chess

nodes count is for this depth only, time from the beginning.
1. depth 12, 6.6Mnodes, not found (55 seconds)
2. depth 5, 31Knodes, found (0.45 seconds)
3. depth 8, 896Knodes, found (stopped to think about the bad move at depth 8 and found the 1st bm at depth 8, according to stockfish)
4. depth 9, 706Knodes, found (4.1 seconds)
5. depth 8, 213Knodes, found (1.5 seconds)
6. depth 8, 201Knodes, found (stopped to think about the bad move at depth 8 and found the 2nd bm at depth 8, according to stockfish)
7. depth 4, 8.2Knodes, found (0.16 seconds)
8. depth 13, 12.8Mnodes, not found (53 seconds)
9. unfinished depth 13, found (1 minute)
10. depth 8, 105Knodes, found (1.7 seconds)
11. unfinished depth 14, found (1 minute)
12. depth 13, 2.3Mnodes, not found (31.5 seconds)
13. depth 11, 1.64Mnodes, found (7.4 seconds)
14. unfinished depth 12, found (1 minute)
15. depth 4, 24Knodes, found (0.2 seconds), stopped to see, depth 7, 246Knodes (1.2 seconds), re-found, depth 10, 8.5Mnodes (42 seconds)
16. unfinished depth 23, found (1 minute)
17. depth 7, 183Knodes, found (1.1 seconds)
18. depth 14, 8.1Mnodes, not found (59 seconds)
19. depth 5, 74Knodes, found (0.5 seconds)
20. depth 18, 11.3Mnodes, found (40 seconds)
21. depth 18, 10.4Mnodes, found (35 seconds)
22. depth 21, 4.1Mnodes, not found (56.5 seconds)
23. depth 10, 2Mnodes, not found (16.5 seconds)
24. depth 12, 353Knodes, found (3 seconds)
25. depth 17, 5.3Mnodes, not found (31 seconds)

that's for turboknight 5 of course

Last edited by Destructor_chess (Nov. 28, 2025 18:12:31)

ArnoHu

Kaufman Test with current GoK Classic:

Problem    Found    Nodecount (k)    Comment
 1:          Y              4,154
 2:          Y                155
 3:          Y                  7    Lost again at 101k
 4:          Y              1,127
 5:          Y                248
 6:          Y                  0    Lost again at 22k
 7:          Y                253
 8:          Y               1341
 9:          Y                397
10:          Y                 40
11:          Y                131
12:          
13:          Y                618
14:          Y              1,859
15:          Y                285
16:          Y                106
17:          Y                167
18:          
19:          Y                 59
20:          Y                565
21:          
22:          Y                  2
23:          Y                  1    Lost again at 647k
24:          Y                 52
25:

Last edited by ArnoHu (Nov. 28, 2025 20:08:00)

Discuss Scratch