Scratch Chess Engine - Game of Kings

birdracerthree

S_P_A_R_T wrote:

ArnoHu wrote:

Scratch Chess Engine Ranking (Scratch 3 Runtime)

Rank	Name		Elo	+	-	games	score	oppo.	draws
1	GoK		1713	160	124	25	100%	28	0%
2	Element		1542	210	246	6	33%	283	0%
3	Bonsai		1524	180	168	10	60%	125	0%
4	White Dove	1422	180	185	9	44%	122	0%
5	Archimedes	1400	167	165	10	50%	87	20%
6	HarleyK		1340	260	311	4	25%	184	0%
7	The Turk	1335	204	238	7	29%	128	0%
8	Shallow Blue	1331	204	204	5	50%	9	20%
9	Frenchgamerlol	1317	242	251	4	38%	51	25%
10	LowDoor		1315	220	228	5	40%	46	0%
11	Chip		1307	196	229	6	25%	96	17%
12	Scurious	1299	190	190	5	50%	-49	60%
13	Wolverine	1275	305	470	3	0%	231	0%
14	Pseudo		1271	331	479	2	0%	172	0%
15	U0		1237	342	481	2	0%	164	0%
16	Mystery		1185	273	402	3	0%	107	0%
17	Midecah		1136	253	410	4	0%	87	0%

Scratch Chess Engine Ranking (TurboWarp Runtime)

Rank	Name		Elo	+	-	games	score	oppo.	draws
1	GoK		2114	118	91	52	90%	59	4%
2	White Dove	1796	64	66	72	40%	184	17%
3	Element		1766	64	66	72	38%	172	15%
4	Bonsai		1649	156	144	11	59%	-53	27%
5	Thundershark	1543	146	163	12	25%	50	17%
6	Shallow Blue	1529	258	298	3	17%	69	33%
7	Scurious	1502	143	164	10	25%	-59	30%

Interesting stuff! I wonder how WD vs Bonsai games will go, especially considering that I've fixed a few S3 WD related issues.

We’ll see, but those WD 6.5 games against Element really lowered its rating (pre 6+7 WD didn’t have limited quiescence on S3 runtime, causing losing captures). I’m currently running some S3 runtime games right now. I think that some S3 vs Turbowarp games might help to adjust the ratings on GoK on S3

Last edited by birdracerthree (March 26, 2024 20:59:38)

ArnoHu

birdracerthree wrote:
S_P_A_R_T wrote:
ArnoHu wrote:
Scratch Chess Engine Ranking (Scratch 3 Runtime)
Rank	Name		Elo	+	-	games	score	oppo.	draws
1	GoK		1713	160	124	25	100%	28	0%
2	Element		1542	210	246	6	33%	283	0%
3	Bonsai		1524	180	168	10	60%	125	0%
4	White Dove	1422	180	185	9	44%	122	0%
5	Archimedes	1400	167	165	10	50%	87	20%
6	HarleyK		1340	260	311	4	25%	184	0%
7	The Turk	1335	204	238	7	29%	128	0%
8	Shallow Blue	1331	204	204	5	50%	9	20%
9	Frenchgamerlol	1317	242	251	4	38%	51	25%
10	LowDoor		1315	220	228	5	40%	46	0%
11	Chip		1307	196	229	6	25%	96	17%
12	Scurious	1299	190	190	5	50%	-49	60%
13	Wolverine	1275	305	470	3	0%	231	0%
14	Pseudo		1271	331	479	2	0%	172	0%
15	U0		1237	342	481	2	0%	164	0%
16	Mystery		1185	273	402	3	0%	107	0%
17	Midecah		1136	253	410	4	0%	87	0%
Scratch Chess Engine Ranking (TurboWarp Runtime)
Rank	Name		Elo	+	-	games	score	oppo.	draws
1	GoK		2114	118	91	52	90%	59	4%
2	White Dove	1796	64	66	72	40%	184	17%
3	Element		1766	64	66	72	38%	172	15%
4	Bonsai		1649	156	144	11	59%	-53	27%
5	Thundershark	1543	146	163	12	25%	50	17%
6	Shallow Blue	1529	258	298	3	17%	69	33%
7	Scurious	1502	143	164	10	25%	-59	30%
Interesting stuff! I wonder how WD vs Bonsai games will go, especially considering that I've fixed a few S3 WD related issues.
We’ll see, but those WD 6.5 games against Element really lowered its rating (pre 6+7 WD didn’t have limited quiescence on S3 runtime, causing losing captures). I’m currently running some S3 runtime games right now. I think that some S3 vs Turbowarp games might help to adjust the ratings on GoK on S3

I will apply a sliding window approach. Over time and more games, I will age out older ones. GoK also has some draws in there which were caused by a regression.

ArnoHu

I constantly run into the missing-a1-rook-on-startup problem with Element. It is a race condition during startup (when-flag-clicked). As a quick fix I added a wait(1) there in the Element v1.4862 sprite, which is ugly but works. Maybe you want to consider to introduce a structured startup handling by one central controller, broadcasting a certain order of startup messages?

Two good games by both engines, GoK and Element:

Game #1 (Scratch 3): GoK (Medium) vs. Element (3+8), GoK wins in 58 moves, 96% vs. 89% accuracy: https://lichess.org/study/v3EKTlR2/lxH2g7Gh
Game #2 (TurboWarp): GoK (Medium) vs. Element (5+8), GoK wins in 37 moves, 98% vs. 88% accuracy: https://lichess.org/study/oWyPldeN/F482kGTb

Last edited by ArnoHu (March 27, 2024 01:59:18)

birdracerthree

ArnoHu wrote:
I constantly run into the missing-a1-rook-on-startup problem with Element. It is a race condition during startup (when-flag-clicked). As a quick fix I added a wait(1) there in the Element v1.4862 sprite, which is ugly but works. Maybe you want to consider to introduce a structured startup handling by one central controller, broadcasting a certain order of startup messages?

Two good games by both engines, GoK and Element:

Game #1 (Scratch 3): GoK (Medium) vs. Element (3+8), GoK wins in 58 moves, 96% vs. 89% accuracy: https://lichess.org/study/v3EKTlR2/lxH2g7Gh
Game #2 (TurboWarp): GoK (Medium) vs. Element (5+8), GoK wins in 37 moves, 98% vs. 88% accuracy: https://lichess.org/study/oWyPldeN/F482kGTb

I’ll fix it tomorrow. I should be able to generate the evaluation board first on startup; that will be enough to stop the condition.
As promised, I have fixed the issue.

Why Element 5+8 over 6+8?

I have put 3 Element vs WD games into the Element vs Engines study (S3 runtime). Result is 1.5-1.5, a lot better than last time. 6.93 helped WD a lot.

Last edited by birdracerthree (March 27, 2024 13:15:57)

ArnoHu

birdracerthree wrote:
ArnoHu wrote:
I constantly run into the missing-a1-rook-on-startup problem with Element. It is a race condition during startup (when-flag-clicked). As a quick fix I added a wait(1) there in the Element v1.4862 sprite, which is ugly but works. Maybe you want to consider to introduce a structured startup handling by one central controller, broadcasting a certain order of startup messages?

Two good games by both engines, GoK and Element:

Game #1 (Scratch 3): GoK (Medium) vs. Element (3+8), GoK wins in 58 moves, 96% vs. 89% accuracy: https://lichess.org/study/v3EKTlR2/lxH2g7Gh
Game #2 (TurboWarp): GoK (Medium) vs. Element (5+8), GoK wins in 37 moves, 98% vs. 88% accuracy: https://lichess.org/study/oWyPldeN/F482kGTb
I’ll fix it tomorrow. I should be able to generate the evaluation board first on startup; that will be enough to stop the condition.

Why Element 5+8 over 6+8?

Because GoK was also on Medium, hence similar think time. 6+8 frequently runs into browser timeouts (although it recovers), I will let it play against GoK Difficult next.

Here it is:

Game #3 (TurboWarp): GoK (Difficult) vs. Element (6+8), GoK wins in 44 moves, 95% vs. 87% accuracy: https://lichess.org/study/oWyPldeN/gDiUUu4Z

I paid closer attention, Element is faster now than it used to be.

Last edited by ArnoHu (March 27, 2024 02:57:10)

ArnoHu

Hi all,

I finally was able to address the issue with dynamic evaluations (AKA eval fixups, that mainly compensate for limited search depth, like throwing bishops and knights against pawn shelters (maybe capturing pawn + rook), with short-term positional gains but long-term material loss), and the way they bubble up the node evaluation tree and are stored in the transposition table. Any evaluation containing such dynamic components cannot be re-used in the next search run (e.g. as the capture sequence might have started already). The main issue was with standing pat, which by definition often is dynamic.

I fixed that now in the GoK Dev version at https://scratch.mit.edu/projects/828094886 - the change was not trivial, and it is still undergoing testing. So I would appreciated data on any test games you might be doing in the meantime, resp. mistakes you might encounter

The gain so far seems to be ~30% speedup during midgame on TurboWarp (mainly when there are enough capture-sequences at some point during search), which translate to 0,5 plies of search depth gained (in average), thanks to improved transposition table node eval cache hits.

Thank you!

HasiLover

Scurious 2.1 May just beat GoK Difficult, its up a Queen against Rook and Bishop and Pawn. Stockfish says its -5 for GoK
It sadly blundered a Pawn and GoK will have 3 strong Passed Pawns 3 Squares away from Promotion

New GoK Version beats Scurious 2.1 in 50 Moves without a Queen:https://lichess.org/SSI22lCx#99

Last edited by HasiLover (March 27, 2024 10:02:21)

ArnoHu

HasiLover wrote:
Scurious 2.1 May just beat GoK Difficult, its up a Queen against Rook and Bishop and Pawn. Stockfish says its -5 for GoK
It sadly blundered a Pawn and GoK will have 3 strong Passed Pawns 3 Squares away from Promotion

Oh no! I am very interested in PGN data. Scratch 3 I suppose? Scurious on 4 or 5 plies?

Material-wise RBP > Q. Stockfish likely sees something far beyond their search horizon.

Last edited by ArnoHu (March 27, 2024 10:00:31)

HasiLover

ArnoHu wrote:
HasiLover wrote:
Scurious 2.1 May just beat GoK Difficult, its up a Queen against Rook and Bishop and Pawn. Stockfish says its -5 for GoK
It sadly blundered a Pawn and GoK will have 3 strong Passed Pawns 3 Squares away from Promotion

Oh no! I am very interested in PGN data. Scratch 3 I suppose? Scurious on 4 or 5 plies?

Material-wise RBP > Q. Stockfish likely sees something far beyond their search horizon.

I ran it on TW, but with my PC thats like the same thing as S3

ArnoHu

HasiLover wrote:
ArnoHu wrote:
HasiLover wrote:
Scurious 2.1 May just beat GoK Difficult, its up a Queen against Rook and Bishop and Pawn. Stockfish says its -5 for GoK
It sadly blundered a Pawn and GoK will have 3 strong Passed Pawns 3 Squares away from Promotion

Oh no! I am very interested in PGN data. Scratch 3 I suppose? Scurious on 4 or 5 plies?

Material-wise RBP > Q. Stockfish likely sees something far beyond their search horizon.
I ran it on TW, but with my PC thats like the same thing as S3

Do you have PGN export?

HasiLover

ArnoHu wrote:
HasiLover wrote:
ArnoHu wrote:
HasiLover wrote:
Scurious 2.1 May just beat GoK Difficult, its up a Queen against Rook and Bishop and Pawn. Stockfish says its -5 for GoK
It sadly blundered a Pawn and GoK will have 3 strong Passed Pawns 3 Squares away from Promotion

Oh no! I am very interested in PGN data. Scratch 3 I suppose? Scurious on 4 or 5 plies?

Material-wise RBP > Q. Stockfish likely sees something far beyond their search horizon.
I ran it on TW, but with my PC thats like the same thing as S3

Do you have PGN export?

What? The Link is in the old Message.https://lichess.org/SSI22lCx#99

birdracerthree

ArnoHu wrote:
birdracerthree wrote:
ArnoHu wrote:
I constantly run into the missing-a1-rook-on-startup problem with Element. It is a race condition during startup (when-flag-clicked). As a quick fix I added a wait(1) there in the Element v1.4862 sprite, which is ugly but works. Maybe you want to consider to introduce a structured startup handling by one central controller, broadcasting a certain order of startup messages?

Two good games by both engines, GoK and Element:

Game #1 (Scratch 3): GoK (Medium) vs. Element (3+8), GoK wins in 58 moves, 96% vs. 89% accuracy: https://lichess.org/study/v3EKTlR2/lxH2g7Gh
Game #2 (TurboWarp): GoK (Medium) vs. Element (5+8), GoK wins in 37 moves, 98% vs. 88% accuracy: https://lichess.org/study/oWyPldeN/F482kGTb
I’ll fix it tomorrow. I should be able to generate the evaluation board first on startup; that will be enough to stop the condition.

Why Element 5+8 over 6+8?

Because GoK was also on Medium, hence similar think time. 6+8 frequently runs into browser timeouts (although it recovers), I will let it play against GoK Difficult next.

Here it is:

Game #3 (TurboWarp): GoK (Difficult) vs. Element (6+8), GoK wins in 44 moves, 95% vs. 87% accuracy: https://lichess.org/study/oWyPldeN/gDiUUu4Z

I paid closer attention, Element is faster now than it used to be.

That’s strange… I checked my notes and it looks like the last speed upgrade to Element was v1.48 Full Release (that was months ago). I’ll have to look into better king’s gambit lines in the meantime

ArnoHu

HasiLover wrote:
ArnoHu wrote:
HasiLover wrote:
ArnoHu wrote:
HasiLover wrote:
Scurious 2.1 May just beat GoK Difficult, its up a Queen against Rook and Bishop and Pawn. Stockfish says its -5 for GoK
It sadly blundered a Pawn and GoK will have 3 strong Passed Pawns 3 Squares away from Promotion

Oh no! I am very interested in PGN data. Scratch 3 I suppose? Scurious on 4 or 5 plies?

Material-wise RBP > Q. Stockfish likely sees something far beyond their search horizon.
I ran it on TW, but with my PC thats like the same thing as S3

Do you have PGN export?
What? The Link is in the old Message.https://lichess.org/SSI22lCx#99

Thanks, I didn't re-read.

GoK's blunder at move 9 would have taken another 19 plies to lead to any material loss, far beyond search horizon of any Scratch chess engine. The worst deficit was 3.2, Element and WD have held similar and larger advantages during midgame.

Last edited by ArnoHu (March 27, 2024 21:21:55)

ArnoHu

ArnoHu wrote:
Hi all,

I finally was able to address the issue with dynamic evaluations (AKA eval fixups, that mainly compensate for limited search depth, like throwing bishops and knights against pawn shelters (maybe capturing pawn + rook), with short-term positional gains but long-term material loss), and the way they bubble up the node evaluation tree and are stored in the transposition table. Any evaluation containing such dynamic components cannot be re-used in the next search run (e.g. as the capture sequence might have started already). The main issue was with standing pat, which by definition often is dynamic.

I fixed that now in the GoK Dev version at https://scratch.mit.edu/projects/828094886 - the change was not trivial, and it is still undergoing testing. So I would appreciated data on any test games you might be doing in the meantime, resp. mistakes you might encounter

The gain so far seems to be ~30% speedup during midgame on TurboWarp (mainly when there are enough capture-sequences at some point during search), which translate to 0,5 plies of search depth gained (in average), thanks to improved transposition table node eval cache hits.

Thank you!

Game #1: GoK 6.405 (TW. Medium) wins against GoK 6.404 in 31 (!) moves, 96% vs. 86% accuracy: https://lichess.org/UPPIDAq5#62
Game #2: GoK 6.405 (S3. Medium) wins against GoK 6.404 in 62 moves, 93% vs. 89% accuracy: https://lichess.org/N9hmF029#123

Last edited by ArnoHu (March 27, 2024 20:52:28)

S_P_A_R_T

White Dove v7.0 has been released!

This version tuned LMR (now it's only forced to search the first 3 moves at full depth), and also fixed the NMP by changing it to fail-soft, instead of fail-hard, because WD itself is fail-soft.

This should hopefully make WD stronger on both S3 and TW!

Check out Space Program Simulator!

In it, you can build your own rockets from a variety of parts!
Then fly it with realistic orbital mechanics.

Go to orbit, explore different planets, share your save codes, and do so much more!

If you would like to help out on the project or chat about space or really anything else, check out the offical SPS Studio!

For more information & tutorials, check out the offical forum post!

birdracerthree

S_P_A_R_T wrote:
White Dove v7.0 has been released!

This version tuned LMR (now it's only forced to search the first 3 moves at full depth), and also fixed the NMP by changing it to fail-soft, instead of fail-hard, because WD itself is fail-soft.

This should hopefully make WD stronger on both S3 and TW!

You don't think that the LMR is too aggressive with WD's (relatively) poor move ordering?

Flashback : 1r1k2r1/p2b2b1/Q1np3p/1ppN1p1B/2P1P2P/1P4P1/Pq1N1P2/3KRR2 b - - 2 35

Last edited by birdracerthree (March 27, 2024 22:11:52)

S_P_A_R_T

birdracerthree wrote:
S_P_A_R_T wrote:
White Dove v7.0 has been released!

This version tuned LMR (now it's only forced to search the first 3 moves at full depth), and also fixed the NMP by changing it to fail-soft, instead of fail-hard, because WD itself is fail-soft.

This should hopefully make WD stronger on both S3 and TW!
You don't think that the LMR is too aggressive with WD's (relatively) poor move ordering?

Flashback : 1r1k2r1/p2b2b1/Q1np3p/1ppN1p1B/2P1P2P/1P4P1/Pq1N1P2/3KRR2 b - - 2 35

WD never really was able to solve this position, so I'm not super concerned. I also think that this more aggressive LMR (hopefully) shouldn't have that much of an impact on tactical positions, but I guess only time will tell…

Check out Space Program Simulator!

In it, you can build your own rockets from a variety of parts!
Then fly it with realistic orbital mechanics.

Go to orbit, explore different planets, share your save codes, and do so much more!

If you would like to help out on the project or chat about space or really anything else, check out the offical SPS Studio!

For more information & tutorials, check out the offical forum post!

ArnoHu

S_P_A_R_T wrote:
White Dove v7.0 has been released!

This version tuned LMR (now it's only forced to search the first 3 moves at full depth), and also fixed the NMP by changing it to fail-soft, instead of fail-hard, because WD itself is fail-soft.

This should hopefully make WD stronger on both S3 and TW!

Congrats, zero-mistake game by WD (black) until move 51: https://lichess.org/study/oWyPldeN/adThgqCX . GoK won it at the end at 96% vs. 92% accuracy.

S_P_A_R_T

ArnoHu wrote:
S_P_A_R_T wrote:
White Dove v7.0 has been released!

This version tuned LMR (now it's only forced to search the first 3 moves at full depth), and also fixed the NMP by changing it to fail-soft, instead of fail-hard, because WD itself is fail-soft.

This should hopefully make WD stronger on both S3 and TW!

Congrats, zero-mistake game by WD (black) until move 51: https://lichess.org/study/oWyPldeN/adThgqCX . GoK won it at the end at 96% vs. 92% accuracy.

Cool game!

(Also, could you turn on analysis & export on this study too? Thx!)

Check out Space Program Simulator!

In it, you can build your own rockets from a variety of parts!
Then fly it with realistic orbital mechanics.

Go to orbit, explore different planets, share your save codes, and do so much more!

If you would like to help out on the project or chat about space or really anything else, check out the offical SPS Studio!

For more information & tutorials, check out the offical forum post!

ArnoHu

S_P_A_R_T wrote:
ArnoHu wrote:
S_P_A_R_T wrote:
White Dove v7.0 has been released!

This version tuned LMR (now it's only forced to search the first 3 moves at full depth), and also fixed the NMP by changing it to fail-soft, instead of fail-hard, because WD itself is fail-soft.

This should hopefully make WD stronger on both S3 and TW!

Congrats, zero-mistake game by WD (black) until move 51: https://lichess.org/study/oWyPldeN/adThgqCX . GoK won it at the end at 96% vs. 92% accuracy.

Cool game!

(Also, could you turn on analysis & export on this study too? Thx!)

Done!

Discuss Scratch