Scratch Chess Engine - Game of Kings

ArnoHu

S_P_A_R_T wrote:
ArnoHu wrote:
birdracerthree wrote:
S_P_A_R_T wrote:
What NPS should WD be aiming for on S3?

Because right now, even though it's search depth can get pretty far, it's mostly because of the aggressive LMR. I'm working on an update to hopefully add specialized extensions to help out in this area, but at the end of the day, NPS / Move Gen Speed, is going to be more important.
I would guess double of what it is now to start. WD's NPS is already poor (<20k NPS on TW is not good, 100 NPS on S3 isn't any better), maybe you should look into the double move generation. That should not be necessary.

Update on mate eval TT bug : The incorrect TT value is coming from the same depth as the correct one. No more info is known.

Good question, as mentioned before, NPS must be taken with a grain of salt. The whole goal of pruning is to have as little nodes visited as necessary for reaching a high search depth. An engine with none or very limited pruning will have high NPS without getting to any significant search depth. When you compare NPS of professional engines, you know they have great pruning in place, so no need to discuss.

E.g. I just imported “r3kb1r/pp1n1ppp/q1p1p3/3pPn2/3P2P1/1QN2N2/PPPB1P1P/2KR3R b kq - 0 11” to our engines on TurboWarp. Scurious is known to have the highest NPS. It was done searching ply 5 in 3.2 seconds for that board, that is without quiescence. It was “only” running on 175k NPS here. GoK also had 175k NPS, but was done with ply 5 search including quiescence in 0.3 seconds. And that gap will widen with any additional ply. Element was done in 2.2 seconds at 85k NPS, WD in 0.8 seconds at 16k NPS (although I doubt our NPS calculation is equivalent).

If you refer to improving move generator, moves-per-second is an important metric. As a baseline I can tell you that GoK's core move generator, when running standalone (no other code executed, no staged generation, no hash moves, etc), will be at 800,000 generated moves per second for the start board. So MPS in the 100,000s would be a good starting point.

When it comes to pruning via good move ordering, I would look at the average move list index of the best move found. I think you have that already in WD, right? And if there is a staged move generator, you can compare NPS and MPS. You want NPS to be close to MPS, because then you did not generate a lot of moves that were never applied.

With all that in place, just verify at the search depth you reach. You are right that LMR can be misleading, depth might be great, but it could prune things it should not. For that we can compare depth with LMR disabled, or having the same LMR configuration in place.

We must also keep in mind that system speed and current board do have a lot of impact on NPS, and so does quiescence search. In any way, based on where it is now, I guess 50k NPS might a be a good initial goal for WD. Why do you think NPS is low now? How can WD reach its search depth now? Are we sure it is not also about the way how NPS is calculated?

@birdracerthree, why don't you simply disable checkmate evals in TT for the time being? I don't think they have such a high impact. And I even have seen professional engines disabling them for quiescence search at least, and so does GoK.

Alright, thanks!

WD on the starting board has an average-move-list-index of around 1.5-1.6.

WD also calculates the NPS as the total “normal nodes” + the q-search nodes, and then divide that by the time used. (Normal nodes is increased by one at the start of every “minmax”, and the q-search nodes counter is increased at the start of every “q-search”.)

I'm not totally sure why the NPS is just so low, (even doubling the move gen speed would put it at less than half of Element's), but I have a feeling it's just some really poor code somewhere (similar to the progress/eval bar fix on S3).

(Also, how do you calculate Moves Per Second on the starting board?)

(Also also, here's a really cool 99% accuracy game I played yesterday IRL https://lichess.org/nwfYC30o#65 )

Do you increment node count on every evaluation? On every applied move? Or something else?

birdracerthree

Beating Chess Engines as fast as possible
White Dove v7.06 (black) - 1. e4 e5 2. Nf3 Nc6 3. Bc4 Nf6 4. Ng5 d5 5. exd5 Nxd5 6. Nxf7 Kxf7 7. Qf3+ Ke6 8. Nc3 Nb4 9. O-O c6 10. d4 Nxc2 11. dxe5 Nxa1 12. Rd1 Nc2 13. Nxd5 cxd5 14. Rxd5 Qf6 15. Rd4+ Kxe5 16. Qd5#

GoK 6.405 (black) - 1. e4 e6 2. d4 d5 3. Nc3 Bb4 4. e5 c5 5. a3 Bxc3+ 6. bxc3 Ne7 7. Qg4 O-O 8. Nf3 Qc7 9. Bd3 cxd4 10. O-O dxc3 11. Bxh7+ Kxh7 12. Ng5+ Kg8 13. Qh5 Rd8 14. Qxf7+ Kh8 15. Re1 Qxe5 16. Rxe5 Nf5 17. Rxf5 exf5 18. Bf4 Nc6 19. Bd6 d4 20. Bf8 Rxf8 21. Qxf8#

TShark (black) - 1. e4 e5 2. Nf3 Nc6 3. Bc4 f5 4. d4 exd4 5. e5 d6 6. g3 dxe5 7. Nh4 d3 8. Qh5+ g6 9. Nxg6 Bb4+ 10. c3 d2+ 11. Bxd2 Nf6 12. Bf7+ Kxf7 13. Nxe5+ Ke6 14. Qf7+ Kxe5 15. Bf4+ Ke4 16. Qc4+ Kf3 17. Qe2+ Kg2 18. f3+ Kxh1 19. Qf1+ Kxh2 20. g4#

Scurious 2.1 (black) - 1. Nf3 d5 2. c4 dxc4 3. Nc3 Nc6 4. e3 Bg4 5. Bxc4 Ne5 6. Nxe5 Bxd1 7. Bxf7#

Scurious 2.2 (black) - 1. Nf3 d5 2. c4 dxc4 3. Nc3 Nc6 4. e3 Bg4 5. Bxc4 g6 6. Bxf7+ Kxf7 7. Ng5+ Kf6 8. Qxg4 Nh6 9. Qf4+ Nf5 10. g4 e5 11. Qf3 Kxg5 12. gxf5 gxf5 13. Ne4+ fxe4 14. Rg1+ Kh6 15. Qh3+

I had one for Element, but I can't find the PGN.

Last edited by birdracerthree (April 4, 2024 19:40:17)

birdracerthree

ArnoHu wrote:
S_P_A_R_T wrote:
ArnoHu wrote:
birdracerthree wrote:
S_P_A_R_T wrote:
What NPS should WD be aiming for on S3?

*snip*
I would guess double of what it is now to start. WD's NPS is already poor (<20k NPS on TW is not good, 100 NPS on S3 isn't any better), maybe you should look into the double move generation. That should not be necessary.

Update on mate eval TT bug : The incorrect TT value is coming from the same depth as the correct one. No more info is known.

Good question, as mentioned before, NPS must be taken with a grain of salt. The whole goal of pruning is to have as little nodes visited as necessary for reaching a high search depth. An engine with none or very limited pruning will have high NPS without getting to any significant search depth. When you compare NPS of professional engines, you know they have great pruning in place, so no need to discuss.

E.g. I just imported “r3kb1r/pp1n1ppp/q1p1p3/3pPn2/3P2P1/1QN2N2/PPPB1P1P/2KR3R b kq - 0 11” to our engines on TurboWarp. Scurious is known to have the highest NPS. It was done searching ply 5 in 3.2 seconds for that board, that is without quiescence. It was “only” running on 175k NPS here. GoK also had 175k NPS, but was done with ply 5 search including quiescence in 0.3 seconds. And that gap will widen with any additional ply. Element was done in 2.2 seconds at 85k NPS, WD in 0.8 seconds at 16k NPS (although I doubt our NPS calculation is equivalent).

*snip*

We must also keep in mind that system speed and current board do have a lot of impact on NPS, and so does quiescence search. In any way, based on where it is now, I guess 50k NPS might a be a good initial goal for WD. Why do you think NPS is low now? How can WD reach its search depth now? Are we sure it is not also about the way how NPS is calculated?

@birdracerthree, why don't you simply disable checkmate evals in TT for the time being? I don't think they have such a high impact. And I even have seen professional engines disabling them for quiescence search at least, and so does GoK.

Alright, thanks!

WD on the starting board has an average-move-list-index of around 1.5-1.6.

WD also calculates the NPS as the total “normal nodes” + the q-search nodes, and then divide that by the time used. (Normal nodes is increased by one at the start of every “minmax”, and the q-search nodes counter is increased at the start of every “q-search”.)

I'm not totally sure why the NPS is just so low, (even doubling the move gen speed would put it at less than half of Element's), but I have a feeling it's just some really poor code somewhere (similar to the progress/eval bar fix on S3).

(Also, how do you calculate Moves Per Second on the starting board?)

(Also also, here's a really cool 99% accuracy game I played yesterday IRL https://lichess.org/nwfYC30o#65 )

Do you increment node count on every evaluation? On every applied move? Or something else?

"WD also calculates the NPS as the total “normal nodes” + the q-search nodes, and then divide that by the time used. (Normal nodes is increased by one at the start of every “minmax”, and the q-search nodes counter is increased at the start of every “q-search”.)"

I tried to do what you suggested, but it didn't work. Element Dev plays Rc8 in this position 4r1k1/6r1/7R/6pQ/p4p2/q7/6PP/6RK b - - 1 39, hanging mate in 1.

Where did 85k NPS come from? My NPS is under 60k.

Last edited by birdracerthree (April 5, 2024 02:56:16)

S_P_A_R_T

ArnoHu wrote:
S_P_A_R_T wrote:
ArnoHu wrote:
birdracerthree wrote:
S_P_A_R_T wrote:
What NPS should WD be aiming for on S3?

Because right now, even though it's search depth can get pretty far, it's mostly because of the aggressive LMR. I'm working on an update to hopefully add specialized extensions to help out in this area, but at the end of the day, NPS / Move Gen Speed, is going to be more important.
I would guess double of what it is now to start. WD's NPS is already poor (<20k NPS on TW is not good, 100 NPS on S3 isn't any better), maybe you should look into the double move generation. That should not be necessary.

Update on mate eval TT bug : The incorrect TT value is coming from the same depth as the correct one. No more info is known.

Good question, as mentioned before, NPS must be taken with a grain of salt. The whole goal of pruning is to have as little nodes visited as necessary for reaching a high search depth. An engine with none or very limited pruning will have high NPS without getting to any significant search depth. When you compare NPS of professional engines, you know they have great pruning in place, so no need to discuss.

E.g. I just imported “r3kb1r/pp1n1ppp/q1p1p3/3pPn2/3P2P1/1QN2N2/PPPB1P1P/2KR3R b kq - 0 11” to our engines on TurboWarp. Scurious is known to have the highest NPS. It was done searching ply 5 in 3.2 seconds for that board, that is without quiescence. It was “only” running on 175k NPS here. GoK also had 175k NPS, but was done with ply 5 search including quiescence in 0.3 seconds. And that gap will widen with any additional ply. Element was done in 2.2 seconds at 85k NPS, WD in 0.8 seconds at 16k NPS (although I doubt our NPS calculation is equivalent).

If you refer to improving move generator, moves-per-second is an important metric. As a baseline I can tell you that GoK's core move generator, when running standalone (no other code executed, no staged generation, no hash moves, etc), will be at 800,000 generated moves per second for the start board. So MPS in the 100,000s would be a good starting point.

When it comes to pruning via good move ordering, I would look at the average move list index of the best move found. I think you have that already in WD, right? And if there is a staged move generator, you can compare NPS and MPS. You want NPS to be close to MPS, because then you did not generate a lot of moves that were never applied.

With all that in place, just verify at the search depth you reach. You are right that LMR can be misleading, depth might be great, but it could prune things it should not. For that we can compare depth with LMR disabled, or having the same LMR configuration in place.

We must also keep in mind that system speed and current board do have a lot of impact on NPS, and so does quiescence search. In any way, based on where it is now, I guess 50k NPS might a be a good initial goal for WD. Why do you think NPS is low now? How can WD reach its search depth now? Are we sure it is not also about the way how NPS is calculated?

@birdracerthree, why don't you simply disable checkmate evals in TT for the time being? I don't think they have such a high impact. And I even have seen professional engines disabling them for quiescence search at least, and so does GoK.

Alright, thanks!

WD on the starting board has an average-move-list-index of around 1.5-1.6.

WD also calculates the NPS as the total “normal nodes” + the q-search nodes, and then divide that by the time used. (Normal nodes is increased by one at the start of every “minmax”, and the q-search nodes counter is increased at the start of every “q-search”.)

I'm not totally sure why the NPS is just so low, (even doubling the move gen speed would put it at less than half of Element's), but I have a feeling it's just some really poor code somewhere (similar to the progress/eval bar fix on S3).

(Also, how do you calculate Moves Per Second on the starting board?)

(Also also, here's a really cool 99% accuracy game I played yesterday IRL https://lichess.org/nwfYC30o#65 )

Do you increment node count on every evaluation? On every applied move? Or something else?

I only increment node count at the start of the “minmax” my-block. (And there's a similar story for the q-search side too.)

Also, how should the moves per second on the starting board be calculated? Do you just keep on generating moves until 1 sec is up? And do you apply / revert moves on the board?

Check out Space Program Simulator!

In it, you can build your own rockets from a variety of parts!
Then fly it with realistic orbital mechanics.

Go to orbit, explore different planets, share your save codes, and do so much more!

If you would like to help out on the project or chat about space or really anything else, check out the offical SPS Studio!

For more information & tutorials, check out the offical forum post!

ArnoHu

S_P_A_R_T wrote:
ArnoHu wrote:
S_P_A_R_T wrote:
ArnoHu wrote:
birdracerthree wrote:
S_P_A_R_T wrote:
What NPS should WD be aiming for on S3?

Because right now, even though it's search depth can get pretty far, it's mostly because of the aggressive LMR. I'm working on an update to hopefully add specialized extensions to help out in this area, but at the end of the day, NPS / Move Gen Speed, is going to be more important.
I would guess double of what it is now to start. WD's NPS is already poor (<20k NPS on TW is not good, 100 NPS on S3 isn't any better), maybe you should look into the double move generation. That should not be necessary.

Update on mate eval TT bug : The incorrect TT value is coming from the same depth as the correct one. No more info is known.

Good question, as mentioned before, NPS must be taken with a grain of salt. The whole goal of pruning is to have as little nodes visited as necessary for reaching a high search depth. An engine with none or very limited pruning will have high NPS without getting to any significant search depth. When you compare NPS of professional engines, you know they have great pruning in place, so no need to discuss.

E.g. I just imported “r3kb1r/pp1n1ppp/q1p1p3/3pPn2/3P2P1/1QN2N2/PPPB1P1P/2KR3R b kq - 0 11” to our engines on TurboWarp. Scurious is known to have the highest NPS. It was done searching ply 5 in 3.2 seconds for that board, that is without quiescence. It was “only” running on 175k NPS here. GoK also had 175k NPS, but was done with ply 5 search including quiescence in 0.3 seconds. And that gap will widen with any additional ply. Element was done in 2.2 seconds at 85k NPS, WD in 0.8 seconds at 16k NPS (although I doubt our NPS calculation is equivalent).

If you refer to improving move generator, moves-per-second is an important metric. As a baseline I can tell you that GoK's core move generator, when running standalone (no other code executed, no staged generation, no hash moves, etc), will be at 800,000 generated moves per second for the start board. So MPS in the 100,000s would be a good starting point.

When it comes to pruning via good move ordering, I would look at the average move list index of the best move found. I think you have that already in WD, right? And if there is a staged move generator, you can compare NPS and MPS. You want NPS to be close to MPS, because then you did not generate a lot of moves that were never applied.

With all that in place, just verify at the search depth you reach. You are right that LMR can be misleading, depth might be great, but it could prune things it should not. For that we can compare depth with LMR disabled, or having the same LMR configuration in place.

We must also keep in mind that system speed and current board do have a lot of impact on NPS, and so does quiescence search. In any way, based on where it is now, I guess 50k NPS might a be a good initial goal for WD. Why do you think NPS is low now? How can WD reach its search depth now? Are we sure it is not also about the way how NPS is calculated?

@birdracerthree, why don't you simply disable checkmate evals in TT for the time being? I don't think they have such a high impact. And I even have seen professional engines disabling them for quiescence search at least, and so does GoK.

Alright, thanks!

WD on the starting board has an average-move-list-index of around 1.5-1.6.

WD also calculates the NPS as the total “normal nodes” + the q-search nodes, and then divide that by the time used. (Normal nodes is increased by one at the start of every “minmax”, and the q-search nodes counter is increased at the start of every “q-search”.)

I'm not totally sure why the NPS is just so low, (even doubling the move gen speed would put it at less than half of Element's), but I have a feeling it's just some really poor code somewhere (similar to the progress/eval bar fix on S3).

(Also, how do you calculate Moves Per Second on the starting board?)

(Also also, here's a really cool 99% accuracy game I played yesterday IRL https://lichess.org/nwfYC30o#65 )

Do you increment node count on every evaluation? On every applied move? Or something else?

I only increment node count at the start of the “minmax” my-block. (And there's a similar story for the q-search side too.)

Also, how should the moves per second on the starting board be calculated? Do you just keep on generating moves until 1 sec is up? And do you apply / revert moves on the board?

I checked the code, and it all makes sense, but I simply don't understand why this results in 18k NPS, when at the same time WD generates 200k moves per seconds (unless I measured that in a wrong way, I counted invocations to “add move to legal moves”). I then added up the number of moves iterated over (move count if no cutoff, move index if cutoff), it was 160k. Then I counted applied moves, it was 16k again. I must be missing something… I disabled LMR, same result. I only looked at non-quiescence numbers.

Move generator: No apply(), no revert(), no TT caching, nothing but “GenerateMoves” over and over again for starting board (including move ordering).

BTW, reading WD code in more detail for the first time, I could not avoid noticing it seems a bit related to GoK :-)

About performance, one thing I noticed was that “add move to legal moves…” is the most expensive custom block, although it is only about adding one item to a list. It contains a cascade of if/then/else blocks, to map to 20-something moves-<1-N> lists. Even an if statement is an expensive operation, on Scratch 3 very much so, on TW still a conditional jump. You can flatten that, e.g. to chunks of 5 (<6, <11, …), And in there the most likely match first (== 6, == 10, ..). No else block, instead just break when found. Only to levels of ifs, with that, you will end up with 2 ifs executed most of the time, not 15. A better approach would be to have only one move list, and an offset of probably 300 for each ply. Much faster, much less code, and increasing search depth won't require any code change. This is what GoK does:

https://github.com/ArnoHue/scratch/blob/fedda69407c0d5ccb7eaf269791b103f125ae861/chess/Engine.scratch#L1019

Last edited by ArnoHu (April 6, 2024 03:40:11)

S_P_A_R_T

ArnoHu wrote:
S_P_A_R_T wrote:
ArnoHu wrote:
S_P_A_R_T wrote:
ArnoHu wrote:
birdracerthree wrote:
S_P_A_R_T wrote:
What NPS should WD be aiming for on S3?

Because right now, even though it's search depth can get pretty far, it's mostly because of the aggressive LMR. I'm working on an update to hopefully add specialized extensions to help out in this area, but at the end of the day, NPS / Move Gen Speed, is going to be more important.
I would guess double of what it is now to start. WD's NPS is already poor (<20k NPS on TW is not good, 100 NPS on S3 isn't any better), maybe you should look into the double move generation. That should not be necessary.

Update on mate eval TT bug : The incorrect TT value is coming from the same depth as the correct one. No more info is known.

Good question, as mentioned before, NPS must be taken with a grain of salt. The whole goal of pruning is to have as little nodes visited as necessary for reaching a high search depth. An engine with none or very limited pruning will have high NPS without getting to any significant search depth. When you compare NPS of professional engines, you know they have great pruning in place, so no need to discuss.

E.g. I just imported “r3kb1r/pp1n1ppp/q1p1p3/3pPn2/3P2P1/1QN2N2/PPPB1P1P/2KR3R b kq - 0 11” to our engines on TurboWarp. Scurious is known to have the highest NPS. It was done searching ply 5 in 3.2 seconds for that board, that is without quiescence. It was “only” running on 175k NPS here. GoK also had 175k NPS, but was done with ply 5 search including quiescence in 0.3 seconds. And that gap will widen with any additional ply. Element was done in 2.2 seconds at 85k NPS, WD in 0.8 seconds at 16k NPS (although I doubt our NPS calculation is equivalent).

If you refer to improving move generator, moves-per-second is an important metric. As a baseline I can tell you that GoK's core move generator, when running standalone (no other code executed, no staged generation, no hash moves, etc), will be at 800,000 generated moves per second for the start board. So MPS in the 100,000s would be a good starting point.

When it comes to pruning via good move ordering, I would look at the average move list index of the best move found. I think you have that already in WD, right? And if there is a staged move generator, you can compare NPS and MPS. You want NPS to be close to MPS, because then you did not generate a lot of moves that were never applied.

With all that in place, just verify at the search depth you reach. You are right that LMR can be misleading, depth might be great, but it could prune things it should not. For that we can compare depth with LMR disabled, or having the same LMR configuration in place.

We must also keep in mind that system speed and current board do have a lot of impact on NPS, and so does quiescence search. In any way, based on where it is now, I guess 50k NPS might a be a good initial goal for WD. Why do you think NPS is low now? How can WD reach its search depth now? Are we sure it is not also about the way how NPS is calculated?

@birdracerthree, why don't you simply disable checkmate evals in TT for the time being? I don't think they have such a high impact. And I even have seen professional engines disabling them for quiescence search at least, and so does GoK.

Alright, thanks!

WD on the starting board has an average-move-list-index of around 1.5-1.6.

WD also calculates the NPS as the total “normal nodes” + the q-search nodes, and then divide that by the time used. (Normal nodes is increased by one at the start of every “minmax”, and the q-search nodes counter is increased at the start of every “q-search”.)

I'm not totally sure why the NPS is just so low, (even doubling the move gen speed would put it at less than half of Element's), but I have a feeling it's just some really poor code somewhere (similar to the progress/eval bar fix on S3).

(Also, how do you calculate Moves Per Second on the starting board?)

(Also also, here's a really cool 99% accuracy game I played yesterday IRL https://lichess.org/nwfYC30o#65 )

Do you increment node count on every evaluation? On every applied move? Or something else?

I only increment node count at the start of the “minmax” my-block. (And there's a similar story for the q-search side too.)

Also, how should the moves per second on the starting board be calculated? Do you just keep on generating moves until 1 sec is up? And do you apply / revert moves on the board?

I checked the code, and it all makes sense, but I simply don't understand why this results in 18k NPS, when at the same time WD generates 200k moves per seconds (unless I measured that in a wrong way, I counted invocations to “add move to legal moves”). I then added up the number of moves iterated over (move count if no cutoff, move index if cutoff), it was 160k. Then I counted applied moves, it was 16k again. I must be missing something… I disabled LMR, same result. I only looked at non-quiescence numbers.

Move generator: No apply(), no revert(), no TT caching, nothing but “GenerateMoves” over and over again for starting board (including move ordering).

BTW, reading WD code in more detail for the first time, I could not avoid noticing it seems a bit related to GoK :-)

About performance, one thing I noticed was that “add move to pseudo legal…” is the most expensive custom block, although it is only about adding one item to a list. It contains a cascade of if/then/else blocks, to map to 20-something moves-<1-N> lists. Even an if statement is an expensive operation, on Scratch 3 very much so, on TW still a conditional jump. You can flatten that, e.g. to chunks of 5 (<6, <11, …), And in there the most likely match first (== 6, == 10, ..). No else block, instead just break when found. Only to levels of ifs, with that, you will end up with 2 ifs executed most of the time, not 15. A better approach would be to have only one move list, and an offset of probably 300 for each ply. Much faster, much less code, and increasing search depth won't require any code change. This is what GoK does:

https://github.com/ArnoHue/scratch/blob/fedda69407c0d5ccb7eaf269791b103f125ae861/chess/Engine.scratch#L1019

Alright, thanks!

What do you mean by “add move to pseudo legal…” has a cascade of if/then/else blocks? Are you referring to a different custom block?

(Also, WDs code is quite similar to GoK's as I used the coding dojo tutorial as a guide, and I frequently went into GoK to find optimizations, like finding the first 2 digits of a 4 digit number, etc. Not to mention the functions that are completely the same, like the killer moves & TT table.)

Check out Space Program Simulator!

In it, you can build your own rockets from a variety of parts!
Then fly it with realistic orbital mechanics.

Go to orbit, explore different planets, share your save codes, and do so much more!

If you would like to help out on the project or chat about space or really anything else, check out the offical SPS Studio!

For more information & tutorials, check out the offical forum post!

ArnoHu

S_P_A_R_T wrote:
ArnoHu wrote:
S_P_A_R_T wrote:
ArnoHu wrote:
S_P_A_R_T wrote:
ArnoHu wrote:
birdracerthree wrote:
S_P_A_R_T wrote:
What NPS should WD be aiming for on S3?

Because right now, even though it's search depth can get pretty far, it's mostly because of the aggressive LMR. I'm working on an update to hopefully add specialized extensions to help out in this area, but at the end of the day, NPS / Move Gen Speed, is going to be more important.
I would guess double of what it is now to start. WD's NPS is already poor (<20k NPS on TW is not good, 100 NPS on S3 isn't any better), maybe you should look into the double move generation. That should not be necessary.

Update on mate eval TT bug : The incorrect TT value is coming from the same depth as the correct one. No more info is known.

Good question, as mentioned before, NPS must be taken with a grain of salt. The whole goal of pruning is to have as little nodes visited as necessary for reaching a high search depth. An engine with none or very limited pruning will have high NPS without getting to any significant search depth. When you compare NPS of professional engines, you know they have great pruning in place, so no need to discuss.

E.g. I just imported “r3kb1r/pp1n1ppp/q1p1p3/3pPn2/3P2P1/1QN2N2/PPPB1P1P/2KR3R b kq - 0 11” to our engines on TurboWarp. Scurious is known to have the highest NPS. It was done searching ply 5 in 3.2 seconds for that board, that is without quiescence. It was “only” running on 175k NPS here. GoK also had 175k NPS, but was done with ply 5 search including quiescence in 0.3 seconds. And that gap will widen with any additional ply. Element was done in 2.2 seconds at 85k NPS, WD in 0.8 seconds at 16k NPS (although I doubt our NPS calculation is equivalent).

If you refer to improving move generator, moves-per-second is an important metric. As a baseline I can tell you that GoK's core move generator, when running standalone (no other code executed, no staged generation, no hash moves, etc), will be at 800,000 generated moves per second for the start board. So MPS in the 100,000s would be a good starting point.

When it comes to pruning via good move ordering, I would look at the average move list index of the best move found. I think you have that already in WD, right? And if there is a staged move generator, you can compare NPS and MPS. You want NPS to be close to MPS, because then you did not generate a lot of moves that were never applied.

With all that in place, just verify at the search depth you reach. You are right that LMR can be misleading, depth might be great, but it could prune things it should not. For that we can compare depth with LMR disabled, or having the same LMR configuration in place.

We must also keep in mind that system speed and current board do have a lot of impact on NPS, and so does quiescence search. In any way, based on where it is now, I guess 50k NPS might a be a good initial goal for WD. Why do you think NPS is low now? How can WD reach its search depth now? Are we sure it is not also about the way how NPS is calculated?

@birdracerthree, why don't you simply disable checkmate evals in TT for the time being? I don't think they have such a high impact. And I even have seen professional engines disabling them for quiescence search at least, and so does GoK.

Alright, thanks!

WD on the starting board has an average-move-list-index of around 1.5-1.6.

WD also calculates the NPS as the total “normal nodes” + the q-search nodes, and then divide that by the time used. (Normal nodes is increased by one at the start of every “minmax”, and the q-search nodes counter is increased at the start of every “q-search”.)

I'm not totally sure why the NPS is just so low, (even doubling the move gen speed would put it at less than half of Element's), but I have a feeling it's just some really poor code somewhere (similar to the progress/eval bar fix on S3).

(Also, how do you calculate Moves Per Second on the starting board?)

(Also also, here's a really cool 99% accuracy game I played yesterday IRL https://lichess.org/nwfYC30o#65 )

Do you increment node count on every evaluation? On every applied move? Or something else?

I only increment node count at the start of the “minmax” my-block. (And there's a similar story for the q-search side too.)

Also, how should the moves per second on the starting board be calculated? Do you just keep on generating moves until 1 sec is up? And do you apply / revert moves on the board?

I checked the code, and it all makes sense, but I simply don't understand why this results in 18k NPS, when at the same time WD generates 200k moves per seconds (unless I measured that in a wrong way, I counted invocations to “add move to legal moves”). I then added up the number of moves iterated over (move count if no cutoff, move index if cutoff), it was 160k. Then I counted applied moves, it was 16k again. I must be missing something… I disabled LMR, same result. I only looked at non-quiescence numbers.

Move generator: No apply(), no revert(), no TT caching, nothing but “GenerateMoves” over and over again for starting board (including move ordering).

BTW, reading WD code in more detail for the first time, I could not avoid noticing it seems a bit related to GoK :-)

About performance, one thing I noticed was that “add move to pseudo legal…” is the most expensive custom block, although it is only about adding one item to a list. It contains a cascade of if/then/else blocks, to map to 20-something moves-<1-N> lists. Even an if statement is an expensive operation, on Scratch 3 very much so, on TW still a conditional jump. You can flatten that, e.g. to chunks of 5 (<6, <11, …), And in there the most likely match first (== 6, == 10, ..). No else block, instead just break when found. Only to levels of ifs, with that, you will end up with 2 ifs executed most of the time, not 15. A better approach would be to have only one move list, and an offset of probably 300 for each ply. Much faster, much less code, and increasing search depth won't require any code change. This is what GoK does:

https://github.com/ArnoHue/scratch/blob/fedda69407c0d5ccb7eaf269791b103f125ae861/chess/Engine.scratch#L1019

Alright, thanks!

What do you mean by “add move to pseudo legal…” has a cascade of if/then/else blocks? Are you referring to a different custom block?

(Also, WDs code is quite similar to GoK's as I used the coding dojo tutorial as a guide, and I frequently went into GoK to find optimizations, like finding the first 2 digits of a 4 digit number, etc. Not to mention the functions that are completely the same, like the killer moves & TT table.)

Correction, the slow custom block is “add move to legal moves with start square”

ArnoHu

Quite a bad game between GoK (Medium, white) and White Dove (P3) on TW; which no side deserved to win anyway: https://lichess.org/d23bKl6U#192

HasiLover

ArnoHu wrote:
Quite a bad game between GoK (Medium, white) and White Dove (P3) on TW; which no side deserved to win anyway: https://lichess.org/d23bKl6U#192

That surely was an Interesting Game.

ArnoHu

HasiLover wrote:
ArnoHu wrote:
Quite a bad game between GoK (Medium, white) and White Dove (P3) on TW; which no side deserved to win anyway: https://lichess.org/d23bKl6U#192
That surely was an Interesting Game.

Rematch, similar story (GoK white): https://lichess.org/VF6nVkyE#149

Game #3, much better accuracy by both, being up a pawn during endgame again was not enough for GoK (it is running at lower think time than WD, but still should win those): https://lichess.org/74E292nL#149

Decider, GoK (black) wins it, this time entering endgame one pawn up was enough (OK, it was a passed pawn): https://lichess.org/5VBSL2cj#148

After that, I found and fixed an evaluation bug (saw just by accident a draw evaluation flying by, which should not have been there), re-activated some more extension types, and next game looked a lot better (GoK white), 93% vs 89%: https://lichess.org/RtyATKo8#113

One more game for confirmation, also promising, 96% vs. 90% accuracy: https://lichess.org/52gxzcYV#133

GoK and WD both on 10 seconds think time in a high accuracy game, 96% vs 92%, nice wo watch: https://lichess.org/V3BMtP7T#117

Switching to S3, confirmed improvement, 96% vs. 89% accuracy: https://lichess.org/YfgtHp2M#107

And against Element (3+8) on S3, GoK Medium wins in 24 moves, 93% vs. 83% accuracy: https://lichess.org/pUQBonXM#47

93% accuracy minimum in 5 games, 2 of which on S3, that bugfix certainly paid off.

Last edited by ArnoHu (April 7, 2024 06:38:13)

ArnoHu

GoK (Blitz 2) vs. Shallow Blue V2 (Depth 3) on TurboWarp, GoK wins in 31 moves, 90% vs. 83% accuracy: https://lichess.org/0H5vqms5#61

ArnoHu

Fixed a bug that could lead to “no move found” results during KP/K endgames (draw detection was invoked one ply to early), fine-tuned move ordering, let GoK (white) play a game against White Dove on TW, both on 10 seconds think time: https://lichess.org/sL2EQOJP#81

Rematch, GoK as usual does not know how to play King's Gambit, blunders at move 8 (would have seen the mistake at 14 seconds), survival mode for many moves (down -5,2), slowly crawls back and wins at the end: https://lichess.org/39OkV7J2#155

Last edited by ArnoHu (April 8, 2024 05:28:45)

AZURUS41

https://lichess.org/mBvzDqi4
Just tried the new performance mode (5) on WD, pretty good game when I should've lost at move 6

Anyways, a fun attacking game…

HasiLover_Test

Quick question, how does the Standing Pat in Quiescence Search work exactly?

birdracerthree

AZURUS41 wrote:
https://lichess.org/mBvzDqi4
Just tried the new performance mode (5) on WD, pretty good game when I should've lost at move 6
Anyways, a fun attacking game…

Thoughts on the game :
Missed 6. Nxe5! is pretty bad (for both you because of your strength and White Dove). Element plays Nxe5 on every depth (depth 6 has an evaluation of -0.1 because of heavy pawn shield evaluation).
I guess 18. axb4 was to open the a-file? I would have played cxb4 to remove the doubled pawns.

The final checkmate sequence is really clean. Great game overall! Pushing the h-pawn was really nice.

birdracerthree

HasiLover_Test wrote:
Quick question, how does the Standing Pat in Quiescence Search work exactly?

CPW Quote : “In order to allow the quiescence search to stabilize, we need to be able to stop searching without necessarily searching all available captures. In addition, we need a score to return in case there are no captures available to be played. This is done by a using the static evaluation as a “stand-pat” score…”

The rest of the information is on the CPW including pseudocode (although the code is negamax).

ArnoHu

ArnoHu wrote:
Fixed a bug that could lead to “no move found” results during KP/K endgames (draw detection was invoked one ply to early), fine-tuned move ordering, let GoK (white) play a game against White Dove on TW, both on 10 seconds think time: https://lichess.org/sL2EQOJP#81

Rematch, GoK as usual does not know how to play King's Gambit, blunders at move 8 (would have seen the mistake at 14 seconds), survival mode for many moves (down -5,2), slowly crawls back and wins at the end: https://lichess.org/39OkV7J2#155

Replayed last game after move 7 with current versions at 10 seconds think time, goes a completely different way, GoK wins in 25 moves: https://lichess.org/YpxYtakf#49

ArnoHu

birdracerthree wrote:
HasiLover_Test wrote:
Quick question, how does the Standing Pat in Quiescence Search work exactly?
CPW Quote : “In order to allow the quiescence search to stabilize, we need to be able to stop searching without necessarily searching all available captures. In addition, we need a score to return in case there are no captures available to be played. This is done by a using the static evaluation as a “stand-pat” score…”

The rest of the information is on the CPW including pseudocode (although the code is negamax).

Implementation:
https://github.com/ArnoHue/scratch/blob/a75448ea3de049e5a717c6e27fd15c1f2208c846/chess/Engine.scratch#L1477

HasiLover_Test

ArnoHu wrote:
birdracerthree wrote:
HasiLover_Test wrote:
Quick question, how does the Standing Pat in Quiescence Search work exactly?
CPW Quote : “In order to allow the quiescence search to stabilize, we need to be able to stop searching without necessarily searching all available captures. In addition, we need a score to return in case there are no captures available to be played. This is done by a using the static evaluation as a “stand-pat” score…”

The rest of the information is on the CPW including pseudocode (although the code is negamax).

Implementation:
https://github.com/ArnoHue/scratch/blob/a75448ea3de049e5a717c6e27fd15c1f2208c846/chess/Engine.scratch#L1477

Scurious is pretty Incompatible with other Code of Scratch Chess Engines, since I didnt use any Help or references for its code.

ArnoHu

Congratulations to White Dove (P2, black) to this win against GoK (Medium) on S3: https://lichess.org/6DP6CaPj#98

I increased passed pawn bonus just slightly, and GoK won the replay of the same crucial situation at move 33: 6k1/2R2p1p/6p1/pp1P4/6r1/8/5PPP/5K2 w - - 0 33

This is the changeset. Small change, huge impact:
https://github.com/ArnoHue/scratch/commit/a529d04bf8bac931f10cfc79bd7f0425119b7334

Discuss Scratch