Scratch Chess Engine - Game of Kings

HasiLover_Test

Gok Logfile: Already sent

Last edited by HasiLover_Test (March 15, 2024 15:22:41)

I am deeply regretting naming my Chess Engine SCURIOUS????? The Name is so bad and I am forced to stare at it everytime I try to make Progress. What was I thinking a few Months ago.

ArnoHu

HasiLover_Test wrote:
Gok Logfile: Already sent

Thanks for logfile. Wow, it finished ply 9 in 22 seconds. That takes 3.5 seconds on my system. Are you sure you don't have other stuff running? Another parallel user session maybe? High memory / CPU usage by other processes? Or is the engine's browser window overlapped during execution? Must be in foreground, without anything else running, no other windows, no other tabs. Power cable plugged in? Which browser are you using?

About the blunder, the move is not considered that bad at ply 9, and internal caching might lead to differences of +/- 10 centipawns, that is why I cannot reproduce even on ply 9.

Last edited by ArnoHu (March 15, 2024 15:52:37)

HasiLover_Test

ArnoHu wrote:
HasiLover_Test wrote:
Gok Logfile: Already sent

Thanks for logfile. Wow, it finished ply 9 in 22 seconds. That takes 3.5 seconds on my system. Are you sure you don't have other stuff running? Another parallel user session maybe? High memory / CPU usage by other processes? Or is the engine's browser window overlapped during execution? Must be in foreground, without anything else running, no other windows, no other tabs. Power cable plugged in? Which browser are you using?

About the blunder, the move is not considered that bad at ply 9, and internal caching might lead to differences of +/- 10 centipawns, that is why I cannot reproduce even on ply 9.

I still use an old Laptop, It even sometimes takes a few Minutes to open a Browser Tab. My Wifi also isnt the best.

I am deeply regretting naming my Chess Engine SCURIOUS????? The Name is so bad and I am forced to stare at it everytime I try to make Progress. What was I thinking a few Months ago.

ArnoHu

HasiLover_Test wrote:
ArnoHu wrote:
HasiLover_Test wrote:
Gok Logfile: Already sent

Thanks for logfile. Wow, it finished ply 9 in 22 seconds. That takes 3.5 seconds on my system. Are you sure you don't have other stuff running? Another parallel user session maybe? High memory / CPU usage by other processes? Or is the engine's browser window overlapped during execution? Must be in foreground, without anything else running, no other windows, no other tabs. Power cable plugged in? Which browser are you using?

About the blunder, the move is not considered that bad at ply 9, and internal caching might lead to differences of +/- 10 centipawns, that is why I cannot reproduce even on ply 9.
I still use an old Laptop, It even sometimes takes a few Minutes to open a Browser Tab. My Wifi also isnt the best.

Well that's OK, engines should also work fine on a slow system, but doesn't it take a long time for projects that have hardcoded depth instead of think time management?

Nice tournament / games BTW!

And: How is your engine-ELO-rating project going?

Last edited by ArnoHu (March 17, 2024 14:05:58)

HasiLover_Test

ArnoHu wrote:
HasiLover_Test wrote:
ArnoHu wrote:
HasiLover_Test wrote:
Gok Logfile: Already sent

Thanks for logfile. Wow, it finished ply 9 in 22 seconds. That takes 3.5 seconds on my system. Are you sure you don't have other stuff running? Another parallel user session maybe? High memory / CPU usage by other processes? Or is the engine's browser window overlapped during execution? Must be in foreground, without anything else running, no other windows, no other tabs. Power cable plugged in? Which browser are you using?

About the blunder, the move is not considered that bad at ply 9, and internal caching might lead to differences of +/- 10 centipawns, that is why I cannot reproduce even on ply 9.
I still use an old Laptop, It even sometimes takes a few Minutes to open a Browser Tab. My Wifi also isnt the best.

Well that's OK, engines should also work fine on small system, but doesn't it take a long time for projects that have hardcoded depth instead of think time management?

Nice tournament / games BTW!

And: How is your engine-ELO-rating project going?

I will need more Games, thats why im gonna be hosting more Tournaments.

I am deeply regretting naming my Chess Engine SCURIOUS????? The Name is so bad and I am forced to stare at it everytime I try to make Progress. What was I thinking a few Months ago.

birdracerthree

ArnoHu wrote:
HasiLover_Test wrote:
Gok Logfile: Already sent

Thanks for logfile. Wow, it finished ply 9 in 22 seconds. That takes 3.5 seconds on my system. Are you sure you don't have other stuff running? Another parallel user session maybe? High memory / CPU usage by other processes? Or is the engine's browser window overlapped during execution? Must be in foreground, without anything else running, no other windows, no other tabs. Power cable plugged in? Which browser are you using?

About the blunder, the move is not considered that bad at ply 9, and internal caching might lead to differences of +/- 10 centipawns, that is why I cannot reproduce even on ply 9.

22 seconds is really bad. The first time I ran GoK, it did consider Qb5 midway through ply 9, but it switched to Kf8 on ply 10 in the final few seconds (23.932).

The second time, I played music on YT and re-ran the position on a new GoK instance. This time, it switched to Rg7 during ply 9 instead of Qb5 and it stayed that way. GoK reached ply 11 “21.891: 11 : Search start, depth = 11”. The second instance started ply 9 faster than the first (3.681 instead of 3.472).

Quick note : I opening a new tab and went to GoK's URL immediately to minimize the memory footprint from the saved history.

Last edited by birdracerthree (March 15, 2024 22:25:17)

Hello, I am @birdracerthree , the creator of the fourth strongest chess engine on Scratch, Element
Element’s approximate rating is 1800 FIDE or 2050 chesscom.

ArnoHu

birdracerthree wrote:
ArnoHu wrote:
HasiLover_Test wrote:
Gok Logfile: Already sent

Thanks for logfile. Wow, it finished ply 9 in 22 seconds. That takes 3.5 seconds on my system. Are you sure you don't have other stuff running? Another parallel user session maybe? High memory / CPU usage by other processes? Or is the engine's browser window overlapped during execution? Must be in foreground, without anything else running, no other windows, no other tabs. Power cable plugged in? Which browser are you using?

About the blunder, the move is not considered that bad at ply 9, and internal caching might lead to differences of +/- 10 centipawns, that is why I cannot reproduce even on ply 9.
22 seconds is really bad. The first time I ran GoK, it did consider Qb5 midway through ply 9, but it switched to Kf8 on ply 10 in the final few seconds (23.932).

The second time, I played music on YT and re-ran the position on a new GoK instance. This time, it switched to Rg7 during ply 9 instead of Qb5 and it stayed that way. GoK reached ply 11 “21.891: 11 : Search start, depth = 11”. The second instance started ply 9 faster than the first (3.681 instead of 3.472).

Quick note : I opening a new tab and went to GoK's URL immediately to minimize the memory footprint from the saved history.

Thanks, I also get different results, that is explainable within small boundaries given GoK's incremental / dynamic evaluation approach and caching. One of the fastest runs on my system was this:

2.057: 9 : Search start, depth = 9
2.306: 9 : 2734 : 121
5.250: 9 : 0715 : 127
5.536: 10 : Search start, depth = 10
10.087: 10 : 0715 : 142
11.440: 11 : Search start, depth = 11
14.829: 11 : 0715 : 136
18.826: 12 : Search start, depth = 12

ArnoHu

Looking into NPS rates, I discovered something strange when comparing Scurious and GoK. While GoK turned out to run pretty stable at 200k NPS during all game stages, for the same boards Scurious was between 200k to 500k for opening boards, and 100k during midgame and endgame. Checking 1-sec timeframes, GoK was between 150k and 300k.

One example is r3kb1r/pp1n1ppp/q1p1p3/3pPn2/3P2P1/1QN2N2/PPPB1P1P/2KR3R b kq - 0 11, which GoK handles at 255k NPS, Scurious at 80k. Maybe the comparison is difficult to make, given Scurious only searches 5 plies.

Last edited by ArnoHu (March 16, 2024 04:10:58)

ArnoHu

Scratch Chess Engine ELO Ratings

As I was curious about that subject, I started to play around with BayesElo. I fed it with 81 games, mostly from birdracerthree's excellent lichess.org study, and games from this forum since beginning of the year. Element, White Dove, GoK, when running on TurboWarp and their highest level, against an opponent as well on its highest level. This was the result:

Rank Name         Elo    +    - games score oppo. draws
   1 GoK          189  100   78    34   85%   -93    6%
   2 White Dove   -75   51   52    65   45%   -30   18%
   3 Element     -113   52   54    63   37%    -8   16%

If I apply the average rating from ScratchChessChampion's Rating project (2220) as baseline, we get:

Rank Name         Elo    +    - games score oppo. draws
   1 GoK         2409  100   78    34   85%   -93    6%
   2 White Dove  2145   51   52    65   45%   -30   18%
   3 Element     2107   52   54    63   37%    -8   16%

If we talk about FIDE ratings, I consider a baseline of 2000 more realistic

Rank Name         Elo    +    - games score oppo. draws
   1 GoK         2189  100   78    34   85%   -93    6%
   2 White Dove  1925   51   52    65   45%   -30   18%
   3 Element     1887   52   54    63   37%    -8   16%

Disclaimer, I had to run some semi-automated search/replace for unifying the engine names, derive from the forum posting text which side was black / white, manually enter draw results, then concat everything into one file and so on => error-prone. I will try to publish it as a study later, and would be glad if someone could verify it.

I plan to continue creating ratings from an updated study in the future. Please feel free to submit additional games. Preconditions: average or above average hardware, TurboWarp, and the engines on their highest level (unless the lower level wins anyway), less than 1 avg. minute think time, and use the same engine names as in the main study. No pre-selection of games, either submit all or none. I will also add other engines.

Update: I uploaded to lichess.org, learned that there is a limit of 64 chapters per study, so where we are with two studies:

I added some Scurious, Thundershark, Bonsai games, but for a solid baseline we would need several more games between those three and White Dove, Element. birdracerthree's study have some, but there Element is at low search depth (it still won, though).

Last edited by ArnoHu (March 16, 2024 13:15:18)

ArnoHu

ArnoHu wrote:
Scratch Chess Engine ELO Ratings

As I was curious about that subject, I started to play around with BayesElo. I fed it with 81 games, mostly from birdracerthree's excellent lichess.org study, and games from this forum since beginning of the year. Element, White Dove, GoK, when running on TurboWarp and their highest level, against an opponent as well on its highest level. This was the result:
Rank Name         Elo    +    - games score oppo. draws
   1 GoK          189  100   78    34   85%   -93    6%
   2 White Dove   -75   51   52    65   45%   -30   18%
   3 Element     -113   52   54    63   37%    -8   16%
If I apply the average rating from ScratchChessChampion's Rating project (2220) as baseline, we get:
Rank Name         Elo    +    - games score oppo. draws
   1 GoK         2409  100   78    34   85%   -93    6%
   2 White Dove  2145   51   52    65   45%   -30   18%
   3 Element     2107   52   54    63   37%    -8   16%
If we talk about FIDE ratings, I consider a baseline of 2000 more realistic
Rank Name         Elo    +    - games score oppo. draws
   1 GoK         2189  100   78    34   85%   -93    6%
   2 White Dove  1925   51   52    65   45%   -30   18%
   3 Element     1887   52   54    63   37%    -8   16%
Disclaimer, I had to run some semi-automated search/replace for unifying the engine names, derive from the forum posting text which side was black / white, manually enter draw results, then concat everything into one file and so on => error-prone. I will try to publish it as a study later, and would be glad if someone could verify it.

I plan to continue creating ratings from an updated study in the future. Please feel free to submit additional games. Preconditions: average or above average hardware, TurboWarp, and the engines on their highest level (unless the lower level wins anyway), less than 1 avg. minute think time, and use the same engine names as in the main study. No pre-selection of games, either submit all or none. I will also add other engines.

Update: I uploaded to lichess.org, learned that there is a limit of 64 chapters per study, so where we are with two studies:
https://lichess.org/study/O3sgGLnq
https://lichess.org/study/oWyPldeN

I added some Scurious, Thundershark, Bonsai games, but for a solid baseline we would need several more games between those three and White Dove, Element. birdracerthree's study have some, but there Element is at low search depth (it still won, though).

Updated rankings after adding Scurious, Thundershark, Bonsai:

Rank Name           Elo    +    - games score oppo. draws
   1 GoK           2201  119   93    43   88%    30    5%
   2 White Dove    1903   63   65    68   41%   129   18%
   3 Element       1872   62   64    69   38%   123   16%
   4 Bonsai        1740  171  163     6   58%  -136   50%
   5 Thundershark  1709  193  218     6   25%    44   17%
   6 Scurious      1675  143  157     8   31%   -85   38%

With so few games played by Bonsai, Thundershark and Scurious, the numbers don't mean a lot yet. I expect the gap between them and White Dove, Element to be larger.

Last edited by ArnoHu (March 16, 2024 13:15:31)

HasiLover_Test

I have released a version of Scurious 1.2 without Iterative Deepening, could someone give me feedback if its faster?

I am deeply regretting naming my Chess Engine SCURIOUS????? The Name is so bad and I am forced to stare at it everytime I try to make Progress. What was I thinking a few Months ago.

ArnoHu

HasiLover_Test wrote:
I have released a version of Scurious 1.2 without Iterative Deepening, could someone give me feedback if its faster?

I just ran 1.2 against Thundershark, can't say if faster than before (I usually let old and new version play the same boards and compare), but certainly fast enough for me. I thought Scurious would improve its rating against Thundershark because it had more difficult opponents so far in the study. It was clearly ahead, but allowed a KQ fork to happen, did not see a pinned queen it could have taken right after that, also did not care about trapped rooks, and decided to get its king out of shelter prematurely, which resulted in a loss: https://lichess.org/study/oWyPldeN/tphwThXS

Rank Name           Elo    +    - games score oppo. draws
   1 GoK           2206  118   92    45   89%    29    4%
   2 White Dove    1904   64   65    68   41%   131   18%
   3 Element       1873   62   64    70   38%   128   16%
   4 Thundershark  1735  184  193     7   36%     3   14%
   5 Bonsai        1732  169  165     7   50%   -76   43%
   6 Scurious      1653  138  155     9   28%   -86   33%

Last edited by ArnoHu (March 16, 2024 14:39:13)

HasiLover_Test

ArnoHu wrote:
HasiLover_Test wrote:
I have released a version of Scurious 1.2 without Iterative Deepening, could someone give me feedback if its faster?

I just ran 1.2 against Thundershark, can't say if faster than before (I usually let old and new version play the same boards and compare), but certainly fast enough for me. I thought Scurious would improve its rating against Thundershark because it had more difficult opponents so far in the study. It was clearly ahead, but allowed a KQ fork to happen, did not see a pinned queen it could have taken right after that, also did not care about trapped rooks, and decided to get its king out of shelter prematurely, which resulted in a loss: https://lichess.org/study/oWyPldeN/tphwThXS
Rank Name           Elo    +    - games score oppo. draws
   1 GoK           2206  118   92    45   89%    29    4%
   2 White Dove    1904   64   65    68   41%   131   18%
   3 Element       1873   62   64    70   38%   128   16%
   4 Thundershark  1735  184  193     7   36%     3   14%
   5 Bonsai        1732  169  165     7   50%   -76   43%
   6 Scurious      1653  138  155     9   28%   -86   33%

The version without iterative deepening is not the main DEV Project. You just tested the normal Version.
Also there is no way Thundershark is that Good.

Last edited by HasiLover_Test (March 16, 2024 15:21:29)

I am deeply regretting naming my Chess Engine SCURIOUS????? The Name is so bad and I am forced to stare at it everytime I try to make Progress. What was I thinking a few Months ago.

ArnoHu

HasiLover_Test wrote:
ArnoHu wrote:
HasiLover_Test wrote:
I have released a version of Scurious 1.2 without Iterative Deepening, could someone give me feedback if its faster?

I just ran 1.2 against Thundershark, can't say if faster than before (I usually let old and new version play the same boards and compare), but certainly fast enough for me. I thought Scurious would improve its rating against Thundershark because it had more difficult opponents so far in the study. It was clearly ahead, but allowed a KQ fork to happen, did not see a pinned queen it could have taken right after that, also did not care about trapped rooks, and decided to get its king out of shelter prematurely, which resulted in a loss: https://lichess.org/study/oWyPldeN/tphwThXS
Rank Name           Elo    +    - games score oppo. draws
   1 GoK           2206  118   92    45   89%    29    4%
   2 White Dove    1904   64   65    68   41%   131   18%
   3 Element       1873   62   64    70   38%   128   16%
   4 Thundershark  1735  184  193     7   36%     3   14%
   5 Bonsai        1732  169  165     7   50%   -76   43%
   6 Scurious      1653  138  155     9   28%   -86   33%
The version without iterative deepening is not the main DEV Project. You just tested the normal Version.
Also there is no way Thundershark is that Good.

True, but you saw the last game, and as mentioned, we need more games.

HasiLover_Test

ArnoHu wrote:
HasiLover_Test wrote:
ArnoHu wrote:
HasiLover_Test wrote:
I have released a version of Scurious 1.2 without Iterative Deepening, could someone give me feedback if its faster?

I just ran 1.2 against Thundershark, can't say if faster than before (I usually let old and new version play the same boards and compare), but certainly fast enough for me. I thought Scurious would improve its rating against Thundershark because it had more difficult opponents so far in the study. It was clearly ahead, but allowed a KQ fork to happen, did not see a pinned queen it could have taken right after that, also did not care about trapped rooks, and decided to get its king out of shelter prematurely, which resulted in a loss: https://lichess.org/study/oWyPldeN/tphwThXS
Rank Name           Elo    +    - games score oppo. draws
   1 GoK           2206  118   92    45   89%    29    4%
   2 White Dove    1904   64   65    68   41%   131   18%
   3 Element       1873   62   64    70   38%   128   16%
   4 Thundershark  1735  184  193     7   36%     3   14%
   5 Bonsai        1732  169  165     7   50%   -76   43%
   6 Scurious      1653  138  155     9   28%   -86   33%
The version without iterative deepening is not the main DEV Project. You just tested the normal Version.
Also there is no way Thundershark is that Good.
True, but you saw the last game, and as mentioned, we need more games.

Scurious isnt made to play as Whie normally, as it messes up its Piece Square Tables.

I am deeply regretting naming my Chess Engine SCURIOUS????? The Name is so bad and I am forced to stare at it everytime I try to make Progress. What was I thinking a few Months ago.

ArnoHu

HasiLover_Test wrote:
ArnoHu wrote:
HasiLover_Test wrote:
ArnoHu wrote:
HasiLover_Test wrote:
I have released a version of Scurious 1.2 without Iterative Deepening, could someone give me feedback if its faster?

I just ran 1.2 against Thundershark, can't say if faster than before (I usually let old and new version play the same boards and compare), but certainly fast enough for me. I thought Scurious would improve its rating against Thundershark because it had more difficult opponents so far in the study. It was clearly ahead, but allowed a KQ fork to happen, did not see a pinned queen it could have taken right after that, also did not care about trapped rooks, and decided to get its king out of shelter prematurely, which resulted in a loss: https://lichess.org/study/oWyPldeN/tphwThXS
Rank Name           Elo    +    - games score oppo. draws
   1 GoK           2206  118   92    45   89%    29    4%
   2 White Dove    1904   64   65    68   41%   131   18%
   3 Element       1873   62   64    70   38%   128   16%
   4 Thundershark  1735  184  193     7   36%     3   14%
   5 Bonsai        1732  169  165     7   50%   -76   43%
   6 Scurious      1653  138  155     9   28%   -86   33%
The version without iterative deepening is not the main DEV Project. You just tested the normal Version.
Also there is no way Thundershark is that Good.
True, but you saw the last game, and as mentioned, we need more games.
Scurious isnt made to play as Whie normally, as it messes up its Piece Square Tables.

OK, please let me know when this is fixed, I can then schedule a re-match.

ArnoHu

Rank Name           Elo    +    - games score oppo. draws
   1 GoK           2235  118   91    47   89%    45    4%
   2 White Dove    1934   63   64    69   42%   154   17%
   3 Element       1901   62   64    71   39%   150   15%
   4 Bonsai        1753  160  149    10   55%   -73   30%
   5 Thundershark  1661  156  182    11   23%    40    9%
   6 Scurious      1616  139  159    10   25%   -95   30%

Calculated using BayesELO tool, based on these lichess.org studies:

Last edited by ArnoHu (March 16, 2024 22:41:55)

ArnoHu

ArnoHu wrote:

Scratch Chess Engine ELO Ranking

Rank Name           Elo    +    - games score oppo. draws
   1 GoK           2235  118   91    47   89%    45    4%
   2 White Dove    1934   63   64    69   42%   154   17%
   3 Element       1901   62   64    71   39%   150   15%
   4 Bonsai        1753  160  149    10   55%   -73   30%
   5 Thundershark  1661  156  182    11   23%    40    9%
   6 Scurious      1616  139  159    10   25%   -95   30%

Calculated using BayesELO tool, based on these lichess.org studies:

Just for the fun of it, I merged ScratchChessChampion's SCF 2023 and 2024 tournament studies, did some cleanup, and fed it into BayesELO, which produced the following result (ELOs are relative numbers, I did not provide an ELO baseline in this case):

SCF 2023 + 2024 Combined Tournament Ranking

Rank Name                             Elo    +    - games score oppo. draws
   1 GoK Chess (Medium)               479  202  176     9   72%   324   11%
   2 GoK Chess (Difficult)            468  163  143    19   68%   350   21%
   3 WhiteDove Chess Engine (P3)      437  212  191     7   71%   289   29%
   4 WhiteDove Chess Engine (P4)      362  176  170     9   56%   302   22%
   5 GoK Chess (Blitz 1)              310  147  143    17   59%   233    0%
   6 Element Chess Engine (Depth 4)   244  178  191     9   39%   315   11%
   7 Element Chess Engine (Depth 5)   139  160  168    13   42%   188   23%
   8 Bonsai Chess (Blue Belt)          23  158  184    11   18%   266   18%
   9 Element Chess Engine (Depth 3)   -65  275  322     2   25%    23   50%
  10 Bonsai Chess (Green Belt)       -190  153  151    11   45%   -92   18%

Data was limited, but result is not completely off IMHO. Funny that the lower-depth engine versions are ahead, but you might remember GoK Difficult lost one decisive game against GoK Medium. Blitz 1 searches 4 + quiescence on TurboWarp, that explains why - with some good luck of draw and games - it might show up relatively high. BTW, GoK had a severe regression at the time of SCF 2024.

Last edited by ArnoHu (March 17, 2024 13:48:45)

HasiLover_Test

Check out the Shallow Blue Chess Engine: https://scratch.mit.edu/projects/958201361/ even though its described as bad by the creator it plays very Good Chess and most Games between it and Scurious is borderline Winning at Ply 2 Depth.

I am deeply regretting naming my Chess Engine SCURIOUS????? The Name is so bad and I am forced to stare at it everytime I try to make Progress. What was I thinking a few Months ago.

HasiLover_Test

Shallow Blue(Ply 2) Draws Scurious 1.3(Ply 5) https://lichess.org/V5UYtRzx#46 Shallow Blue is White.

I am deeply regretting naming my Chess Engine SCURIOUS????? The Name is so bad and I am forced to stare at it everytime I try to make Progress. What was I thinking a few Months ago.

Discuss Scratch