Sedat Canbaz

User avatar
Sedat Canbaz
Posts: 594
Joined: Mon Jun 19, 2023 8:49 am
Has thanked: 682 times
Been thanked: 2070 times

Re: Sedat Canbaz

Post by Sedat Canbaz »

One thing more and once more:
Thanks to all eng authors as well, otherwise I would not run...
Special thanks to authors: Cfish, Berserk, Spectral, Shashchess!

One note more,
Unfortunately all my tested Eman engines series are crashed
And buggy on my tournament machine via Eman exp (1.5+ GB)
It seems Eman eng needs serious optimization on modern machines!
At least in case of using HUGE exp files..sure in Gauntlet mode etc.
And these crashes are appearing in beginning of test..Cutechess is
Directly terminating...sad really..we are in 2024.. but you see...

But the good news is that,
Spectral is Spectral... this great engine is stable via Eman exp as well!!
It does not matter in Gauntlet, Round-Robin etc. well-done to Mr. Anton

And keep up the great work!

Greetings )
User avatar
Sedat Canbaz
Posts: 594
Joined: Mon Jun 19, 2023 8:49 am
Has thanked: 682 times
Been thanked: 2070 times

Re: Sedat Canbaz

Post by Sedat Canbaz »

Hello there,

SCCT - Unofficial Test with OrgZ's latest Top engines 2024!
Simply target is to check which OrgZ engines are better..!?

1st of all, let's start with good news:
Great performance comes by one of my favorite engines: SF-POLY!
Well-done to Mr. Tanick Ramz !

The bad news is that,
NON of them can use CTG books... so I hope next SF-POLY ver to use...
At least one of them, if not...there will be BIG miss, thanks in advance

Code: Select all

1   SF-POLY2 10924a    102.0 - 98.0102.5 - 97.5103.5 - 96.5**    308.0/600
2   JigSaw 6.0         98.0 - 102.0102.0 - 98.098.5 - 101.5 **   298.5/600
3   Private 19Sf VICE  97.5 - 102.598.0 - 102.0101.5 - 98.5  **  297.0/600
4   Sailfish 3         96.5 - 103.5101.5 - 98.598.5 - 101.5   ** 296.5/600
GAMES:
https://mega.nz/file/TwxmgI5Q#NTYlZRhwp ... DiRS0bdpG0


Conditions:
2x Epyc 7B12, CuteChess, 1 Core, Ponder OFF, Balsa/Unique, 30s+0.6s, 64 Hash, 4-MEN

More Details:
Sailfish seems to be not so stable: lost 1 game on time,
Where all rest so far are stable..no games lost on time!

Greetings
User avatar
Sedat Canbaz
Posts: 594
Joined: Mon Jun 19, 2023 8:49 am
Has thanked: 682 times
Been thanked: 2070 times

Re: Sedat Canbaz

Post by Sedat Canbaz »

UPDATE

1st of all, I was wondering e.g up to 600 games:
Is enough data (per player under these Bullet cond.)?

And here are the details, with facts of course (not papers):
Well, according to my experiences I strongly believe in that,
Sometimes small number of games ok, but usually not enough !!
It's much better to be played min 1000-1500 games (per player)
Here again and again I refer for strong opening suites..otherwise,
Many more thousands of games (per player) required...such as 5000+
Sure if running many more games will be much better...actually my
Fingers are tired of explaining.. but what is changing?..not so much..
Yes..there will be same story as in past, we'll see some engines which
Will forfeit on time...We will see also such testers who will run small
Number of games ...anyhow, maybe this time I will be successful..
I know it's hard but there is nothing impossible.., right ?)
More over, you know: my target is very simple: I'm trying only to help!

This is other question of course, but
How many of us (TD / Testers) prefer thousands of games (per player)?
Not many for sure unfortunately..but it's all right... plus no any one is
Forcing us for more right/accurate metrics..but I think that we should
Do our best for more valid data.. such as to increase number of games!

And here are latest tests and see what is going on:

Code: Select all

1st test: here it seems all right: 1 Elo difference, 
I mean very accurate..since both are same eng copy...
                     
1   SF-POLY2 Copy)   +11/-10/=579 50.08%  300.5/600
2   SF-POLY2 10924a  +10/-11/=579 49.92%  299.5/600

------------------------------------------------------

2nd test: it's all right again: 2 Elo difference, 
I mean not bad at all..since both are same eng copy!
                   
1   SF-CTG (Copy)   +10/-8/=319 50.30%  169.5/337
2   SF-CTG  150724  +8/-10/=319 49.70%  167.5/337

------------------------------------------------------

3rd test: it's NOT all right !!  7 Elo difference, 
I mean bad at all... since both are same eng copy!
                   
                     
1   Incognito(Copy)  +23/-11/=566 51.00%  306.0/600
2   Incognito5 pro   +11/-23/=566 49.00%  294.0/600

------------------------------------------------------

4rd test: here no much idea, but 0 Elo difference, 
As we see the results are identical in strength.. 
And note that here both are NOT same eng copy!
                                    
1   Rems MPV Sep24   +20/-20/=560 50.00%  300.0/600 
2   Rems EXP 160824  +20/-20/=560 50.00%  300.0/600 

------------------------------------------------------

And here are one of last tests SF-POLY2 vs Rems MPV


Test via Balsa suite: SF-POLY2 10924a performed better!
                     
1   SF-POLY2 10924a  +15/-13/=518 50.18%  274.0/546
2   Rems MPV Sep24   +13/-15/=518 49.82%  272.0/546

Test via Unique suite: Rems MPV performed better!
                     
1   Rems MPV Sep24   +29/-25/=478 50.38%  268.0/532
2   SF-POLY2 10924a  +25/-29/=478 49.62%  264.0/532

Plus more tests are included in database...anyhow 
Here are overall results: identical in performance!
                 
1   SF-POLY2 10924a  +59/-58/=1615 50.03%  866.5/1732
2   Rems MPV Sep24   +58/-59/=1615 49.97%  865.5/1732

Conditions:
2x Epyc 7B12, CuteChess, 1 Core, Ponder OFF, 30s+0.6s, 64 MB Hash, 4-MEN
Note: As openings, Balsa  plus Unique suites are used...

GAMES:
https://mega.nz/file/XxhEURCK#buTFaVhRP ... Z-FDoKO7U4

As final words,
What I am trying to say over long past years to all of you:

In short, if I was new/amateur/beginner in Computer chess,
And If checking small number of games.. then I'd say such as
Oh yes... SF POLY is stronger or just opposite Rems is stronger ))

And I hope again and again,
All my data to be useful..sure just for computer chess progress!


Greetings
User avatar
Sedat Canbaz
Posts: 594
Joined: Mon Jun 19, 2023 8:49 am
Has thanked: 682 times
Been thanked: 2070 times

Re: Sedat Canbaz

Post by Sedat Canbaz »

UPDATE 2

Really sad... for example:
After checking more closely the latest Error margin test...
I've found several games lost on time by SF-POLY 210924a

Note also that about former Champion: SF-Poly 220723:
I never seen/noticed any game to be lost on time...
What does it mean ? why some of latest engines
Are became as worst (not so much stable)?
Dual Nets can not be reason.. because some
SF based ones (with dual nets) never loose on time...
But I wonder much ? Any opinions over these issues ?

By the way, the good news is that,
So far Mr. Eduard's engines seems be very stable..great !
E.g so far no any game is recorded to be lost on time !!

Ok...that's all for now...exc. all engines are played with move overhead: 400

Greetings
User avatar
Sedat Canbaz
Posts: 594
Joined: Mon Jun 19, 2023 8:49 am
Has thanked: 682 times
Been thanked: 2070 times

Re: Sedat Canbaz

Post by Sedat Canbaz »

Meanwhile,
I realized to quote one of my old posting/ ranking...
https://open-chess.org/viewtopic.php?f=4&p=34207#p34207

Who knows? older stats may help here...where in those times:
0 (zero) game is recorded on time loss (based on 27750 games)
Sedat Canbaz wrote: Thu Mar 07, 2024 7:27 am

Code: Select all

Rank Name              Elo    +    - games score oppo. draws 
   1 Brainlearn 27    3780    2    2  2800   51%  3776   95% 
   2 CoolIris 11.80   3779    2    3  2700   50%  3776   96% 
   3 RapTora 2.3      3779    3    3  2400   50%  3776   96% 
   4 SF-PB 080124     3778    2    2  2800   50%  3776   96% 
   5 Raid v3.4        3778    2    2  2800   50%  3777   95% 
   6 Brainlearn 26.5  3778    3    3  2200   50%  3776   96% 
   7 Polyfish 140124  3778    2    3  2700   50%  3776   97% 
   8 CoolIris 11.90   3778    3    3  2100   50%  3777   96% 
   9 SF-PB 051123     3778    3    3  2200   50%  3776   96% 
  10 Patzer AI X256   3778    3    3  2100   50%  3776   96% 
  11 Eman 9.90        3778    2    2  2800   50%  3777   96% 
  12 DarkSisTer 8.50  3777    3    3  2000   50%  3776   97% 
  13 SF POLY 261123   3777    2    2  2700   50%  3776   96% 
  14 Killfish 231123  3776    3    3  2000   50%  3776   97% 
  15 Incognito 5 Pro  3776    2    2  2700   50%  3776   95% 
  16 Hazard 3.78      3775    3    3  2000   50%  3776   96% 
  17 Tactical 281023  3775    3    3  2000   50%  3776   97% 
  18 SF POLY 220723   3775    3    2  2700   50%  3777   95% 
  19 SunLight 3       3773    3    3  2100   50%  3776   95% 
  20 XTD 010723       3773    3    3  2100   50%  3776   96% 
  21 SpecTral 5.50    3773    3    3  2100   49%  3776   95% 
  22 AWOL Z11         3773    3    3  2100   49%  3776   95% 
  23 ShashChess 34.6  3771    3    3  1900   49%  3778   94% 
  24 Sawfish 2TC      3768    3    3  1500   49%  3776   94% 
Conditions:
2x Epyc 7B12, CuteChess, 1 Core, Ponder OFF, 30s+0.6s, Balsa, 64 Hash, 4-MEN
Note: In the beginning is started at 30s+0.5s but later switched to 30s+0.6s
In other words, mostly of the current games are played at TC: 30sec + 0.6sec

GAMES:
https://mega.nz/file/D5ginASb#r0qRGjiBY ... mnuNINhpmk
Btw, If you need more data (as facts that all stable...) just let me know please...

For these reasons, again and again I wish to say..
Not always newer is better..and not everything as it seems!

Best,
Sedat
User avatar
Sedat Canbaz
Posts: 594
Joined: Mon Jun 19, 2023 8:49 am
Has thanked: 682 times
Been thanked: 2070 times

Re: Sedat Canbaz

Post by Sedat Canbaz »

Hello Chess Friends,

As usually, I'm very pleased to announce also that,
I managed to organize another new championship!
And what's new: each book contains 5600 games !!
In other words, I think that they deserve more...but
That's what I can do my best.. at least for nowadays!

Some notes about the current played Top book participants:
The Winners of SIZE tours: Small / Medium / Large / Giant
I know too that it is not so much fair...but anyhow, I think that
It's not so bad idea to be in fight each other, right ?) if nothing
Else mainly for fun..what I can add more, a lot of things but no
Free time for all, exc. Messi's old dated one (by Mr. Angel) proves
Again to all of us as to be the strongest under these conditions!!
Sure I'm impressed a lot by rest Top books too, for examples:
Super strong performance by SENTINEL 2409 despite its very
Small in size, plus its produced DrawRatio is lowest, just: 89%
Geralt is the only Public one, plus small + old dated...so nothing
Strange...that ranked at last place...but in 7th tour (via Cfish..):
Geralt is Geralt..where managed to be 3rd place...really good!

As other very important issue is that,
I realized to run many separate tours, played by various engines!
And via this testing method..now is much clear the influences e.g
Error margin and this is not all, we can compare Eng/Books Draw
Records as well...for more notes I suggest to read 'More Details'

XXXVI's GRAND Champion: Chucaro - Congrats to Angel Morano!!
My Congratulations to all rest Former Champions Authors as well!

For More Details, Full Standings etc:
https://sites.google.com/site/computers ... k-nn-cs-36

GAMES:
https://mega.nz/file/OppjxDTI#ZusW5Fi7K ... 8T3g-ho9io

That's all for now...thanks for your interest...

Best Regards,
Sedat Canbaz
User avatar
Sedat Canbaz
Posts: 594
Joined: Mon Jun 19, 2023 8:49 am
Has thanked: 682 times
Been thanked: 2070 times

Re: Sedat Canbaz

Post by Sedat Canbaz »

UPDATE

A new STAR is born, but belongs to brightest ones!
And the name of this great star is SF-PB 220324 SC
A super strong engine, plus so far the less drawish
Than all tested engines, which are close to 3800+
Really that means a lot ..especially for book tours!
One thing in SF-PB missing: not capable to use CTG..
But no one work is perfect and we've to be satisfied..

Meanwhile and just to be more clear,
SF-PB 220324 SC = SF-PB via nn-b1a57edbea57.nnue

And here are the latest new strength NN results:

Code: Select all

SF-PB 220324 SC Vs SF-CTG 150724: 9 Elo difference
Here we need more games, sure for accurate metrics..
                     
1   SF-PB 220324 SC  +31/-16/=553 51.25%  307.5/600
2   SF-CTG 150724    +16/-31/=553 48.75%  292.5/600

DrawRatio is normal: 92%, since played via strong lines
-------------------------------------------------------

Default (nn-1ceb1ade0001.nnue) Vs SC (nn-b1a57edbea57.nnue)

SF-PB 220324 SC Vs SF-PB 220324 Def: 0 Elo difference
In short: just great as we see identical (in strength)
                   
           
1   SF-PB 220324 Def  +22/-21/=879 50.05%  461.5/922
2   SF-PB 220324 SC   +21/-22/=879 49.95%  460.5/922


DrawRatio high: 95% but here it seems nn-1ceb1ade0001 
Played as serious role to appear more draws..because
According to SC's itself testings: the draws were 92%

And here is the mentioned SF-PB 220324 SC Draw Test:
Note: Played each other, sure with 2 other SF-PB eng
                     
1   SF-PB 220324 SC  +23/-20/=557 50.25%  301.5/600
2   SF-PB SC (Copy)  +20/-23/=557 49.75%  298.5/600

----------------------------------------------------

Last test: Vs Brainlearn 28.1, which has CTG future!
And theirs Elo difference is almost same..not so bad!
That means just in case CTG books will be played under 
More fair conditions.. since strength matters a lot!
                     
1   SF-PB 220324 SC  +19/-17/=756 50.13%  397.0/792
2   Brainlearn 28.1  +17/-19/=756 49.87%  395.0/792

Btw, here the draw ratio is high: 95%, but sometimes
Not all in my hands.. but I will see what I can do..
Sure for appearing 'less' Draw percentage values, but 
If running SF-PB 220324 SC (for all books) then in
Recent XXXVI CS is already proved as less drawish than
All Top engines, which are close to 3800 Elo points !!

Conditions:
2x Epyc 7B12, CuteChess, 1 Core, Ponder OFF, 30s+0.6s, Balsa/Unique, 64 Hash, 4-MEN
GAMES:
https://mega.nz/file/OlB0HDxS#_PWsDKR9Z ... uXCyorOcdw

Meantime, I'd happy also the programmers to make theirs
Best too..sure for appearing less draws as well..Reminder:
I am just a simple Tester/TD here... no more no less... !)

And as a last note,
I've tested many more engines..but they were out..as
Reason: they are more drawish..and this is not all..
Some are not so stable..e.g time forfeits...rarely, but..
Or crashing in Gauntlet..and it seems they need some
Optimizations on fast + modern hardwares, if nothing
Else on 2x EPYC 7B12 (with 256 Threads / 128 Cores)
But the good news is that, current tested Top engines
Are stable..at least so far!! You know, not easy.. e.g
Playing at Bullet (30s+0.6s) + High Concurrency (64)!

Thanks for reading and have a nice weekend )

Greetings
User avatar
Sedat Canbaz
Posts: 594
Joined: Mon Jun 19, 2023 8:49 am
Has thanked: 682 times
Been thanked: 2070 times

Re: Sedat Canbaz

Post by Sedat Canbaz »

UPDATE 2

Just one more testing...

SF-PB 220324 SC vs Rems EXP 160824: + 6 Elo (in favor for SC)

Code: Select all

1   SF-PB 220324 SC  +50/-34/=916 50.80%  508.0/1000
2   RemsEXP 160824   +34/-50/=916 49.20%  492.0/1000
On other hand, here I am slightly surprised..e.g normally newer
Should be better (I mean the newer ones have to be stronger...)

Btw, as you may see too, this time:
Both Top engines are produced the lowest draw values: 91% great !!

Note also that
Rems EXP played as without Eng Learning (as all other engines)
Plus for all are used same conditions (such as 30s+0.6s etc.)
Be aware that all played games are included in previous post..

Best,
Sedat
Post Reply