Page 62 of 70

Re: Sedat Canbaz

Posted: Fri Sep 27, 2024 8:48 pm
by Sedat Canbaz
One thing more and once more:
Thanks to all eng authors as well, otherwise I would not run...
Special thanks to authors: Cfish, Berserk, Spectral, Shashchess!

One note more,
Unfortunately all my tested Eman engines series are crashed
And buggy on my tournament machine via Eman exp (1.5+ GB)
It seems Eman eng needs serious optimization on modern machines!
At least in case of using HUGE exp files..sure in Gauntlet mode etc.
And these crashes are appearing in beginning of test..Cutechess is
Directly terminating...sad really..we are in 2024.. but you see...

But the good news is that,
Spectral is Spectral... this great engine is stable via Eman exp as well!!
It does not matter in Gauntlet, Round-Robin etc. well-done to Mr. Anton

And keep up the great work!

Greetings )

Re: Sedat Canbaz

Posted: Sat Sep 28, 2024 12:41 pm
by Sedat Canbaz
Hello there,

SCCT - Unofficial Test with OrgZ's latest Top engines 2024!
Simply target is to check which OrgZ engines are better..!?

1st of all, let's start with good news:
Great performance comes by one of my favorite engines: SF-POLY!
Well-done to Mr. Tanick Ramz !

The bad news is that,
NON of them can use CTG books... so I hope next SF-POLY ver to use...
At least one of them, if not...there will be BIG miss, thanks in advance

Code: Select all

1   SF-POLY2 10924a    102.0 - 98.0102.5 - 97.5103.5 - 96.5**    308.0/600
2   JigSaw 6.0         98.0 - 102.0102.0 - 98.098.5 - 101.5 **   298.5/600
3   Private 19Sf VICE  97.5 - 102.598.0 - 102.0101.5 - 98.5  **  297.0/600
4   Sailfish 3         96.5 - 103.5101.5 - 98.598.5 - 101.5   ** 296.5/600
GAMES:
https://mega.nz/file/TwxmgI5Q#NTYlZRhwp ... DiRS0bdpG0


Conditions:
2x Epyc 7B12, CuteChess, 1 Core, Ponder OFF, Balsa/Unique, 30s+0.6s, 64 Hash, 4-MEN

More Details:
Sailfish seems to be not so stable: lost 1 game on time,
Where all rest so far are stable..no games lost on time!

Greetings

Re: Sedat Canbaz

Posted: Sun Sep 29, 2024 2:01 pm
by Sedat Canbaz
UPDATE

1st of all, I was wondering e.g up to 600 games:
Is enough data (per player under these Bullet cond.)?

And here are the details, with facts of course (not papers):
Well, according to my experiences I strongly believe in that,
Sometimes small number of games ok, but usually not enough !!
It's much better to be played min 1000-1500 games (per player)
Here again and again I refer for strong opening suites..otherwise,
Many more thousands of games (per player) required...such as 5000+
Sure if running many more games will be much better...actually my
Fingers are tired of explaining.. but what is changing?..not so much..
Yes..there will be same story as in past, we'll see some engines which
Will forfeit on time...We will see also such testers who will run small
Number of games ...anyhow, maybe this time I will be successful..
I know it's hard but there is nothing impossible.., right ?)
More over, you know: my target is very simple: I'm trying only to help!

This is other question of course, but
How many of us (TD / Testers) prefer thousands of games (per player)?
Not many for sure unfortunately..but it's all right... plus no any one is
Forcing us for more right/accurate metrics..but I think that we should
Do our best for more valid data.. such as to increase number of games!

And here are latest tests and see what is going on:

Code: Select all

1st test: here it seems all right: 1 Elo difference, 
I mean very accurate..since both are same eng copy...
                     
1   SF-POLY2 Copy)   +11/-10/=579 50.08%  300.5/600
2   SF-POLY2 10924a  +10/-11/=579 49.92%  299.5/600

------------------------------------------------------

2nd test: it's all right again: 2 Elo difference, 
I mean not bad at all..since both are same eng copy!
                   
1   SF-CTG (Copy)   +10/-8/=319 50.30%  169.5/337
2   SF-CTG  150724  +8/-10/=319 49.70%  167.5/337

------------------------------------------------------

3rd test: it's NOT all right !!  7 Elo difference, 
I mean bad at all... since both are same eng copy!
                   
                     
1   Incognito(Copy)  +23/-11/=566 51.00%  306.0/600
2   Incognito5 pro   +11/-23/=566 49.00%  294.0/600

------------------------------------------------------

4rd test: here no much idea, but 0 Elo difference, 
As we see the results are identical in strength.. 
And note that here both are NOT same eng copy!
                                    
1   Rems MPV Sep24   +20/-20/=560 50.00%  300.0/600 
2   Rems EXP 160824  +20/-20/=560 50.00%  300.0/600 

------------------------------------------------------

And here are one of last tests SF-POLY2 vs Rems MPV


Test via Balsa suite: SF-POLY2 10924a performed better!
                     
1   SF-POLY2 10924a  +15/-13/=518 50.18%  274.0/546
2   Rems MPV Sep24   +13/-15/=518 49.82%  272.0/546

Test via Unique suite: Rems MPV performed better!
                     
1   Rems MPV Sep24   +29/-25/=478 50.38%  268.0/532
2   SF-POLY2 10924a  +25/-29/=478 49.62%  264.0/532

Plus more tests are included in database...anyhow 
Here are overall results: identical in performance!
                 
1   SF-POLY2 10924a  +59/-58/=1615 50.03%  866.5/1732
2   Rems MPV Sep24   +58/-59/=1615 49.97%  865.5/1732

Conditions:
2x Epyc 7B12, CuteChess, 1 Core, Ponder OFF, 30s+0.6s, 64 MB Hash, 4-MEN
Note: As openings, Balsa  plus Unique suites are used...

GAMES:
https://mega.nz/file/XxhEURCK#buTFaVhRP ... Z-FDoKO7U4

As final words,
What I am trying to say over long past years to all of you:

In short, if I was new/amateur/beginner in Computer chess,
And If checking small number of games.. then I'd say such as
Oh yes... SF POLY is stronger or just opposite Rems is stronger ))

And I hope again and again,
All my data to be useful..sure just for computer chess progress!


Greetings

Re: Sedat Canbaz

Posted: Sun Sep 29, 2024 6:03 pm
by Sedat Canbaz
UPDATE 2

Really sad... for example:
After checking more closely the latest Error margin test...
I've found several games lost on time by SF-POLY 210924a

Note also that about former Champion: SF-Poly 220723:
I never seen/noticed any game to be lost on time...
What does it mean ? why some of latest engines
Are became as worst (not so much stable)?
Dual Nets can not be reason.. because some
SF based ones (with dual nets) never loose on time...
But I wonder much ? Any opinions over these issues ?

By the way, the good news is that,
So far Mr. Eduard's engines seems be very stable..great !
E.g so far no any game is recorded to be lost on time !!

Ok...that's all for now...exc. all engines are played with move overhead: 400

Greetings

Re: Sedat Canbaz

Posted: Sun Sep 29, 2024 6:41 pm
by Sedat Canbaz
Meanwhile,
I realized to quote one of my old posting/ ranking...
https://open-chess.org/viewtopic.php?f=4&p=34207#p34207

Who knows? older stats may help here...where in those times:
0 (zero) game is recorded on time loss (based on 27750 games)
Sedat Canbaz wrote: Thu Mar 07, 2024 7:27 am

Code: Select all

Rank Name              Elo    +    - games score oppo. draws 
   1 Brainlearn 27    3780    2    2  2800   51%  3776   95% 
   2 CoolIris 11.80   3779    2    3  2700   50%  3776   96% 
   3 RapTora 2.3      3779    3    3  2400   50%  3776   96% 
   4 SF-PB 080124     3778    2    2  2800   50%  3776   96% 
   5 Raid v3.4        3778    2    2  2800   50%  3777   95% 
   6 Brainlearn 26.5  3778    3    3  2200   50%  3776   96% 
   7 Polyfish 140124  3778    2    3  2700   50%  3776   97% 
   8 CoolIris 11.90   3778    3    3  2100   50%  3777   96% 
   9 SF-PB 051123     3778    3    3  2200   50%  3776   96% 
  10 Patzer AI X256   3778    3    3  2100   50%  3776   96% 
  11 Eman 9.90        3778    2    2  2800   50%  3777   96% 
  12 DarkSisTer 8.50  3777    3    3  2000   50%  3776   97% 
  13 SF POLY 261123   3777    2    2  2700   50%  3776   96% 
  14 Killfish 231123  3776    3    3  2000   50%  3776   97% 
  15 Incognito 5 Pro  3776    2    2  2700   50%  3776   95% 
  16 Hazard 3.78      3775    3    3  2000   50%  3776   96% 
  17 Tactical 281023  3775    3    3  2000   50%  3776   97% 
  18 SF POLY 220723   3775    3    2  2700   50%  3777   95% 
  19 SunLight 3       3773    3    3  2100   50%  3776   95% 
  20 XTD 010723       3773    3    3  2100   50%  3776   96% 
  21 SpecTral 5.50    3773    3    3  2100   49%  3776   95% 
  22 AWOL Z11         3773    3    3  2100   49%  3776   95% 
  23 ShashChess 34.6  3771    3    3  1900   49%  3778   94% 
  24 Sawfish 2TC      3768    3    3  1500   49%  3776   94% 
Conditions:
2x Epyc 7B12, CuteChess, 1 Core, Ponder OFF, 30s+0.6s, Balsa, 64 Hash, 4-MEN
Note: In the beginning is started at 30s+0.5s but later switched to 30s+0.6s
In other words, mostly of the current games are played at TC: 30sec + 0.6sec

GAMES:
https://mega.nz/file/D5ginASb#r0qRGjiBY ... mnuNINhpmk
Btw, If you need more data (as facts that all stable...) just let me know please...

For these reasons, again and again I wish to say..
Not always newer is better..and not everything as it seems!

Best,
Sedat

Re: Sedat Canbaz

Posted: Fri Oct 04, 2024 9:36 am
by Sedat Canbaz
Hello Chess Friends,

As usually, I'm very pleased to announce also that,
I managed to organize another new championship!
And what's new: each book contains 5600 games !!
In other words, I think that they deserve more...but
That's what I can do my best.. at least for nowadays!

Some notes about the current played Top book participants:
The Winners of SIZE tours: Small / Medium / Large / Giant
I know too that it is not so much fair...but anyhow, I think that
It's not so bad idea to be in fight each other, right ?) if nothing
Else mainly for fun..what I can add more, a lot of things but no
Free time for all, exc. Messi's old dated one (by Mr. Angel) proves
Again to all of us as to be the strongest under these conditions!!
Sure I'm impressed a lot by rest Top books too, for examples:
Super strong performance by SENTINEL 2409 despite its very
Small in size, plus its produced DrawRatio is lowest, just: 89%
Geralt is the only Public one, plus small + old dated...so nothing
Strange...that ranked at last place...but in 7th tour (via Cfish..):
Geralt is Geralt..where managed to be 3rd place...really good!

As other very important issue is that,
I realized to run many separate tours, played by various engines!
And via this testing method..now is much clear the influences e.g
Error margin and this is not all, we can compare Eng/Books Draw
Records as well...for more notes I suggest to read 'More Details'

XXXVI's GRAND Champion: Chucaro - Congrats to Angel Morano!!
My Congratulations to all rest Former Champions Authors as well!

For More Details, Full Standings etc:
https://sites.google.com/site/computers ... k-nn-cs-36

GAMES:
https://mega.nz/file/OppjxDTI#ZusW5Fi7K ... 8T3g-ho9io

That's all for now...thanks for your interest...

Best Regards,
Sedat Canbaz

Re: Sedat Canbaz

Posted: Fri Oct 04, 2024 1:49 pm
by Sedat Canbaz
UPDATE

A new STAR is born, but belongs to brightest ones!
And the name of this great star is SF-PB 220324 SC
A super strong engine, plus so far the less drawish
Than all tested engines, which are close to 3800+
Really that means a lot ..especially for book tours!
One thing in SF-PB missing: not capable to use CTG..
But no one work is perfect and we've to be satisfied..

Meanwhile and just to be more clear,
SF-PB 220324 SC = SF-PB via nn-b1a57edbea57.nnue

And here are the latest new strength NN results:

Code: Select all

SF-PB 220324 SC Vs SF-CTG 150724: 9 Elo difference
Here we need more games, sure for accurate metrics..
                     
1   SF-PB 220324 SC  +31/-16/=553 51.25%  307.5/600
2   SF-CTG 150724    +16/-31/=553 48.75%  292.5/600

DrawRatio is normal: 92%, since played via strong lines
-------------------------------------------------------

Default (nn-1ceb1ade0001.nnue) Vs SC (nn-b1a57edbea57.nnue)

SF-PB 220324 SC Vs SF-PB 220324 Def: 0 Elo difference
In short: just great as we see identical (in strength)
                   
           
1   SF-PB 220324 Def  +22/-21/=879 50.05%  461.5/922
2   SF-PB 220324 SC   +21/-22/=879 49.95%  460.5/922


DrawRatio high: 95% but here it seems nn-1ceb1ade0001 
Played as serious role to appear more draws..because
According to SC's itself testings: the draws were 92%

And here is the mentioned SF-PB 220324 SC Draw Test:
Note: Played each other, sure with 2 other SF-PB eng
                     
1   SF-PB 220324 SC  +23/-20/=557 50.25%  301.5/600
2   SF-PB SC (Copy)  +20/-23/=557 49.75%  298.5/600

----------------------------------------------------

Last test: Vs Brainlearn 28.1, which has CTG future!
And theirs Elo difference is almost same..not so bad!
That means just in case CTG books will be played under 
More fair conditions.. since strength matters a lot!
                     
1   SF-PB 220324 SC  +19/-17/=756 50.13%  397.0/792
2   Brainlearn 28.1  +17/-19/=756 49.87%  395.0/792

Btw, here the draw ratio is high: 95%, but sometimes
Not all in my hands.. but I will see what I can do..
Sure for appearing 'less' Draw percentage values, but 
If running SF-PB 220324 SC (for all books) then in
Recent XXXVI CS is already proved as less drawish than
All Top engines, which are close to 3800 Elo points !!

Conditions:
2x Epyc 7B12, CuteChess, 1 Core, Ponder OFF, 30s+0.6s, Balsa/Unique, 64 Hash, 4-MEN
GAMES:
https://mega.nz/file/OlB0HDxS#_PWsDKR9Z ... uXCyorOcdw

Meantime, I'd happy also the programmers to make theirs
Best too..sure for appearing less draws as well..Reminder:
I am just a simple Tester/TD here... no more no less... !)

And as a last note,
I've tested many more engines..but they were out..as
Reason: they are more drawish..and this is not all..
Some are not so stable..e.g time forfeits...rarely, but..
Or crashing in Gauntlet..and it seems they need some
Optimizations on fast + modern hardwares, if nothing
Else on 2x EPYC 7B12 (with 256 Threads / 128 Cores)
But the good news is that, current tested Top engines
Are stable..at least so far!! You know, not easy.. e.g
Playing at Bullet (30s+0.6s) + High Concurrency (64)!

Thanks for reading and have a nice weekend )

Greetings

Re: Sedat Canbaz

Posted: Fri Oct 04, 2024 2:30 pm
by Sedat Canbaz
UPDATE 2

Just one more testing...

SF-PB 220324 SC vs Rems EXP 160824: + 6 Elo (in favor for SC)

Code: Select all

1   SF-PB 220324 SC  +50/-34/=916 50.80%  508.0/1000
2   RemsEXP 160824   +34/-50/=916 49.20%  492.0/1000
On other hand, here I am slightly surprised..e.g normally newer
Should be better (I mean the newer ones have to be stronger...)

Btw, as you may see too, this time:
Both Top engines are produced the lowest draw values: 91% great !!

Note also that
Rems EXP played as without Eng Learning (as all other engines)
Plus for all are used same conditions (such as 30s+0.6s etc.)
Be aware that all played games are included in previous post..

Best,
Sedat

Re: Sedat Canbaz

Posted: Sat Oct 05, 2024 11:25 am
by Sedat Canbaz
Hello there,

1st of all, just I'd like to inform you that
Cutechess 1.2.0 is one of best/most stable GUIs!
At least it's more stable than Cutechess 1.3.1

And why I say like that...? sure according to my tests!
At least, all latest new produced results indicate this!
And depending on Chess GUIs: we may see instability
Issues... and sure I am not going to re-test all GUIs...
Or to count all etc..but at least I wish to say that
Cutechess 1.3.1 is quite sensitive, not so so stable!
In same time, I am not going to share all the previous
SCCT GUI testings..but here I've some experience too,
Not much..but I have...where we've noticed also that
Not all of tested engines or tested GUIs are very stable!
Sure it all depends... but not always GUIs are as reason!
Sometimes, the engines can be as reason too...you know,
There is no any fixed formula over these stability issues!

And to be more clear (about all recent time forfeits),
Especially if via High Concurrency games + Bullet TC
I started to be afraid a lot.. such as again will appear..)
Because latest SCCT tours played under Cutechess 1.3.1
But before (e.g several months ago..) I've used to play
Mainly Cutechess 1.2.0 GUI and those times...the games
Were much more stable, at least I can't remember to appear
Often time loses by latest Top engines...sure in next days,
Weeks, months...the picture will be more clear...in short,
Time will tell...

Btw, after re-testing some Top engines which are lost on time
Under Cutechess 1.3.1, sure as next under Cutechess 1.2.0 GUI:
All of these times forfeits are disappeared...really good news!
So many troubles..sorry here..but trying newer GUI ver sounds
Good.. but not always..at least not via newer Cutechess series!
And now I wonder much too: who will pay my electricity bills ?)
Just joking...))

As final words,
Not all, but mostly engines are stable under 1.3.1 too!
But this is also true that some of the Top chess engines
Suffer under Cutechess 1.3.1, at least on my tournament
Machine..because all these time forfeits shoudl not be
As reason such as only engine bugs...In other words,
What about GUI bugs? all of them are so much stable?
I hardly doubt..because according to many GUI testings,
Sure I refer from past to present..I am perfectly aware:
Some Chess GUIs can play as serious bad role as well..

Note also that
I tried also some of latest Cutechess pre-releases too..
But they are not so so stable too..I mean similar story (
Sure rarely.. but the same engines produce time losses etc
Where via Cutechess 1.2.0, so far it seems all stable!!
If nothing else..with my tested Top engines so far...

And please stay tuned.. .soon as possible I hope to
Share new tests but played under Cutechess 1.2.0

Greetings

Re: Sedat Canbaz

Posted: Sat Oct 05, 2024 1:49 pm
by Sedat Canbaz
UPDATE

1st of all,
No any game is lost on time..in short, just great!
As reason, very likely Cutechess 1.2.0 played as BIG
Influences...there is no other explanation of that...

And for anyone missed, once more:
All previous tests, tours are played under Cutechess 1.3.1
But now under Cutechess 1.2.0 .. so latest test are done
Just only for comparing..you know..I'm eng/gui doctor too )

On other hand,
Sorry to say..but it's pity that exc. Engines and Books,
Nowadays I've started testing GUI stability, influences ..
Actually nothing new... sometimes I run GUI testings..
And in this way, we can compare GUIs influences too!

Yes, we are in 2024, but unfortunately still many bugs!
I don't know about all recent as Engine or GUI bug/s..
But there is one true: under these new tested cond.
All worked flawlessly...so it seems Chess GUIs can
Play as serious and important roles over results!

1st test, but this time is played under CuteChess 1.2.0:

Rem EXP vs SF-PB SC: 1 Elo difference (almost identical)

Code: Select all

1   SF-PB 220324 SC  +42/-40/=918 50.10%  501.0/1000
2   RemsEXP 160824   +40/-42/=918 49.90%  499.0/1000
Draw Ratio is lowest again just: 91% (as in previous test)
But this is also true that via this test is more reliable..!!
Sure I refer about theirs strength, performances etc.
Btw, very likely Cutechess 1.3.1 did not like a lot Rems ))
You may know.in previous test, there was 6+ Elo diff.
---------------------------------------------------------

2nd test: It's played under CuteChess 1.2.0 too

TR vs SC: 5 Elo difference (in favor for SC)

Code: Select all

1   SF-PB 220324 SC  +43/-29/=928 50.70%  507.0/1000
2   Artemis TR       +29/-43/=928 49.30%  493.0/1000
Draw Ratio is little bit high: 93% , but acceptable..
Since there are such engines which produce much more!
But again surprise..older is overcome newer..strange..

----------------------------------------------------------

3rd test: it's played under CuteChess 1.2.0 too
Actually since today, all played under v 1.2.0

SF-POLY vs SF-PB SC: 3 Elo diff (in favor for SF-POLY)
Oh...finely I feel much better...otherwise I'd switch
To another older GUIs )), because till this test, SC
Managed to overcome almost all newer releases...!)

Code: Select all

1   SF-POLY 210924a  +37/-29/=934 50.40%  504.0/1000
2   SF-PB 220324 SC  +29/-37/=934 49.60%  496.0/1000
------------------------------------------------------

SF-PB SC DRAW Test (sure under CuteChess 1.2.0 too)

High draw values this time... as we see 95% draw-ratio

Code: Select all

1   SF-PB SC (Copy)  +16/-14/=570 50.17%  301.0/600
2   SF-PB 220324 SC  +14/-16/=570 49.83%  299.0/600
GAMES:
https://mega.nz/file/uhhkkBLZ#NEB14lQYg ... FDhp6IhStA

And as last notes,
I've produced many more tests, but not required to be
Shared...at least now we know that all stable so far!

On other hand, and as I stated earlier:
Especially since NN era, we see many more bugs..
Sure I admit that we see Eng strength too..and
Who knows about tomorrow? what's waiting us?
But after all,
Let's hope only for good news ..otherwise:
Be ready and face your future with no fear !
And just do not do same mistake as twice...

That's why and once more..
I have a very small request from all ENG/GUI programmers:
Be sure, run serious beta testings (before final releases)!
Thanks in advance..if not, why each time I have to check ?)
It is your turn...and wishing good luck...

Best,
Sedat