Galaxy s9 exynos 9810 hands-on – awkward first results small apple nutrition

Following our launch article I promised an update on the performance scores of the exynos 9810 variant of the galaxy S9. I was able to have some time with one of the demo devices at the launch event and thoroughly benchmark it with a few of our common tests. Samsung exynos socs specifications

As a refresher, early in the year samsung LSI had dropped a bombshell in claiming an astounding 2x single-thread performance improvement with the new exynos 9810. While this initially caused a lot of controversy and discussions on the validity of the claim, early this year we exclusively covered the high-level micro-architectural features of the new exynos M3 core and by then it was clear that the performance claims were not just marketing claims. The new samsung CPU core is the first “very wide” CPU microarchitecture to power android socs and the first to finally follow apple’s footsteps in the direction of maximising single-thread performance.Benchmark results

as a result it stands to be a very interesting – and ideally very powerful – soc for the android market. Determining clock speeds

Firstly one of the biggest questions for me was confirming the final clock that samsung would use on the galaxy S9. We detected the clock as 2704 mhz, which is 200mhz less than the 2.9 ghz that samsung’s LSI division advertises for the chipset. What makes the story more compelling is that the 2.7 ghz clock is only achievable when one of the cores in the cluster is active – thus making samsung employ scalable maximum frequencies depending on active core numbers in the big cluster. At two active cores the frequency drops down to 2314 mhz while three and four active cores the cores clock down to only 1794 mhz.

We can also confirm that the mali G72MP18 GPU is running at a very conservative 572mhz. This is not what we had expected – the previous generation exynos 8895 had a larger MP20 configuration, running at a similar 546mhz.Android socs the resulting performance gains for the GPU thus seem to be even lower than we had expected, as I was betting on a ~650-700 mhz clock for the graphics. Memory latency

I was also able to confirm the cache configurations of the cpus with help of our latency test. The L1D cache of the M3 cores is 64KB, up from the 32KB on the previous generation. The M3 cores also come with 512KB of private L2 caches, and a shared 4MB L3 cache.

The little A55 cores came at a surprise as they look to be in a separate cluster, rather than in a single dynamiq cluster with the big cores. This creates something similar to a big.Little design, but each part of the 4+4 is its own dynamiq cluster. So here it looks like samsung has decided not to employ the optional L2 caches for the cortex a55s, and instead the cluster solely relies on a shared 512KB L3 cache of the DSU. The latency scores to DRAM are outlandishly good and the best we’ve ever seen among current android socs, so samsung has definitely introduced a new generation of interconnect or memory controllers.Benchmark results parsing the benchmark results: geekbench looks good

In our testing we were able to confirm the geekbench 4 scores already leaked, where we saw the exynos 9810 achieving excellent performance gains and vastly outpacing the snapdragon 845, and coming into the territory of the apple A10 and A11. Meanwhile versus the last-generation exynos 8895, the floating point performance increases handily exceed samsung’s projected gains of 2x as we see a 114% improvement even at the lowered 2.7GHz frequency.

When looking at the performance per clock it is clear how the exynos M3 distinguishes itself as a much wider microarchitecture compared to any other existing CPU which powers android socs. Parsing the benchmark results: pcmark and web tests

Finally I stumbled upon some very questionable performance figures when testing system performance. I’m not going to go into the details for every benchmark as they are generally all painting the same picture:

exynos 8895

What seems clear is that there is something is very very wrong with the exynos 9810 S9+ that I tested. It was barely able to distinguish itself from last year’s exynos 8895, let alone the snapdragon 845 in the qualcomm reference device which we previewed earlier this month. I looked through the system and monitored frequencies and indeed the big cores were reaching the maximum 2.7GHz core frequency. The only explanation I have right now is that it’s possible that the DVFS configuration, as well as the scheduler, are currently so conservatively tuned that there is barely any activity on the big cores.

I dug a bit more through the system and found out samsung uses some new scheduler called “ehmp”. I’m not sure if this is something based on EAS but the system did use schedutil as a frequency governor.

One of the samsung spokesmen confirmed to me that the demo unit were running special firmware for MWC and that they might not be optimized.Android socs I’m having a bit of a hard time believing they would so drastically limit the performance of the device for the show demo units and less so that they would mess around with the scheduler settings. I did get confirmation that samsung is planning to “tune down” the exynos variant to match the snapdragon performance – however the current scores which I got on these devices make absolutely no sense so I do hope this is just a mistake that will be resolved in shipping firmwares and we see the full potential of the soc. Parsing the benchmark results: graphics

On the GPU side, the lower cluster count of the new mali G72MP18 is a surprise, as the minor clock bump is negated by the fact that the new soc has two less GPU cores compared to the 8895. If the performance per clock per core between the G71 and G72 were the same then this would actually mean a downgrade in raw GPU power from the exynos 8895, so any increase, if any, should come solely thanks to the architectural changes of the new G72 GPU, power efficiency improvements, as well as possibly soc memory subsystem improvements.Exynos 8895

In T-rex, the increase is 18% which might be one of the benchmarks that samsung sourced their 20% improvement from. Here the exynos is more near to the performance of the snapdragon 845. Measuring power

I wasn’t able to properly measure power on the event demo devices, as they had different interface settings than my tool had been programmed with, so I only was able to make some inaccurate estimates based on coarse current readout from the system.

For CPU workloads, our usual CPU power virus used up 3.1W at 1-core 2.7 ghz loads. 2-core 2.3 ghz seemed to have floated around 3.1-3.5W, and a 4-core load at 1.8 ghz maintained this power consumption.

Over the following days I will need more time, and hopefully get some SPEC figures to paint a more accurate picture. For now the results could swing either way and be either positive or negative for the M3 cores. It’s clear that the higher frequencies have a very large power penalty, and samsung should want to operate more in the low-to-mid frequencies, hence the current frequency scheme.Android socs

On the GPU side for manhattan fluctuated between 4.5 and 5.2W, which is an improvement over the exynos 8895. But again, this is still at a disadvantage compared to the snapdragon 845. Quick thoughts

Overall today’s quick benchmarking session opened up more questions than it managed to answer. Hopefully with more time we will be able to investigate the working of the new soc and, fingers crossed, today’s results are not representative of shipping product as that would otherwise be an utterly massive disappointment.

• lilmoe – sunday, february 25, 2018 – link I wasn’t really surprised. Samsung has always used a relatively conservative DVFS/scheduler compared to others for various reasons. I wouldn’t bet on this being "fixed" in production units, unless android gets a major overhaul in UI rendering efficiency and touch latency.

What this has been confirmed to me, though, is that all the benchmarks where the m3s are showing "odd" results employ burst loads that are too short to make any tangle conclusions of platform performance.Benchmark results

• iancutress: @hpc_guru @nvidia dont forget for the extra $$ you get a lot more memory, a lot more storage, and all that intercon… https://t.Co/wd8wzpcyz1

• ryansmithat: @filipe_alves_pt @tftcentral no news at this point. Though I would expect to see the G-sync versions first. NV has… https://t.Co/b2zwkz63yi

• iancutress: @padraigbelton every time I sign up to a big industry event (CES etc), I say ‘no’ to unwanted press emails. Still g… https://t.Co/qrxad81xg3

• ryansmithat: @iancutress @filipe_alves_pt can confirm, as I sign his paycheck. In the matters of advertising/revenue ian is purp… https://t.Co/xuypxzsxxs

• iancutress: @filipe_alves_pt I have no financial ties. Advertising is handled by our publisher, to which I have zero contact wi… https://t.Co/tfqhmpasjn

• ryansmithat: just noticed that the openpower summit isn’t taking place at GTC this year; apparently it was last week in las vega… https://t.Co/uxualpawhp

benchmark results

• andreif7: @iancutress @thracks the P20 does better in light capture but absolutely loses in detail to the S9 in low-light. Yo… https://t.Co/mgqw0iqqg7

• andreif7: P20 pro camera comparison against S9+ – it’s not all as rosy as huawei describes, the S9 still wins in detail in lo… https://t.Co/lu8yiao5bg

• ryansmithat: @iancutress the DGX-2’s value is almost entirely in scale. If you have a project that scales well between gpus in a… https://t.Co/8nkhtcemuu

• ryansmithat: so no new gpus from NVIDIA today. But that was always going to be a long shot. Consumer gpus will happen, and they… https://t.Co/gtjd5sdorw

• ganeshts: @david_schor @tmfchipfool @grbuf8quqyb4ipy @theaerialace it also depends a lot on the CPU itself. I have some limit… https://t.Co/pse5t4l3lf

• ganeshts: @fpiednoel they do have a blog post that does an extensive comparison : https://t.Co/ablvftp7wp , but, I think this… https://t.Co/vh0yxqsqyx

benchmark results

• ganeshts: @dave_59 would you suggest using VPI to determine the list of flops in the fan-in cone of a particular module outpu… https://t.Co/zusnwpszk2

• ganeshts: @grbuf8quqyb4ipy @david_schor @tmfchipfool @witeken it is not as simple as that 🙂 in any case, intel & AMD would d… https://t.Co/ygwykdp9rp