At the recently concluded Asia Student Cluster Challenge, Tsinghua University of China posted a competition record 7.579 Tflop/s, narrowly edging out home team Shangai Jiao Tong University’s 7.430 mark. Taiwan’s National Tsing Hua University (NTHU) took third place with 6.519 Tflop/s. NTHU also finished second for the Overall Award, winning the Silver Prize and a cool 50k RMB ($8,000).
This new record more than doubles the 3.014 TF LINPACK record set by China’s National University of Defense Technology (NUDT) at SC12 last November. It’s a much larger performance delta than we’ve seen before and quite a bit higher than what you’d expect to see from hardware that’s only six months newer.
We don’t have the hardware configurations for each team as we usually do, but we know that the Tsinghua config utilized GPUs (I’d assume Keplers, given the performance), and the team certainly knew how to use them. Inspur, a major sponsor of the event, offered Xeon E5-2650 dual-socket nodes to competitors for use in the tourney. They also made available Intel Phi co-processors, Mellanox ConnectX-3 HCA cards, and Infiniband and/or GbE switches.
Teams could use the Inspur-provided hardware or bring their own components – or entire systems. Tsinghua added GPUs to their system and, knowing the teams, I would bet that NTHU and NUDT did as well. But the big leap forward in LINPACK can’t be explained by faster hardware alone. Team Tsinghua must have heavily optimized the LINPACK routines in order to get such a big performance bump.