The first task that students have to perform at the ASC16 Student Cluster Competition is to run the venerable HPL and newish HPCG benchmarks. HPL, also known as LINPACK, is a routine that measures floating point performance and is the basis for the TOP500 list. HPCG solves a 3D sparse matrix linear system using the conjugate gradient method. It’s much more of a real-world test of the demands that today’s applications put on systems.
Together, LINPACK and HPCG are performance bookends. LINPACK shows maximum performance of a system performing only floating point operations, and HPCG displays what is most likely the lowest performance of the system on very demanding tasks. Real-world mileage will probably be somewhere in between.
That said, we want to check in with the teams to see how they’re doing with these two benchmarks. First up, the kids running the smaller clusters…
Team Boston ran into some trouble getting their cluster running at full steam, so they ended up running LINPACK on only five out of their six nodes. The sixth node looks to be out for the count as the team can’t seem to get it working with MPI and are still having problems with Infiniband. They also gave up on using their Phi accelerators in the interest of simplifying their configuration.
Northwestern Polytechnic doesn’t know a whole lot of English, but they seem to have mastered getting LINPACK and HPCG running just fine. Not a lot to report from them, all seems to be going along as it should.
The interview with Team Zhejiang once again proves that I can’t pronounce the word “Zhejiang.” This team has been diligently working to max out their LINPACK score, running different versions, different compilers, and the like. They’re going at it with four nodes and eight NVIDIA K80 accelerators. They’ve already beaten their record from last year but are looking to wring out even more performance from their gear.
We talk again with our old buddy Freeman from Hong Kong Baptist University. They’ve also been pounding the hell out of LINPACK in an effort to take home the highest LINPACK award. At filming time, they had topped 10 GFlops, which was the highest score we had heard so far through back channels. In fact, that score is in the neighborhood of the Student Cluster Competition record set at last year’s ISC’15 competition.
It looks like Team Hong Kong and Team Zhejiang are both poised to take the LINPACK crown and potentially a win on HPCG (although there’s no award for it).
Let’s talk to the students with the mid-sized clusters next…
Team Hungary had a good LINPACK run but are seeing a bit more trouble with HPCG – mainly the code sucking more power than anticipated. The team is running eight dual-processor nodes, a traditional cluster and, like we say in the video, like a good Grandma-made goulash.
Team Shanghai thinks they got the most out of their gear when it comes to LINPACK and HPCG. They ended up running six nodes with six K80 accelerators, which is a pretty beefy configuration for 3,000 watts. But maybe that’s the right config to nail down the prize.
For Team Dalian, it was the first time running these benchmarks in a real competition. They pulled a solid 9.54 TFlops LINPACK score. They ran LINPACK first thing in the morning when the temperature was low, then rode the 3,000KW line at an astounding 2,995 watts. Great work for a first-time team.
Now the heavyweights weigh in…
All is well with Beihang University when we’re checking in with them. They have LINPACK and HPCG in the bag and are starting to run the MANSUM_WAM application – which has a pretty big data set from what the cluster warriors report.
EAFIT, or Team Colombia, is riding the line with their nine-node cluster. They figure that they’ll be in the 5-ish range for HPL, which isn’t bad but won’t be enough to put them in the winner’s circle. They’re big fans of the Intel compiler, having used it for both of their benchmarks.
Sun Yat-Sen knows something about HPL, having won the award at least once at ASC competitions. At ASC14 they did a magnificent job of gaming the LINPACK benchmark and topped the field by a generous margin. They’re taking a ‘high risk, high reward’ approach to their benchmarks, riding the line by staying within 10-15 watts of the 3,000 watt cap. They don’t really have a system configuration that can challenge for a win in HPCG or the LINPACK award, but they’re putting their all into it.
Tsinghua University did their LINPACK run in one take, finding that they got everything they were going to get out of their ten-node, five-GPU configuration. They didn’t get over 10 TFlops, which looks to be the number to beat for the LINPACK award, but are happy with their result. They’re now on to MANSUM-WAM, working on the smaller data sets first, then the large ones – which is the right way to approach them.
Ural Federal University, or Team Russia, was in fine fettle – or at least middling fettle – on the first day of the ASC16 competition. They had just finished up their LINPACK run on their Phi-powered cluster. They had also finished HPCG, or at least the first run of it.
Team Taiwan, from NTHU in Taiwan, was still tuning HPL when we talked to them on the first day of the actual competition. Preliminary results looked good – just like in their lab back home. According to their spokesperson, both HPL and HPCG are very good results for the team. Their final configuration consisted of nine nodes with four NIVIDA K40 accelerators.
Next up, we’ll look at the winner of the highest LINPACK award and the team that’s going to take home the 10,000 RMB prize. Stay tuned…