You know the saying “better late than never”? That’s how my cluster competition coverage is faring this year. With SC19 coming late in November, quickly followed by my annual trip to South Africa to cover their cluster competition, I’ve been running behind. But I’m back and I’m going to provide all of the deep analysis and competition coverage that you’ve all become accustomed to over the years.
Now let’s take an up close and personal look at our SC19 teams. Using the miracle of video, we’ve interviewed as many teams as we could given the accessibility constraints. We apologize to the teams that we couldn’t get to, but we were under the gun to get as many teams as we could during our limited access time. We managed to snare 12 out of 16, which isn’t too bad, I guess, but far from our usual 100% coverage, damn it.
Team Washington: Representing the great Pacific Northwest, we have Team Washington or Team Husky or Team Udub. This team is driving a slim configuration with two nodes, but they’re also packing eight NVIDIA V100 GPUs, so they have plenty of processing power. This is a team that can adapt on the fly, for example: For some reason, teams have to have official data center racks for their cluster or else they’re disqualified. Back in the day, before we had all of these nitpicky rules, you used to be able to use about anything to hold your cluster. But today, you have to have an expensive rack to house your couple of nodes.
Anyway, the Udub students weren’t provided a rack by their sponsor and thus had to scramble to find one by Monday morning at 9:30 am or else face expulsion. They combed Craigslist and Facebook Marketplace and came up with a $100 42U rack. But it was in Boulder, not Dener. So they had to rent a truck, head to Boulder to pick it up, return the truck, and get it all set up by early Monday morning. Nice work, guys, great job.
Watch the video to see and hear more about the Washington team, both me and my cluster competition color commentator Jessi Lanum were highly impressed by this first-time team. Let’s see how they do.
Team Warsaw: Jessi and I interview Team Warsaw to see how this now veteran team are handling the pressure of the SC19 cluster competition. The students from Warsaw have one of their best configurations with five nodes, eight GPUs, and a beefy Mellanox EDR interconnect. The team this year is very solid and experienced, with great skills. Could this be the year that Team Warsaw breaks out of the pack?
It’s also a closely-knit team. When we were interviewing them, one of their team members was off sleeping, so they showed her picture to the camera just to make sure that she was included in the video.
Wake Forest: When Jessi and I check in on them, Wake Forest seems to be happy with their performance so far in the competition. They’ve established a good division of labor and are using their machine well. We run into an anomaly in the team, a finance major! Well, a finance and computer science major, but it’s the first one we’ve run into in ten years of covering competitions.
On the reproducibility challenge, the Daemon Deacons found that the paper is valid. One of the students on this app is like the most chilled out competitor we’ve seen. Kicked back, easy going, relaxed, he’s the picture of happiness, which is nice to see. Check out the video to check him out.
One of the team’s network cards went out, which is unfortunate. Under the rules, the team can’t do a restart without taking a penalty, which, to me, is sort of unfair when it’s a hardware problem that is clearly outside of student control. But rules are rules, right?
University of Illinois Urbana-Champaign: Team UIUC is doing well when we catch up with them, with some caveats. They’re driving an older cluster that seems like it’s become a bit crotchety in its old age. As the team captain said to us, if they’re not on top of it all the time, it tends to get out of hand and overheat. To me, this sounds a bit like a nuclear pile back in the old days.
The team has two NVMe drives on each of their four nodes, plus a grand total of eight NVIDIA V100 GPUs. They’re also using IBM’s Spectrum Scale (formerly GPFS) file system and tossed out some love to IBM by mentioning it.
Check out the video to get details on their various challenges and how they got over them.
UIUC had a $700 Azure Cloud budget that they managed to blow through pretty quickly. When we talked to them, they only had $6 left in their budget. Jessi and I offered to toss in $10 each to help them get a little breathing room, but that’s against the rules. Plus, I didn’t have the sawbuck on me anyway, so it all worked out well.
Team Tennessee: This team is an amalgamation of students from University of Tennessee, Maryville College and Phellissippi State Community College. These are all first time participants, so they have their work cut out for them. I give them a bit of grief over the unsuccessful Tennessee Volunteer football team, which was kind of fun.
While we’re interviewing the team, both Shanghai teams went over the power limit, causing sirens and lights to go off, which was also fun.
The team is realistic about their chances to take home the Championship Trophy (unfortunately, there is no real trophy). While they’re doing well, they know that it’s an uphill climb and that the most important thing about the competition is how much they’re learning. They hope to come back in subsequent years and mount another quest for cluster competition glory.
ETH Zurich: This is the second outing for the Swiss team. Backed by the CSCS, this is a team that has proven they can compete with the top-tier competitors. How? In their first competition, they took home third place and the Highest LINPACK award at ISC19 – which is almost an unprecedented level of success for first timers. We hadn’t seen that kind of debut since the South African CHPC won the whole ISC shooting match in their first year back at ISC13.
The team is making good progress with the applications, with no apparent problems, when we find them on the competition floor. The stupid video is in and out of focus as the camera struggles to figure out where to focus.
During our conversation we discuss the differences between the ISC and SC competition. More rules at SC, plus plenty of sleep deprivation, which is a marked difference from ISC. One of the team members said that the SC competition was “more competitive” than the ISC competition, begging the question (which I asked) “how can you say it’s more competitive when you didn’t actually win the ISC19 competition?” Mean question? Yeah, it was, but I hadn’t slept much either.
The team had a bit of a letdown on their LINPACK score, which was slightly lower than their championship LINPACK at ISC, but there’s a good explanation for the discrepancy, check out the video for the details.
ShanghaiTech: This is the third competition for a new university, ShanghaiTech. They were a powerful new competitor at ASC18, finishing in second place and punching their ticket for ISC18. They had a bit of a sophomore slump at ISC18, doing well, but not taking home any major prizes, although they were first in HPCG.
The first team member we interviewed talked about his past experience in FPGA design and claimed that his youth (he’s the youngest on the team) gives him an edge in productivity and creativity. The team has a solid complement of skills, ranging from traditional HPC drivers to computer architecture and AI specialists.
ShanghaiTech is pushing a large-ish cluster with six nodes and a whopping 16 NVIDIA V100 GPUs. That’s a whole hell of a lot of computing power, but it requires rigorous control and power throttling in order to keep it within the 3,000 watt limit. Can ShanghaiTech control this beast and get the most out of it? We’ll find out.
Purdue: As an institution, Purdue has sponsored 14 cluster teams in worldwide major competitions. While they haven’t come home with any trophies, they’ve gained a lot of knowledge and have even built a curriculum around the events – which is a very good thing.
They’re running a system with very sporty AMD 32-core Rome processors arranged in five single-node systems. Unfortunately
, their motherboards don’t support GPUs, which is a huge disadvantage in modern cluster competitions. It was unclear whether or not this configuration was intentional, thinking it could win, or if it was a technical oversight. But either way, they’re trying their best and giving it the old Purdue try – which is what you do when you’re in a cluster competition.
Team NTHU: This is another team that has been around the block in Student Cluster Competitions, logging an astounding 17 major events over the last 12 years. They’ve amassed an enviable record of Gold Medals and LINPACK Awards, with their most recent win coming at ASC19 in Dalian, China.
They’re in a bit of trouble when we catch up to them. They have a GPU down and they can’t fix it due to cabling problems. They do have seven other GPUs, but that might not be enough for to get them over the hump.
Nanyang Technological University: Team Nanyang, the pride of Singapore, has become a top echelon team over the past few years and is always a threat to walk away with multiple trophies. They’re a pioneer in the “small is beautiful” cluster movement and are at it again with a two node, 16 GPU system. As we heard in the interview, the team has notched another LINPACK award. We’ll have details on that in our next story.
As we meet with the team, they’re coming down to the wire on turning in applications – but, as we note in the video, Nanyang has never not turned in a result, which is an incredible feat in modern competition history.
Team FAU: This German team has had a long and storied history. They’ve won two LINPACK awards along with a Bronze Medal in their seven-year history. This year, they’re driving a NEC Aurora vector machine, which is a whole different deal for the team, who are used to driving more conventional clusters.
One of the problems they have is that their vector engines broke down during the benchmarking phase of the competition. They had to pull them from the cluster, which means they can only run on CPU power. This won’t give them enough processing power to compete with the other teams, unfortunately. But the plucky Germans are continuing to push and will certainly finish the competition and never give up. There just isn’t any quit in this team.
Shanghai Jiao Tong: This is one of my favorite teams. Their coach was a long-time competitor for the school, and I must have interviewed him ten times over the years at multiple venues. He’s a hard charger, highly competitive, but more interested in what his team can take from the competition knowledge wise than taking home trophies.
Jessi and I catch up with Shanghai Jiao Tong and ask them about their competition so far. While Shanghai has had some hardware problems in the past, everything is running at 100% today. The team is driving one of the larger clusters in the competition with six nodes , eight V100 GPUs and some of the fastest CPUs in the competition at 2.6 GHz. To me, this team has been poised on the edge of moving into the top tier of cluster competition teams but hasn’t quite gotten into the groove yet. This could be their year.
Next up, we’re going to take an in depth look at the LINPACK and HPCG results, then reveal the detailed overall scoring. Following that, we’ll provide our patent pending “Power Ranking Analysis” which shows which teams are getting the most performance out of their systems. Stay tuned….
Posted In: SC 2019 Denver, Latest News