Through the miracle of video, we now have the chance to meet each of the ASC19 university teams up close and personal. This is a great way to get a feel for these amazing students and see how driven they are to succeed in this competition.
Team Warsaw has quickly become one of the more experienced teams in these international competitions. In the video, we meet the team, talk about some of the problems they’re having on set up day. My thoughtful suggestion to the team concerning their problems? “Fix them!” I also tell them the story about the 2018 ASC “Ice Station Nanchang” incident.
Team UESTC is sporting the longest university name in the competition this year and maybe in any competition ever. In the video, we talk about how the team is getting along and how they’ve divided up the work. According to the team, they’re all getting along fine– which is what we like to hear. HPC and AI is a very new topic for these students, so they’re understandably going through some deep learning of their own. We complain about CESM and its archaic code. I try to give them a bit of a pep talk, but since it’s early and I hadn’t built up my caffeine load, it wasn’t a great talk.
Team Tsinghua is their usual quietly confident selves in this video. However, Student Cluster Competition aficionados will notice perhaps a bit of tension when I ask them if upholding the dominant legacy of Tsinghua has become a bit of a burden. This school takes these competitions very seriously and they’re here to add another trophy to their crowded trophy shelf at school.
The Chinese University of Hong Kong is another newbie to big league student cluster play. In the video, the team talks about some of the problems they’ve had in getting their cluster stood up. At this point in time, the system is running well, they have their Mellanox InfiniBand working, and all seems good. We start to hear more talk about just how difficult CESM is to optimize – or just to compile. The team coaches also give us their perspective on the team and the competition.
Team Taiyuan is next up. With the help of our trusty interpreter Steve, we talk about their node count (which is high). The team expects to run eight nodes in the competition along with the four GPUs provided by Inspur. From the translation, it seems like the team is referring to themselves as “Team Comeback” which could be a reference to their desire to join the company of the elite teams. There is also some discussion of how the team would love to get some more GPUs and their hopes to get some more from their fellow competitors. Good luck with that.
Team Tartu is the pride of Estonia and are participating in their third cluster competition. When we catch up to the team on set up day, it seems like everything is going well. In the video, we discuss the difficulty of the applications. I learn that the AI application – the one where they need to decode blurry images into crisp clear pictures – isn’t all that hard. Like most teams, they are having a hard time getting the genomic application, wtdbg2, to scale to more than one node. One quick note that we didn’t discuss in the video is that the Tartu team advisor was the Tartu team member at ISC15 – also known as “Sunday, Bloody Sunday” due to his slip up with a razor knife when he was modifying power supply cables. More on this later.
Team Sungkyunkwan (or Team Korea for short) is from one of the most prestigious universities in the world. Although there’s a big language barrier in our short video interview, it’s easy to see that this is a smart team and highly motivated. Take a smart team, give them eight nodes and 10 GPUs, and you have a contender. Let’s see what they show us.
Team Sun Yat-Sen mostly spoke through our substitute interpreter Steve, who did yeoman’s work while our #1 interpreter Jenson was in class. When we catch up to Sun Yat-Sen, the cluster was up and running well, with the team concentrating on power control. On the translation front, we get an assist from Inspur PR maven Tracy Wang, which was welcome.
Team SUSTech (aka Southern University of Science & Technology) is having a bit of problem getting their cluster standing tall. Some nodes are working, others are not, which is sort of typical in this stage of the set up. What’s interesting is that they’re having some software license problems, which I assume means transferring their licenses to these new Inspur systems. For a new team, SUSTech is packing some serious hardware accelerators – 12 in all. But in order to take advantage, they need to get all the nodes working together. We also take a look at a lavishly decorated laptop which, according to its owner, will make all the difference in the world for the team.
Team Shanxi is another newcomer to these competitions. Our pal Steve relays the questions for us , which is quite helpful. When we catch up to the team, they’re still trying to get their cluster working harmoniously, which is always tough for a new team, but particularly so for Team Shanxi because this is the biggest cluster they’ve ever seen. One of the students is sporting a highly respectable MSI gaming laptop, which I give him props for.
Team Shanghai is in good shape. They know what they’re doing and they’re doing it well. While the other teams are mostly just getting set up, Team Shanghai has already tuned LINPACK and is working on HPCG. The team has a unique configuration this year. While all of the other teams are going with Mellanox FDR InfiniBand interconnects, Team Shanghai is utilizing Intel’s OmniPath interconnect with two cards per node. They believe this will give them an advantage when it comes to the HPC applications they’ll be running. Side note: the coach of Team Shanghai was a four-year member of the same team in past competitions. He’s a relentless competitor and never gives up. I think he’ll infuse his team with the same spirit.
Team Peking was a fun interview. They’re obviously very intelligent, since they didn’t rise to my attempts to bait them (such as “Are your folks as smart as you think you are?”). The team is having a few problems with power control, which is typical at this stage of the competition. The team was planning to use 16 GPUs, but only came up with 12, which should be enough. I tried to stir the pot with the team, suggesting that they might have some internal dissention or outright fighting among team members. But no dice, no story here.
Team NTHU is their usual modest selves in this video interview. When we find them, they’re already testing applications. The team does have one potential problem: one of the disks they’re using has failed a couple of times during testing. If it crashes during the competition, it could be a costly failure. They’re still working on getting wtdb2 to scale on more than a single node. One team member manages to sleep during our entire interview – I think he was up all night working on his part of the applications.
Team Jinan is competing in their first Student Cluster Competition and they’re know they’re in for a fight. The major focus of their computer science is IoT and using HPC, not designing systems, administrating systems, or performance tuning. It’s a tough job learning all of these new skills in the space of several months. In the video, we talk with the team advisor and meet the members of the team. We also learn that the team has some innovative ideas when it comes to what is being called the “Face SR” applications. We’ll have to wait to see how they did on the application to learn more.
Team Huazhong is preparing for their HPL
, HPCG, and CESM runs when we catch up to them. They’ve gotten most of the bugs out of their cluster and associated software and have moved on to power control – a critical aspect of the competition. In the video, we meet the team captain and the rest of the team. Along the way we discuss some of the challenges inherent in the facial super resolution (Face SR) problem. It turns out that the old training set isn’t the same as the new training set. We’ll discuss more about this later.
Team Fuzhou is driving a eight node system during testing, but is figuring that they’ll probably slim that config down to six nodes. Each node will have three NVIDIA V100 GPUs to give them a big kick in numerical processing. The team is fighting some power control problems at the time of this interview. Some of the applications are bursty when it comes to consuming power. We ask the team how they’re doing on WTDBG2, at this point in the competition, most teams have gotten it to run on one node – but are having problems scaling it up to the rest of their cluster. Team Fuzhou is also in this boat, but they think they’re figure it out soon.
Team FAU is well ahead of where they were in their last ASC competition back in 2018. Last year, they had some problems getting their InfiniBand network up and running, which severely limited their progress and results. This year? Different story. Their network is up and running like a champ. As we interview the team, you can hear the screeching yowl of other clusters coming up to speed. However, the team is not without problems. They need to download some libraries and are having a tough time getting them off the internet. Team FAU is driving only four GPUs, rather than their typical 12 or more. They didn’t bring them on this trip due to well-founded customs concerns. I know that I’d have some questions for five students carrying more than $180,000 worth of high tech in their carry on bags.
Team EAFIT is a great interview. We had a lot of fun just discussing their names, for instance, and with me trying to probe for team dissention. I’m happy to report that they’re a pretty tight bunch. The team is still trying to get their interconnect up and running at the time of the interview. When I asked, “why five nodes” I heard the answer “why not?” from the team – which cracked me up. We then discussed if they’re planning to use prime numbers when adding components. I give the team a short pep talk and we part the best of friends.
Team Dalian is the home team at ASC19, which is a special kind of pressure in a Student Cluster Competition. Although the team has competed twice before at ASC, they still have to be feeling the strain of heightened expectations. Aided by our interpreter Steve, the team gives us a brief update on their progress so far. At this point, their four node, eight GPU, cluster is working well and the team is working on optimizing their applications. Communication problems ensue when trying to figure out exactly what they’re running, but we finally get it right. They are having some early problems with PyTorch, but have plenty of time to resolve the issue or issues.
Team Beihang is competing in their sixth Student Cluster Competition and looks to be doing pretty well. We start the conversation with a brief observation about one of the team members using a ultra-small laptop. Continuing on, we find that the team is still working to get WDTBG2 to scale across their three-node cluster. We also find that the team is driving a total of sixteen GPUs, which, along with their small node count, gives them an excellent chance to win the LINPACK competition. We’ll see what happens.
Now that we’ve introduced the teams, examined their configurations, and completed our team interviews, it’s time to talk results. In our next stories, we’ll be looking at how the competition turned out. Stay tuned…
Posted In: Latest News, ASC 2019 Dalian