HPC Birds of a Feather RSECon 2023

Introduction

After the successful HPC and RSE workshop “Excalibur RSEs meet HPC Champions” which ran as a satellite event on the Friday after RSECon22 in Newcastle, the organisers decided to run a similar event this year. The idea was to integrate a 1-2h Birds-of-a-Feather (BoF) session into RSECon23 in Swansea, rather than having a satellite event. However, as it did not fit into the given submission formats of the conference, we talked to the conference committee and they generously offered us a half-day slot on the workshop day of RSECon. This was to be run as a trial session to potentially offer BoFs as submission format for next year’s conference. 

BoFs are conference sessions that bring people with a common interest together (after the saying “birds of a feather flock together”) for discussions and networking. Depending on the conference there might be an agenda or not. If there is an agenda, it can involve short talks or lightning talks, panel discussions and/or discussion groups. Often one of the aims is to talk about what the community in this field needs and what future plans could be.

Our HPC RSE BoF at RSECon23 had short talks and discussion in three topics: Technology, Training and Community. We planned it as a hybrid event with joined talks, and in-person as well as online discussion groups. On the day, we had about 60 people in the room and 10 online. It was a mix of talks, ranging from technologies and updates from sides, a general overview of reproducible software practices, to various training and community initiatives. The slides from the presentations at the BoF are publicly available on GitHub Pages.

The discussion sessions in the room ran well, although dominated by the more senior participants. The online discussion did not work well as people did not engage due to various reasons, so we decided after a while to join the in-person discussion and let online participants ask their questions through the chat.  

The general feedback was very positive, and we hope to run a similar session next year. There are a couple of lessons learned, though. For example, we got the feedback that the community would have valued a more open call for talks, rather than invited talks, and smaller discussion groups to lower the hurdle for early career people to take part in the discussions. For that, a different room arrangement (rather than one big lecture hall) would work better. Given that our initial plan had been for a significantly shorter session, in future we hope to better use the time we have, and to have a larger variety of formats in the session. We also hope to increase the size of the organising committee, which will help to make sure that we cover the interests of more of the HPC RSE community, and bring new ideas into the session planning.

Another important lesson learned was that we need to make sure to disseminate knowledge about community spaces and networks for HPC RSEs. A lot of the junior and not-so-junior-actually people did not know of networks which others took for granted. Regular BoF sessions might be one step forwards, but each of us should make sure to let others know about resources and spaces such as 

Topic Discussions

Our goal for the first session of the BoF was to lead a discussion around emerging HPC technologies, with a particular focus on exascale computing and projects funded by the UK’s ExCALIBUR research programme. This sessions began with a presentation on the UK’s exascale programme (presented by Andy Turner standing in for Mark Parsons). This was followed by several lightning talks, covering the Met Office’s next generation workflow engine (Cylc 8), engineering reproducible HPC code and an update from Cambridge’s Open Zettascale lab.

The discussion kicked off with a discussion about data transfer for exascale systems and about whether this problem was being given as much attention as the race to break the exaflop barrier (in the UK). There were several points made across the room about new investments in this technology as part of the ExCALIBUR programme, along with the suitability of existing technologies for problems of this scale. The discussion of this issue extended to cover software solutions related to data storage and management, highlighting the importance for software to keep up with the capability provided by any new hardware solutions. The prospect of reducing data on the fly was also discussed, as is the practice for data coming from the LHC or large telescopes and interferometers.

The impact of industry access to supercomputers was discussed, particularly form a data security standpoint, particularly where TREs (Trusted Research Environments) are concerned. The requirements of TREs are very difficult to achieve in a traditional HPC environment, so this is a problem that still needs to be explored further as these use-cases become more common, and certainly is something that will be interesting to hear about at a future BoF session.

Future usage of MPI was another topic of discussion: as a ubiquitous solution for large-scale parallelism it will certainly be used more in the future, but its level of abstraction is difficult for new users to acclimate to. Natively parallel programming languages (i.e. Co-array Fortran, Julia , etc.) are an interesting proposition, but it was agreed that MPI is here to stay.

The discussion turned to different choices of machine architectures for the next generation of the UK’s regional HPC resource. Many more machines are moving towards GPUs and ARM, and a question around whether different combinations of these chips would make portability an even more difficult proposition. A point was made that the community should strive to use as many openly available programming standards as possible (i.e. OpenMP, OpenACC rather than CUDA for accelerated code) to not compound this issue.

 The final discussion before the break was around bit-reproducibility, but it was largely decided that in the majority of cases it is not particularly useful (an exception being for the Met Office’s next-generation weather climate model, where the science is already well established).

After a short break, the group’s discussion turned to training and the wider HPC community in the UK and further afield. Once again we had three excellent lightning talks, covering the UniverseHPC initiative for education in HPC, the Durham performance analysis workshop and an overview of the previous day’s Fortran carpentries panel.

The initial part of the discussion covered the increased use of HPC resources by researchers in Medical Science and about specific training methods for this group. This was largely seen as a great problem to have, as it means a lot more users and researchers joining the HPC community. The need for the HPC community to become more comfortable with tools such as SnakeMake and NextFLOW was highlighted, as well as improving the accessibility of HPC systems by incorporating notebook style interfaces. The presence of the Archer2 ‘driving-test’ was also noted as a way to ensure competence before new users were let loose on the system.

The role of the HPC community in training new HPC users and RSEs was widely discussed. Many people at the discussion were unfamiliar with existing HPC community initiatives, and ways to make sure that these communities were more widely advertised were discussed. The idea of a “HPC community welcome” for researchers and RSEs getting started in the space was discussed. The HPC carpentries and Archer2 training program were represented at the discussion, each boasting a wide user base and plenty of engagement. The discussion finally turned to career progression, with a range of different experiences being represented. This topic felt like part of a wider issue under discussion within the RSE community, but with the introduction of new systems and ExCALIBUR-based projects there was a hope that there would be plenty of job opportunities in the HPC space in the near future.

Where next for the community?

As mentioned in the introduction, one issue that arose in the BoF discussions was understanding how the UK HPC community comes together and interacts with the worldwide HPC community. It was clear that, particularly for new people coming into HPC, it is not obvious what community groups and resources exist and how they overlap and relate to each other.

For example, there are many conversations going on in both the UK HPC-SIG (https://hpc-sig.org.uk) and in the HPC and Cluster Computing channels in the RSE Slack (https://ukrse.slack.com) which have a lot of overlap in both topics and people involved. However, there are also people present in the HPC-SIG discussions that are not present in the RSE Slack discussions and vice versa. When events are organised or discussions happen that might have cross cutting interest across the different community channels it is often not systematic how they are disseminated across the multiple communities leading to avoidable exclusion of groups of people. We provide a list at the bottom of this blog post of the communities and initiatives that were mentioned during the BoF session along with a list of events with meetups of the RSE HPC community that were mentioned.

What can we do, as a community, to make it easier for people to access the information, access peer support and make the connections they want when they start work in the HPC area in the UK? Although this type of resource would benefit everyone in the community, it is particularly important for embedded RSEs who are less likely to be part of a larger RSE group within their organisation. For embedded RSEs, they need to be able to access community knowledge and support directly from their peers as they often  do not have the same local RSE community to support them. Some suggestions for how to address this that came out of this BoF session were:

1. An open source UK HPC community cookbook with links to useful resources and information maintained by the community. Some ideas for initial sections of the cookbook could be:

  • Communities
    • HPC communities
    • Communication channels
  • General learning resources and podcasts
  • HPC skills
    • Parallel programming
    • Testing, profiling and benchmarking
  • HPC technologies
    • Schedulers
    • Processors
    • Memory
    • I/O and file systems
    • Interconnects

2. Regular online community meetups to discuss topics of interest that complement the in-person events that already happen. Would these be monthly or quarterly?

3. More systematic notification across different channels of events and opportunities – the cookbook described above may help with this.

Having a single location people can go to or be pointed at for information that is owned by the community seems the obvious first step; so, creating and promoting the cookbook should be the initial priority for the community. We will be taking this forward over the next few months and will provide an update once the initial version is available. We will try to make sure to not create yet another place that needs to be found, though, but evaluate whether and how we can integrate it with existing resources.

Other community aspects that were discussed at the session included career progression and mentoring. Both of these aspects of RSE careers are common to many RSEs, not just those working in HPC and have been long standing issues for the community. Indeed, you could argue that these issues are the core reason for the creation of the RSE movement in the first place! It is difficult to capture all the myriad problems and potential solutions in this area in this blog but there are plenty of resources to explore on the Society of Research Software Engineering website:

Final Words

The HPC BoF session at RSECon23 was very successful and was well received by the community. As this was the first time this type of session had been held at RSECon there was a lot of constructive feedback that we hope to address when (fingers crossed!) a similar session is run at RSECon24 in Newcastle. However, there is a lot we can be doing as a community to bring RSEs with HPC interest together and better support each other before RSECon24. 

The BoF organising team is planning an additional event, probably fully online, in spring 2024. We want to continue the discussions, keep the community momentum going, and gather input for the BoF application for RSECon24. Hopefully we will be able to describe some of the community successes and improvements in Newcastle in September 2024. Please contact us if you are interested in contributing and becoming part of the HPC-RSE BoF organising team!

Useful Links

Existing HPC communities

Existing HPC meetups

About the author: Peter Schmidt