Getting to know your 2021 RSE Fellows — Peter Hill

‘Why do some researchers not make their software and data available?’

This week we’re introducing Peter Hill from the University of York. Read his blog post below to learn about his goals for his EPSRC RSE Fellowship.

Why do some researchers not make their software and data available? After all, one of the foundational principles of science is sharing results, and publishing papers has a very long tradition in academia, so why do the same scientists not also share code and data? And is there anything we can do as RSEs to improve the situation?

First things first: is this premise correct, or do most researchers in fact share their code? Wilhelm Hasselbring looks at
research areas of publications with associated GitHub repositories (and vice-versa), and we can see that the absolute number of software citations varies a lot across disciplines. We need to know some demographic information to say how the rates vary, but as a first order approximation, we can look at the HESA data for the UK. From this, we can see that there are about twice as many physicists in the UK as there are Earth, marine, and environmental scientists, and yet Earth Science is about twice as well represented in Hasselbring’s data. This is, at the very least, suggestive that
Earth scientists are much happier to cite and share their software than physicists are.

Let’s now look at why people share their software and data. A very informative look at this is by Victora Stodden. She breaks down the potential reasons into private incentive, communitarian ideals, or both (she also reviews the history of why people think people share results, which is enlightening!). By far the most important reasons people have are communitarian ideals: things like “encouraging scientific advancement”, “being a good community member”, and
“encouraging sharing and others to share with you”. These reasons for sharing were expressed by 80-90% of those surveyed, followed by more private incentives like “increase in publicity”, and “opportunity to get feedback on your work”. All this suggests that researchers do have a strong appetite for sharing software and data, but they often
feel there are barriers to doing so.

While there are some barriers that can be tricky to overcome, such as legal obstacles or copyright reasons, in fact the most common reason people give is “the time it takes to clean up and document for release”. This is important, because this is, in principle, easily overcome: just throw RSE resources at it! Ok, so RSEs don’t grow on trees, but funding bodies such as EPSRC are keen to see RSEs funded. And making sure your grant application is properly resourced (for example, by including RSEs to support any software development and release) is a good way to strengthen your application. RSEs can also have an important impact here through joining the peer review college. As
part of the grant review process, we can ensure that grant applications include sufficient support and resource for the software they use and develop, and that they include plans to share the software and data.

Over the next few years in my Fellowship, I will be working with a network of RSEs across the UK to try and tackle some of these issues in the plasma science community. One of my aims for this work is to focus on usability and sustainability improvements: features like testing and continuous integration, documentation, user interfaces —
things that tend to not be terribly interesting for most researchers but that are the bread and butter of RSEs. In my opinion, these are low-hanging fruit that can help a project feel a lot more polished and, hopefully, give the developers the confidence to make their code open and FAIR. My goal in my Fellowship is to do this work for as much plasma science software as I can get my hands on. All of this will be helping advocate the important work of RSEs directly to the people who stand to benefit the most — researchers.

About the author: cwyatt