Luc Betbeder-Matibet, Director Research Technology Services at University of New South Wales (UNSW), Australia talks about High Performance Computing resources and a facility for 3D visualisation of large data sets.
Can you tell us about your role?
I am Director of Research Technology Services at UNSW. It is a newly established role, established this year, and responsible for High Performance Computing (HPC), Research Data Storage and building up the support services in these areas for the university.
Can you tell us more about your work in HPC?
We are developing a strategy to better leverage the national infrastructure, as well as, where appropriate, continue to do our own things. At the moment universities in Australia and internationally are still purchasing their own HPC systems. We are not sure if that’s what we should do. We are taking the opportunity this year to pause and think about it.
So, we are procuring some additional capacity from the national infrastructure during this quarter and into next year. To put this in context, we already consume about 50 million compute hours through the various components of the national HPC infrastructure (eg: NCI and Pawsey). And we have about 45 Million computer hours of shared HPC capacity on-premise at UNSW but much of this is now ageing.
The idea would be to move the overall university capacity up beyond this level of 100 million hours. To do that, we are going to have to source more capacity from the market, both in traditional HPC and cloud capabilities.
For traditional HPC, we are initially looking to the National Computer Infrastructure (NCI) Facility in Canberra. One of the drivers here is that a lot of the data sets that our Researcher work with are already in Canberra. We see in this an opportunity for the research communities to more easily share and access larger data sets and to work with new data sets that they haven’t had access to. There is also the economic benefit of leveraging the significant investments made by the Federal Government in these large computational and data storage systems.
Who maintains the national infrastructure?
NCI is supported through the Australian Government’s National Collaborative Research Infrastructure Strategy, with operational funding provided through a formal collaboration arrangement. This partnership includes The Australian National University, Geoscience Australia the Australian Research Council, and a number of research intensive universities and medical research institutes including UNSW.
In July this year, the federal government invested AU$ 7 million into NCI through the (NCRIS) Agility Fund. This has been matched dollar-for-dollar by the NCI Collaborating partners to enable NCI to acquire new capacity. We are going to both support this and take advantage of some of that growth. This is a good example of federal funding attracting institutional funding and where co-investments lead to high quality science being carried out.
At the moment, do most of the Australian universities have their own infrastructure?
A lot of the Australian universities will have a blend of smaller but significant, what we might call shoulder, or shared institutional HPC systems. Even then however, the larger researcher groups with more significant requirements will avail themselves of the peak nationally-supported infrastructure through the National Computational Merit Allocation Scheme. It is these researchers, with the big multi-core jobs, especially within materials sciences, mathematics, climate and genomics that have really been the regular users of the peak facilities. The shared university systems, by comparison, have been useful training and prototyping facilities.
We have certainly seen that with our two local systems on campus which have provided excellent support and capacity within the Science and Engineering Faculties over many years. We are wondering however if it is possible to now do more of these kinds of early phase and training activities through the national facilities, instead of replacing the campus systems.
Are you working on anything specifically related to data-sharing?
Data-moving and re-use is particularly important to us because we are at a distance from Canberra and because moving data, even within a campus, is a pain. Part of our strategy is to where possible minimise data moves, we want to make it fast and easy to move the data down to Canberra but also provide them with storage down there so that they don’t have to bring it back and forth all the time.
The idea would be to keep working with NCI, to add to the existing national data collections so that the painful data moves are minimised and exiting new data mashups are made possible. Just like a network gains value as you add nodes I think you add value to a computational environment by adding high quality data sets.
Can you tell us about other projects you are working on?
At the moment, we are looking at commissioning a large data visualisation facility at UNSW. This facilty, called EPICentre (Expanded Perception and Interaction Centre), will have one of the highest resolution 3D displays in the world right now. This is a sevenmeter diameter data cylinder you can stand in the middle of, with 56 panels (14X4), 340-degree panoramic projection and 33 speakers in a surround audio system. A screen resolution of 26,880 x 4,320 so about 120 Megapixels.
The EPICentre is going to be initially targeted for medical imaging. It takes imagery straight off our next generation, high-resolution imaging tools, like light sheet microscopes and moves them from the raw data sets through a pipeline to visualisation. We are really interested in bringing in multidisciplinary teams or people from inside and outside the university to interact with data and to ask different questions of the data. That’s going to create opportunities for collaboration. This is why, from the start we are linking this into the Faculty of Art and Design and seeing this as a way to have new conversations around data.
With high resolution imaging systems in the Medical and Materials Sciences you often don’t have an easy way through current desktop displays to show all of the data you are generating. You need very specific environments to show lots of data at ultra-high resolutions and then to provide ways to interact and ask new questions of these data sets. That’s a problem we haven’t had to deal with before. How do we visualise these really big data sets. It is at the edge of data visualisation research.
Have you faced any conflict between the new technologies and legacy architecture?
Yes, we have. In research computing for example, you are often working with 20-year-old code. The 20-year old code might be perfectly reasonable but does your 20-year old code take advantage of say, the multiple processor or GPU capabilities of our new environments.
So, we have these inefficiencies in how we carry out some of our research. Our approach here is to invest into the people and code side of things, to get generational code improvements and not just generational improvements in the hardware side. It’s lot better for us if we can spend some months improving somebody’s code, so that they can be more productive, instead of trapping them for another year in using old code. It is also more efficient for the machines and we get better utilisation of the shared systems.
And with cloud style architectures we have other opportunities. For many researchers being in a queue is a barrier to adoption. For them, having rapid access to an on-demand HPC-like service is the goal. If they need access to 1000 cores for an hour right now that’s what we need to provide.
What are the outcomes you are looking to achieve during the next 2-3 years?
One objective is to simply expand the HPC capacity of the university. But we don’t just want to grow capacity in existing areas of maturity, we also want to making HPC available to more disciplines and grow the overall base of users. And we also want to be able to share some of our best practices, we have some really mature toolkits and approaches that work in some disciplines which have been developed by our researchers. By using a facility and communities of practice support model we hope to be able to share these lessons more broadly.