The national science agency has developed a computer model that can help government agencies minimise the risk of releasing sensitive personal information in open datasets. The Personal Information Factor (PIF) uses a data analytics algorithm to identify the risk of de-identified personal information contained in a dataset being matched to its owner. CSIRO says this means that data and privacy experts who traditionally do the analysis can now rely on a computer model to validate their work.
Tracking COVID-19
An early version of the tool is currently being used by the NSW government to analyse datasets tracking the spread of COVID-19, and it’s also being used in areas like domestic violence data and public transport use. CSIRO has been working with the Cyber Security CRC to enhance the tool since 2020.
“Every day, it helps us analyse the security and privacy risks of releasing de-identified datasets of people infected with COVID-19 in NSW and the testing cases for COVID-19, allowing us to minimise the re-identification risk before releasing to the public,” the Chief NSW Data Scientist said.
Given the very strong community interest in growing COVID-19 cases, there is a need to release critical and timely information at a fine-grained level detailing when and where COVID-19 cases were identified. This also included information such as the likely cause of infection and, earlier in the pandemic, the age range of people confirmed to be infected.
The government wanted the data to be as detailed and granular as possible, but they also needed to protect the privacy and identity of the individuals associated with those datasets, he added.
Attack scenarios
The Project lead researcher and Senior Research Scientist at CSIRO’s Data61 stated that the PIF takes a tailored approach to each dataset by considering various ‘attack scenarios’ used to de-identify information.
“The tool then assigns a PIF score to each set,” she says. “If the PIF is higher than a desired threshold, the program makes recommendations on how to design a more secure and safe framework.”
The CCRC’s Research Director noted that PIF is unique because it provides a scale by which risk can be assessed. PIF is hugely valuable in achieving the ethical and responsible sharing of critical data, with this technology allowing data owners to fully assess the risks and residual impacts associated with data sharing, she said.
The PIF was developed by CSIRO’s digital unit Data61 in collaboration with the state and Commonwealth governments, the Australian Computer Society (ACS) and industry groups. It’s expected to be made available for wider public use by June 2022.
In its press release, CSIRO acknowledged and thanked the Government of New South Wales and the Government of Western Australia and the Australian Computer Society (ACS) for providing datasets needed to test PIF and supporting the research, along with their partners in advancing the Cyber Security Cooperative Research Centre.
About Data61
Coronavirus (COVID-19) has disrupted economies, markets, supply chains, industries, different parts of the workforce and many aspects of society.
Data61 is working with government and industry partners to help understand how the virus behaves to inform vaccine development, model scenarios to inform decision making, understand sentiment, safeguard privacy and security and automate for greater efficiencies.
These are their capabilities and some key projects.
Capabilities
- Broad and deep analytics capabilities (for example, analysing large unstructured data sets)
- Risk modelling (modelling millions of scenarios and make decisions based on risk)
- Social media analysis (NLP) and sentiment analysis (tracking the spread of virus, understanding community sentiment)
- Privacy-preserving technologies (enhancing data sharing between entities, such as different jurisdictions and between agencies)
- Data sharing (including blockchain, geospatial mapping)
- AI/Machine learning (including reviewing lung images to detect COVID-19 for asymptomatic people)
- Computer vision/AR/VR (3D heat-mapping of virus contamination on surfaces)