2 mins read

Fujitsu Develops Novel Technology To Automatically Assess Personal Data Privacy Risks

Accelerating sharing of de-identified data in accordance with Japan’s Amended Act on the Protection of Personal Information

Fujitsu Laboratories Ltd. today announced the development of a unique new technology to automatically assess the privacy risk of personal data. Under Japan’s Fujitsu LogoAmended Act on the Protection of Personal Information, which goes into effect in 2017, it will become permissible to provide third parties with personal data that has been processed to prevent the identification of a specific individual, or “de-identified,” even without the individual’s consent. Before providing de-identified data, the provider must first ensure it complies with guidelines and evaluate the risk that specific individuals could be recognized, which in cases outside Japan has led to experts spending many days in investigation.

This is why Fujitsu Laboratories developed novel technology to automatically evaluate the risk that an individual could be recognized from personal data. This technology enables data to be quickly and safely shared across multiple organizations, and can be expected to lead to improvements in the quality of products and services in a variety of fields, as well as to the resolution of social problems through co-creation between different industries.

Details of this technology have been announced at the Information Processing Society of Japan’s Computer Security Group (CSEC) meeting, held July 14-15 in Yamaguchi Prefecture.

Development Background

Under Japan’s Amended Act on the Protection of Personal Information, which goes into effect in 2017, it will become permissible to provide third parties with personal data that has been de-identified, even without the individual’s consent. This makes it possible to safely utilize data in different organizations, and is expected to lead to quality improvements in new products and services, with inter-organizational connections resolving societal problems and kick starting co-creation. There are a variety of methods for de-identification, which must be differentiated depending on the field and various guidelines.

For example, it is conceivable that once guidelines, based on the Amended Act on the Protection of Personal Information, are established for the healthcare field, data, such as examination results held by healthcare institutions, will be de-identified and used by research institutions or pharmaceutical companies (Figure 1). For this reason, Fujitsu Laboratories developed de-identification technology focused on “k-anonymization,” which is a technology to process information so that a minimum of k people possess the same attribute. Fujitsu Laboratories has been moving forward on research to apply the technology to healthcare and other fields.

Issues

Providers of personal data must be prepared for the risks associated with de-identification processing, such as checking whether or not they have met the guidelines for each industry, or if privacy could be violated from the de-identified data. It is not easy, however, for data providers to evaluate the risk that an individual could be identified from de-identified data and to take countermeasures, so evaluation and confirmation were previously left to experts, and the time required became an issue. There are reports, for example, of cases where healthcare institutions outside Japan de-identified data they held for use in medical research, and the process took more than half a year.

For these reasons, Fujitsu Laboratories decided that, in order to quickly evaluate the risk that an individual might become known from de-identified data and take countermeasures, it is important to analyze the attributes that make it easiest for an individual to be identified, and then apply appropriate de-identification methods. Because it is possible, however, to identify an individual from the combination of multiple attributes (such as gender, telephone number, address) (Figure 2), the calculation of the combinations of attributes that make it easiest to identify individuals became so large that searching in a realistic amount of time became difficult.