February 11, 2013
Anonymizing medical data
Just a few thoughts copped from a recent email exchange on the Research Dataman list:
It's one thing to be aware of the risks - it's another to decide how to
manage them. Refusing to disclose *any* data except under very carefully
controlled circumstances is one approach, and it's probably valid for data
where the reuse potential is likely to be limited to a few instances at most.
For data with greater reuse potential techniques adopted for some government
datasets may be appropriate. These include perturbation of some of the numbers
or suppression of some numbers in cases that might lead to disclosure even in
aggregated data. Both need expert statistical advice to ensure that the
resultant data can still be used to do something useful but isn't disclosive.
Examples of perturbation include varying a subject's age by a few years in
either direction. An example of suppression I am aware of comes from the Schools
Census - in any school where the number of pupils receiving free school meals
is below 5, the exact total is redacted from the published data.
Ultimately the only way to prevent identification of individuals by combining datasets (i.e. which include sufficiently sensitive data items to permit identification but not actual confidential=identifiable data) is through the Data Sharing or Re-Use Agreements between data controllers and data processors.
Websites that were mentioned:
Anonymisation of data from UK Data Archive
Posted by kkwaiser at February 11, 2013 10:48 AM