
The grand scheme of things
Read Time: 6 mins
Written By:
Felicia Riney, D.B.A.
How should fraud examiners and legal professionals meet compliance standards but also keep individuals’ and organizations’ data private? “Differential privacy” is a system of cybersecurity that proponents claim can protect data far better than traditional sanitizing or anonymizing methods.
Over the past two years as a columnist for Fraud Magazine, it’s been a pleasure introducing new ideas, innovations and technologies to you — my colleagues. That’s why I’m excited to devote this edition of “Innovation Update” to the concept of “differential privacy.”
Differential privacy can securely limit algorithms so organizations can securely share private, sensitive data internally or among third parties. The concept isn’t new. Mathematicians, cryptographers and academics have been discussing it for more than a decade. However, companies are now commercializing it for global fraud examinations and proactive compliance monitoring.
In September 2019, Google released the open-source version of the differential privacy library it uses in some of its products, such as Maps, according to Emil Protalinski, the author of Google open-sources its differential privacy library, VentureBeat, Sept. 5, 2019.
“Differential privacy limits the algorithms used to publish aggregate information about a statistical database,” Protalinski writes. “Whether you are a city planner, small business owner or software developer, chances are you want to gain insights from the data of your citizens, customers or users. But you don’t want to lose their trust in the process. Differentially private data analysis enables organizations to learn from the majority of their data without allowing any single individual’s data to be distinguished or re-identified.”
Here’s a business case example to help clarify a challenge that differential privacy might meet. A company is managing a database containing sensitive personally identifiable information (PII), such as customer credit cards, demographic and personal health information plus corporate product formulas and other forms of company intellectual property.
The company would like to release some statistics from this data to the public, a third-party vendor or joint-venture partner. However, the company has to ensure it’s impossible for outsiders to reverse-engineer the released sensitive data. An outsider, in this example, would be an entity intending to reveal, or learn, at least some of the company’s sensitive data elements.
Traditional approaches would most likely seek to simply anonymize the data (e.g., swap out customer names with random numbers) or even redact or delete sensitive fields in the data. However, if you have auxiliary information from other data sources coming into the repository, anonymization isn’t sufficient because outsiders could reverse-engineer or cross-reference data sets to derive or recover the masked data.
For example, in 2007, Netflix released a dataset of its user ratings as part of a competition to see if anyone could outperform its collaborative filtering algorithm. The dataset didn’t contain PII, but researchers were still able to breach privacy by cross-referencing other data sources to derive individual customer data.
On my podcast, “The Walden Pond,” I recently interviewed Ishaan Nerurkar, CEO of LeapYear Technologies Inc., a company that has applied differential privacy research to develop a commercial platform for privacy-preserving computations on sensitive data. (See Insights Without Exposure with Ishaan Nerurkar, The Walden Pond.)
Applications where differential privacy algorithms can benefit an organization might include personal health care data, in which global patient data in clinical research needs to be analyzed to find a life-saving drug without violating individuals' specific medical information and data privacy details.
“Every regulated industry,” Ishaan says, “whether it be in financial services, health care, telecom, aerospace and defense, government or industrial manufacturing — just to name a few — faces significant challenges using and sharing sensitive data.
“While there are some techniques for sanitizing data such as masking [or redacting] certain sensitive data fields, anonymizing the data or simply deleting key information, these techniques don’t really lend [themselves] for today’s level of analytics that require such valuable information in order to enhance predictive models or extract key insights required for effective decision-making,” Ishaan says. “These old techniques either could reduce the value of the data, or worse — allow end users to perhaps even reverse-engineer the masked data and thus exposing the company to risk.”
Ishaan further describes differential privacy as a technology that seeks to learn statistical patterns about the data without exposing underlying information. “It lets you run a statistic and build a model,” Ishaan says. “But it won’t allow for the exposure of a single underlying record that helped generate that model. … Think of it as a layer that sits on your database that allows the user to only gain access to the sensitive data through the differential privacy platform layers,” he says. “Users are able to gain access to the database fields and select which fields need to be hidden in an easy-to-use interface.”
Applications in which differential privacy algorithms can benefit an organization might include:
“There have been significant advancements in differential privacy techniques” in global investigations and litigation “that can also apply to the identification of relevant information before a document production [in the context of a litigation],” according to the article, Global Privacy Rules Intersect with Discovery Obligations, by Andy G. Gandhi, Mauricio Paez and Mark Kindy, New York Law Journal, Jan. 31.
The technically and mathematically gifted have open-source options for using differential privacy. In addition to Google’s version at the beginning of the column, academics frequently reference the development framework, “Pufferfish.” See “Pufferfish: A Framework for Mathematical Privacy Definitions,” by Daniel Kifer, of Penn State University, and Ashwin Machanavajhala, of Duke University, among several other public articles available on the internet.
According to the Duke University article, organizations can use the Pufferfish framework to create new privacy definitions that are customized to the needs of a given application. The goal of Pufferfish is to allow experts in a particular knowledge domain, who frequently don’t have proficiency in privacy conventions, to develop rigorous privacy definitions for their data-sharing needs. Be forewarned: It’s very technical reading.
Is your curiosity piqued like mine? I hope so. In this small space, I can only give the rudiments of differential privacy. I encourage you to Google the topic and research further to see how organizations are applying it. I anticipate many more entities and sectors will be using differential privacy technologies as governments enact more data privacy regulatory laws, such as the GDPR and the California Consumer Privacy Act.
Vincent M. Walden, CFE, CPA, is a managing director with Alvarez & Marsal’s Disputes and Investigations Practice and is host of “The Walden Pond,” a compliance podcast series. He welcomes your feedback. Contact him at vwalden@alvarezandmarsal.com.
Unlock full access to Fraud Magazine and explore in-depth articles on the latest trends in fraud prevention and detection.
Read Time: 6 mins
Written By:
Felicia Riney, D.B.A.
Read Time: 7 mins
Written By:
Patricia A. Johnson, MBA, CFE, CPA
Read Time: 10 mins
Written By:
Bret Hood, CFE
Read Time: 6 mins
Written By:
Felicia Riney, D.B.A.
Read Time: 7 mins
Written By:
Patricia A. Johnson, MBA, CFE, CPA
Read Time: 10 mins
Written By:
Bret Hood, CFE