De-anonymization of Insurance Applicants' Sensitive Information

Project Details


Fall 2017


Rosalie Dolor, Maxim Castañeda, Jay Lee





Over the years, as the life insurance industry has expanded, the need for clear regulatory laws has grown as well, originating entities like the National Association of Insurance Commissioners (NAIC). Through the NAIC, insurance regulators establish national standards and best practices, conduct peer reviews and coordinate their regulatory oversight to better protect the interests of consumers while ensuring a strong, viable insurance marketplace. American insurance regulators also take part in the International Association of Insurance Supervisors (IAIS) along with the NAIC by participating in all its major standard-setting initiatives, including working with fellow regulators from around the world to better supervise cross-border insurers, identifying systemic risk in the insurance sector and creating international best practices.
The NAIC is not the industry regulator itself, however it does work as intermediary between states insurance commissioners and federal instances. As pointed in their general mission, the NAIC seeks to: 1. Protect the public interest, 2.Promote competitive markets, 3.Facilitate fair and equitable treatment of insurance consumers, 4. Promote the reliability, solvency and financial solidity of insurance institutions, and 5. Support and improve state regulation of insurance. Hence, this research is a collaboration to enhance insurance companies’ best practices by ensuring the clients’ privacy rights in the information gathering process.
Prudential, one of the largest issuers of life insurance in the USA, held a Kaggle competition to predict the risk level of its applicants. The researchers used the data provided in the competition to build models capable of predicting applicants’ sensitive attributes. The dataset contains 128 attributes (or variables) of 59,381 applicants. It contains information that applicants provided to Prudential when applying to the company's products.
The researchers used data mining methods through a series of steps to de-anonymize sensitive attributes from the given dataset. The process included: identification of sensitive variables, building of predictive models for de-anonymization, and exclusion of variables related to the sensitive variables. Given the anonymized dataset, the researchers checked if the removal of the sensitive variables affected risk level prediction. The flowchart in Figure 1 summarizes the whole methodology process that is followed in this research.
Having applied the de-anonymization process, it was found out that dropping the identified sensitive variables (and the important variables related to them) did not significantly affect the risk level prediction. However, assumptions made by the researchers regarding some of the attributes should be checked with Prudential. Also, the performance metrics that was used for evaluating the predictive models should be discussed with NAIC. Finally, it is recommended to repeat the process when new sensitive attributes are identified and then risk-level prediction should be reevaluated accordingly.

Application Area: