02 May 2025 / 10:54 AM

Snowflake: Security Using Differential Privacy Policy

SDG Blog

Written by César Segura, Subject Matter Expert at SDG Group.

Data Governance in the current Data Platform is a mandatory concept we have to take into consideration together with other ones when you have choosen the best features and pieces for your architecture like Ingestion, Transformation and Delivering.

As a part of the Data Governance, there is a important step that is the Security Privacy in your data. In other Security articles series, I extend on the Accesibility Layer part, talking about the different layers. But in this one I will deep into one specific feature like an Accessbility Privacy Policy that Snowflake provides.

The Differential Privacy Policy (DPP) is mainly focused on providing noise to a table/view that contains PII sensitive information, in order to the restricted access users can continue querying information but they won’t be able to retrieve sensible information. They will only be able to get agregatted data applying restrictions.

On this article we will see basically three topics:

  • What is the process to create a DPP? We will provide an overview about a simple process to create a DPP.
  • How does it work with examples? We will show with an simple case and a few examples, how a DPP works injecting noise to the information.
  • Why it is named a Differential Privacy Policy? We will understand the reason of that name, based on the previous topics.

1. What is the process to create a DPP?

When you want to create a DPP, you must define what conditions, when a role is trying to access to a table, applies a policy or not. When you define that applies, Snowflake on Service Layer will assign charges to one specific budget is being affected. That Budget will be automatically generated inside the DPP, it will be used to limit the use done by a specific role to that affected table. You will be able to extend that budget capacity in; number of queries executed (Nbr_Agg), units (the units are consumed depending on each query required to apply more or less noise to the table) or window time (weekly, monthly…). The below Budget values are by default, but it can be modified in any time.

 

 

An example of the previous, could be the below Snowflake script:

CREATE OR REPLACE PRIVACY POLICY
security_db.policies_schema.customers_policy AS () RETURNS privacy_budget ->
CASE
   WHEN CURRENT_ROLE() = 'ACCOUNTADMIN' THEN no_privacy_policy()
   WHEN CURRENT_ROLE() IN ('ROLE_1')
     THEN privacy_budget(budget_name => 'Budget_1')
   WHEN CURRENT_ROLE() IN ('ROLE_2')
     THEN privacy_budget(budget_name => 'Budget_2')
   ELSE privacy_budget(budget_name => 'Budget_Rest')
END;

In the process when you apply your policy to your table you must define what are the fields that identifies an entity inside your table information. An example could be the below:

 

 

An example of the previous, could be the below Snowflake script:

-- Assign the privacy policy to the CUSTOMERS_TABLE table.
ALTER TABLE dp_db.dp_schema.customers_table
ADD PRIVACY POLICY policy_db.diff_priv_policies.customers_policy
ENTITY KEY (id);

Afterthat, you specify the privacy domains for the field(s) that you want manage the sensitive information, these are the ones can identify individually or in combination the identity / sensitive information. That privacy domains will allow inject noise information with meaning, for every query done to the data later.

 

 

An example of the previous, could be the below Snowflake script:

-- Define privacy domains on CUSTOMERS_TABLE table fields

ALTER TABLE dp_db.dp_schema.customers_table ALTER (
   COLUMN gender
       SET PRIVACY DOMAIN IN ('Female', 'Male'),
   COLUMN Address
       SET PRIVACY DOMAIN IN ('BCN', 'SAB', 'MAD', 'TAR'),
   COLUMN Birth_date
       SET PRIVACY DOMAIN
           BETWEEN (to_date('01/01/1954'), to_date('12/31/2007'))
);

2. How does it work with examples?

Now, imagine the below scenario where different roles (ADMIN and ROLE_1) wants to execute the same query over the table affected by a DPP. In that example, we will use the below one:

 

 

  1. Query 1 is executed by ADMIN role at time 0.
  2. The Service layer will check if the query is using a table affected by a DPP. It’s not affected by any policy due to there is no Budget applied, remember role ADMIN has been specifically no assigned any policy.
  3. The Noise injection doesn’t activate it.
  4. The Q1 is execute directly on the CUSTOMERS_TABLE, so the results will be always deterministic.
  5. The results of 1 row is returned to the user with ADMIN role.

 

  1. Query 1 is executed by ROLE_1 role at time 0.
  2. The Service layer will check if the query is using a table affected by a DPP. It’s affected by policy due to there is a condition satisfied: “ROLE_1 is trying to access the data”, so Budget_1 will be affected.
  3. The Noise injection activates on table CUSTOMERS_TABLE, so it will be generated an underlined information to be accessed later. It will be based on the PRIVACY DOMAINs based above.
  4. The Budget_1 will decrease their expense in both cases: Nbr_agg will be reduced in one, and in addition the units needed to apply noise to the underlined information will be increased the Accumulated Unit Spent (in that case +3,6 has been used). Maxim limit will be reached when it arrives to the Unit Limit. If you use the below query, with ESTIMATE_REMAINING_DP_AGGREGATES, you will see an estimation of your current Budget capacity:



  5. The Q1 is executed through the Noised underlined information derived from CUSTOMERS_TABLE, so the results will be always NOT deterministic. Each time ROLE_1 executes Q1, it will generate new underlined noise information.
  6. The results of “3” is returned to the user with ROLE_1 role.

Case 3 — ROLE_1 role executes Q1 query AGAIN later:

 

 

  1. Query 1 is executed by ROLE_1 role at time 1 (later the latest execution).
  2. The Service layer will check if the query is using a table affected by a DPP. It’s affected by policy due to there is a condition satisfied: “ROLE_1 is trying to access the data”, so Budget_1 will be affected.
  3. The Noise injection activates on table CUSTOMERS_TABLE, so it will be generated an underlined information to be accessed later. It will be based on the PRIVACY DOMAINs based above.
  4. The Budget_1 will decrease their expense in both cases: Nbr_agg will be reduced in one, and in addition the units needed to apply noise to the underlined information will be increased the Accumulated Unit Spent up to 4.2 (in that case +0,6 has been used, in addition to 3,6 previously accumulated). Maxim limit is not reached yet to the Unit Limit 233.
  5. The Q1 is executed through the Noised underlined information derived from CUSTOMERS_TABLE, so the results will be always NOT deterministic. Each time ROLE_1 executes Q1, it will generate new underlined noise information.
  6. The results of “5” is returned to the user with ROLE_1 role.

3. Why it is named a Differential Privacy Policy?

It’s based on when you execute your query, the output won’t be deterministic when a DPP affects you. In that case, it will apply noise to your data affecting to the results every time. The differences respecting to the correct value (deterministic one), it applies to the name of that security policy.

So you can use the functions DP_INTERVAL_LOW and DP_INTERVAL_HIGH, to determine the differential margins applied to the output result, with a 95% of trustibility.

An example of use with a query:

Conclusions

The DPP policy is one of most complex privacy policy to understand. Many people have been asked me about it, and I thought that it would be a good idea writting this article to help Snowflake members to get a comprehensive way with an easy and quick overview.

We have to take into consideration that this type of policy is very useful for environments when you share sensible information, and you want to highly restrict the use of this data. Snowflake has a full list of other policies, you will be sure to better understand your use case prior to apply this or other ones features.

Original article published on Medium here