14 November 2025 / 08:38 PM

Snowflake RAG for Data Governance with Cortex Agents & CKE

Written by Cesar Segura, SME at SDG Group.

Welcome to the next episode of the Snowflake Intelligence and AI series.

In the last episode, I showed how to implement a real use case using Snowflake Intelligence to provide access to structure and non-structure data for your stakeholders.

But in this episode, our objective will be to build a RAG application for DataGovernance purposes with Cortex Agents powered by Cortex Analyst, Custom Tools & CKE and finally using Snowflake Intelligence. This would be the functionality of this AI application:

Explore and Document: It will be able to ease with the process to explore and document your databases, schemas, tables/views and columns using AI.
DataGovernance Iteration: It will use a metadata information table in order to confirm/modify the AI Documented descriptions generated, incorporating a workflow methodology to iterate it into your DataGovernance processes.
Providing “Data Accessibility”: It will be able to generate and iterate Suggest Semantic Models or Views from the AI documented objects just generated and suggestions by the user. It can also be used to deploy the Semantic Layers.
Online Best Practices: It will use Snowflake Documentation Online documentation (CKE) in order to check all the rules, guidelines and best practices about generating Semantic Layer objects in Snowflake used and needed by Cortex Data Agent.

Friendly reminder: The definition of Snowflake Intelligence is the new AI feature based on a UI interface available in order to use Data Agents to provide a RAG Corporative application to your stakeholders.

Index

Starting Architecture
Creating Our DG Cortex Data Agent
Using Snowflake Intelligence to Governance
Conclusions
About Me

1. Starting Architecture

starting architecture

APPLICATION

Snowflake Intelligence will be the UI in charge to communicate all the user requests to the Cortex Data Agent.

CORTEX AGENT

It will the CDA (Cortex Data Agent) the powerful feature in charge to orchestrate all the requests done by the user, analyzing, breaking out in multiple tasks, gives a correct answer and finally monitor the conversations and iterate about these ones to provide a better User experience.

RETRIEVE ENGINE

The phase when there are the different tools used by Cortex Data Agent.

Cortex Analyst: Structured Information using semantic layer. Basically the metadata information managed by our application.
Cortex Search: Non-Structured Information using vector database services. In this case will be used CKE, Cortex Knowledge Extension. We are not going to create our own Search Services.
Custom: Functions or Procedures that you can use in addition to native one features representing extra functionality for your tools.

PROCESS ENGINE

The place where is stored all data objects needed by the tools.

AI_ENGINE_DB.AI_ENGINE_SCHEMA schema will contain all the AI engine procedures tools.

AI_ENGINE_DB.AI_METADATA_SCHEMA schema will contain all the metadata information, semantic models and views deployed (by default).

2. Creating our DG Cortex Data Agent

It will be in charge to orchestrate, plan, execute the different tools in order to provide the feedback and response to the user. In a similar way we used and specified our setup in the prior article, we will configure the properly description, instructions, tools, orchestration and permissions.

We will access to Data Agents you could find it here:

Creating our DG Cortex Data Agent

Once we click on Create Agent, we will proceed through the different options below.

Creating our DG Cortex Data Agent

2.0 CDA Agent Functional Architecture

0 CDA Agent Functional Architecture

Here we differentiate different aspect we will have to apply, in the Setup specification and are reflected on each part of the architecture diagram:

Main description functionality
Instructions
Tools used
Orchestration
Permissions

2.1 About description

about description

2.2 Instructions

instructions

2.3 Tools

Here, you will see that there are 3 options we can use based on build ones and custom ones. We are going to use all of them.

2.3.1 Cortex Analyst

Cortex Analyst

We will specify a semantic model YAML file that contains information about the metadata. The configuration is the full path in the stage to the file.

- Metadata_Columns View: Information about the database objects to be discovered and available to access, in order to generate Metadata.

- Data_Object_Descriptions Table: Current Information generated through the AI DOC tool process.

Cortex Analyst

2.3.2 Cortex Search

Cortex Search

Integrate the official Snowflake Documentation content delivering real-time answers for AI applications and teams working with Snowflake using Cortex Knowledge Extensions | Snowflake Documentation

So you only have to GET the below Snowflake: Snowflake Documentation | Snowflake Marketplace CKE from MarketPlace.

Cortex Search - snowflake

This will install a Shared Cortex Search that will contains Snowflake Documentation updated automatically. 0 maintenance by your own, only pay for compute usage.

2.3.3 Custom Tools

In custom tools, you can specify both Snowflake functions and Snowflake procedures objects. In our case, we have specified all procedures.

Take into consideration that specifying a good description will allow to the Data Cortex Agent easily identify what tool must use for what specific task, avoiding unuseful iterations and bad actions.

2.3.3.1 <Tool 1: AI DOCument>

Tool 1: AI DOCument

This tool has been extracted from other article I wrote sometime that allows you generate AI descriptions to every object in your databases, and using an intermediate metadata table (the specified in the AI semantic metadata model) will apply these descriptions to corresponding objects.

Tool 1: AI DOCument - 2

The initial code is the below:

CREATE OR REPLACE PROCEDURE initiate_generate_ai_description(
model_name STRING,
database_name STRING,
schema_name STRING,
table_name STRING ,
column_name STRING ,
action STRING,
parallel_degree int,
parallel_degree_table int,
max_words_length_description int,
show_sensitive_info boolean,
sample_rate int,
type_of_object string,
initialization boolean
)
RETURNS STRING
LANGUAGE SQL
EXECUTE AS CALLER
AS

DECLARE sampled_values STRING;
generated_description STRING;
full_table_name STRING;
list_sql_execute ARRAY;
column_name_individual STRING;
table_name_individual STRING;
schema_name_individual STRING;
run_id datetime default current_timestamp();
desc_show_sensitive_info STRING DEFAULT 'Don''t show sensitive information in the description.';
type_of_object_individual STRING;
view_of_columns STRING;
view_of_tables STRING;
view_of_schemas STRING;
query STRING;
rs RESULTSET;

BEGIN

IF (:initialization) THEN
truncate table AI_ENGINE_DB.metadata_schema.data_object_descriptions;
END IF;

IF (upper(:schema_name)='NULL') THEN
schema_name_individual := NULL;
ELSE
schema_name_individual := :schema_name;
END IF;

IF (upper(:table_name)='NULL') THEN
table_name_individual := NULL;
ELSE
table_name_individual := :table_name;
END IF;

IF (upper(:column_name)='NULL') THEN
column_name_individual := NULL;
ELSE
column_name_individual := :column_name;
END IF;

CALL AI_ENGINE_DB.AI_ENGINE_SCHEMA.generate_ai_description(:model_name, :database_name, :schema_name_individual, :table_name_individual, :column_name_individual, :action, :parallel_degree,:parallel_degree_table,
:max_words_length_description, :show_sensitive_info,:sample_rate, NULL);

RETURN 'AI Main Documentation Process completed successfully';
END;
;

You can check the full article here to get the code of the tool.

2.3.3.2 <Tool 2: AI DOC Modify Description>

Tool 2: AI DOC Modify Description

This tool is used to modify the metadata object description table the description previously generated by AI, but generating a new version of the description specified by the user.

The code is below:

CREATE OR REPLACE PROCEDURE modify_ai_description(
database_name STRING,
schema_name STRING,
table_name STRING ,
column_name STRING ,
generated_description STRING
)
RETURNS STRING
LANGUAGE SQL
EXECUTE AS CALLER
AS

DECLARE
run_id datetime default current_timestamp();
model_name STRING DEFAULT 'N/A';

BEGIN

-- Store the generated description in metadata
INSERT INTO AI_ENGINE_DB.metadata_schema.data_object_descriptions (run_id,model_name,database_name, schema_name, table_name, column_name, description)
VALUES (:run_id, :model_name, :database_name, :schema_name, :table_name, :column_name, :generated_description);

RETURN 'Modification AI Metadata Description Process completed successfully';
END;
;

2.3.3.3 <Tool 3: Generate Semantic Model File>

Tool 3: Generate Semantic Model File

This tool is used to generate a Semantic Model file based into the value provided by the RAG that contains YAML, and specified in one of the parameters specified in the procedure. Parameters are file name, file stage full path and file content. By security reasons, in the tool the stage path will be restricted into the working ones.

If you want to know more about this procedure code, check my below article.

2.3.3.4 <Tool 3: Create Semantic View>

This tool is used to generate a Semantic View based into the value provided by the RAG that CREATE sql statement, and specified in one of the parameters specified in the procedure. Parameters are SQL_STATEMENT. By security reasons, in the tool will be restricted the types of the SQL can be used, only allowing creating or droping semantic views. It will be restricted the place where the semantic views are being created, but you can change it.

The code is the below:

CREATE OR REPLACE PROCEDURE AI_ENGINE_SCHEMA.run_dynamic_sql(sql_command STRING)
RETURNS STRING
LANGUAGE JAVASCRIPT
AS
$$
try {
 var stmt = snowflake.createStatement({sqlText: SQL_COMMAND });
 var rs = stmt.execute();

 if (rs.next()) {
 // Get the first column, first row
 return rs.getColumnValue(1);
 } else {
 return "Success: command executed successfully (no rows returned)";
 }
} catch (err) {
 return "Error: " + err.message;
}
$$;

2.3.3.5 <Tool 5: Validation Semantic Model>

Tool 5: Validation Semantic Model

This tool is used to validate a Semantic Model based into the value provided by the RAG that will be a SQL statement. The Data Agent checking the Snowflake Cortex CKE Search Service in order to know how to check semantic models will use it in order to verify the integrity of your Semantic Layer.

The magic is here: No manuallities, no hardcoding, only easy and apply best practices following online Snowflake documenation. Parameter is a SQL_STATEMENT. By security reasons, in the tool will be restricted the types of the SQL can be used, only allowing not ALTERING existing database objects.

The SQL code of the procedure is same than Creation Semantic Model Tool. I have to recognize :) I am working on a improvement of this Tool, once I have finished I will publish and update this article with the changes!

2.4 Orchestration

Your goal is to apply the ORCHESTRATION Diagram Architecture part but indicating flow functionalities and specifications.

Orchestration

This one of the most important part of the DataAgent, translating how to orchestrate the interactions from the different tools you are using. It’s recommended to use a clear, organized and without contradictions instructions to specify how the Data Agent must work. Not applying these rules could cause that you will face some misbehaviours in your SI RAG application.

For the Orchestration model management, on my case, I will specify in “auto” the election. This option will allow to the DataAgent automatically choose the different LLMs available model to make it fluent, performant and provide better results depending of the question and context. But if you are more comfortable with one specific, at the moment of writing this article these are the below available:

Orchestration - 2

3. Using Snowflake Intelligence to Governance

Now we are going to explore, document and provide data contracts (Semantic Layer objects) to our users using Snowflake Intelligence with our Cortex AI Data Agent.

Accessing & Select our CDA
Discover and AI Document database objects

3.1 Accessing & Select our CDA

Starting opening Snowflake Intelligence
Accessing & Select our CDA Be sure, to select our Data Agent that has been just created:

Accessing & Select our CDA - 2

3.2 Discover and AI Document database objects

Firstly, we are going to discover what information I can see in order to discover it:

Discover and AI Document database objects

SI shows that is checking the Semantic Model to check this information:

Discover and AI Document database objects - 2

The response is:

Discover and AI Document database objects - 3

But our purpose now, it will be to AI document our databases. We are gonna start with one of these schemas. But previously, we would like to check what contains this schema:

Discover and AI Document database objects - 4

Response showed by SI:

Discover and AI Document database objects - 5

So good, now we are comfortable with the information we want to generate AI Documentation. We could also navigate and discover further showing what columns are available, or filter them, etc but imagine that all this job now is done.

Discover and AI Document database objects - 6

The Data Agent really knows has to use AI Doc Tool to document the desired objects, but it need to ask to the user some specification previously defined and needed in AI documentation process:

Discover and AI Document database objects - 7

In that case, are different parameters needed that you can use by default but you have to pay attention on this. This behavior is described on the AI Doc Tool article mentioned earlier.

Our intention now will be only generate description to previously checking and iterate the information before to apply descriptions. We will specify that we are gonna start and if there are any current description generated on the current AI metadata description table that must be wiped out.

Step by step now…

Discover and AI Document database objects - 8

It takes no more than a few seconds, there are not too much objects to generate and the AI Doc process tool use good techniques to get a good performance.

The response of SI, show us that the process has finished, and it ask us about if we want to review the information generated.

Discover and AI Document database objects - 9

So, as a part of the DG process we are gonna check the information, we confirm to SI, so it will use again Semantic Model about Metadata to show us the information just been generated:

Semantic Model

And finally the summary:

Semantic Model - summary

<Iterate over more info>

Now, I am interested to check some tables and columns, so I will ask it about for further review. But in any case, it can be used as you wish, search for specific information, etc… so it will be able to interact with the user.

further review

Unfortunately, I am not comfortable for some specific content, so in that case, here comes into the game the different interaction with the user, or you can delegate to the user themselves to review/modify their own descriptions.

I will want to change some information:

review/modify their own descriptions

The Cortex Data Agent detects that it has to use AI Doc Modify Tool, it will apply the changes I have suggested to make the changes.

The response given by the Data Agent is the successfully action has been done, and asks about how to proceed next.

AI Doc Modify Tool

If we are done all changes to our AI documentation, we are gonna proceed to review the information generated. Again, pay attention that it can be done many other interactions to finally get the information reviewed, such as back again to re-generate full or part of the AI documentation changing some features, changing some descriptions, etc…

AI documentation

Once it confirms that all changes have been done, and finally we are comfortable with the information we get we can go on to apply changes. The CDA (Cortex Data Agent) suggest us for the next steps following our indications and guardrails specified. But here there are different approaches you can do, the limit is your imagination :)

CDA (Cortex Data Agent)

CDA (Cortex Data Agent) ask for confirmation about the action will be taken, this will modify the data object comments in the database. We are gonna confirm, but you can also ask about generate all the needed SQL statements for further uses in your CI/CD pipelines.

Again, CDA going on using AI DOC Tool:

AI DOC Tool

And finally, the response, confirmation and suggestion for next steps done by CDA:

confirmation and suggestion from CDA

Now, we are going to check that the changes have been correctly applied using the Snowsight UI interface.

Snowsight UI interface

The tool as per confirmation, has been documented all objects entire schema, in addition, I am paying special attention on our specific change we made in our DataGovernance interactions.

3.3 Providing Semantic Access

Remember that the data do not used, it has not value, so your data platform doesn’t make sense due to is not achieving your objective provide benefit to your business using your data.

So our intention on this part, it will be to easily generate semantic views / models in order to provide it to other applications to access the data. We won’t need to know any context about how to generate it, etc… the CDA it will have the enough intelligence to do it by itself.

On the same chat (you can also open other one), we will ask to generate a Semantic Layer but with the information that it really interest us and it has been AI generated, in that case only with SALES related. Remember here, your imagination is your limit.

Providing Semantic Access

On the previous image, you can see what are the different planned steps the CDA will do, so you don’t have to worry about how to build it, CDA will check how to do it checking official documentation and it will use your data.

Providing Semantic Access - 2

Semantic View

We are gonna confirm the type of the Semantic Layer we want to generate, the Semantic View ie.

Semantic View

It’s showing what were the different instructions taken on the planned steps by the CDA, so it finally provides teh CREATE SEMANTIC view statement. Here, you could try to iterate, to change the name, to incorporate/remove tables, columns, etc.

In any case, at the end CDA suggests the next steps, if you want to do some further actions.

CDA suggests the next steps

We are going to ask about to create the semantic view!!

create the semantic view

CDA directly detects what tool has to use in order to create the Semantic View. And finally, it creates it!

what tool has to use in order to create the Semantic View

Now, it has been created the CDA suggests us what are the different next steps we can do:

CDA suggests us what are the different next steps we can do

So, in that case we are gonna make it a try using with Cortex Analyst:

Cortex Analyst

Verifying the Semantic View, and asking some questions:

Semantic View questions

Semantic Model

In case we want to build a Semantic Model, we can proceed from the beggining step we started to build the Semantic View, but we can also suggest to build it using this Semantic View due to your use case will be to provide access not only to your BI applications (Semantic View) so you want to provide accessibility to one specific CDA for one department using a Semantic Model.

We are going to proceed with the DataGovernance CDA:

Data Governance CDA

Here are the different steps that the CDA will take but previously needs to know if you want to check and visualize it previously to deploy, or directly deploy it with the instructions provided.

In that case, we are interested to check it previously:

check it previously

On the previous image, CDA has detected a method to convert directly from Semantic View to a Semantic Model using the CKE Cortex Search Service for Snowflake Documentation. Afterthat, it describes again all the action it will take part of the checking and validation process.

After the process finishes, it provides the summary Semantic Model report:

CKE Cortex Search Service for Snowflake Documentation

But we will pay attention on the FINAL recommendations and next steps done by CDA:

FINAL recommendations and next steps done by CDA

As per semantic model can provides more additional information, that could be useful for the Data Agent using Cortex Analyst. So in that case, we are gonna ask CDA to enhance existing Semantic Model:

CDA to enhance existing Semantic Model

Finally we get our Semantic Model validated and enhanced with new information. CDA shows the next steps that can be taken:

CDA shows the next steps that can be taken

In that case, we are gonna proceed with the deploy of this semantic model to allow other tools as Cortex Analyst used it.

Cortex Analyst used it

CDA knows has to use the Generate Semantic Model File Tool in order to deploy the YAML that contains the Semantic Model. It mentions the parameters that CDA needs that user confirms.

Generate Semantic Model File Tool

For the previous confirmation parameters (model file name and ready to deploy it), we can change them but we will proceed with default values. Take into consideration, that you can restrict that parameters: file name, stage path, etc, but remember the limits are in your imagination and your use case needs :)

proceed with default values

The deployment has been done successfully, and per confirmation you can use it!

next steps

In that step, you could also re-iterate it if you want, but as per now we have finished.

So we are gonna verify that has been created and check it!

verify that has been created and check it

And finally, you get your Semantic Model enhanced, tested and ready to use!

Semantic Model enhanced

4. Conclusions

We have implemented CDA (Cortex Data Agent) that allows us to benefit from multiple Data Governance tasks. Maybe you thought you would need to implement using other third party tools or Streamlit but not in this case. This CDA use tools with different purposes, that are in charge of explore your available data objects, use AI to document it, iterate it as per your DataGovernance process and applies your changes if requires it, and finally provide Semantic Access through using the AI documented information building the appropriate Semantic Layer objects for your BI and AI applications such as metrics, business concepts, custom instructions (onthology), etc.

The UI used has been Snowflake Intelligence, once we have selected our CDA in the chat, you could able to make interactions in any direction. But take into consideration, that you could alse use Cortex Agents REST API | Snowflake Documentation in order to implement and use it programmatically.

Important the role of CKE (Cortex Knowledge Extension) providing the capacity to generate / validate / correct semantic layers together with the ability of the agent to incorporate new functionalities, and add any new features provided by user instructions without the need the Users knows how semantic layers are built.

Next, take into consideration that the effectivity of your CDA will be based on different setup aspects: Description, Instructions, Orchestration details and model, Tool definition and descriptions.

Finally, you must ensure that your application can’t be used in a malicious way, so establish guardrails in your orchestration instructions and configuration.

At the end, the experience using these types of AI metholodigies, knowledge and testing are the base concepts you will need to become succed in your AI application implementation.

5. About me

As a Subject Matter Expert on different Data Technologies, with 20+ years of experience in Data Adventures, I lead and advisor teams to get the best approaches and apply best practices.

I am a Snowflake DataHero 2025, Snowflake Certified on Advanced Architecture + GenAI Specialty, Snowflake Spotlight Squad Team Member 2024 and Snowflake Barcelona User Group Lead Chapter. I am working as a Snowflake Principal Architect at SDG Group.

As Data Vault Certified Practitioner, I have been leading Data Vault Architecures using Metadata Driven methodologies.

If you want to know more in detail about the aspects seen here, or other ones, you can follow me here on Medium || Linked-in here.

Article originally posted on Medium.