< back to blog

Protecting sensitive business data in preparation for the organization's Gen AI

Conor Sherman
Protecting sensitive business data in preparation for the organization's Gen AI
Published by:
Conor Sherman
@
Protecting sensitive business data in preparation for the organization's Gen AI
Published:
January 21, 2026
falco feeds by sysdig

Falco Feeds extends the power of Falco by giving open source-focused companies access to expert-written rules that are continuously updated as new threats are discovered.

learn more

As organizations prepare to deploy generative AI tools like Microsoft Copilot and Google Gemini, those responsible for safeguarding sensitive data face the near-impossible task of inventorying risks and presenting actionable recommendations. A key reason this task is difficult is that the most important business data is often unstructured, while most of the security industry is positioned to protect structured data and infrastructure. As an industry, we have prioritized protecting the technology that houses the data, “the bucket,” rather than protecting the data itself, “the water.”

The value of unstructured data lies in its content and context, not in its form. This makes discovery, inventory, and protection of that data difficult. Since enterprise GenAI, such as Microsoft Copilot and Google Gemini, is primarily focused on providing answers rather than protecting data, those responsible for data protection are at odds with generative AI technology.

I will share my perspective on why the industry’s historical measures are not sufficient to withstand the impacts of GenAI, and the problems we must overcome to align defenders and GenAI so the organization is comfortable with a responsible deployment of AI (“Responsible AI”).

To align both the defenders of sensitive data and the GenAI technologies available to enterprises today—namely Microsoft Copilot and Google Gemini—we must first acknowledge key attributes that make this technology so different and powerful.

The rollout can bypass traditional gatekeepers. Since the software is made by the same vendors who make the productivity software, it can easily be enabled globally for all employees.

There is no limitation on data consumption. Since the vendor already has the data, the GenAI features can sit on top of the datastore and consume as much data as is made available.

The priority is the answer. The success of a GenAI tool depends on the accuracy of its answers to given questions. These tools are not operative in the context of “should” the requester get the answer; they are limited only to “can” the requester see the answer.

Because answers are derived from data, GenAI tools are ultimately an “answer engine.” Organizations need to answer the following questions before they can responsibly roll out GenAI, and here lies the challenge: our historical approaches are not well-suited to unstructured data.

Sensitive business data

  • Do we know what it is? Is there a systematic way of identifying sensitive business data?
  • Do we know where it is? Once we have the means to identify the data, can we review all possible locations of that data?
  • Do we know it can be trusted? Once we have an inventory of all the data, can we ensure that it is accurate and has not been tampered with?
  • Do we know who has access to it? Once we know where the data that is trustworthy resides, do we know who has access to that data—and, more importantly, the answers that can arise from that data?
  • Do we know if it is being handled properly? Once we know who has access to it, is it being handled with due care?

Challenges of unstructured data

A key reason that answering the questions above is difficult is that there is no “bright line” test to determine whether unstructured data is sensitive. You can’t regex your way out of the problem. Compounding the issue is that sensitivity is an emergent property, not an inherent attribute. When components are separate, they are not sensitive, but when they coalesce in the form of an answer, insight, or single data artifact, that artifact becomes sensitive.

CUI as an example

Controlled Unclassified Information (CUI) is an example of unstructured, sensitive business data. CUI is information that the U.S. government creates or possesses that requires safeguarding or dissemination controls. It is not classified as national security or atomic energy information, but it must be protected in accordance with applicable laws, regulations, and government-wide policies. CUI can also include information that an entity creates or possesses for or on behalf of the government.

In summary, CUI is any data that the U.S. government has classified as CUI; there is no regex to pass or fail on.

What is CUI?

In addition to CUI and general business data, both of which are unstructured, we have mirrored the approach to safeguarding that information. For example:

 

Problem Requirement Solution
Do we know what it is? Must be inventoried Data markings (“CUI//Privacy”, “Proprietary / Confidential”)
Do we know where it is? Must reside on an authorized system Geo-tracking (U.S. residency, data residency)
Do we know it can be trusted? Must have integrity Logging (NIST 800-171 AU), access logs
Do we know who has access to it? Must be restricted Access control (NIST 800-171 AC), “need to know”
Do we know if it is being handled properly? Data handling Training, 33-page document, acceptable use

 

A perspective on what we need to solve for:

  • Inventory: We need a way to identify sensitive data that does not rely on humans applying data markings.
  • Defense: We need a way to protect the data, not just the data store—water versus bucket.
  • Integrity: We need to know if the data can be trusted.
  • Integrity: The ability to correlate sensitive queries to the data that informed the answer.
  • Access: We need to know who has and has had direct and indirect access to the data.
  • Access: We need to know how the data and the answers derived from that data are being used.

About the author

No items found.
featured resources

Test drive the right way to defend the cloud
with a security expert