11% of AI Inputs Are Confidential — What Studies Really Show About the AI Data Leak Problem
AI tools like ChatGPT, Claude, and Gemini measurably boost productivity — the research is clear on that. But those same studies reveal an alarming flipside: employees are regularly sharing significant volumes of sensitive company data with external AI services, often without realising it. Here are the key numbers from the research.
11% of all AI inputs are confidential (Cyberhaven, 2023)
Data security company Cyberhaven analysed the behaviour of 1.6 million employees across thousands of companies and published one of the most comprehensive reports on the topic in February 2023.
The finding: 11% of the data employees paste into ChatGPT is classified as confidential. In just a single week, an average company of 100,000 employees logged 199 instances of confidential data entered into ChatGPT — including customer data, source code, and internal strategy documents.
The most exposed groups: employees in legal, finance, and HR — precisely the functions where data sensitivity is highest.
Source: Cyberhaven Research, "The Data Security Risks of ChatGPT" (February 2023)
The Samsung incident: three leaks in 20 days (2023)
In March 2023, it emerged that engineers at Samsung Semiconductor had accidentally shared proprietary source code and confidential meeting notes via ChatGPT in three separate incidents within less than three weeks. In one case, source code from a measurement programme was pasted in to debug errors. In another, internal meeting notes were submitted for summarisation.
Samsung responded by immediately banning ChatGPT and similar tools on company devices. The incident illustrates in concrete terms what the Cyberhaven data shows in aggregate: this is not a fringe problem — it is everyday behaviour.
Sources: Bloomberg, The Verge (March 2023); confirmed by Samsung press statement
AI boosts productivity by 37% — but creates new risks (MIT, 2023)
A widely cited MIT study published in 2023 in the journal Science (Noy & Zhang, "Experimental Evidence on the Productivity Effects of Generative AI") examined the use of ChatGPT in professional writing tasks.
Result: access to ChatGPT reduced task completion time by an average of 37% and improved perceived output quality by 18 percentage points. No wonder employees actively use AI tools — including for sensitive work.
The dilemma: the greater the productivity gain, the stronger the temptation to feed confidential documents into these tools. The study itself notes the need for clear usage policies.
Source: Noy, S. & Zhang, W. (2023). "Experimental Evidence on the Productivity Effects of Generative AI." Science, 381(6654), 187–192.
Average cost of a data breach: $4.88 million (IBM, 2024)
IBM's annual Cost of a Data Breach Report, produced with the Ponemon Institute, is one of the most reliable sources on breach costs. The 2024 report analysed 604 real breaches across 17 countries and 17 industries.
Key figures:
- $4.88 million — average global cost of a data breach, a record high and a 10% increase year-on-year.
- $9.77 million — average cost in healthcare, the most expensive sector for the 14th consecutive year.
- 46% of breaches involved customer personal data, including names, email addresses, and personal information.
- Companies using AI-powered security tools saved an average of $2.22 million in breach costs.
Source: IBM & Ponemon Institute, "Cost of a Data Breach Report 2024"
GDPR fines have exceeded €4 billion (DLA Piper, 2024)
Since GDPR came into force in May 2018, European supervisory authorities have imposed record-level fines. According to the GDPR fines tracker maintained by law firm DLA Piper, cumulative penalties since 2018 have exceeded €4 billion.
Notable individual cases:
- Meta: €1.2 billion (2023) — for unlawful data transfers to the US. The largest GDPR fine ever at the time of issue.
- Amazon: €746 million (2021) — for privacy violations in ad targeting.
- WhatsApp: €225 million (2021) — for insufficient transparency around data processing.
For smaller businesses and freelancers, fines are typically lower — but even five- or six-figure penalties can be existential.
Source: DLA Piper, GDPR Fines Tracker (2024); European Data Protection Board (EDPB)
55% of companies use generative AI — but few have safety policies (McKinsey, 2023)
The McKinsey Global Survey on AI (2023) surveyed more than 1,600 executives worldwide. Key findings:
- 55% of companies are using generative AI in at least one business function.
- Only 21% have introduced formal policies for the safe use of generative AI.
- 38% cite data privacy and security as their biggest AI risk — more frequently than errors, bias, or regulation.
That means the vast majority of companies using generative AI are doing so without clear safeguards for sensitive data.
Source: McKinsey & Company, "The State of AI in 2023: Generative AI's Breakout Year" (August 2023)
What do these numbers mean in practice?
The research is clear: AI tools are productivity-enhancing and widely adopted — but without protective measures, they represent a significant data privacy risk. For professionals with access to especially sensitive data (clients, patients, employees), uncontrolled AI use simply is not an option.
The practical solution is local anonymisation: documents are cleaned before being entered into any AI tool. No cloud upload, no GDPR violations, no breach of client trust. MaskBase automates exactly this step — directly on your device.
Sources
- Cyberhaven Research: "The Data Security Risks of ChatGPT" (Feb. 2023)
- Noy, S. & Zhang, W.: "Experimental Evidence on the Productivity Effects of Generative AI." Science 381(6654), 2023
- IBM & Ponemon Institute: "Cost of a Data Breach Report 2024"
- DLA Piper: GDPR Fines Tracker (2024)
- McKinsey & Company: "The State of AI in 2023" (Aug. 2023)
- Bloomberg / The Verge: Samsung ChatGPT data leak (March 2023)
Try MaskBase for free
Automatic redaction of sensitive data — locally on your device, before it reaches any AI service.
Go to homepage →