The DeepSeek AI database leaks over a million chat records

February 3, 2025
DeepSeek AI Artificial Intelligence China Data Breach

The new Chinese AI startup, DeepSeek, has publicly exposed a couple of databases that allegedly contain user and operational information.

Reports revealed that the platform’s unsecured ClickHouse instances had over a million log entries containing plaintext user conversation history, API keys, backend details, and operational metadata.

Researchers uncovered this vulnerability while running a security evaluation on the China-based AI model’s external infrastructure.

The security researchers noticed that two publicly accessible database instances at oauth2callback.deepseek.com:9000 and dev.deepseek.com:9000 have permitted arbitrary SQL queries over a web interface without authentication.

The databases had a ‘log_stream’ table that stored confidential internal logs starting from January 6, 2025.

 

The stored information in the exposed DeepSeek database includes various details.

 

According to investigations, the exposed DeepSeek databases include information such as user searches in the chatbot, backend systems that use keys to authenticate API calls, internal infrastructure and service information, and other operational details.

The researchers emphasise that this kind of access constituted a significant danger to DeepSeek’s and its end consumers’ security.

Depending on their ClickHouse configuration, an attacker could retrieve sensitive logs and actual plaintext chat messages and potentially exfiltrate plaintext passwords, local files, and proprietary information directly from the server using queries.

It is unclear whether the researchers were the first to find this vulnerability or whether malicious actors have already exploited the misconfiguration. Still, the investigators have already notified DeepSeek, and the corporation swiftly fixed the issue, removing the datasets from the public.

Aside from the problems raised by DeepSeek’s status as a China-based technology business, which requires it to cooperate with aggressive data access requests from the country’s government, the company does not appear to have established a strong security posture, putting sensitive information in danger.

Exposure of user prompts is a breach of privacy that is of great concern to enterprises that employ the AI model for sensitive business activities. Furthermore, exposing backend data and API keys could allow attackers to access DeepSeek’s internal networks, allowing privilege escalation and perhaps larger-scale intrusions.

Therefore, this discovery could have significant implications for users if threat actors have noticed and exploited the exposed databases.

About the author