By Sonia Valeja and Michal Nosek, Percona
In our increasingly digitized world, data reigns supreme. Alongside traditional valuable information like customer records and bank details, data on interactions and activity has become more valuable to companies. As data has become critical, it is also more at risk from theft or attacks like ransomware. According to IBM, the average data breach cost worldwide is now more than US $4.4M.
While it might be simple to advocate for more security around data, actually delivering this in practice is hard. In this article, we’ll look at some techniques that can be used with open-source databases to support systems providing better data security.
Attacks on the systems that store data have existed for years. One of the earliest forms of attacks on database systems is the SQL Injection method used to gain access to data or to modify it without expected authorization. This has been a known concern since 1998, yet it is still commonly used by threat actors to either steal data or run their code. SQL injection can affect more than relational databases – it can affect databases that use other query languages, such as MongoDB.
Since this problem is well known, there are standard approaches to avoid this vulnerability. To start with, applications should validate and escape all external input, use prepared statements with parametrized queries, or use database stored procedures.
Alongside protecting against SQL injection, ensuring that reliable database encryption functions are in place will keep the most sensitive information secure. If someone gains access to the entire database, your data stays safe as long as you keep the encryption key secure. Such an approach is also helpful in staying in compliance with standards like PCI-DSS.
As an example, the PostgreSQL database ecosystem has a powerful extension providing cryptographic functions: pgCrypto. Similarly, if you use MySQL, have a look at its built-in AES_ENCRYPT and AES_DECRYPT functions. Encryption brings more security at the cost of performance overhead and a more complex approach to data retrieval.
In the ideal scenario, development teams want to have data as close to production as possible to test performance and implement new functionality efficiently. Production databases, however, must not be copied and given to developers without any modifications. Development environments are not as secure as production ones, and by default, developers should have restricted access to the production data.
Data masking is a technique that helps to resolve this challenge. This redacts sensitive data so that it can be used in test environments safely. Open-source tools like PostgreSQL Anonymizer provide different data masking options, so you can create a copy of your production database without sharing sensitive data.
Each of these approaches allows developers to harden their systems and use data more securely within their applications. The struggle here is enforcing these practices. Areas like data masking and encryption add to the development process time, while protecting against SQL injection involves deliberate decisions during the code development process.
Database Configuration and Usage
Alongside improving your application code around accessing data, you must secure the data once it’s stored in the production database system to avoid unauthorized access. The least privilege security principle, storage encryption, data in transit encryption, and putting your database in a DMZ are among the most popular ways to make your database more secure in production.
“Data at rest encryption” is often understood as encryption of the disk used by a database to store its files. For example, AWS EBS volumes can be encrypted independently. In fact, encryption can be done on different levels including storage device, block level, and filesystem level. The database engine can be responsible for the encryption of its data files as well.
Database-level data-at-rest encryption does not necessarily replace other methods but complements them. That way, no other application or user running on the host can read the data, only the database itself. Storing the encryption key in a secure location (such as an external KMS system) adds one more layer of security, safeguarding the database from any unauthorized access. This functionality is typically available in proprietary solutions, but it can be found in some open-source distributions like Percona Server for MongoDB as well.
Another element that should be considered is how your database system stores diagnostic data about its own activities. For instance, your database may have performance issues, and this may have a significant impact on the application and business it serves. The ability to analyze any potential issues quickly and efficiently makes a huge difference to their ability to solve problems. This information will come from logs. There’s a side effect of allowing a database to write a lot of information – some sensitive information may end up in logs.
Log redaction is a process of removing sensitive information from the logs including any login credentials, contact details, credit card numbers, or other forms of Personally Identifiable Information (PII). This approach helps protect confidential data and ensures that essential information is available for debugging, auditing, and analysis purposes. Some databases support this by providing extra features to remove sensitive data from the logs, while others like MySQL support different verbosity levels for logging.
Logs are also often targeted during attacks as they may contain additional information, and they provide evidence for investigation during incident response. Editing or altering logs is therefore a common threat actor behavior. Controlling access rights and activities around logs is consequently important.
Hardening Best Practices
Like many software and infrastructure components, the default configurations for many popular databases are not focused on security, but rather on making the initial installation of the database quick and easy.
Attackers often target default configurations while breaking into any system, as enough organizations do not change their configurations. To protect against these kinds of attempts, you can update these settings to make the system more secure. One common change is to alter the default database port numbers to prevent easy access, while changing the sample tables for storing actual data also makes the system less vulnerable. Alongside this, it is essential to deploy role-based access control (RBAC) to stop any abuse of the system by unauthorized users. While this should be an obvious approach to make security by design implementations easier, many default installations allow you to access the database without any type of credentials being in place.
Some databases, such as MySQL, provide a dedicated utility to harden your installation. Others typically provide a handy checklist of things to deal with before going to production. Whether you use utilities or checklists, the most important element is to implement a security by design methodology overall. This should ensure that the installation – and the application on top – is more secure by default and that typical attack approaches cannot succeed.
Understanding security challenges in the context of your specific application and organizational needs is key to making informed decisions about your data security. It will influence areas like database selection, configuration, and management. Tailoring solutions to address these challenges ensures that database deployments support your business goals effectively and reliably.
Addressing data and database challenges requires a multifaceted approach. As part of your strategy, choose trusted open-source databases with thriving communities, apply patches regularly, and implement secure configurations. In your deployments, anonymize sensitive data with data masking, employ continuous monitoring, and establish automated backup and recovery processes. Around this, you should also enable auditing for compliance, tap into community support, and conduct security testing to ensure your applications are secure over time. These measures collectively fortify data and database security, ensuring robust protection and regulatory compliance and delivering a secure by design approach.
About the Authors
Sonia Valeja is a PostgreSQL Database Administrator at Percona. She has over ten years of experience in database management and its operations. Sonia has been an active participant at pgConf and has given a lightning talk at the event. She is also the author of “PostgreSQL for Jobseekers” alongside David Gonzalez Milan.
Michal Nosek is a Senior Enterprise Architect at Percona, where his objective is to bridge the gap between database technologies and business outcomes by providing customers with appropriate strategies and open-source database solutions. He lives in Gdansk, Poland. In his spare time, Michal enjoys traveling and windsurfing.