An Open Source Approach to Threat Mitigation in AWS

By Nigel Douglas and Igor Eulalio

The security of cloud environments is a top priority for organisations worldwide. According to research by Omdia, supporting cloud and digital transformation projects is one of the top three priorities for cyber security teams, alongside skills development and protecting against ransomware. From a security perspective, getting the right skills around cloud environments so they can be managed and maintained securely is critical. At the same time, having the budget to cover these needs is also a massive challenge.

Amazon Web Services (AWS), one of the most popular cloud service providers, is often a key consideration in discussions about securing cloud infrastructure. Leveraging open source tools for security monitoring and response can significantly enhance threat mitigation strategies, as you can access the latest projects based on industry collaboration and improve your ability to spot issues across cloud, container and Linux deployments. In this piece, I’ll go into how Falco provides insight into cloud security in combination with other open source security tools.

Understanding Falco and Falco Talon

A recent graduate of the Cloud Native Computing Foundation (CNCF). The open source project was designed for the security monitoring of containerised applications, often referred to as cloud-native workloads. Integrating the open source Falco engine with the Falco Talon response engine provides a powerful and flexible solution for security threats while being fully open source. You can then combine these tools with your cloud service provider’s offerings – in the case of AWS, we’ll look at AWS Lambda as an on-demand serverless compute platform.Falco allows users to define custom rules to detect suspicious activity within their AWS environments through CloudTrail logs. Falco can be deployed on any Linux VM, from robust servers to lightweight devices like a Raspberry Pi, ensuring versatility and broad accessibility.

Falco Talon extends Falco’s capabilities by automating responses to threats it detects. Falco Talon particularly shines through its integration with AWS Lambda, which executes code in response to events & automatically manages underlying compute resources.

Workflow Integration with AWS Lambda

The integration process is straightforward yet powerful:

Falco checks for specific events defined by custom rules analysing AWS CloudTrail logs.
Upon detection of a threat, Falco Talon triggers an AWS Lambda function.
This function can be specified with parameters such as the function name, the alias or version, and the invocation type (e.g., RequestResponse, Event, DryRun).
The Lambda function then executes, performing actions ranging from notifications to active threat mitigation steps. This part is completely up to you, the author!

Configuration and Security Requirements

For Falco Talon to interact effectively with AWS Lambda, certain AWS permissions are necessary:

sts:getCallerIdentity
lambda:InvokeFunction
lambda:GetFunction

These permissions ensure that Falco Talon can securely invoke the Lambda function based on the detection rules configured in Falco. You will need to update the Falco Talon configuration file with the relevant credentials.

aws:

role_arn: arn:aws:iam::<account_number>:role/<role_name>
external_id: <external_id>
region: <region>
access_key: <access_key>
secret_key: <secret_key>

The Lambda script will try to search for this function via ARN, which is composed by your account name and region. Both will be fetched from your credentials provider. The account is always set to the provider account ID, but the region can also be overridden via the config file.

Practical Example

Consider a scenario where Falco detects an unauthorised access attempt via CloudTrail logs. Falco Talon can automatically invoke a pre-configured Lambda function that may, for instance, disable a compromised user account or isolate suspicious resources. The function is defined as follows:

action: Invoke Lambda function
actionner: aws:lambda
parameters:
  aws_lambda_name: sample-function
  aws_lambda_alias_or_version: $LATEST
  aws_lambda_invocation_type: RequestResponse

This automated response is crucial for mitigating potential damage quickly and efficiently, without human delay. As stated previously, the outcome depends on what the lambda function is configured to do. Essentially, Talon’s responsibility is to call the function and without waiting for the result, it invokes the Lambda response action immediately.

curl --location 'http://localhost:2803/' \
--header 'Content-Type: application/json' \
--data '{
    "output": "Test invoke lambda",
    "priority": "Warning",
    "Rule": "Test invoke lambda",
    "time": "2019-05-17T15:31:56.746609046Z",
    "output_fields": {
        "test": "true",
        "shoudFail": "false"
    }
}'

The rule mentioned above is just an example; the action could be triggered by any rule from Falco. Consider this concept as a user aiming to respond to identity-specific threats, such as detecting unknown logins from a particular geolocation to AWS based on Okta audit logs that are monitored by Falco. In such a scenario, an AWS Lambda action could be invoked to perform various tasks, such as executing an automated Lambda script to limit IAM access for the affected user account.

Until now, Falco Talon response actions have been limited to Kubernetes-specific actions such as enforcing network policies in response to suspicious network IoC. However, the introduction of AWS Lambda response actions within Falco Talon can deliver response capabilities at the cloud tenancy level.

Why This Approach Matters

The integration of Falco and AWS Lambda via Falco Talon offers a compelling example of how open source tools can provide sophisticated, scalable solutions to cloud security problems.

This approach allows for:

Customization: Users can define specific security rules and responses that matter.
Flexibility: It works across various Linux VMs, and can even be run on smaller hardware, down to a Raspberry Pi.
Speed: Automated responses in the cloud ensure rapid mitigation in line with the 5/5/5 benchmark.

The 5/5/5 benchmark is a measure of how quickly any security team can detect, isolate and respond to potential cloud security issues. In essence, teams should aim for a five second response time to detect any potential issue in the cloud environment, then five minutes to understand the issue and five minutes to remediate or remove the issue that exists within the environment. This provides a benchmark for security teams to measure their performance against in the cloud. This is particularly critical when you have cloud attacks that are carried out in real time, and where the average window from initial foothold being gained to implementation of attack payload is around ten minutes.

A Real World Solution Using AWS Lambda

When an attacker gains access to your AWS infrastructure, their first objective is typically to escalate their privileges. They will attempt to access components with higher permissions, such as containers with elevated privileges, hosts with permissive IAM roles, or even hardcoded user credentials. This tactic is known as privilege escalation.

In a Kubernetes context, the attacker aims to move to a higher privilege level, which often means gaining control of the host running a specific container. The host has more extensive access rights, and from there, the attacker can potentially move laterally to other containers on the same host, which might also have elevated permissions.

To mitigate such attacks, one effective strategy is to recycle the entire worker node, thereby cutting off the attacker’s access if they have already escalated their privileges. However, this approach is complex. Terminating or removing a container can disrupt the workload it is running, and doing this for an entire host can affect multiple workloads simultaneously, which is generally undesirable.

The Solution

Using Falco Talon, we can configure a series of actions to respond to incidents with minimal impact. By properly removing workloads from the compromised worker node and rescheduling them to another node, we can safely recycle the node.

This process leverages existing out-of-the-box “actionners” such as;

kubernetes:cordon
kubernetes:drain
aws:lambda

sysdig Here’s a high-level architecture of how it happens. Let’s dig into it!

Talon can react to all Falco events, so we configure Falco and Talon to work together. For example, when a “Container Escalation Discovered” event is detected, Talon triggers a series of actions:

Cordon the Node: The kubernetes:cordon action prevents any new pods from being scheduled on the compromised node. This prepares the node for recycling and prevents the attacker from deploying additional workloads that could exploit the node further.
Drain the Node: Using the kubernetes:drain action, we remove all pods running on the node. This forces Kubernetes to reschedule these pods on other nodes, as the cordoned node can no longer accept new pods.
Terminate the Node: With the node now devoid of pods and unable to receive new ones, we can safely terminate it. This allows the AWS Auto Scaling Group (ASG) to provision a fresh instance, ensuring the node is secure and free from any compromised elements.

By following these steps, we can effectively minimise the impact of an attack while ensuring our infrastructure remains secure and resilient. Because these steps are automated and can be carried out in real time, the potential security impact is reduced while the effect on services for users should be minimal.

Let’s Recap

The use of open source tools like Falco in conjunction with Kubernetes and AWS Lambda API responses represents a novel and effective approach to cloud security. This method not only leverages the power of community-driven security intelligence but also integrates seamlessly with cloud-native technologies, offering a robust defence mechanism against a wide array of security threats. By adopting such interconnected solutions, organisations can enhance their threat detection and response capabilities, ultimately securing their cloud environments more effectively.

About the Author

Nigel_Douglas Nigel Douglas plays a key role in driving education for the open source detection and response segment of cloud and container security at Sysdig. He spends his time drafting articles, blogs, and taking the stage to help bring awareness to how security needs to change in the cloud. Prior to his current role at Sysdig, he held similar positions at software security vendors such as Tigera, Malwarebytes, Solarwinds, and Google. He completed a Master of Science in Cybersecurity, Privacy, and Trust at South East Technological University in Ireland.

Igor_Eulalio Igor Eulalio is a software engineer and fully certified Kubernetes engineer working at Sysdig. He is an open source contributor and certified by AWS across multiple projects. Prior to Sysdig, Igor was a senior solutions architect for containers at AWS, and a senior software engineer at Itau Unibanco. He is also a Kubernetes Container Days organizer for 2024.