Tuesday, May 21, 2019

Kubernetes Hardening

  • Kubernetes Hardening
I’m running on AWS using Kops to create and manage my Kubernetes cluster.

Private topology with Calico
Calico is an open-source project to manage and enforce network policy of the cluster and it comes built-in in the latest google container releases.

Network Policies
If you created your cluster with private topology you can use Network Policies. Set up your network policy to explicitly allow/deny connections between elements in the cluster.

Bastion
Access with SSH through a single point-of-contact: Bastion.By default, all nodes have a public IP and are accessible to SSH from the outside world. With a Bastion you can limit the vulnerabilities of penetration to your cluster.

Default Authorization with RBAC
Add the option --authorization=RBAC to kops create cluster command when creating the cluster for the first time.

POD dangerous IAM credentials
By default every pod has the powers of its hosting node in terms of AWS access (IAM). To fix, you will install kube2iam which is a daemonset that runs on each instance and provides a firewall to the IAM credentials requests from the containers on those instances

We are going to install kube2iam Daemonset with helm (package manager for Kubernetes)
Assuming you created your cluster with RBAC enabled by default we need to provide the helm tiller (the pod issueing the requests we make to helm on our cluster) with the appropriate RBAC credentials to operate

https://itnext.io/kubernetes-hardening-d24bdf7adc25

Launching a Kubernetes cluster hosted on AWS, GCE or DigitalOcean
https://github.com/kubernetes/kops

Bastion provide an external facing point of entry into a network containing private network instances. This host can provide a single point of fortification or audit and can be started and stopped to enable or disable inbound SSH communication from the Internet, some call bastion as the "jump server"
https://github.com/kubernetes/kops/blob/master/docs/bastion.md

kube2iam
Provide IAM credentials to containers running inside a kubernetes cluster based on annotations.
https://github.com/jtblin/kube2iam

  • Generating the Data Encryption Config and Key

Kubernetes stores a variety of data including cluster state, application configurations, and secrets. Kubernetes supports the ability to encrypt cluster data at rest.
https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/06-data-encryption-keys.md


Provisioning a CA and Generating TLS Certificates
In this lab you will provision a PKI Infrastructure using CloudFlare's PKI toolkit, cfssl, then use it to bootstrap a Certificate Authority, and generate TLS certificates for the following components: etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet, and kube-proxy.
https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/04-certificate-authority.md

  • Hardening Kubernetes from Scratch


Level 0 Security
The following items are to be deployed to fulfill basic Kubernetes cluster functionality. The steps purposefully omit any security-related configuration/hardening.

Level 0 Attacks
At this most basic level, "Level 0", the current configuration offers very little (if any) protection from attacks that can take complete control of the the cluster and its nodes

Enumerate exposed ports on the nodes and identify their corresponding services
Probing Etcd to compromise the data store
Probing the Controller to access the API and other control plane services
Probing the Worker to access the Kubelet and other worker service

Level 1 Hardening
Let's do the very basic steps to prevent the "Level 0" attacks from being so straightforward.
    Improve the security group configuration
    Enable TLS on the externally exposed Kubernetes API

Level 1 Attacks
Without any boundaries in place, deploying too many pods or pods that consume too much CPU/RAM shares can cause serious cluster availability/Denial of Service issues. When the cluster is "full", any new pods will not be scheduled.
    Launch too many pods
    Launch pods that consume too many CPU/RAM shares
    Launch pods that consume all available disk space and/or inodes.

Level 2 Hardening
In order to provide the proper boundaries around workloads and their resources, using separate namespaces and corresponding resource quotas can prevent the "Level 1" issues.

    Separate workloads using Namespaces
    Set specific Request/Limits on Pods
    Enforce Namespace Resource Quotas
    Discuss multi-etcd, multi-controller

Level 2 Attacks
Malicious Image, Compromised Container, Multi-tenant Misuse


    Service Account Tokens
    Dashboard Access
    Direct Etcd Access
    Tiller Access
    Kubelet Exploit
    Application Tampering
    Metrics Scraping
    Metadata API
    Outbound Scanning/pivoting

Level 3 Hardening
    RBAC
    Etcd TLS
    New Dashboard
    Separate Kubeconfigs per user
    Tiller TLS
    Kubelet Authn/z
    Network Policy/CNI
    Admission Controllers
    Logging?

Level 4 Attacks
    Malicious Image, Compromised Container, Multi-tenant Misuse
    Escape the container


    Advanced admission controllers
    Restrict images/sources
    Network Egress filtering
    Vuln scan images
    Pod Security Policy
    Encrypted etcd
    Sysdig Falco


https://github.com/hardening-kubernetes/from-scratch

  • Securing a Cluster


Kubernetes playgrounds:
Minikube
Katacoda
Play with Kubernetes

As Kubernetes is entirely API driven, controlling and limiting who can access the cluster and what actions they are allowed to perform is the first line of defense

Use Transport Level Security (TLS) for all API traffic
Kubernetes expects that all API communication in the cluster is encrypted by default with TLS, and the majority of installation methods will allow the necessary certificates to be created and distributed to the cluster components.

API Authentication
All API clients must be authenticated, even those that are part of the infrastructure like nodes, proxies, the scheduler, and volume plugins. These clients are typically service accounts or use x509 client certificates, and they are created automatically at cluster startup or are setup as part of the cluster installation

API Authorization
Once authenticated, every API call is also expected to pass an authorization check. Kubernetes ships an integrated Role-Based Access Control (RBAC) component that matches an incoming user or group to a set of permissions bundled into roles.It is recommended that you use the Node and RBAC authorizers together, in combination with the NodeRestriction admission plugin.

Controlling access to the Kubelet
Kubelets expose HTTPS endpoints which grant powerful control over the node and containers. By default Kubelets allow unauthenticated access to this API
Production clusters should enable Kubelet authentication and authorization

Controlling the capabilities of a workload or user at runtime
Limiting resource usage on a cluster
Resource quota limits the number or capacity of resources granted to a namespace
Limit ranges restrict the maximum or minimum size of some of the resources
Controlling what privileges containers run with
Pod security policies can limit which users or service accounts can provide dangerous security context settings.
Generally, most application workloads need limited access to host resources so they can successfully run as a root process (uid 0) without access to host information. However, considering the privileges associated with the root user, you should write application containers to run as a non-root user. Similarly, administrators who wish to prevent client applications from escaping their containers should use a restrictive pod security policy
 
Restricting network access
The network policies for a namespace allows application authors to restrict which pods in other namespaces may access pods and ports within their namespaces.
 
Restricting cloud metadata API access
When running Kubernetes on a cloud platform limit permissions given to instance credentials, use network policies to restrict pod access to the metadata API, and avoid using provisioning data to deliver secrets.
 
Controlling which nodes pods may access
As an administrator, a beta admission plugin PodNodeSelector can be used to force pods within a namespace to default or require a specific node selector, and if end users cannot alter namespaces, this can strongly limit the placement of all of the pods in a specific workload

Protecting cluster components from compromise
Restrict access to etcd
Write access to the etcd backend for the API is equivalent to gaining root on the entire cluster, and read access can be used to escalate fairly quickly. Administrators should always use strong credentials from the API servers to their etcd server, such as mutual auth via TLS client certificates, and it is often recommended to isolate the etcd servers behind a firewall that only the API servers may access
Enable audit logging
The audit logger is a beta feature that records actions taken by the API for later analysis in the event of a compromise. It is recommended to enable audit logging and archive the audit file on a secure server
Rotate infrastructure credentials frequently
The shorter the lifetime of a secret or credential the harder it is for an attacker to make use of that credential. Set short lifetimes on certificates and automate their rotation. Use an authentication provider that can control how long issued tokens are available and use short lifetimes where possible. If you use service account tokens in external integrations, plan to rotate those tokens frequently.
Review third party integrations before enabling them
Many third party integrations to Kubernetes may alter the security profile of your cluster.
Encrypt secrets at rest
In general, the etcd database will contain any information accessible via the Kubernetes API and may grant an attacker significant visibility into the state of your cluster. Always encrypt your backups using a well reviewed backup and encryption solution, and consider using full disk encryption where possible.
Receiving alerts for security updates and reporting vulnerabilities
 

https://kubernetes.io/docs/tasks/administer-cluster/securing-a-cluster/

  • Free and open source, Project Calico is designed to simplify, scale, and secure cloud networks

Unlike SDNs that require a central controller, limiting scalability, Calico is built on a fully distributed, scale-out architecture. So it scales smoothly from a single developer laptop to large enterprise deployments.
https://www.projectcalico.org/


HTTP, gRPC, and Kafka Aware Security and Networking for Containers with BPF and XDP

Cilium is open source software for providing and transparently securing network connectivity and loadbalancing between application workloads such as application containers or processes. Cilium operates at Layer 3/4 to provide traditional networking and security services as well as Layer 7 to protect and secure use of modern application protocols such as HTTP, gRPC and Kafka. Cilium is integrated into common orchestration frameworks such as Kubernetes and Mesos
https://github.com/cilium/cilium



  • Benchmark results of Kubernetes network plugins (CNI) over 10Gbit/s network (Updated: April 2019)


Flannel is still one of the fastest and leanest in the CNI competition, but still does not support NetworkPolicies, nor encryption.
Calico announced support of Application Layer Policy on top of Istio, bringing security to the application layer.
Cilium now supports encryption! Cilium is providing encryption with IPSec tunnels and offers an alternative to WeaveNet for encrypted networking. However, WeaveNet is faster than Cilium with encryption enabled. That is due to Cilium 1.4.2 only support CBC encryption, GCM would be better as it can be hardware offloaded by network adapters, but it will be part of 1.5 version of Cilium.


Here is the list of CNIs we will compare :

    Calico v3.6
    Canal v3.6 (which is, in fact, Flannel for network + Calico for firewalling)
    Cilium 1.4.2
    Flannel 0.11.0
    Kube-router 0.2.5
    WeaveNet 2.5.1

Security

When comparing the security of these CNIs, we are talking about two things: their ability to encrypt communications, and their implementation of Kubernetes Network Policies (according to real tests, not from their documentation).
There are only two CNIs that can encrypt communications: Cilium and WeaveNet.
When it comes to the Network Policy implementation, Calico, Canal, Cilium, and WeaveNet are the best of the panel, by implementing both Ingress and Egress rules.

https://itnext.io/benchmark-results-of-kubernetes-network-plugins-cni-over-10gbit-s-network-updated-april-2019-4a9886efe9c4

  • Kubernetes networks solutions comparison


Kubernetes requires networks to follow these rules:

    All pods can communicate with each other without NAT
    All nodes can communicate with pods without NAT, in both directions
    The IP seen by a container is the same as the IP seen by external components

There are two types of network setup:

    Default k8s network,
    Or CNI with its plugins – most frequently used, and which will be the base of our comparison.

CNI with a network plugin
The second solution to setup Kubernetes network is to use the Container Network Interface (CNI) and a network plugin

Linux networking can be defined in two ways: underlay or overlay. Kubernetes makes it possible to use both.

    Underlay is defined at the physical level (switchs, routers…)
    Overlay is a virtual network, composed of vlan, veth (virtual interface) and VxLAN; it encapsulates the network trafic. Overlay is a bit slower than underlay, as it creates tunnels between hosts, which reduces available MTUs.

Comparison of different CNI + plugin solutions on k8s
Three solutions are mainly used: Calico, Flannel and WeaveNet
two others, Cilium and Contiv, provide interesting features too.

The deployment tests have benn done with Kubespray.
In more detail: Calico works in L2 mode by default. It is possible to configure it to use IpinIP (L3). IPinIP is a tunnelled IP, an IP packet encapsulates another IP packet and adds a header field “SourceIP ” which is the entry point of a tunnel and the field “Destination” which is used as an endpoint.

Calico offers two configurations for IPinIP:

    always: all the trafic is encapsulated
    crossSubnet: only the subnetwork trafic is encapsulated

Cilium
It uses L3/L4 for the network part and L7 for the application part.
The L7 support allows adding high-level filter rules for web applications. It supports ipv4 and ipv6. Cilium is the only solution to offer BPF filtering.
BPF – Berkeley Packet Filter
BPF is a packet filter solution which can replace iptables. The filter isn’t performed at application level, but at the kernel level: it’s more efficient and secure.
Cilium uses BPF to create and apply filter rules on packets, no iptable rule is created. Filters are more effective and flexible.

Contiv
Presentation
Contiv is a network solution for Kubernetes distributed by Cisco and using the VxLAN and BGP protocols.
It supports IPv4 and IPv6. It offers Cisco ACI (Application Centric Infrastructure) integration as well, but Cisco offers a specific ACI network solution for Kubernetes. It is based on Open vSwitch for pipelines and uses etcd for key-values storage.

Flannel
Flannel can run using several encapsulation backends, VxLAN being the recommended one (others are more experimental). Only IPv4 is supported.

WeaveNet
Weave net provides VxLAN on layer 2 networking for Kubernetes. It uses kube-proxy and kube-dns. It supports IPv4 and IPv6.

Conclusion
For a POC or if we want to quickly setup the network, it is best to use Flannel or WeaveNet.
Calico, Contiv, and Cilium offer to use an underlay network (including BGP) directly, and avoid VxLAN encapsulation.
Several solutions (Calico, Contiv) offer to add many virtual networks for the whole cluster, pods can connect on a same network from different nodes.
Cilium is more security focused and offers application layer filtering. It uses BPF to filter at the kernel level. BPF filter offers better performances that iptables filters.


https://www.objectif-libre.com/en/blog/2018/07/05/k8s-network-solutions-comparison/

  • Kubernetes secret
Kubernetes secret objects let you store and manage sensitive information, such as passwords, OAuth tokens, and ssh keys.
A Secret is an object that contains a small amount of sensitive data such as a password, a token, or a key. Such information might otherwise be put in a Pod specification or in an image; putting it in a Secret object allows for more control over how it is used, and reduces the risk of accidental exposure.
https://kubernetes.io/docs/concepts/configuration/secret/


  • Running Cilium in Azure

Why do we need another firewall anyway?

Traditional firewalls will inspect traffic on (usually) two layers: 3 and 4. That is networking and transport. With a few add-ons will make firewalling layer 7 traffic possible, what the industry calls “WAF” (Web Application Firewalls)
How would you filter an external traffic hitting a single public IP but being load balanced to various Pods running different microservices?
how would you now filter the traffic (internal to the cluster) so that only certain microservices will be able to reach out to specific resources
Cilium relies on BPF (Berkley Packet Filter)

BPF is a highly flexible and efficient virtual machine-like construct in the Linux kernel allowing to execute bytecode at various hook points in a safe manner. It is used in a number of Linux kernel subsystems, most prominently networking, tracing and security (e.g. sandboxing).
https://medium.com/@dcasati/running-cilium-in-azure-c5a9626d8595

  • Ingress vs. Ingress Controller
Kubernetes provides three service types:

    ClusterIP: adds internal endpoints for in-cluster communication
    NodePort: exposes a static port on each of the Nodes to route external calls to internal services
    LoadBalancer: creates an external load balancer to route external requests to internal services

While an Ingress is not a Kubernetes Service, it can also be used to expose services to external requests
The advantage of an Ingress over a LoadBalancer or NodePort is that an Ingress can consolidate routing rules in a single resource to expose multiple services.
an Ingress is an API object that defines the traffic routing rules (e.g. load balancing, SSL termination, path-based routing, protocol), whereas the Ingress Controller is the component responsible for fulfilling those requests.

With the exception of GKE, which includes GLBC by default, ingress controllers must be installed separately prior to usage.

Cloud-Specific Ingress Controllers
The key advantage of using a cloud provider-specific Ingress Controller is native integration with other cloud services.
On the other hand, if you are going for a hybrid or multi-cloud strategy, using an open-source option listed below will be easier than maintaining multiple solutions per cloud provider.

Open-Source Ingress Controllers
Kubernetes website maintains a list of popular third-party solutions
NGINX-Based Ingress Controllers

ingress-nginx
This is the only open-source Ingress Controller maintained by the Kubernetes team, built on top of NGINX reverse proxy.

NGINX & NGINX Plus Ingress Controller
This is the official Ingress Controller from NGINX Inc (now owned by F5) supporting both the open-source and commercial (NGINX Plus) products.

Kong
Unlike ingress-nginx, Kong insists on not implementing a cross-namespace Ingress Controller, citing privilege escalation as a critical attack vector in those scenarios.


HAProxy-Based Ingress Controllers

HAProxy Ingress
As an Ingress Controller, HAProxy Ingress offers dynamic configuration update via API to address reliance on static configuration files with HAProxy.

Voyager
Voyager highlights both L4 and L7 load balancing for HTTP/TCP as well as seamless SSL integration with LetsEncrypt and AWS Certificate Manager on its website.


Envoy-Based Ingress Controllers

Istio Ingress
Istio makes heavy use of Envoy proxies to mediate all traffic within the service mesh.
If you are already using Istio as the service mesh solution in your cluster, using the default Istio Ingress/Gateway makes the most sense.

Ambassador
Technically, Ambassador is an API Gateway and L7 load balancer with Kubernetes Ingress support.
Although it’s based on Envoy, it connects nicely with other service mesh solutions besides Istio (e.g. Consul, Linkerd).

Contour
Contour was one of the first Ingress Controllers to make use of Custom Resource Definitions (CRDs) to extend the functionality of the Kubernetes Ingress API
The CRD (HTTPProxy — renamed from IngressRoute) primarily addresses the limitations of the native Kubernetes Ingress API in multi-tenant environments. Now that IngressRoute is officially defined in Kubernetes v1.18+

Gloo
Gloo differentiates from other Envoy-based Ingress Controllers by offering what it calls “function-level routing”. This means that Gloo can act as an Ingress and API Gateway to route traffic to not only microservices, but also to serverless functions (e.g. AWS Lambda, Google Cloud Functions, OpenFaaS, Knative).

Others

Skipper
https://medium.com/swlh/kubernetes-ingress-controller-overview-81abbaca19ec

  • Unlike other types of controllers which run as part of the kube-controller-manager binary, Ingress controllers are not started automatically with a cluster.
https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/


  • Differences Between nginxinc/kubernetes-ingress and kubernetes/ingress-nginx Ingress Controllers
Which One Am I Using?
If you are unsure about which implementation you are using, check the container image of the Ingress controller that is running. 
https://github.com/nginxinc/kubernetes-ingress/blob/master/docs/nginx-ingress-controllers.md

1 comment: