> ## Documentation Index
> Fetch the complete documentation index at: https://docs.binarly.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Considerations

## Why Kubernetes?

[Kubernetes](https://kubernetes.io/) is a container orchestration tool that helps you manage deployments using declarative configuration files called manifests. Kubernetes provides a standardized way of achieving the following:

* High availability

* Disaster recovery

* Scalability

### Relevant Kubernetes Resources

If you're new to Kubernetes, here are some starting resources to get you up to speed:

* [Kubernetes Documentation](https://kubernetes.io/docs/home/)

* [Kubernetes Basics](https://kubernetes.io/docs/tutorials/kubernetes-basics/)

* [Kubernetes Networking Concepts](https://kubernetes.io/docs/concepts/services-networking/)

* [Persistent Volumes in Kubernetes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)

* [Kubernetes from Zero to Hero](https://www.youtube.com/watch?v=X48VuDVv0do)

Binarly On-Prem can be deployed on both managed Kubernetes services and bare-metal kubernetes environments. Each option has its own advantages / disadvantages:

## Managed Kubernetes Services

Managed Kubernetes services, offered by major cloud providers, can simplify deployment and maintenance. Benefits include:

* Easier deployment and scaling

* Automated control plane maintenance

* Built-in health monitoring and repairs

However, you retain responsibility for deploying and maintaining Binarly On-Prem on the worker nodes. Some popular options include:

* Amazon Elastic Kubernetes Service (EKS)

* Google Kubernetes Engine (GKE) \[^1]

* Microsoft Azure Kubernetes Service (AKS)

When using cloud providers, ensure that your cluster configuration meets or exceeds the hardware requirements specified below.

## Bare-Metal Kubernetes

Deploying on bare-metal environments offers:

* Complete control over the entire stack

* Ability to fine-tune configurations

* Potential cost savings for large-scale deployments

Consider your specific needs, expertise, and resources when choosing between these options.

### Kubernetes Distributions

There are [many distributions available to install Kubernetes](https://nubenetes.com/matrix-table/). For self-hosted options, we recommend two for their simplicity and sane default settings:

* [k3s](https://k3s.io/)

* [rke2](https://docs.rke2.io/)

But also, more enterprise-ready options are fine too:

* [VmWare Tanzu](https://tanzu.vmware.com/platform)

* [RedHat OpenShift](https://www.redhat.com/en/technologies/cloud-computing/openshift)

* [Rancher](https://www.rancher.com/)

## Kubernetes Cluster Considerations

While this guide focuses on worker node specifications, it's important to note that a functional Kubernetes cluster also requires properly configured master nodes (known as the Control Plane). The requirements for master nodes can vary significantly depending on:

* The size of your cluster

* Your chosen Kubernetes distribution

* High availability requirements

* The specific needs of your environment

We recommend consulting the documentation of your chosen Kubernetes distribution for guidance on sizing master nodes appropriately for your use case.

## Local Testing

For local testing purpose, we recommend using:

* [minikube](https://minikube.sigs.k8s.io/)

* [kind](https://kind.sigs.k8s.io/).

Both tools allow you to run Kubernetes clusters on your local machine, which is perfect for testing and familiarizing yourself with Binarly On-Prem before deploying to a production environment.

\[^1]: Our test cluster in Google Cloud Platform (GCP) uses c3-standard-8 instances (8 vCPUs, 32 GB memory) for worker nodes, which has shown good performance for testing purposes. Your specific requirements may vary based on your workload and scale.

## Hardware Requirements

For Binarly On-Prem, here are our recommended specifications for Kubernetes Worker Nodes:

| Node Type   | Quantity | CPU       | Memory   | Storage    | Network |
| ----------- | -------- | --------- | -------- | ---------- | ------- |
| Worker Node | 1-3      | 4-8 vCPUs | 16-32 GB | 100 GB SSD | 1 Gbps  |
| Tools Node  | 1-3      | 64 vCPUs  | 512 GB   | 100 GB SSD | 1 Gbps  |

The Binarly Scanner is the component with the highest demands in terms of memory and CPU. We recommend allocating the resources in the table above, but scanner requirements can vary hugely between different image types.

## Kubernetes Requirements

Binarly On-Prem requires a Kubernetes cluster with the following components:

* A Storage Class for Persistent Volumes
* An Ingress Controller
* A route to the cluster
* A domain
* Three subdomain names for the components (The names can be customised):
  * Dashboard (Main application)
  * Keycloak (Authentication)
  * Minio (Object Storage)
* Certificates for the domain names

## Scanner Requirements

The scanning tools run as Kubernetes Jobs on the system and will ideally be run on a separate node group. These jobs run in parallel and therefore can be resource intensive, depending on the subject of the scan.

### Parallel Scans

The Scanner deployment will run as many scans in parallel as there are Scanner pods. This is controlled using `replicas` in the values file:

```yaml theme={null}
server:
  scanner:
    replicas: 4 
```

### Scan Resource Requests

The Binarly scan is made up of multiple seperate jobs that run in parallel. The resources are set in the values file and are shown here with the default values:

```yaml theme={null}
server:
  scanner:
    jobs:
      resources:
        requests:
          cpu: 2000m
          memory: 8Gi
        limits:
          cpu: 64000m
          memory: 512Gi
```

Due to the complexity of the scans, the resource requests and limits are set to a high value. This is to ensure that the scans run as quickly as possible. The values can be adjusted to suit your needs, but we recommend keeping the requests and limits as high as possible and deploying these jobs on a different node group. The actual resoucre requirement varies greatly on a per-scan basis.

### Setting Up Job Distribution

The Jobs accept common Kubernetes configuration to spread the load across the cluster:

```yaml theme={null}
server:
  scanner:
    jobs:
      toleration: See https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/ for setup
      requiredPodAntiAffinity: See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity for setup
      nodeSelector: See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector for setup
```

### Scanner Storage Requirements

By default, each scan job requests 80GB of storage. This is configurable in the values file:

```yaml theme={null}
server:
  scanner:
    jobs:
      storageClassName: #The storage class available on your system. We suggest using a storageClass with the `reclaimnPolicy` set to `Delete`.
      tempPVCForToolsStorageRequest: 80Gi # The size of the PVC for the tools. This PVC is per tool, and is roughly 1.25TB per scan using default values.
      tempPVCForSymbolsStorageRequest: 1Gi # The size of the PVC for symbols jobs.
```

## Data Requirements

Binarly On-Prem requires a persistent storage backend comprising of PostgreSQL Databases and Object Storage. We recommend deploying these outside of the Binarly On-Prem cluster for better performance and reliability, but can deploy these as part of the installation.

For object storage, we support:

* Amazon S3

* Google Cloud Storage

* MinIO

For PostgreSQL we support version 16 and above.

### Using the Built-in Data System

Binarly On-Prem includes a built-in data plane for small-scale deployments. This data plane is suitable for testing and evaluation purposes, but we recommend using external storage for production deployments.

| Component                   | Storage Type      | Default Storage Size | Number of Volumes |
| --------------------------- | ----------------- | -------------------- | ----------------- |
| VDB and Keycloak PostgreSQL | Persistent Volume | 20 GB                | 1                 |
| Server PostgreSQL           | Persistent Volume | 100 GB               | 1                 |
| MinIO                       | Persistent Volume | 100 GB               | 6                 |

<Note>
  The Storage Size is dependent on the number of scans and the size of the images being scanned. The above values are a starting point and should be adjusted based on your specific requirements.
</Note>

<Warning>
  We recommend using a Storage Class that retains the underlying volume in case of deletion.
</Warning>

### Using External Data Systems

Details can be injected into the Binarly deployments using secrets in the deployment namespace. The secrets are passed to each component using the following values:

#### Databases

* Server:

  ```yaml theme={null}
  server:
    postgresql:
      useExternalDatabase: true # Set to true to use an external database
      connection:
        passwordSecretName: server-database-connection # The name of the secret
        passwordSecretKey: password # The key that contains the information required
        usernameSecretName: server-database-connection
        usernameSecretKey: username
        hostSecretName: server-database-connection
        hostSecretkey: host
        databaseSecretName: server-database-connection
        databaseSecretKey: database
  ```

* VDB:

  ```yaml theme={null}
  vdb:
    postgresql:
      connection:
        passwordSecretName: vdb-database-connection # The name of the secret
        passwordSecretKey: password # The key that contains the information required
        usernameSecretName: vdb-database-connection
        usernameSecretKey: username
        hostname: my-host.com
        database: my-database
  ```

* Keycloak:

  ```yaml theme={null}
  externalDatabase:
    existingSecret: vdb-database-connection
    existingSecretHostKey: host
    existingSecretPortKey: port
    existingSecretUserKey: username
    existingSecretDatabaseKey: database
    existingSecretPasswordKey: password
  ```

#### Object Storage

Object storage is used to:

* Host the files used for vulnerability discovery
* Store images and other artifacts

#### AWS S3

There needs to be a secret called `artefacts-bucket-credentials` with the following keys:

```yaml theme={null}
  AWS_ACCESS_KEY_ID: my-access
  AWS_SECRET_ACCESS_KEY: my-secret
```

The values config for VDB:

```yaml theme={null}
  vdb:
    artefactsBucket: my-bucket
    artefactsBucketConfig:
      type: s3
      region: us-east-1
      endpoint: s3.amazonaws.com
```

The other buckets should have a service account with the following permissions:

```json theme={null}
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::my-bucket",
        "arn:aws:s3:::my-bucket/*"
      ]
    }
  ]
}
```

then this setting in the values for server:

```yaml theme={null}
  server:
    buckets:
      images: "my-images-bucket" # These can also be one bucket with different paths
      store: "my-store-bucket"
      symbols: "my-symbols-bucket"
    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: my-role-arn
```

#### GCS

There needs to be a secret called `artefacts-bucket-credentials` with the following keys:

```yaml theme={null}
  SERVICE_ACCOUNT_KEY: my-credentials
```

The values config for VDB:

```yaml theme={null}
  vdb:
    artefactsBucket: my-bucket
    artefactsBucketConfig:
      type: gcs
```

The other buckets should have a service account with the following permissions:

* objectViewer
* objectUser
* objectCreator

```yaml theme={null}
  server:
    buckets:
      images: "my-images-bucket" # These can also be one bucket with different paths
      store: "my-store-bucket"
      symbols: "my-symbols-bucket"
    serviceAccount:
      annotations:
        iam.gke.io/gcp-service-account: my-service-account
```

## Third-Party Charts

The Binarly Installation comes with a set of third-party charts that are used to support the application. While these are all optional, the installation automates set-up. These charts are:

### ArgoCD (Semi-Optional)

[ArgoCD](https://argoproj.github.io/argo-cd/) is a declarative, GitOps continuous delivery tool for Kubernetes. It allows you to deploy applications to your Kubernetes cluster using Git and Helm (among other things). The Binarly application is delivered as an app-of-apps.

### Secretsgen Controller (Semi-Optional)

[Secretgen Controller](https://github.com/carvel-dev/secretgen-controller) generates secrets from a template. This is used to generate the secrets required for the Binarly application.

### Keycloak (Required)

[Keycloak](https://www.keycloak.org/) is an open-source identity and access management solution. This is used to manage the authentication for the Binarly application.

### Zalando Postgres Operator (Optional)

[Zalando Postgres Operator](https://postgres-operator.readthedocs.io/en/latest/) is a Kubernetes operator for managing PostgreSQL clusters. This is used to manage the PostgreSQL databases required for the Binarly application if required.

### MinIO Operator (Optional)

MinIO-Operator]\([https://min.io/docs/minio/kubernetes/upstream/operations/installation.html](https://min.io/docs/minio/kubernetes/upstream/operations/installation.html)) is a Kubernetes operator for managing MinIO clusters that mimic AWS S3 object storage. This is used to manage the MinIO cluster if required.

### Nginx Ingress Controller (Optional)

[Nginx Ingress Controller](https://kubernetes.github.io/ingress-nginx/) is an Ingress controller that uses ConfigMap to store the Nginx configuration. This is can be used to manage ingress to the cluster.

### Cert Manager (Optional)

[Cert Manager](https://cert-manager.io/docs/) is a Kubernetes operator for managing TLS certificates. This is used to manage the certificates for ingress if required.
