Why Kubernetes?
Kubernetes is a container orchestration tool that helps you manage deployments using declarative configuration files called manifests. Kubernetes provides a standardized way of achieving the following:
-
High availability
-
Disaster recovery
-
Scalability
Relevant Kubernetes Resources
If youβre new to Kubernetes, here are some starting resources to get you up to speed:
Binarly On-Prem can be deployed on both managed Kubernetes services and bare-metal kubernetes environments. Each option has its own advantages / disadvantages:
Managed Kubernetes Services
Managed Kubernetes services, offered by major cloud providers, can simplify deployment and maintenance. Benefits include:
-
Easier deployment and scaling
-
Automated control plane maintenance
-
Built-in health monitoring and repairs
However, you retain responsibility for deploying and maintaining Binarly On-Prem on the worker nodes. Some popular options include:
-
Amazon Elastic Kubernetes Service (EKS)
-
Google Kubernetes Engine (GKE) [^1]
-
Microsoft Azure Kubernetes Service (AKS)
When using cloud providers, ensure that your cluster configuration meets or exceeds the hardware requirements specified below.
Deploying on bare-metal environments offers:
-
Complete control over the entire stack
-
Ability to fine-tune configurations
-
Potential cost savings for large-scale deployments
Consider your specific needs, expertise, and resources when choosing between these options.
Kubernetes Distributions
There are many distributions available to install Kubernetes. For self-hosted options, we recommend two for their simplicity and sane default settings:
But also, more enterprise-ready options are fine too:
Kubernetes Cluster Considerations
While this guide focuses on worker node specifications, itβs important to note that a functional Kubernetes cluster also requires properly configured master nodes (known as the Control Plane). The requirements for master nodes can vary significantly depending on:
-
The size of your cluster
-
Your chosen Kubernetes distribution
-
High availability requirements
-
The specific needs of your environment
We recommend consulting the documentation of your chosen Kubernetes distribution for guidance on sizing master nodes appropriately for your use case.
Local Testing
For local testing purpose, we recommend using:
Both tools allow you to run Kubernetes clusters on your local machine, which is perfect for testing and familiarizing yourself with Binarly On-Prem before deploying to a production environment.
[^1]: Our test cluster in Google Cloud Platform (GCP) uses c3-standard-8 instances (8 vCPUs, 32 GB memory) for worker nodes, which has shown good performance for testing purposes. Your specific requirements may vary based on your workload and scale.
Hardware Requirements
| Node groups | Quantity | CPU | Memory | Storage | Network |
| Default nodes | 1β3 | 4β8 vCPUs | 16β32 GB | 100 GB SSD | 1 Gbps |
| Scanning nodes | 1β3 | 64 vCPUs | 512 GB | 100 GB SSD | 1 Gbps |
The Binarly Scanner is made up of many components that have high resource requirements. The scanner must be deployed on a dedicated node group to prevent resource contention, Out Of Memory failures, and degraded performance.
Binarly strongly recommend monitoring the resource usage of the scanner pods and using this information to right-size the the scanning node group.
The scanning node group should configured to scale down to zero nodes when no scans are being run by the cluster operator.
Kubernetes Requirements
Binarly On-Prem requires a Kubernetes cluster with the following components:
- A Storage Class for Persistent Volumes, reclaimPolicy set to Delete. This is used during scans.
- If using Minio and Postgres, a Storage Class for Persistent Volumes with reclaimPolicy set to Retain.
- An Ingress Controller
- A route to the cluster
- A domain
- Three subdomain names for the components (The names can be customised):
- Dashboard (Main application)
- Keycloak (Authentication)
- Minio (Object Storage)
- Certificates for the domain names
Scanner Requirements
The scanning tools run as Kubernetes Jobs on the system and will ideally be run on a separate node group. These jobs run in parallel and therefore can be resource intensive, depending on the subject of the scan.
Parallel Scans
The Scanner deployment will run as many scans in parallel as defined in values:
server:
scanner:
maxConcurrentFullScans: 4
Scan Resource Requests
The Binarly scan is made up of multiple separate jobs that run in parallel. The resources are set in the values file and are shown here with the default values:
scan-workflow:
workflow:
resources:
limits:
cpu: 64000m
memory: 512Gi
requests:
cpu: 2000m
memory: 8Gi
Due to the complexity of the scans, the resource requests and limits are set to a high value. This is to ensure that the scans run as quickly as possible. The values can be adjusted to suit your needs, but we recommend keeping the requests and limits as high as possible and deploying these jobs on a different node group. The actual resoucre requirement varies greatly on a per-scan basis.
Setting Up Job Distribution
The Jobs accept common Kubernetes configuration to spread the load across the cluster:
scan-workflow:
workflow:
nodeSelector: # The node selector to use for the scanner jobs
workload: tools
tolerations: # The tolerations to use for the scanner jobs
- effect: NoSchedule
key: workload
operator: Equal
value: tools
Scanner Storage Requirements
By default, each scan job requests 80GB of storage. This is configurable in the values file:
scan-workflow:
workflow:
toolPvcSize: 80Gi
Data Requirements
Binarly On-Prem requires a persistent storage backend comprising of PostgreSQL Databases and Object Storage. We recommend deploying these outside of the Binarly On-Prem cluster for better performance and reliability, but can deploy these as part of the installation.
For object storage, we support:
-
Amazon S3
-
Google Cloud Storage
-
MinIO
For PostgreSQL we support version 16 and above.
Using the Built-in Data System
Binarly On-Prem includes a built-in data plane for small-scale deployments. This data plane is suitable for testing and evaluation purposes, but we recommend using external storage for production deployments.
| Component | Storage Type | Default Storage Size | Number of Volumes |
| VDB and Keycloak PostgreSQL | Persistent Volume | 20 GB | 1 |
| Server PostgreSQL | Persistent Volume | 100 GB | 1 |
| MinIO | Persistent Volume | 100 GB | 6 |
The Storage Size is dependent on the number of scans and the size of the images being scanned. The above values are a starting point and should be adjusted based on your specific requirements.
We recommend using a Storage Class that retains the underlying volume in case of deletion for the built in data system.
Using External Data Systems
Details can be injected into the Binarly deployments using secrets in the deployment namespace.
Databases
Binarly requires a Postgres instance version 16 or above, and connection details to that instance.
The secrets are passed to each component using the following values:
-
Server:
server:
postgresql:
useExternalDatabase: true # Set to true to use an external database
connection:
passwordSecretName: server-database-connection # The name of the secret
passwordSecretKey: password # The key that contains the information required
usernameSecretName: server-database-connection
usernameSecretKey: username
hostSecretName: server-database-connection
hostSecretkey: host
databaseSecretName: server-database-connection
databaseSecretKey: database
-
VDB:
vdb:
postgresql:
connection:
passwordSecretName: vdb-database-connection # The name of the secret
passwordSecretKey: password # The key that contains the information required
usernameSecretName: vdb-database-connection
usernameSecretKey: username
hostname: my-host.com
database: my-database
-
Keycloak:
keycloak:
externalDatabase:
existingSecret: vdb-database-connection
existingSecretHostKey: host
existingSecretPortKey: port
existingSecretUserKey: username
existingSecretDatabaseKey: database
existingSecretPasswordKey: password
Object Storage
Object storage is used to:
- Host the files used for vulnerability discovery
- Store images and other artifacts
Authentication to S3 can be done using IRSA, PodIdentity, or access keys. Please see the AWS documentation for more details on how to set this up on your AWS cluster.
IRSA Using a Role Annotation
The values config for external buckets:
global:
buckets:
images: "my-images-bucket"
bucketsConfig:
type: s3
region: us-east-1
endpoint: s3.amazonaws.com
publicEndpoint: https://s3.amazonaws.com # The public endpoint for the S3 bucket
useIAM: true
artefactsBucketConfig:
type: s3
region: us-east-1
endpoint: s3.amazonaws.com
useIAM: true
bucketName: my-artefacts-bucket
The service accounts using the role need to be annotated using the following values:
server:
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/my-role
vulnerability-database:
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/my-role
Pod Identity
This is similar to IRSA except the annotation is not required. The service account just needs to be linked to the role using the Pod Identity mechanism detailed here: https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html
Access Keys
The access key and secret key need to be stored inside a secret in the namespace where Binarly is deployed and is not managed by the BTP chart. The following example has two secrets, bucket-credentials for the images bucket and artefacts-bucket-credentials for the artefacts bucket.
global:
buckets:
images: "my-images-bucket"
bucketsConfig:
type: s3
region: us-east-1
endpoint: s3.amazonaws.com
publicEndpoint: https://s3.amazonaws.com # The public endpoint for the S3 bucket
accessKeySecret:
name: "bucket-credentials"
key: "AWS_ACCESS_KEY_ID"
secretKeySecret:
name: "bucket-credentials"
key: "AWS_SECRET_ACCESS_KEY"
artefactsBucketConfig:
type: s3
region: us-east-1
endpoint: s3.amazonaws.com
bucketName: my-artefacts-bucket
accessKeySecret:
name: "artefacts-bucket-credentials"
key: "AWS_ACCESS_KEY_ID"
secretKeySecret:
name: "artefacts-bucket-credentials"
key: "AWS_SECRET_ACCESS_KEY"
Permissions and CORS
The Role or User used to access the buckets needs the following permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BinarlyPolicy",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObjectAcl",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject",
"s3:PutObjectAcl"
],
"Resource": [
"arn:aws:s3:::my-images-bucket",
"arn:aws:s3:::my-artefacts-bucket",
"arn:aws:s3:::my-images-bucket/*",
"arn:aws:s3:::my-artefacts-bucket/*"
]
}
]
}
A CORS policy must be set on the Images bucket as the application will present self-signed URLs to the user. This policy should allow the necessary origins and methods for the application to function correctly.
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET",
"PUT",
"POST",
"HEAD"
],
"AllowedOrigins": [
"http://my.domain.com" // Change to your domain
],
"ExposeHeaders": [
"ETag"
],
"MaxAgeSeconds": 3600
}
]
GCP GCS
GCS Access can be managed using Workload Identity.
Workload Identity
The values config for external buckets:
global:
buckets:
images: "my-images-bucket"
bucketsConfig:
type: gcs
useWorkloadIdentity: true
artefactsBucketConfig:
type: gcs
bucketName: my-artefacts-bucket
useWorkloadIdentity: true
The service accounts using the role need to be annotated using the following values:
server:
serviceAccount:
annotations:
iam.gke.io/gcp-service-account: my-service-account
vulnerability-database:
serviceAccount:
annotations:
iam.gke.io/gcp-service-account: my-service-account
Permissions and CORS
The Service Account used to access the buckets needs the following permissions:
- Storage Object Creator
- Storage Object User
- Storage Object Viewer
A CORS policy must be set on the Images bucket as the application will present self-signed URLs to the user. See the following document for how to set up CORS on GCS: https://cloud.google.com/storage/docs/using-cors
This policy should allow the necessary origins and methods for the application to function correctly:
[
{
"origin": ["http://my.domain.com"], // Change to your domain
"method": ["GET", "PUT", "POST", "HEAD"],
"responseHeader": ["ETag"],
"maxAgeSeconds": 3600
}
]
Third-Party Charts
The Binarly Installation comes with a set of third-party charts that are used to support the platform. The installation of these charts is automated by default for ease, but ideally these components should be installed and managed outside of the Binarly installation and disabled in the BTP chart values.
Argo Workflows (Required)
Argo Workflows is an open-source container-native workflow engine for Kubernetes. It allows you to define and manage complex workflows using a simple YAML syntax. The Binarly application can leverage Argo Workflows for advanced orchestration and automation tasks.
If this is managed outside of the BTP application, please ensure that the following are set in the values file:
"argo-workflows":
enabled: false
Secretsgen Controller (Semi-Optional)
Secretgen Controller generates secrets from a template. This is used to generate the secrets required for the Binarly application.
This can be disabled if the secrets are managed outside the BTP application, or installed separately. To install separately, you can use the chart from charts/secretgen-controller in the BTP chart, and adjust the values to disable the bundled deployment:
"secretsgen-controller":
enabled: false
Keycloak (Required)
Keycloak is an open-source identity and access management solution. This is used to manage the authentication for the Binarly application.
If installing this manually, please ensure that the following are set in the values file:
"keycloak":
enabled: false
Additionally, please set up a secret in the BTP namespace called keycloak with the following keys:
admin-password: (The password for the Keycloak admin user)
and pass the correct configuration to the BTP chart:
keycloak:
adminUser: (The username for the password in the secret)
server:
keycloak:
internalHost: keycloak.my.domain.com (The internal host for Keycloak)
Zalando Postgres Operator (Optional)
Zalando Postgres Operator is a Kubernetes operator for managing PostgreSQL clusters. This is used to manage the PostgreSQL databases required for the Binarly application if required.
If installing this manually, please ensure that the following are set in the values file:
"postgres-operator":
enabled: false
In addition to this, the operator requires network access to the PostgreSQL instances. If you are using the built-in BTP network policy please ensure the operator namespace is whitelisted.
MinIO Operator (Optional)
MinIO-Operator is a Kubernetes operator for managing MinIO clusters that mimic AWS S3 object storage. This is used to manage the MinIO cluster if required.
If installing this manually, please ensure that the following are set in the values file:
"operator":
enabled: false