The Binarly Transparency Platform must be deployed on a Kubernetes cluster. Any other installation methods are not supported.

Hardware Requirements

The following hardware requirements are assumed with a single instance of the platform running scans are multiple large images in parallel, in the worst case scenario:

Node groups	Quantity	CPU	Memory	Storage	Network
Default nodes	1-3	2-4 vCPUs	16-32 GB	100 GB SSD	1 Gbps
Scanning nodes	1-3	64 vCPUs	512 GB	100 GB SSD	1 Gbps

Depending on the usage pattern the scan nodes can be made smaller to parallelise parts of scans. See the Scanner Requirements section for more detail on scan resourcing.

The Binarly Scanner is made up of many components that have high resource requirements. The scanner should be deployed on a dedicated node group to prevent resource contention, Out Of Memory failures, and degraded performance. Additionally, a low priorityClass should be assigned to the scanner pods.

Binarly strongly recommend monitoring the resource usage of the scanner pods and using this information to right-size the the scanning node group.

The scanning node group should configured to scale down to zero nodes when no scans are being run by the cluster operator.

Kubernetes Requirements

Binarly On-Prem requires a Kubernetes cluster with the following components:

A Storage Class for Persistent Volumes, reclaimPolicy set to Delete. This is used during scans.
If using Minio and Postgres, a Storage Class for Persistent Volumes with reclaimPolicy set to Retain.
An Ingress Controller or a Gateway
A route to the cluster
A domain
Three subdomain names for the components (The names can be customised):
- Dashboard (Main application)
- Keycloak (Authentication)
- Minio (Object Storage)
Certificates for the domain names

Scanner Requirements

The Binarly scanner runs in two distinct phases: the normalisation phase and the scan phase. During normalisation the input file is normalised into a format that can be processed during the scan phase. This phase requires a PVC that is discarded when normalisation is complete. The scanning phase takes the output of normalisation and processes it with several tools in parallel, using the node’s ephemeral storage to store files during this process.

Resourcing

Resource usage by the scanner components is entirely dependent on the size of the input file, number of components, and size of individual components. Generally, smaller files will use fewer resources and the size of the scanning node group can be made dramatically smaller. Larger files with large components will use more memory during a scan.

Collecting the memory and CPU metrics from scan pods will allow continuous tuning of the resources required by the scanner. It is better to overprovision and adjust down once a profile has been established.Depending on the input files some tools may not have components to process, or alternatively have many large components to process.

Resource Requests and Limits

The scan pods can have a global or individual Request and Limit for memory and CPU:

# The global requests and limits, applied to all scan pods when not explicitly overwritten
scan-workflow:
  workflow:
    resources:
      limits:
        cpu: 64000m
        memory: 512Gi
      requests:
        cpu: 2000m
        memory: 8Gi
# Explicitly overwriting the requests and limits for the detect tool
global:
  scanToolsConfiguration:
    detect:
      resources:
        limits:
          cpu: 5000m
          memory: 16Gi
        requests:
          cpu: 1000m
          memory: 4Gi

Partitioned Scans

If the usage pattern requires large files to be scanned, or a mixture of large and small files, the scanner can be configured to partition each normalised input and run a job in parallel per partition:

scan-workflow:
  workflow:
    scanPartitions:
      partitionSize: '50000000' # Roughly 50MB, in bytes

Using the above configuration a 150 MB binary would be split into three partitions which are scanned in parallel. This allows the jobs that run these processes to be split across several smaller nodes.

Setting a small number for partition size and processing large binaries can cause a very large number of pods to be created, which can seriously impact the kubernetes API and Workflow Performance. The following value can be passed to Argo Workflows to limit parallelism:

argo-workflows: # If deployed as an all-in-one chart, omit this key if deployed separately
  controller:
    resourceRateLimit:
      limit: 10 # limit to ten pods per creation phase
      burst: 1

Parallel Scans

The Scanner deployment will run as many scans in parallel as defined in values:

server: 
  scanner:
    maxConcurrentFullScans: 4

Scan Resource Requests

The Binarly scan is made up of multiple separate jobs that run in parallel. The resources are set in the values file and are shown here with the default values:

scan-workflow:
  workflow:
    resources:
      limits:
        cpu: 64000m
        memory: 512Gi
      requests:
        cpu: 2000m
        memory: 8Gi

Due to the complexity of the scans, the resource requests and limits are set to a high value. This is to ensure that the scans run as quickly as possible. The values can be adjusted to suit your needs, but we recommend keeping the requests and limits as high as possible and deploying these jobs on a different node group. The actual resoucre requirement varies greatly on a per-scan basis.

Setting Up Job Distribution

The Jobs accept common Kubernetes configuration to spread the load across the cluster:

scan-workflow:
  workflow:
    nodeSelector: # The node selector to use for the scanner jobs
      workload: tools
    tolerations: # The tolerations to use for the scanner jobs
      - effect: NoSchedule
        key: workload
        operator: Equal
        value: tools

Scanner Storage Requirements

By default, the initial normalise phase requests 80GB of storage. This is configurable in the values file:

scan-workflow:
  workflow:
    toolPvcSize: 80Gi

Data Requirements

Binarly On-Prem requires a persistent storage backend comprising of PostgreSQL Databases and Object Storage. We recommend deploying these outside of the Binarly On-Prem cluster for better performance and reliability, but can deploy these as part of the installation. For object storage, we support:

Amazon S3
Google Cloud Storage
MinIO

For PostgreSQL we support version 16 and above.

Using the Built-in Data System

Binarly On-Prem includes a built-in data plane for small-scale deployments. This data plane is suitable for testing and evaluation purposes, but we recommend using external storage for production deployments.

Component	Storage Type	Default Storage Size	Number of Volumes
VDB and Keycloak PostgreSQL	Persistent Volume	20 GB	1
Server PostgreSQL	Persistent Volume	100 GB	1
MinIO	Persistent Volume	100 GB	6

The Storage Size is dependent on the number of scans and the size of the images being scanned. The above values are a starting point and should be adjusted based on your specific requirements.

We recommend using a Storage Class that retains the underlying volume in case of deletion for the built in data system.

Using External Data Systems

Details can be injected into the Binarly deployments using secrets in the deployment namespace.

Databases

Binarly requires a Postgres instance version 16 or above, and connection details to that instance. The secrets are passed to each component using the following values:

Server:

server:
  postgresql:
    useExternalDatabase: true # Set to true to use an external database
    connection:
      passwordSecretName: server-database-connection # The name of the secret
      passwordSecretKey: password # The key that contains the information required
      usernameSecretName: server-database-connection
      usernameSecretKey: username
      hostSecretName: server-database-connection
      hostSecretkey: host
      databaseSecretName: server-database-connection
      databaseSecretKey: database

VDB:

vdb:
  postgresql:
    connection:
      passwordSecretName: vdb-database-connection # The name of the secret
      passwordSecretKey: password # The key that contains the information required
      usernameSecretName: vdb-database-connection
      usernameSecretKey: username
      hostname: my-host.com
      database: my-database

Keycloak:

keycloak:
  externalDatabase:
    existingSecret: vdb-database-connection
    existingSecretHostKey: host
    existingSecretPortKey: port
    existingSecretUserKey: username
    existingSecretDatabaseKey: database
    existingSecretPasswordKey: password

Object Storage

Object storage is used to:

Host the files used for vulnerability discovery
Store images and other artifacts

AWS S3

Authentication to S3 can be done using IRSA, PodIdentity, or access keys. Please see the AWS documentation for more details on how to set this up on your AWS cluster.

IRSA Using a Role Annotation

The values config for external buckets:

  global:
    buckets:
      images: "my-images-bucket"
    bucketsConfig:
      type: s3
      region: us-east-1
      endpoint: s3.amazonaws.com
      publicEndpoint: https://s3.amazonaws.com # The public endpoint for the S3 bucket
      useIAM: true
    artefactsBucketConfig:
      type: s3
      region: us-east-1
      endpoint: s3.amazonaws.com
      useIAM: true
      bucketName: my-artefacts-bucket

The service accounts using the role need to be annotated using the following values:

  server:
    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/my-role
  vulnerability-database:
    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/my-role

Pod Identity

This is similar to IRSA except the annotation is not required. The service account just needs to be linked to the role using the Pod Identity mechanism detailed here: https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html

Access Keys

The access key and secret key need to be stored inside a secret in the namespace where Binarly is deployed and is not managed by the BTP chart. The following example has two secrets, bucket-credentials for the images bucket and artefacts-bucket-credentials for the artefacts bucket.

  global:
    buckets:
      images: "my-images-bucket"
    bucketsConfig:
      type: s3
      region: us-east-1
      endpoint: s3.amazonaws.com
      publicEndpoint: https://s3.amazonaws.com # The public endpoint for the S3 bucket
      accessKeySecret:
        name: "bucket-credentials"
        key: "AWS_ACCESS_KEY_ID"
      secretKeySecret:
        name: "bucket-credentials"
        key: "AWS_SECRET_ACCESS_KEY"
    artefactsBucketConfig:
      type: s3
      region: us-east-1
      endpoint: s3.amazonaws.com
      bucketName: my-artefacts-bucket
      accessKeySecret:
        name: "artefacts-bucket-credentials"
        key: "AWS_ACCESS_KEY_ID"
      secretKeySecret:
        name: "artefacts-bucket-credentials"
        key: "AWS_SECRET_ACCESS_KEY"

Permissions and CORS

The Role or User used to access the buckets needs the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BinarlyPolicy",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObjectAcl",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::my-images-bucket",
                "arn:aws:s3:::my-artefacts-bucket",
                "arn:aws:s3:::my-images-bucket/*",
                "arn:aws:s3:::my-artefacts-bucket/*"
            ]
        }
    ]
}

A CORS policy must be set on the Images bucket as the application will present self-signed URLs to the user. This policy should allow the necessary origins and methods for the application to function correctly.

[
  {
    "AllowedHeaders": [
      "*"
    ],
    "AllowedMethods": [
      "GET",
      "PUT",
      "POST",
      "HEAD"
    ],
    "AllowedOrigins": [
      "http://my.domain.com"  // Change to your domain
    ],
    "ExposeHeaders": [
      "ETag"
    ],
    "MaxAgeSeconds": 3600
  }
]

GCP GCS

GCS Access can be managed using Workload Identity.

Workload Identity

The values config for external buckets:

  global:
    buckets:
      images: "my-images-bucket"
    bucketsConfig:
      type: gcs
      useWorkloadIdentity: true
    artefactsBucketConfig:
      type: gcs
      bucketName: my-artefacts-bucket
      useWorkloadIdentity: true

The service accounts using the role need to be annotated using the following values:

  server:
    serviceAccount:
      annotations:
        iam.gke.io/gcp-service-account: my-service-account
  vulnerability-database:
    serviceAccount:
      annotations:
        iam.gke.io/gcp-service-account: my-service-account

Permissions and CORS

The Service Account used to access the buckets needs the following permissions:

Storage Object Creator
Storage Object User
Storage Object Viewer

A CORS policy must be set on the Images bucket as the application will present self-signed URLs to the user. See the following document for how to set up CORS on GCS: https://cloud.google.com/storage/docs/using-cors This policy should allow the necessary origins and methods for the application to function correctly:

[
  {
    "origin": ["http://my.domain.com"],  // Change to your domain
    "method": ["GET", "PUT", "POST", "HEAD"],
    "responseHeader": ["ETag"],
    "maxAgeSeconds": 3600
  }
]

Third-Party Charts

The Binarly Installation comes with a set of third-party charts that are used to support the platform. The installation of these charts is automated by default for ease, but ideally these components should be installed and managed outside of the Binarly installation and disabled in the BTP chart values.

Argo Workflows (Required)

Argo Workflows is an open-source container-native workflow engine for Kubernetes. It allows you to define and manage complex workflows using a simple YAML syntax. The Binarly application can leverage Argo Workflows for advanced orchestration and automation tasks. If this is managed outside of the BTP application, please ensure that the following are set in the values file:

  "argo-workflows":
    enabled: false

Secretsgen Controller (Semi-Optional)

Secretgen Controller generates secrets from a template. This is used to generate the secrets required for the Binarly application. This can be disabled if the secrets are managed outside the BTP application, or installed separately. To install separately, you can use the chart from charts/secretgen-controller in the BTP chart, and adjust the values to disable the bundled deployment:

  "secretsgen-controller":
    enabled: false

Keycloak (Required)

Keycloak is an open-source identity and access management solution. This is used to manage the authentication for the Binarly application. If installing this manually, please ensure that the following are set in the values file:

  "keycloak":
    enabled: false

Additionally, please set up a secret in the BTP namespace called keycloak with the following keys:

  admin-password: (The password for the Keycloak admin user)

and pass the correct configuration to the BTP chart:

  keycloak:
    adminUser: (The username for the password in the secret)
  server:
    keycloak:
      internalHost: keycloak.my.domain.com (The internal host for Keycloak)

Zalando Postgres Operator (Optional)

Zalando Postgres Operator is a Kubernetes operator for managing PostgreSQL clusters. This is used to manage the PostgreSQL databases required for the Binarly application if required. If installing this manually, please ensure that the following are set in the values file:

  "postgres-operator":
    enabled: false

In addition to this, the operator requires network access to the PostgreSQL instances. If you are using the built-in BTP network policy please ensure the operator namespace is whitelisted.

MinIO Operator (Optional)

MinIO-Operator is a Kubernetes operator for managing MinIO clusters that mimic AWS S3 object storage. This is used to manage the MinIO cluster if required. If installing this manually, please ensure that the following are set in the values file:

  "operator":
    enabled: false

Installation

​Hardware Requirements

​Kubernetes Requirements

​Scanner Requirements

​Resourcing

​Resource Requests and Limits

​Partitioned Scans

​Parallel Scans

​Scan Resource Requests

​Setting Up Job Distribution

​Scanner Storage Requirements

​Data Requirements

​Using the Built-in Data System

​Using External Data Systems

​Databases

​Object Storage

​AWS S3

​IRSA Using a Role Annotation

​Pod Identity

​Access Keys

​Permissions and CORS

​GCP GCS

​Workload Identity

​Permissions and CORS

​Third-Party Charts

​Argo Workflows (Required)

​Secretsgen Controller (Semi-Optional)

​Keycloak (Required)

​Zalando Postgres Operator (Optional)

​MinIO Operator (Optional)

Hardware Requirements

Kubernetes Requirements

Scanner Requirements

Resourcing

Resource Requests and Limits

Partitioned Scans

Parallel Scans

Scan Resource Requests

Setting Up Job Distribution

Scanner Storage Requirements

Data Requirements

Using the Built-in Data System

Using External Data Systems

Databases

Object Storage

AWS S3

IRSA Using a Role Annotation

Pod Identity

Access Keys

Permissions and CORS

GCP GCS

Workload Identity

Permissions and CORS

Third-Party Charts

Argo Workflows (Required)

Secretsgen Controller (Semi-Optional)

Keycloak (Required)

Zalando Postgres Operator (Optional)

MinIO Operator (Optional)