Skip to content

This section describes the minimum requirements that are needed for a Corridor Installation.

Broadly, the components involved are:

  • Web Application & Worker
  • Spark Worker
  • Jupyter Notebook
  • File Management
  • Metadata Database (SQL RDBMS)

For very simple installations, all of these could be installed on the same machine, we recommend keeping them separate to simplify scalability needs.

A flask application which serves the User Interface and Web APIs which are accessible to users via the browser. It also includes a worker process for long running tasks in the API. This component has 2 processes: corridor-app and corridor-worker

  • RAM: 4 GB
  • Processor: 4 CPU
  • Installation storage space: 20 GB
  • Python 3.11+

Optional:

  • Web Server - Example: Nginx
  • Process Management - Example: Supervisor or Systemd

Worker to handle any jobs triggered by users which are asynchronously. It is recommended to have at least 2 workers and increase concurrency as required.

  • RAM: 16 GB
  • Processor: 8 CPU
  • HDFS storage space: 500 GB (depends on the data being processed, HDFS space to handle shuffles need to be considered too)
  • Python 3.11+
  • Java 8+
  • Spark 3.3+

Optional:

  • Process Management - Example: Supervisor or Systemd

A notebook for free-form analytical usage. We provide Jupyter Notebooks out-of-the-box but can integrate with existing notebook solutions too.

  • RAM: 4 GB for base services and more as per usage by users
  • Processor: 4 CPU and more as per usage by users
  • Installation storage space: 10 GB
  • Python 3.11+
  • Spark 3.3+

Optional:

  • Process Management - Example: Supervisor or Systemd

A file system management to store and retrieve files. A NAS storage that can be mounted on all servers and be accessible by all services is ideal.

  • File storage space: 50 GB

This serves as an internal RDBMS to store the state of the application and various user information.

  • RAM: 2 GB

  • Processor: 2 CPU

  • Database storage space: 5 GB

  • SQL Databases supported:

    • Oracle 19+
    • MSSQL 2016+
    • Postgres 11.7+