Minimum Requirements
This section describes the minimum requirements that are needed for a Corridor Installation.
Broadly, the components involved are:
- Web Application & Worker
- Spark Worker
- Jupyter Notebook
- File Management
- Metadata Database (SQL RDBMS)
For very simple installations, all of these could be installed on the same machine, we recommend keeping them separate to simplify scalability needs.
Web Application
Section titled “Web Application”A flask application which serves the User Interface and Web APIs which are accessible to users via the browser.
It also includes a worker process for long running tasks in the API.
This component has 2 processes: corridor-app and corridor-worker
Requirements
Section titled “Requirements”- RAM: 4 GB
- Processor: 4 CPU
- Installation storage space: 20 GB
- Python 3.11+
Optional:
- Web Server - Example: Nginx
- Process Management - Example: Supervisor or Systemd
Spark Worker
Section titled “Spark Worker”Worker to handle any jobs triggered by users which are asynchronously. It is recommended to have at least 2 workers and increase concurrency as required.
Requirements
Section titled “Requirements”- RAM: 16 GB
- Processor: 8 CPU
- HDFS storage space: 500 GB (depends on the data being processed, HDFS space to handle shuffles need to be considered too)
- Python 3.11+
- Java 8+
- Spark 3.3+
Optional:
- Process Management - Example: Supervisor or Systemd
Jupyter Notebook
Section titled “Jupyter Notebook”A notebook for free-form analytical usage. We provide Jupyter Notebooks out-of-the-box but can integrate with existing notebook solutions too.
Requirements
Section titled “Requirements”- RAM: 4 GB for base services and more as per usage by users
- Processor: 4 CPU and more as per usage by users
- Installation storage space: 10 GB
- Python 3.11+
- Spark 3.3+
Optional:
- Process Management - Example: Supervisor or Systemd
File Management
Section titled “File Management”A file system management to store and retrieve files. A NAS storage that can be mounted on all servers and be accessible by all services is ideal.
Requirements
Section titled “Requirements”- File storage space: 50 GB
Metadata Database
Section titled “Metadata Database”This serves as an internal RDBMS to store the state of the application and various user information.
Requirements
Section titled “Requirements”-
RAM: 2 GB
-
Processor: 2 CPU
-
Database storage space: 5 GB
-
SQL Databases supported:
- Oracle 19+
- MSSQL 2016+
- Postgres 11.7+