Data Assets
The Table Registry records the location, content, and structure of source data tables used for analytics.
The table can be registered using one of the following methods:
-
Connection to a Data Lake: A direct link to a data lake allows specifying the location of the data file. The link must point to a valid PySpark data file.
-
Table Upload: Datasets with fewer rows can be uploaded directly in CSV or Excel format.
Note: Large or production-scale datasets should be registered through the Data Lake connection rather than uploaded through the browser. The data is then read directly from its source location (for example a cloud bucket or a PySpark/Parquet file), avoiding slow uploads while remaining viewable as a table.
Managing Tables on the Platform
Section titled “Managing Tables on the Platform”The Table Registry organizes all the registered tables into customized groups at this centralized location and allows easier tracking, monitoring, and creating new ones.
Registering a Table:
Section titled “Registering a Table:”- Click on Create button in Table Registry.
- Fill in important details like Name, and Attributes (Alias, Group, Input Type, Location, Description).
- Select an Input Type (Data Lake or Upload Data) and provide a data link or upload files accordingly.
- Finally, click on the Save button to complete the registration.
Note: After registering the table, users can Edit and click on the Fetch Columns button to automatically load the table columns and their types.
Once the table is registered, data quality can be evaluated through registered Quality Checks or it can be used for validation and testing.
Benefits of Table Registration:
Section titled “Benefits of Table Registration:”- Automated change history records that tracks all the modifications to the tables.
- Track the lineage of table usage in downstream applications.
- Run Quality Checks on the tables.
- Use tables for validation and testing in a fully auditable manner.
- Export tables outside the platform when required with a single click.
What is a Quality Check?
Section titled “What is a Quality Check?”The Quality Check enables the analysis of data and the creation of standard or custom reports based on registered tables in the Table Registry. It supports the generation of profiling metrics, descriptive statistics, invalid entry detection, outlier analysis, and other custom reports and metrics to assess data quality effectively before using the data for downstream tasks like running jobs.
Note: The Quality Check object currently supports linking only one table at a time, enabling the generation of multiple metrics and reports for a single table per analysis.
Managing Quality Checks on the Platform:
Section titled “Managing Quality Checks on the Platform:”The Quality Check Registry organizes all the registered quality checks into customized groups at this centralized location and allows easier tracking, monitoring, and creating new ones.
Registering a Quality Check:
Section titled “Registering a Quality Check:”- Click on Create button in Quality Check registry.
- Fill in important details like Name, Attributes (Data-Table, Group, Descriptions, Select Data-Columns).
- Add notes, attach documentation if available in the Additional Information section.
- Lastly, click on the Save button to complete the registration process.
Benefits of Quality Check Registration:
Section titled “Benefits of Quality Check Registration:”- Analyze and monitor data using standard and custom reports.
- Share data analysis and evaluations with other team members.