📝 Kedro Preflight Notes
This is a very rough idea for a kedro package to prevent time lost to get partway through a pipeline run only to realize that you dont have access to data or resources.
Must Haves # [1]
- check that inputs exist or are of a type to skip (sql)
Good to haves
- check that all input and output databases are accessible with good credentials
- check for s3 bucket access
- check for spark install
Implementation # [2]
@hook_spec
def before_pipeline_run(run_params, pipeline, catalog):
run params # [3]
{
"run_id": str
"project_path": str,
"env": str,
"kedro_version": str,
"tags": Optional[List[str]],
"from_nodes": Optional[List[str]],
"to_nodes": Optional[List[str]],
"node_names": Optional[List[str]],
"from_inputs": Optional[List[str]],
"load_versions": Optional[List[str]],
"pipeline_name": str,
"extra_params": Optional[Dict[str, Any]]
}
References:
[1]: #must-haves
[2]: #implementation
[3]: #run-params