Kedro Preflight

edit✏️

πŸŽ„This post has fully grown


This is a very rough idea for a kedro package to prevent time lost to get partway through a pipeline run only to realize that you dont have access to data or resources.

Must Haves

  • check that inputs exist or are of a type to skip (sql)

Good to haves

  • check that all input and output databases are accessible with good credentials
  • check for s3 bucket access
  • check for spark install

Implementation

@hook_spec
def before_pipeline_run(run_params, pipeline, catalog):

run params

{
  "run_id": str
  "project_path": str,
  "env": str,
  "kedro_version": str,
  "tags": Optional[List[str]],
  "from_nodes": Optional[List[str]],
  "to_nodes": Optional[List[str]],
  "node_names": Optional[List[str]],
  "from_inputs": Optional[List[str]],
  "load_versions": Optional[List[str]],
  "pipeline_name": str,
  "extra_params": Optional[Dict[str, Any]]
}


πŸ‘€ see an issue, edit this post on GitHub

If you found value in this post
and want to send a tip.

Buy Me A Coffee



tweet about this post and it will show up here.

    loading
←An IndieWeb Webring πŸ•ΈπŸ’β†’

.