Brainstorming Kedro Hooks


This post is a 🧠 branstorming work in progress.

What is Kedro 🤔

If you are completely unsure what kedro is be sure to check out my what is kedro post

aftercatalogcreated

  • filepath replacer
  • bucket replacer

beforepipelinerun

  • preflight
  • check that data exists
  • run kedro_static_viz
  • run mypy
  • run interrogate
  • run flake8

afterpipelinerun

  • Great Expectations
  • send email
  • send slack

beforenoderun

afternoderun

  • Great Expectations
  • save stats/meta data
  • *

Execution Order

hooks are executed in reverse order of the hooks list.

hooks with tryfirst will be moved to the end of the list hooks with trylast will be moved to the end of the list

  1. aftercatalogcreated
  2. beforepipelinerun
  3. args

    • runparams = runparams = {'runid': '2020-05-23T15.24.23.958Z', 'projectpath': '/mnt/c/temp/kedro0160', 'env': 'local', 'kedroversion': '0.15.9', 'tags': (), 'fromnodes': [], 'tonodes': [], 'nodenames': (), 'frominputs': [], 'loadversions': {}, 'pipelinename': None, 'extraparams': {}, 'git_sha': None}
    • pipeline
    • catalog
  4. beforenoderun
  5. afternoderun
  6. 3.

When does data get saved???

  • before or after node hook?

??Unsure??

  • does before catalog load have access to parameters?

    • Yes
  • *

[steel toes](https://github.com/waylonwalker/steel-toes/)6

I was way too excited about this one and already created it

prevents pain from stepping on your teammates toes

Kedro is so amazing at promoting collaboration between team members. Each team member can check out the code, branch, and start work on their own section of the pipeline. Issues can arrise if the team members section of the pipeline happen to cross. Breaking changes happen, BREAKS during development happen and can completely kill a teammates workflow.

  • is there a way to prevent toe stepping?
  • try to load filepath_<branch>
  • if load fails try filepath
  • save data to filepath_<branch>

how

  • on aftercatalogload check for existing "branch" data
  • if "branch" data exists load that
  • otherwise keep default
  • *

Run only nodes that have changed

  • store a deephash of functions code
  • store a hash of the inputs
  • if neither code or inputs changed run function, otherwise skip.

    • How could a hook choose to skip the node?

Static viz hook

Before pipeline run

  • make site
  • Set node status to queued

Before node run

  • Set running status

After node run

  • Set running status

On pipeline error

  • Set run status

On node error

  • Set error status

After pipeline run

  • Set complete status

After node run

  • set complete


👀 see an issue, edit this post on GitHub

If you found value in this post
and want to send a tip.

Buy Me A Coffee

Check out my otherblogs

tweet about this post and it will show up here.

    loading

.