What's New in Kedro 0.16.4

Waylon Walker

Think pytest #

As mentioned in #435 this is the model that pytest uses. Not all plugins automatically start doing things right out of the box but require a CLI argument.

simplicity #

It feels a bit crazy that simply installing a package will change the way that your pipeline gets executed. I do like that it requires just a bit less reaching into the framework stuff for the average user. Most folks will be able to write in the catalog and nodes without much change to the rest of the project.

Implementation #

Reading through the docs, they show us that we can make our hooks automatically register by adding a kedro.hooks endpoint that points to a singleton instance of our hook.

from the docs






        
setup(
    ...
    entry_points={"kedro.hooks": ["plugin_name = plugin_name.plugin:hooks"]},
)

import logging

from kedro.framework.hooks import hook_impl

class MyHooks:
    @hook_impl
    def after_catalog_created(self, catalog): # pylint: disable=unused-argument
        logging.info("Reached after_catalog_created hook")

hooks = MyHooks()

Careful with the singletons #

hook authors beware

I will be a bit cautious before installing a plugin that is automatically registered. I know its not a common pattern, but if you were to leverage any part of two kedro projects at the same time, and project-specific data was stored in the instance of the hook it will likely be broken.

As long as the hook doesn't store data on the instance you will be ok. Hooks like what they have in the examples will be ok. They generally just take some information from the lifecycle arguments and do something at their prescribed lifecycle point.

Many of the hooks I am seeing in the wild are already more complicated and require the hooks author to utilize an __init__ method and store data on the instance. If you were to do this on two pipelines simultaneously it would break.

Can my hook be auto-discovered #

If your hook doesn't include a __init__ method its a fairly easy yes, otherwise be aware of the potential dangers of passing singleton on to your users.

Use Virtual environments #

Whatever virtual environment manager you use, it is more important than ever to make sure you DO NOT install plugins in your global environment. Generally, you should always run projects even toys or tests in a virtual environemnt.

I use conda






        
conda create -n my-sample-env python=3.8 -y

Overall #

I think this is a really interesting direction for the project to go to. Hooks are still really early. The implementation is good, but I foresee us getting some more functionality that may require us to rely on the __init__ method a little less. I think there are going to be some really cool hooks that can leverage the simplicity of auto-discoverability.

Reply by email

What's New in Kedro 0.16.4

Tags