📢 Announcing find-kedro

Waylon Walker Author

May 4, 2020 3 min read

find-kedro is a small library to enhance your kedro experience. It looks through your modules to find kedro pipelines, nodes, and iterables (lists, sets, tuples) of nodes. It then assembles them into a dictionary of pipelines, each module will create a separate pipeline, and __default__ being a combination of all pipelines. This format is compatible with the kedro _create_pipelines format.

#

kedro is a ✨ fantastic project that allows for super-fast prototyping of data pipelines, while yielding production-ready pipelines. find-kedro enhances this experience by adding a pytest like node/pipeline discovery eliminating the need to bubble up pipelines through modules.

When working on larger pipeline projects, it is advisable to break your project down into different sub-modules which requires knowledge of building python libraries, and knowing how to import each module correctly. While this is not too difficult, in some cases, it can trip up even the most senior engineers, losing precious feature development time to debugging a library.

#

find-kedro is deployed to pypi and can easily be pip installed.

pip install find-kedro

#

The recommended usage of find-kedro is to implement it directly into your projects run.py module

from kedro.context import KedroContext
from find_kedro import find_kedro

class ProjectContext(KedroContext):
    def _get_pipelines(self) -> Pipeline:
        return find_kedro()

Creating nodes #

find-kedro will not execute any functions. It will simply look for variables that match the pattern and identify if they are a kedro.pipeline.Pipeline, kedro.pipeline.nodes.Node, or a list of kedro.pipeline.nodes. Node’s. If so, it will collect them into the dictionary of pipelines.

There are typically three ways that pipelines are constructed with find-kedro; lists, single-nodes, and pipelines.

Lists #

Any pattern matched list will be flattened and collected into the pipeline. Nodes can be created all at once in the list definition.

# my-proj/pipelinies/data_engineering/pipeline
from kedro.pipeline import node
from .nodes import split_data

pipeline = [
    node(
        split_data,
        ["example_iris_data", "params:example_test_data_ratio"],
        dict(
            train_x="example_train_x",
            train_y="example_train_y",
            test_x="example_test_x",
            test_y="example_test_y",
        ),
    )
]

It is also convenient many times to keep the node definition close to the function definition. Many times I define the list at the top of the file, then append to it as I go.

# my-proj/pipelinies/data_engineering/pipeline
from kedro.pipeline import node
from .nodes import split_data

nodes = []
nodes.append(
    node(
        split_data,
        ["example_iris_data", "params:example_test_data_ratio"],
        dict(
            train_x="example_train_x",
            train_y="example_train_y",
            test_x="example_test_x",
            test_y="example_test_y",
        ),
    )
)

Nodes #

All pattern matched kedro.pipeline.node.Node objects will get collected into the pipeline.

# my-proj/pipelinies/data_engineering/pipeline
from kedro.pipeline import node
from .nodes import split_data

split_node = node(
        split_data,
        ["example_iris_data", "params:example_test_data_ratio"],
        dict(
            train_x="example_train_x",
            train_y="example_train_y",
            test_x="example_test_x",
            test_y="example_test_y",
        ),
    )

Pipeline #

All pattern matched kedro.pipeline.Pipeline objects will get collected into the pipeline.

# my-project/pipelinies/data_engineering/pipeline
from kedro.pipeline import node, Pipeline
from .nodes import split_data

split_node = Pipeline(
    [
        node(
            split_data,
            ["example_iris_data", "params:example_test_data_ratio"],
            dict(
                train_x="example_train_x",
                train_y="example_train_y",
                test_x="example_test_x",
                test_y="example_test_y",
            ),
        )
    ]
)

Fully Qualified imports #

When using fully qualified imports from my_proj.pipelines.data_science.nodes import split_data instead of relative imports from .nodes split_data you will need to make sure that your project is installed, in your current path, or you set the directory

#

The CLI provides a handy interface to search your project for nodes

Usage: find-kedro [OPTIONS]

Options:
  --file-patterns TEXT       glob-style file patterns for Python node module
                             discovery

  --patterns TEXT            prefixes or glob names for Python pipeline, node,
                             or list object discovery

  -d, --directory DIRECTORY  Path to save the static site to
  --version                  Prints version and exits
  -v, --verbose              Prints extra information for debugging
  --help                     Show this message and exit.

Example ran with a slightly modified default kedro new project.

❯ find-kedro
{
  "__default__": [
    "split_data([example_iris_data,params:example_test_data_ratio]) -> [example_test_x,example_test_y,example_train_x,example_train_y]",
    "train_model([example_train_x,example_train_y,parameters]) -> [example_model]",
    "predict([example_model,example_test_x]) -> [example_predictions]",
    "report_accuracy([example_predictions,example_test_y]) -> None"
  ],
  "src.default_kedro_159.pipelines.data_engineering.pipeline": [
    "split_data([example_iris_data,params:example_test_data_ratio]) -> [example_test_x,example_test_y,example_train_x,example_train_y]"
  ],
  "src.default_kedro_159.pipelines.data_science.pipeline": [
    "train_model([example_train_x,example_train_y,parameters]) -> [example_model]",
    "predict([example_model,example_test_x]) -> [example_predictions]",
    "report_accuracy([example_predictions,example_test_y]) -> None"
  ]
}

#

You’re Awesome for considering a contribution! Contributions are welcome, please check out the Contributing Guide for more information. Please be a positive member of the community and embrace feedback

#

We use SemVer for versioning. For the versions available, see the tags on this repository.

#

- Waylon Walker - Original Author

- Zain Patel

#

This project is licensed under the MIT License - see the LICENSE.md file for details

Latest Thoughts #

Sustainable Augmented Development • Kent Beck • YOW! 2025

youtube.com

Interesting thoughts from a real OG Kent Beck. He argues we are whitnissing the biggest advancement in computing in 40 years. For 40 years we have been almost exclusively writing in compiled languages. Languages that output to some sort of binary compiled assembly instructions for each processor type. Before that it was all hand written instructions directly to CPU. Now we are seeing the genie revolution. The genie is good a writing in many different programming languages given english prompts. It loves to complete problems and is not afraid to write lots of code very quickly. Its quite good at skirting around our best judgement in order to accomplish the goal we set.

Some take aways from this talk is that NO ONE knows what the fuck is next. No one knows the best ways to use these things. Everyone has their own opinion wrapped around their experience that might be wildly different than yours. It seems clear this is changing every day and ways no one can measure, predict, or prepare for. So much sits in the hands of model labs and model providers. Without real access to any of it, its all magic.

The Internet I Grew Up With Doesn’t Exist Anymore - cleberg.net

cleberg.net html" hidden="">

I align a lot with this post. From growing up in a rural house where internet access was harder to get. It was slow and only had one line, if you were on the internet, no one in the house could recieve a phone call. Idk if what is now the smol webl, the indieweb, is seeing a resurgence, if I’m just noticing it more. It does feel like along there has been a very small number of users willing to get their own domain, their own server, and host their own shit. A few of these are starting to really break out occasionally, see things like wordle [1] hitting mass adoptioun before being swooped up by a giant.

I too miss the internet of old, heck I really dont even mind the middle era of fb, twitter, instagram dominance, but what we have now is dominated by political undertones on it all with everyone shouting “Fake News” at each other. These sites are the easiest place to stay in touch with those you have met online, yet they are riddled with so much toxicity its impossible to have a great experience on them. I’m about a year and a half of not having a single one on my phone, and at this point I barely log into them once a week. RSS is the way forward, as long as its not killed by pay…

Nibelungenlied40 | Book Split Keyboard Design

www.reddit.com

Reddit - Please wait for verification reddit.com [1]

Absolute whimsy achieved in this book themed #keeb build!

References: [1]: https://www.reddit.com/r/MechanicalKeyboards/comments/1u9awcz/nibelungenlied40_book_split_keyboard_design/?utm_source=cassidoo&amp;utm_medium=email&amp;utm_campaign=u1f6b8-you-can-make-anything-by-writing-cs-lewis

Dumb people lay off - YouTube

www.youtube.com

oof Mark, this does not feel like it is set to age well. These people in power feel so disconnected from regular people with a job trying to do work.

The Website Specification

specification.website

The Website Specification A platform-agnostic, full specification of the technical features a good website should have. Built in the open under an MIT licence. The Website Specification · specification.website [1]

A solid checklist for agents to implement on most sites. Very few sites need 100% coverage, but most should probably check most of these boxes

References: [1]: https://specification.website/

Recent Pings #

Ping 65 #

Usage Resets??

Apparently you can use a reset to completely reset your codex usage limits. I never knew these existed and apparently I have some expiring soon. Not sure if they auto apply and this is why I get the weirdest reset experience on the usage chart or what.

I have 3? they expire soon, how many have I missed out on.

hmmm. #

TIL, `?.` is sometimes pronounced as a `hmm dot`

Does codex change throughout the token usage window? #

Is it me or can you feel the 5 hour and weekly limit of codex nearing. It feels like it slowly stops working tasks to completion and asking to continue. I nearly start yelling into the enter, "YES, DO WHAT WE AGREED ON, STOP ASKING" then I notice I'm in the single digits of remaining window left.

Mythos gone already #

Mythos released days ago, now its gone, before most of us were allowed to touch it.

just good enough #

I built shit on the web before AI, some of it was okay, some I put a lot of time into, but some was just really shitty. Done just well enough to get the job done. All of it had to be written by hand or copy pasted from stack overflow. It was a lot of fun. Now we have a new level of shitty, it looks fine. It looks like it should work good. None of it is just barely good enough, nearly browser default style anymore.

`j`	Scroll down
`k`	Scroll up
`g` `g`	Scroll to top
`Shift` `G`	Scroll to bottom
`d`	Half-page down
`u`	Half-page up

`j` / `↓`	Next post (in feeds)
`k` / `↑`	Previous post (in feeds)
`Enter` / `o`	Open highlighted post
`Shift` `O`	Open in new tab
`g` `h`	Go to home
`g` `s`	Focus search
`[`	Previous page
`]`	Next page
`b`	Toggle left sidebar
`Shift` `B`	Toggle right sidebar
`s`	Toggle simple/rich feed view

`/`	Focus search input
`⌘CtrlK`	Focus search (alternative)
`y` `y`	Copy URL to clipboard
`?`	Show this help
`Esc`	Close / clear highlight

Creating nodes #

Lists #

Nodes #

Pipeline #

Fully Qualified imports #

Connections

Share this post