Posts tagged: python

All posts with the tag "python"

275 posts latest post 2026-03-31
Publishing rhythm
Feb 2026 | 1 posts

Kedro pipeline_registry.py

With the realease of kedro==0.17.2 came a new module in the project template pipeline_registry.py. Here are some notes that I learned while playing with this new module.

You should now have something that looks like this in your src/<package-name>/pipeline_registry.py.

"""Project pipelines.""" from typing import Dict from kedro.pipeline import Pipeline def register_pipelines() -> Dict[str, Pipeline]: """Register the project's pipelines. Returns: A mapping from a pipeline name to a ``Pipeline`` object. """ return {"__default__": Pipeline([])}

pipeline_registry only works in kedro>=0.17.2

...

🐍 Pluggable Architecture with Python

pytest has open sourced their amazing plugin framework pluggy, it allows library authors to give their users a way to modify the libaries behavior without needing to submit a change that may not make sense to the entire library.

My experience so far as a plugin user, and plugin author has been great. Building and using plugins are incredibly intuitive. I wanted to dive a bit deeper and see how they are implemented inside of a library and its a bit of a mind bend the first time you try to do it.

A hook is a single function that has a specific place that it is ran by the PluginManager.

...

4 min read

⚙ How Python Tools Are Configured

There are various ways to configure python tools, config files, code, or environment variables. Let’s look at a few projects that allow users to configure them through the use of config files and how they do it.

This will not include how they are implemented, I’ve looked at a few and its not simple. This will focus on where config is placed and the order in which duplicates are resolved.

The motivation of this article is to serve as a bit of a reference guide for those who may want to create their own package that needs configuration.

...

5 min read

Minimal Kedro Pipeline

How small can a minimum kedro pipeline ready to package be? I made one within 4 files that you can pip install. It’s only a total of 35 lines of python, 8 in setup.py and 27 in mini_kedro_pipeline.py.

📝 Note this is only a composable pipeline, not a full project, it does not contain a catalog or runner.

I have everything for this post hosted in this gihub repo, you can fork it, clone it, or just follow along.

...

Markdown Cli

This is a post that may be a work in progress for awhile, Its a collections of thoughts on managing my blog, but could be translated into anythiung that is just a collection of markdown.

Kedro - My Data Is Not A Table

In python data science/engineering most of our data is in the form of some sort of table, typically a DataFrame from a library like pandas, spark, or dask.

These containers for data contain many convenient methods to manipulate table like data structures. Sometimes we leverage other data types, namely vanilla types like lists and dicts, or even numpy data types.

What is Kedro

...

Quickly Change Conda Env With Fzf

Changing conda environments is a bit verbose, I use a function with fzf that both lists environments and selects the one I want in one go.

I have used conda as a virtual environment tool for years now. I started using conda for its simplicity to install packages on windows, but now that has gotten so much better and it’s been years since I have run a conda install command. I’m sure that I could use a different environment manager, but it works for me and makes sense.

What environment manager do you use for python?

...

3 min read

Minimal Python Package

What does it take to create an installable python package that can be hosted on pypi?

This post is somewhat inspired by the bottle framework, which is famously created as a single python module. Yes, a whole web framework is written in one file.

. ├── setup.py └── my_pipeline.py

setup.py #

from setuptools import setup setup( name="", version="0.1.0", py_modules=["my_pipeline", ], install_requires=["kedro"], ) 

name #

The name of the package can contain any letters, numbers, “_”, or “-”. Even if it’s for internal/personal consumption only I usually check for discrepancy with pypi so that you don’t run into conflicts.

...

2 min read

Ipython-Config

I use my ipython terminal daily. It’s my go to way of running python most of the time. After you use it for a little bit you will probably want to setup a bit of your own configuration.

Activate your virtual environment of choice and pip install it. Any time you are running your project in a virtual environment, you will need to install ipython inside it to access those packages from ipython.

pip install ipython

You are using a virtual environment right? Virtual environments like venv or conda can save you a ton of pain down the road.

...

2 min read

Custom Ipython Prompt

I’ve grown tired of the standard ipython prompt as it doesn’t do much to give me any useful information. The default one gives out a line number that only seems to add anxiety as I am working on a simple problem and see that number grow to several hundred. I start to question my ability 🤦‍♂️.

If you already have an ipython config you can move on otherwise check out this post on creating an ipython config.

Ipython-Config

...

3 min read

Ipython Ninjitsu

Stop going to google everytime your stuck and stay in your workflow. The ipython ? is a superhero for productivity and staying on task.

from kedro.pipeline import Pipeline Pipeline? Init signature: Pipeline( nodes: Iterable[Union[kedro.pipeline.node.Node, ForwardRef('Pipeline')]], *, tags: Union[str, Iterable[str]] = None, ) Docstring: A ``Pipeline`` defined as a collection of ``Node`` objects. This class treats nodes as part of a graph representation and provides inputs, outputs and execution order. Init docstring: Initialise ``Pipeline`` with a list of ``Node`` instances. Args: nodes: The iterable of nodes the ``Pipeline`` will be made of. If you provide pipelines among the list of nodes, those pipelines will be expanded and all their nodes will become part of this new pipeline. tags: Optional set of tags to be applied to all the pipeline nodes. Raises: ValueError: When an empty list of nodes is provided, or when not all nodes have unique names....

...

Automating my Post Starter

One thing we all dread is mundane work of getting started, and all the hoops it takes to get going. This year I want to post more often and I am taking some steps towards making it easier for myself to just get started.

When I start a new post I need to cd into my blog directory, start neovim in a markdown file with a clever name, copy some frontmatter boilerplate, update the post date, add tags, a description, and a cover.

hot and fast

...

6 min read 💬 1

Windowing Python Lists

In python data science we often will reach for pandas a bit more than necessary. While pandas can save us so much there are times where there are alternatives that are much simpler. The itertoolsandmore-itertools` are full of cases of this.

This post is a walkthrough of me solving a problem with more-itertools rather than reaching for a for loop, or pandas.

I am working on a one-line-link expander for my blog. I ended up doing it, just by modifying the markdown with python. I first split the post into lines with content.split('\n'), then look to see if the line appears to be just a link. One more safety net that I wanted to add was to check if there was whitespace around the line, this could not simply be done in a list comprehension by itself. I need just a bit of knowledge of the surrounding lines, enter more-itertools.

...

1 min read

Testing Data Pipelines

Lint/Format/Doc black flake8 interrogate mypy Pipeline Assertions pipeline constructs pipeline as expected nodes pipeline has minimum nodes test minimum tags test alternate tags Catalog Assertions test catalog follows naming structure Node Tests test function does the correct operations on test data Great Expectations

reasons-to-kedro

There are many reasons that you should be using kedro. If you are on a team of Data Scientists/Data Engineers processing DataFrames from many data sources should be considering a pipeline framework. Kedro is a great option that provides many benefits for teams to collaborate, develop, and deploy data pipelines

What is Kedro

Kedro makes it super easy to get started with their cli that utilizes cookiecutter under the hood.

...

What's New in Kedro 0.16.6

Kedro 0.16.6 is out! Let’s take a look through the release notes

This is really exciting to see more deployment options coming from the kedro team. It really shows the power of the framework. The power of some of these orchestrations options is incredible.

Most of them hinge on a sweet combination of the kedro cli, docker image, and the pipeline knowing your nodes dependencies.

...

Designing a "Router" for kedro

I released a router-like plugin for kedro back in April 2020. This was not the first design, the idea actually came from one of the QB folks who taught me kedro nearly a year before. We were assembling our pipelines with something called nodes_global. It worked fairly well but did have some issues around being set as a global variable.

But…

One thing in particular that it did not lend itself well to was being able to create a packagable pipeline that I could pip install and append into any of my existing pipelines. Something I am still trying to work out, maybe I don’t need this. I think I have it working for our internal pipelines and it seems like the way to go, but we don’t necessarily end up using it.

...

4 min read

Reclaim memory usage in Jupyter

Today I ran into an issue where we had a one-off script that just needed to work, but it was just chewing threw memory like nothing.

It started with a colleague asking me How do I clear the memory in a Jupyter notebook, these are the steps we took to debug the issue and free up some memory in their notebook.

How do I clear the memory in a Jupyter notebook?

...

3 min read