Posts tagged: kedro

All posts with the tag "kedro"

40 posts latest post 2025-02-05
Publishing rhythm
Feb 2025 | 1 posts

Kedro rich is a very new and unstable (it’s good, just not ready) plugin for kedro to make the command line prettier.

There is no pypi package yet, but it’s on github. You can pip install it with the git url.

pip install git+https://github.com/datajoely/kedro-rich

Kedro run #

You can run your pipeline just as you normally would, except you get progress bars and pretty prints.

kedro run

kedro rich pretty run

Kedro catalog #

Listing out catalog entries from the command line now print out a nice pretty table.

...

I keep my nodes short and sweet. They do one thing and do it well. I turn almost every DataFrame transformation into its own node. It makes it must easier to pull catalog entries, than firing up the pipeline, running it, and starting a debugger. For this reason many of my nodes can be built from inline lambdas.

Here are two examples, the first one lambda x: x is sometimes referred to as an identity function. This is super common to use in the early phases of a project. It lets you follow standard layering conventions, without skipping a layer, overthinking if you should have the layer or not, and leaves a good placholder to fill in later when you need it.

Many times I just want to get the data in as fast as possible, learn about it, then go back and tidy it up.

from kedro.pipeline import node my_first_node = node( func=lambda x: x, inputs='raw_cars', output='int_cars', tags=['int',] ) my_first_node = node( func=lambda cars: cars[['mpg', 'cyl', 'disp',]].query('disp>200'), inputs='raw_cars', output='int_cars', tags=['pri',] )

Note: try not to take the idea...

As you work on your kedro projects you are bound to need to add more dependencies to the project eventually. Kedro uses a fantastic command pip-compile under the hood to ensure that everyone is on the same version of packages at all times, and able to easily upgrade them. It might be a bit different workflow than what you have seen, let’s take a look at it.

Before you start mucking around with any changes to dependencies make sure that your git status is clean. I’d even reccomend starting a new branch for this, and if you are working on a team potentially submit this as its own PR for clarity.

git status git checkout main git checkout -b add-rich-dependency

requirements.in #

New requirements get added to a requirements.in file. If you need to specify an exact version, or a minimum version you can do that, but if all versions generally work you can leave it open.

# requirements.in rich

Here I added the popular rich package to my requirements.in file. Since I am ok with the latest version I am not going to pin anything,...

...

I am a huge believer in practicing your craft. Professional athletes spend most of their time honing their skills and making themsleves better. In Engineering many spend nearly 0 time practicing. I am not saying that you need to spend all your free time practicing, but a few minutes trying new things can go a long way in how you understand what you are doing and make a hue impact on your long term productivity.

What is Kedro

practice building pipelines with #kedro today

Go to your playground directory, and if you don’t have one, make one.

...

I just installed a brand new Ubuntu 21.10 Impish Indri, and wanted a kedro project to play with so I did what any good kedroid would do, I went to my command line and ran

pipx run kedro new --starter spaceflights

But what I got back was not what I expected!

Fatal error from pip prevented installation. Full pip output in file: /home/walkers/.local/pipx/logs/cmd_2022-01-01_20.42.16_pip_errors.log Some possibly relevant errors from pip install: ERROR: Could not find a version that satisfies the requirement kedro (from versions: none) ERROR: No matching distribution found for kedro Error installing kedro.

This is weird, why cant I run kedro new with pipx? Lets try pip.

pip install kedro

Same issue.

...

Kedro-Broken-Urls

Broken Urls # https://github.com/josephhaaga) [ ] https://example.com/file.h5 https://raw.githubusercontent.com/kedro-org/kedro/develop/static/img/pipeline_visualisation.png https://example.com/file.txt https://github.com/jmespath/jmespath.py. https://github.com/tsanikgr) https://example.com/file.csv https://kedro.readthedocs.io/en/latest/04_user_guide/15_hooks.html https://kedro.readthedocs.io/en/stable/07_extend_kedro/04_hooks.html https://github.com/EbookFoundation/free-programming-books/blob/master/books/free-programming-books.md#python https://github.com/quantumblacklabs/private-kedro/blob/develop/docs/source/04_user_guide/04_data_catalog.md http://example.com/api/test https://example.com/file.parquet https://kedro.readthedocs.io/en/stable/11_faq/01_faq.html#how-do-i-upgrade-kedro https://example.com/file.xlsx https://www.datacamp.com/community/tutorials/docstrings-python https://github.com/mmchougule) https://example.com/file.tf...
1 min read

kedro Virtual Environment

Avoid serious version conflict issues, and use a virtual environment anytime you are running python, here are three ways you can setup a kedro virtual environment.

https://youtu.be/ZSxc5VVCBhM

I prefer to use conda as my virtual environment manager of choice as it give me both the interpreter and the packages I install. I don’t have to rely on the system version of python or another tool to maintain python versions at all, I get everything in one tool.

...

Kedro Pipeline Create

Kedro pipeline create is a command that makes creating new pipelines much easier. There is much less boilerplate that you need to write yourself.

https://youtu.be/HtyIKqlEoNw

The kedro cli comes with the following command to scaffold out new pipelines. Note that it will not add it to your pipeline_registry, to be covered later, you will need to add it yourself.

...

Kedro Install

Kedro comes with an install command to install and manage all of your projects dependencies.

https://youtu.be/IWimEs-hHQg

You must start by having your kedro project either cloned down from an existing project or created from kedro new. Then activate your environment.

...

Kedro Git Init

Immediately after kedro new, before you start running kedro install or your first line of code the first thing you should always do after getting a new kedro template created is to git init.

https://youtu.be/IGba3ytf_6U

Its as simple as these three commands to get started.

...

Kedro New

https://youtu.be/uqiv5LAiJe0

Kedro new is simply a wrapper around the cookiecutter templating library. The kedro team maintains a ready made template that has everything you need for a kedro project. They also maintain a few kedro starters, which are very similar to the base template.

What is Kedro

...

What is Kedro

Kedro is an unopinionated Data Engineering framework that comes with a somewhat opinionated template. It gives the user a way to build pipelines that automatically take care of io through the use of abstract DataSets that the user specifies through Catalog entries. These Catalog entries are loaded, ran through a function, and saved by Nodes. The order that these Nodes are executed are determined by the Pipeline, which is a DAG. It’s the runner’s job to manage the execution of the Nodes.

https://youtu.be/Wf4rnFsaFFU

...

How I Kedro

https://youtu.be/bw5_FWDVRpU

I recently switched over to using Ubuntu, it works well pretty much out of the box for me. I am using gnome with a dark theme.

I am still using the built in default gnome terminal, it just works. It does all the things that I need it to do. It supports transparency renders my fonts and allows me to highlight things well.

...

3 min read

Incremental Versioned Datasets in Kedro

Kedro versioned datasets can be mixed with incremental and partitioned datasets to do some timeseries analysis on how our dataset changes over time. Kedro is a very extensible and composible framework, that allows us to build solutions from the individual components that it provides. This article is a great example of how you can combine these components in unique ways to achieve some powerful results with very little work.

What is Kedro

👆 Unsure what kedro is? Check out this post.

...