Posts tagged: data

All posts with the tag "data"

70 posts latest post 2025-06-09
Publishing rhythm
Jun 2025 | 1 posts

Kedro Install

Kedro comes with an install command to install and manage all of your projects dependencies.

https://youtu.be/IWimEs-hHQg

You must start by having your kedro project either cloned down from an existing project or created from kedro new. Then activate your environment.

...

Kedro Git Init

Immediately after kedro new, before you start running kedro install or your first line of code the first thing you should always do after getting a new kedro template created is to git init.

https://youtu.be/IGba3ytf_6U

Its as simple as these three commands to get started.

...

Kedro New

https://youtu.be/uqiv5LAiJe0

Kedro new is simply a wrapper around the cookiecutter templating library. The kedro team maintains a ready made template that has everything you need for a kedro project. They also maintain a few kedro starters, which are very similar to the base template.

What is Kedro

...

What is Kedro

Kedro is an unopinionated Data Engineering framework that comes with a somewhat opinionated template. It gives the user a way to build pipelines that automatically take care of io through the use of abstract DataSets that the user specifies through Catalog entries. These Catalog entries are loaded, ran through a function, and saved by Nodes. The order that these Nodes are executed are determined by the Pipeline, which is a DAG. It’s the runner’s job to manage the execution of the Nodes.

https://youtu.be/Wf4rnFsaFFU

...

How I Kedro

https://youtu.be/bw5_FWDVRpU

I recently switched over to using Ubuntu, it works well pretty much out of the box for me. I am using gnome with a dark theme.

I am still using the built in default gnome terminal, it just works. It does all the things that I need it to do. It supports transparency renders my fonts and allows me to highlight things well.

...

Incremental Versioned Datasets in Kedro

Kedro versioned datasets can be mixed with incremental and partitioned datasets to do some timeseries analysis on how our dataset changes over time. Kedro is a very extensible and composible framework, that allows us to build solutions from the individual components that it provides. This article is a great example of how you can combine these components in unique ways to achieve some powerful results with very little work.

What is Kedro

👆 Unsure what kedro is? Check out this post.

...

Creating pypi-list with kedro

I had an idea come to me via twitter. Short one word name packages are becoming hard to find on pypi. Short one word readable package names that are not a play on words are easy to remember, easy to spell correctly, and quick to type out.

I started with the simple index. Pypi provides a single page listing to every single package hosted on pypi via the simple-index

Using Kedro In Scripts

With the latest releases of kedro 0.17.x, it is now possible to run kedro pipelines from within scripts. While I would not start a project with this technique, it will be a good tool to keep in my back pocket when I want to sprinkle in a bit of kedro goodness in existing projects.

What is Kedro

If your just learning about kedro check out this post walking through it

...

Silence Kedro Logs

Kedro can have a chatty logger. While this is super nice in production so see everything that happened during a pipeline run. This can be troublesome while trying to implement a cli extension with clean output.

First, how does one silence a python log? Python loggers can be retrieved by the logging module’s getLogger function. Then their log level can be changed. Much of kedro’s chattiness comes from INFO level logs. I don’t want to hear about anything for my current use case unless it’s essential, i.e., a failure. In this case, I set the log levels to ERROR as most errors should stop execution anyways.

Getting a python logger is straightforward if we know the name of the logger. The following block will grab the logger object for the logger currently registered under the name passed in.

...

Vim Fugitive

:G :G status :G commit :G add % :Gdiff :G push :Glog

Add current file and commit with diff in a split #

function! s:GitAdd() exe "G add %" exe "G diff --staged" exe "only" exe "G commit" endfunction :command! GitAdd :call s:GitAdd() nnoremap gic :GitAdd<CR> 

:on[ly] #

C-W o

:on[ly] will make the current buffer the only one on the screen. This is super helpful as many of fugitive commands will open in a split by default.

cycle through the jumplist

...

Zev Averbach Interview

Zev Averbach, Frustrated spreadsheet jockey to software developer at 36

Q: Tell me about your journey as a spreadsheet jockey into Data Engineering?

A: First of all, it’s hilarious that I accidentally found your questions for this interview by Googling myself. 😊

...

Kedro pipeline_registry.py

With the realease of kedro==0.17.2 came a new module in the project template pipeline_registry.py. Here are some notes that I learned while playing with this new module.

You should now have something that looks like this in your src/<package-name>/pipeline_registry.py.

"""Project pipelines.""" from typing import Dict from kedro.pipeline import Pipeline def register_pipelines() -> Dict[str, Pipeline]: """Register the project's pipelines. Returns: A mapping from a pipeline name to a ``Pipeline`` object. """ return {"__default__": Pipeline([])}

pipeline_registry only works in kedro>=0.17.2

...

Minimal Kedro Pipeline

How small can a minimum kedro pipeline ready to package be? I made one within 4 files that you can pip install. It’s only a total of 35 lines of python, 8 in setup.py and 27 in mini_kedro_pipeline.py.

📝 Note this is only a composable pipeline, not a full project, it does not contain a catalog or runner.

I have everything for this post hosted in this gihub repo, you can fork it, clone it, or just follow along.

...