mingrammer [1] has done a fantastic job with diagrams [2]. Highly recommend taking a look.
🎨 Diagram as Code for prototyping cloud system architectures
References:
[1]: https://github.com/mingrammer
[2]: https://github.com/mingrammer/diagrams
Archive
All published posts
2469 posts
latest post 2026-05-08
Publishing rhythm
Just starred svelte-actions [1] by swyxio [2]. It’s an exciting project with a lot to offer.
prototype official actions for Svelte
References:
[1]: https://github.com/swyxio/svelte-actions
[2]: https://github.com/swyxio
Codeit Bro Interview
[1]
use this profile image
Please share your professional role as a data scientist? [Also feel free to
share about your personal projects, publications, etc.]
I graduated with a Mechanical Engineering Degree 8 years ago. Much of my work
early in my career [2] was wrapped
around analyzing larger datasets for my group to understand quality, drive
changes to improve quality or prove that quality was already good.
[3]
Three years ago I made the switch to Data Science and have loved every minute of
it. It is a very dynamic field that is continually changing and there are
always a new set of skills to learn and hone in on. I talk a lot about the
mindset of always learning, sharing knowledge, and communicating in my
newsletter [4]
What are the most difficult challenges you faced as a data scientist and how
you resolved them?
Deployment is a high bar to enter. Jupyter notebooks provide a suspiciously simple start into Data Science. Folks with very little coding experience can easily ...
reasons-to-kedro
There are many reasons that you should be using kedro. If you are on a team of
Data Scientists/Data Engineers processing DataFrames from many data sources
should be considering a pipeline framework. Kedro is a great option that
provides many benefits for teams to collaborate, develop, and deploy data
pipelines
What is Kedro [1]
Starter Template # [2]
Kedro makes it super easy to get started with their cli that utilizes
cookiecutter under the hood.
conda create -n my-new-project -y python=3.8
kedro new
kedro install
kedro run
Create New Kedro Project [3]
read more about how to start your first kedro project here
Collaboration # [4]
Kedro provides many tools that help teams collaborate on a single codebase.
While writing monolithic scripts it can be easy to pin yourself in a corner
where it is difficult to have multiple people making changes to the
notebook/script at the same time. Kedro helps guide your team to break your
project down into small pieces that different members o...
Reasons to Kedro
Reasons to Kedro # [1]
- collaboration
- Sharable catalog
- small nodes over monolithic notebooks
- catalog
- easily load anything without needing to run
- No need to write read/write code
- pipeline
- No need to keep execution order in your head
- easily run a slice of a pipeline
- plugins
- pip install
- make your own
- hooks
- flexible expandable cli
Reasons Not to Kedro # [2]
- Already utilizing another DAG framework
- Data is not in a widely supported format
- Micro short-lived project
- Large Project / Deadline
- Use a lower profile project to learn first
- Team not willing to change
- Need minimal dependencies
- God Project - kedro owns everything??
References:
[1]: #reasons-to-kedro
[2]: #reasons-not-to-kedro
Just starred Second-Brain [1] by KasperZutterman [2]. It’s an exciting project with a lot to offer.
A curated list of awesome Public Zettelkastens 🗄️ / Second Brains 🧠 / Digital Gardens 🌱
References:
[1]: https://github.com/KasperZutterman/Second-Brain
[2]: https://github.com/KasperZutterman
Reading List
Latest Post # [1]
latest [2]
STOP LEAVING Browser Tabs open and save them here!
- https://nesbitt.io/2026/03/04/package-managers-need-to-cool-down.html
- https://mariozechner.at/posts/2026-03-25-thoughts-on-slowing-the-fuck-down/
- https://danielmiessler.com/blog/ai-stops-being-artificially-cheap
---
- jbrancha til [3]
- The Video Course Launch that Made Me Think [4]
- photo prism [5]
- box python library [6]
- kedro on hn [7]
- How can a Data Scientist refactor Jupyter notebooks towards production-quality code? [8]
- Sourcing vs executing in Bash [9]
- Should We Follow The Open-Closed Principle? [10]
- Create multi-dimensional arrays in pure Python: The Correct Way [11]
- Beware of These 9 Red Flags in a Developer Interview [12]
- How to Overcome Impostor Syndrome as a Developer [13]
- lazy load youtube videos [14]
- lite youtube embeds [15]
- full subtitle youtube search [16]
---
- Jungle Scout - Kedro Case Study [17]
- Kedro Sessions [18]
- Julia Evans - A...
Just starred Repo-Roster [1] by nastyox [2]. It’s an exciting project with a lot to offer.
Shout-out supporters in your GitHub README file.
References:
[1]: https://github.com/nastyox/Repo-Roster
[2]: https://github.com/nastyox
What's New in Kedro 0.16.6
Kedro 0.16.6 [1] is out! Let’s take a look through the release notes
Deployment Docs # [2]
This is really exciting to see more deployment options coming from the kedro team. It really shows the power of the framework. The power of some of these orchestrations options is incredible.
- Argo [3]
- Prefect [4]
- Kubeflow [5]
- Batch [6]
- SageMaker [7]
Most of them hinge on a sweet combination of the kedro cli, docker image, and the pipeline knowing your nodes dependencies.
Argo, Prefect, and Kubeflow have an interesting technique where they translate the pipeline and its dependencies from kedro to their language.
Batch uses the aws cli to submit jobs, one node per job, and listen for them to complete. It will submit all nodes with completed dependencies at once, meaning that we can get some massive parallelization.
I did a quick and dirty test of one of these by simulating the technique in a bash script and saw a 40 hr pipeline finish in about 1 hour. I am excited to get thi...
mkdocs [1] by mkdocs [2] is a game-changer in its space. Excited to see how it evolves.
Project documentation with Markdown.
References:
[1]: https://github.com/mkdocs/mkdocs
[2]: https://github.com/mkdocs
A brain dump of stories
I started making stories as kind of a brain dump a few times per day and
posting them to
[LinkedIn](https://www.linkedin.com/in/waylonwalker/(https://www.linkedin.com/in/waylonwalker/).
Here are the last 11 days of stories.
I store all the stories on my website with the hopes of doing something with
them on my own platform eventually. For now it makes it easy to make these
posts.
cd static/stories
ls | xargs -I {} echo ''
Stories 10-10-2020 - 10-21-2020 # [1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
References:
[1]: #stories-10-10-2020---10-21-2020
[2]: https://waylonwalker.com/stories/TIL-kedro-sorts-nodes.png
[3]: https://waylonwalker.com/stories/disable-base-pip.png
[4]: https://waylonwalker.com/stories/discovered-social-cards.png
[5]: https://waylonwalker.com/stories/find-kedro-de1-contributor.png
[6]: https://waylonwalker.com/stories/hacktoberfest-2020-kedro-538-tests-pass.png
[7]: https://waylonwalk...
Check out mmchougule [1] and their project kedro-grpc-server [2].
Kedro gRPC Server is a Kedro plugin that creates a gRPC server for triggering and monitoring pipeline runs using a general-purpose RPC framework gRPC
References:
[1]: https://github.com/mmchougule
[2]: https://github.com/mmchougule/kedro-grpc-server
Check out yetudada [1] and their project kedro-user-testing [2].
Discovery prototypes for user testing
References:
[1]: https://github.com/yetudada
[2]: https://github.com/yetudada/kedro-user-testing
The work on flynt [1] by ikamensh [2].
A tool to automatically convert old string literal formatting to f-strings
References:
[1]: https://github.com/ikamensh/flynt
[2]: https://github.com/ikamensh
charmbracelet [1] has done a fantastic job with glow [2]. Highly recommend taking a look.
Render markdown on the CLI, with pizzazz! 💅🏻
References:
[1]: https://github.com/charmbracelet
[2]: https://github.com/charmbracelet/glow
Check out mytechnotalent [1] and their project Python-For-Kids [2].
A FREE comprehensive online Python development tutorial FOR KIDS utilizing an official BBC micro:bit Development Board going step-by-step into the world of Python for microcontrollers.
References:
[1]: https://github.com/mytechnotalent
[2]: https://github.com/mytechnotalent/Python-For-Kids
I’m impressed by pycon_pybadge_2020 [1] from nnja [2].
Initial code for Microsoft’s PyBadge at PyCon 2020
References:
[1]: https://github.com/nnja/pycon_pybadge_2020
[2]: https://github.com/nnja
Fix git commit author
I was 20 commits into a hackoberfest PR when I suddenly realized they they all had my work email on them instead of my personal email 😱. This is the story of how I corrected my email address on 19 individual commits after already submitting for a PR.
- Change the email for this repo [1]
- Prepare for rebasing [2]
- start the rebase [3]
- 🛠 Fix First wrong Commit [4]
- Fix all commits [5]
- Done [6]
- ReCap [7]
Change the email for this repo # [1]
stop the bleeding
Before anything else set the email correctly!
cd kedro
git config user.name "Waylon Walker"
git config user.email [email protected]
Prepare for rebasing # [2]
First thing is to find how many commits back this mistake goes. I opened up the git [8] log, and saw mine went back 19 commits. I rolled back 20 just to be sure.
$ git log
...
commit a355926b9d7ec4c05659adaa254beefbdb036332
Author: WaylonWalker <[email protected]>
Date: Sat Oct 17 10:28:59 2020 -0500
give name of function inside incorrect parameters erro...
I like muesli’s [1] project duf [2].
Disk Usage/Free Utility - a better ‘df’ alternative
References:
[1]: https://github.com/muesli
[2]: https://github.com/muesli/duf
Designing a "Router" for kedro
nodes_global # [1]
I released a router-like plugin for kedro back in April 2020. This was not the first design, the idea actually came from one of the QB folks who taught me kedro nearly a year before. We were assembling our pipelines with something called nodes_global. It worked fairly well but did have some issues around being set as a global variable.
But…
One thing in particular that it did not lend itself well to was being able to create a packagable pipeline that I could pip install and append into any of my existing pipelines. Something I am still trying to work out, maybe I don’t need this. I think I have it working for our internal pipelines and it seems like the way to go, but we don’t necessarily end up using it.
Also…
With this pattern all of the nodes needed to be importable by the module containing nodes_global. I find that this becomes a big hurdle for new pipelines coming from jupyter to overcome and can be most infuriating when their nodes aren’t getting ran af...
I came across python_training [1] from AnkurDedania [2], and it’s packed with great features and ideas.
Intro to Python
References:
[1]: https://github.com/AnkurDedania/python_training
[2]: https://github.com/AnkurDedania
github [1] has done a fantastic job with renaming [2]. Highly recommend taking a look.
Guidance for changing the default branch name for GitHub repositories
References:
[1]: https://github.com/github
[2]: https://github.com/github/renaming
Reclaim memory usage in Jupyter
Today I ran into an issue where we had a one-off script that just needed to
work, but it was just chewing threw memory like nothing.
It started with a colleague asking me How do I clear the memory in a Jupyter
notebook, these are the steps we took to debug the issue and free up some
memory in their notebook.
How do I clear the memory in a Jupyter notebook?
Pre check the status of memory # [1]
There are a number of ways that you can check the amount of memory on your
system. The easiest is not necessarily my first go to is free… literally
free.
check for free space
$ free -h
total used free shared buffers cached
Mem: 15G 15G 150M 0B 59M 8.7G
Generally my first go to is a bit more graphical, and not available on a stock
stystem, but far more useful…. htop. htop [2] is a
terminal process explorer that shows cpu usage, mem usage, and running
processes.
htop
sudo apt-get install htop # install it from your package repo
htop
[3]
First step throw more swap at it # [4]
Often be...
Strip Trailing Whitespace from Git projects
A common linting error thrown by various linters is for trailing whitespace. I
most often use flake8. I generally have
[pre-commit](https://waylonwalker.com/pre-commit-is-awesome
hooks setup to strip this,
but sometimes I run into situations where I jump into a project without it, and
my editor lights up with errors. A simple fix is to run this one-liner.
One-Liner to strip whitespace # [1]
bash
git grep -I --name-only -z -e '' | xargs -0 sed -i -e 's/[ \t]\+\(\r\?\)$/\1/'
[2]
read more about how pre-commit is awesome [3]
References:
[1]: #one-liner-to-strip-whitespace
[2]: https://waylonwalker.com/pre-commit-is-awesome
[3]: /pre-commit-is-awesome/
tpope [1] has done a fantastic job with vim-sleuth [2]. Highly recommend taking a look.
sleuth.vim: Heuristically set buffer options
References:
[1]: https://github.com/tpope
[2]: https://github.com/tpope/vim-sleuth
actions [1] has done a fantastic job with setup-python [2]. Highly recommend taking a look.
Set up your GitHub Actions workflow with a specific version of Python
References:
[1]: https://github.com/actions
[2]: https://github.com/actions/setup-python
I came across starter-workflows [1] from actions [2], and it’s packed with great features and ideas.
Accelerating new GitHub Actions workflows
References:
[1]: https://github.com/actions/starter-workflows
[2]: https://github.com/actions
checkout [1] by actions [2] is a game-changer in its space. Excited to see how it evolves.
Action for checking out a repo
References:
[1]: https://github.com/actions/checkout
[2]: https://github.com/actions
Looking for inspiration? dotfiles [1] by nicknisi [2].
vim, zsh, git [3], homebrew, neovim - my whole world
References:
[1]: https://github.com/nicknisi/dotfiles
[2]: https://github.com/nicknisi
[3]: /glossary/git/
Just starred zk [1] by sirupsen [2]. It’s an exciting project with a lot to offer.
Zettelkasten on the command-line 📚 🔍
References:
[1]: https://github.com/sirupsen/zk
[2]: https://github.com/sirupsen
The work on napkin-math [1] by sirupsen [2].
Techniques and numbers for estimating system’s performance from first-principles
References:
[1]: https://github.com/sirupsen/napkin-math
[2]: https://github.com/sirupsen
deepyaman [1] has done a fantastic job with kedro-accelerator [2]. Highly recommend taking a look.
Kedro-Accelerator speeds up pipelines by parallelizing I/O in the background.
References:
[1]: https://github.com/deepyaman
[2]: https://github.com/deepyaman/kedro-accelerator
Chrome Extensions I use
There are many useful chrome extensions out there. I probably have way too many installed, here are four that I am currently using.
This post was inspired from Chris over at daily-dev-tips [1]
- LastPass [2]
- Stylus [3]
- Vimium [4]
- hypothesis [5]
---
LastPass [6] # [7]
Love it or hate it passwords are hard to manage. Everyone needs a password manager to avoid the dreaded password reuse, and to be able to quickly rotate them with a service. I use lastpass, thus it’s browser extension is my most used extension.
[6]
---
Stylus [8] # [9]
Stylus is an extension that allows you to add your own CSS to style pages how you want. There seems to be a full community of folks that really use this to the nth degree to style all of their commonly used sites somewhat similarly or add dark mode to sites without it.
Personally I mostly use it to add my favorite syntax highlighting theme to jupyter, onedark. I’ve long lost the original author, but have posted the CSS I use in this gi...
The work on find-kedro [1] by WaylonWalker [2].
kedro plugin to automatically construct pipelines using pytest style pattern matching
References:
[1]: https://github.com/WaylonWalker/find-kedro
[2]: https://github.com/WaylonWalker
Looking for inspiration? steel-toes [1] by WaylonWalker [2].
a kedro hook to protect against breaking changes to data
References:
[1]: https://github.com/WaylonWalker/steel-toes
[2]: https://github.com/WaylonWalker
I like htop-dev’s [1] project htop [2].
htop - an interactive process viewer
References:
[1]: https://github.com/htop-dev
[2]: https://github.com/htop-dev/htop
Creating Reusable Bash Scripts
Bash is a language that is quite useful for automation no matter what language
you write in. Bash can do so many powerful system-level tasks. Even if you are
on windows these days you are likely to come across bash inside a cloud VM,
Continuous Integration, or even inside of docker.
I have three techniques that help me write more composable bash scripts.
- functions [1]
- Arguments [2]
- positional arguments [3]
- All Arguments [4]
- Error Handling [5]
- main script [6]
---
Functions # [1]
Break scripts down into reusable components
Functions in bash are quite simple. They are something that I wish I would have
started using long ago. They make your code much more reusable. I often use
them in my aliases as well since they can simplify the process and allow more
flexibility.
syntax
#!/bin/sh
# hello_world
hello_world () {
echo "hello world"
}
Source the file to load the function and run it from the terminal.
run it
source hello_world
hello_world
outputs
hello world
...
Three things to Automate with Python using Pandas
Here are three things that I see my non programming counterparts doing every single day. These really sum up so much of what folks do within an office. So many of us dabble in or become power users of spreadsheets without knowing there is an alternative out there that can save us time, automate boring things, and allow us to open up our minds for the part that we add value, Thinking about the data.
Focus on Value Add Operations # [1]
Lets face it, stitching together spreadsheets is zero value add by itself, but if you can see something in the data and take action on it, this can be huge value add to your company. Learning just a bit of python will help focus more of your attention on “value add operations” and leave the mundane stuff to your computer.
Merge a directory full of spreadsheets into one # [2]
I see this one all the time. One team gets a spreadsheet from another team once per month and they need to stich all the pieces together. Excel really opens the door for some na...
How to Install miniconda on linux (from the command line only)
miniconda is a python distribution from continuum. It’s a slimmed-down version of their very popular anaconda distribution. It comes with its own environment manager and has eased the install process for many that do not have a way to compile c-extensions. It made it much easier to install the data science stack on windows a few years ago. These days windows are much better than it was back then at compiling c-extensions. I still like its environment manager, which installs to a global directory rather than a local directory for your project.
Installing miniconda on Linux # [1]
Installing miniconda on Linux can be a bit tricky the first time you do it completely from the terminal. The following snippet will create a directory to install miniconda into, download the latest python 3 based install script for Linux 64 bit, run the install script, delete the install script, then add a conda initialize to your bash or zsh shell. After doing this you can restart your shell and conda will...
How to crush amazing posts on DEV
This post was inspired by a comment I left on @dsteenman’s post.
{% post dsteenman/how-long-should-a-blogpost-be-2k6n %}
Most of the time I prefer short as I am more likely to read the whole thing. If its setup as a series I am more likely to work my way through the whole series in a matter of a few sessions. Just my preference
I will say though there are certain articles that fit well to the long format. They are articles that folks tend to come back to often as a reference again and again.
Sections # [1]
- layout is key [2]
- Break it up [3]
- Article types [4]
- superpost [5]
- single post [6]
- series [7]
- discussion [8]
- Post what you want to read [9]
layout is key # [2]
Either way, you go layout is key. You are not Steven King, no matter how great of a writer you are, you are unlikely to hold attention like he can. Most folks reading blogs scan articles first. I often scan, then read. If the article is really good or pertains well to me I will read everything, ...
I like RanaEmad’s [1] project metrics-of-awesome-api [2].
A Node.js API with the main purpose of acting as a backend for practicing authentication in React. It enables the user to sign up, sign in and view a dashboard with his metrics of awesome through different endpoints.
References:
[1]: https://github.com/RanaEmad
[2]: https://github.com/RanaEmad/metrics-of-awesome-api
If you’re into interesting projects, don’t miss out on awesome-gpt3 [1], created by elyase [2].
No description available.
References:
[1]: https://github.com/elyase/awesome-gpt3
[2]: https://github.com/elyase
shreyashankar [1] has done a fantastic job with gpt3-sandbox [2]. Highly recommend taking a look.
The goal of this project is to enable users to create cool web demos using the newly released OpenAI GPT-3 API with just a few lines of Python.
References:
[1]: https://github.com/shreyashankar
[2]: https://github.com/shreyashankar/gpt3-sandbox
Black Tech Pipeline
I was particularly inspired by @chantastic episode 103 of the react podcast with @ParissAthena. They spoke about the black tech pipeline as well as Diversity, Equity, and Inclusion. Pariss is quite an inspiration. She has done so much work to create a better place for POC in tech. I like that not only is she helping them get jobs but acting as a mentor for their first few months on the job to make sure that they are able to find their place and fit in.
Based on an episode of react podcast.
🎙 Listen to the full episode [1].
So Inspirational # [2]
I was particularly inspired by @chantastic [3] episode 103 of the react podcast with @ParissAthena [4]. They spoke about the black tech pipeline as well as Diversity, Equity, and Inclusion. Pariss is quite an inspiration. She has done so much work to create a better place for POC in tech. I like that not only is she helping them get jobs but acting as a mentor for their first few months on the job to make sure that they are able to find ...
Review of the git-auto-commit-action
It’s a really cool GitHub action that will automatically commit files changed
during the action. I was using this to render a new readme based on a template.
Check out the repo for git-auto-commit-action [1].
It’s a really cool GitHub action that will automatically commit files changed during the action. I was using this to render a new readme based on a template.
This has been by far the easiest way to commit back to a repo that I have seen. Other patterns often require fully setting up the git [2] user and everything. While it’s not all that hard, this action already has all of that covered.
You must give it a commit message and thats it. Optionally you can configure a number of things. Its possible to configure the commit_user_name, commit_user_email, and commit_author. I often also scope the file_pattern to a certain subset of files.
---
[3]
If you’re new to actions check out this article on using actions.
[3]
If you’re new to actions check out this article on using a...
What's New in Kedro 0.16.4
If we take a look at the release notes [1] I see one major feature improvement on the list, auto-discovery of hooks.
## Major features and improvements
* Enabled auto-discovery of hooks implementations coming from installed plugins.
This one comes a bit surprising as it was just casually mentioned in #435 [2]
[2]
Think pytest # [3]
As mentioned in #435 [2] this is the model that pytest uses. Not all plugins automatically start doing things right out of the box but require a CLI argument.
simplicity # [4]
It feels a bit crazy that simply installing a package will change the way that your pipeline gets executed. I do like that it requires just a bit less reaching into the framework stuff for the average user. Most folks will be able to write in the catalog and nodes without much change to the rest of the project.
Implementation # [5]
Reading through the docs [6], they show us that we can make our hooks automatically register by adding a kedro.hooks endpoint that points to a ...
I’m impressed by gitActionTraction [1] from bdougie [2].
📹 Home video of GitHub Actions tips for better traction.
References:
[1]: https://github.com/bdougie/gitActionTraction
[2]: https://github.com/bdougie
If you’re into interesting projects, don’t miss out on awesome-README-templates [1], created by elangosundar [2].
A collection of awesome readme templates to display on your github profile.
References:
[1]: https://github.com/elangosundar/awesome-README-templates
[2]: https://github.com/elangosundar
I’m really excited about pandoc [1], an amazing project by jgm [2]. It’s worth exploring!
Universal markup converter
References:
[1]: https://github.com/jgm/pandoc
[2]: https://github.com/jgm
I’m really excited about github-readme-stats [1], an amazing project by anuraghazra [2]. It’s worth exploring!
⚡ Dynamically generated stats for your github readmes
References:
[1]: https://github.com/anuraghazra/github-readme-stats
[2]: https://github.com/anuraghazra