Blog

Full Blog Posts

341 posts latest post 2026-05-11
Publishing rhythm
Feb 2026 | 6 posts

Large Refactor At The Command Line

As projects grow patterns that worked early on break and we need to change things to make the project easier to work with, and more welcoming to new developers. git # [2] Before you start mucking up a project with wild commands at the terminal check that you have a super clean git status. We may make some mistakes and need a way to undo 100’s files and git makes it really easy if you start with a clean history. git status If we are ready to begin work we should see a response like this. On branch main nothing to commit, working tree clean It would also be wise to do this inside of a branch. The minute you try to do something wild in your working branch someone will walk by and ask you to do a five-minute task, but your deep in refactoring and haven’t left yourself a clean way back. git branch my-big-refactor grepr # [3] Time for the meat of this refactor replacing text across our project. I often will pop this bash function into my terminal session and tweak it as needed. This...
4 min read

Ipython-Config

I use my ipython terminal daily. It’s my go to way of running python most of the time. After you use it for a little bit you will probably want to setup a bit of your own configuration. install ipython # [1] Activate your virtual environment [2] of choice and pip install it. Any time you are running your project in a virtual environment, you will need to install ipython inside it to access those packages from ipython. pip install ipython You are using a virtual environment right? Virtual environments like venv or conda can save you a ton of pain down the road. profile_default # [3] When you install ipython you start out with no config at all. Runnign ipython profile create will start a new profile called profile_default that contains all of the default configuration. ipython profile create This command will create a directory ~/.ipython/profile_default multiple configurations # [4] You can run multiple configurations by naming them with ipython profile create [profile_name...
2 min read

Custom Ipython Prompt

I’ve grown tired of the standard ipython prompt as it doesn’t do much to give me any useful information. The default one gives out a line number that only seems to add anxiety as I am working on a simple problem and see that number grow to several hundred. I start to question my ability 🤦‍♂️. Configuration # [1] If you already have an ipython config you can move on otherwise check out this post on creating an ipython config. Ipython-Config [2] The Dream Prompt # [3] I want something similar to the starship prompt I am using in the shell. I want to be able to quickly see my python version, environment name, and git [4] branch. - python version - active environment - git branch [5] This is my zsh prompt I am using for inspiration Basic Prompt # [6] This is mostly boilerplate that I found from various google searches, but this gets me a basic green chevron as my prompt. from IPython.terminal.prompts import Prompts, Token class MyPrompt(Prompts): def in_prompt_tokens(self...
3 min read

Automating my Post Starter

One thing we all dread is mundane work of getting started, and all the hoops it takes to get going. This year I want to post more often and I am taking some steps towards making it easier for myself to just get started. When I start a new post I need to cd into my blog directory, start neovim in a markdown file with a clever name, copy some frontmatter boilerplate, update the post date, add tags, a description, and a cover. Todo List for starting a post # [1] - frontmatter template - Title - slug - tags - date - cover - description - create markdown file - open in neovim Lets Automate this # [2] This aint no proper cli # [3] hot and fast As with many thing running behind the scenes on this site, I am the one and only user, I have limited time, so this is going to be a bit hot and fast. Let’s create a file called new-post. start the script new-post #!python # new-post 👆 Works on my machine If this were something that had more users than me I would probably use some...

Windowing Python Lists

In python data science we often will reach for pandas a bit more than necessary. While pandas can save us so much there are times where there are alternatives that are much simpler. The itertoolsandmore-itertools` are full of cases of this. This post is a walkthrough of me solving a problem with more-itertools rather than reaching for a for loop, or pandas. I am working on a one-line-link expander for my blog. I ended up doing it, just by modifying the markdown with python. I first split the post into lines with content.split('\n'), then look to see if the line appears to be just a link. One more safety net that I wanted to add was to check if there was whitespace around the line, this could not simply be done in a list comprehension by itself. I need just a bit of knowledge of the surrounding lines, enter more-itertools. simplified rendering function # [1] I have a function that will check to see if the line should be expanded, then render the correct template. Fist step is to ...
1 min read

Adding Audio to my blog posts

This is episode 1 of the Waylon Walker Audio experience, posts from waylonwalker.com [1]{.hoverlink} in audio form. So I have had this idea for awhile to add audio to my blog posts. The idea partly comes from the aws blog, if you have ever been on their blog you will have noticed that they have a voiced by amazon polly section. What to Expect # [2] Honestly I don’t know this is all new to me and I dont have much to go off of. For now its a test that may or may not work out. I will say that the time that I have available for clean audio is a bit limited so expect these to come out in batches as I get time to go back and record. What Not to Expect # [3] One thing that makes the aws blog really hard to listen to is the robotic voice, I definitely don’t want that. This will be voiced by a real human, Me. At the same time written text doesn’t translate directly to audio well so don’t necessarily expect the audio to be word for word. Code blocks # [4] There are a lot of code block...

gatsby-remark-embedder

Inspired by discourse’s link expansion I am rolling out expansions for one line links on the blog waylonwalker [1]. I was able to find a gatsby plugin gatsby-remark-embedder [2] that expands one line links for social cards for popular platforms like twitter and YouTube through a repose from Kyle Mathews to my tweet. https://twitter.com/kylemathews/status/1329817928666005504 Use Cases # [3] This covers a couple of use cases I have with very little effort. - Twitter - YouTube install # [4] npm i gatsby-remark-embedder gatsby-plugin-twitter This was super quick and simple to setup, the only thing that was extra was to install the gatsby-plugin-twitter plugin as well as the gatsby-remark-embedder. enable # [5] // In your gatsby-config.js module.exports = { // Find the 'plugins' array plugins: [ `gatsby-plugin-twitter`, { resolve: `gatsby-transformer-remark`, options: { plugins: [ { resolve: `gatsby-remark-embedder`, options: { customTransformers: [ // Your custom t...

Expand One Line Links

I wanted a super simple way to cross-link blog posts that require as little effort as possible, yet still looks good in vanilla markdown in GitHub. I have been using a snippet that puts HTML [1] into the markdown. While this works, it’s more manual/difficult for me does not look the best, and does not read well as Goals for new card # [2] The new card should be fully automated to expand with title, description, and cover image. Bonus if I am able to attach a comment behind it. - fully automated - card expansion - Title - description - cover image Old Card # [3] If you can call it a card 🤣. This card was just an image wrapped in an anchor tag and a paragraph tag. I found this was the most consistent way to get an image narrower and centered in both GitHub and dev.to. <p style='text-align: center'> <a href='https://waylonwalker.com/notes/eight-years-cat/'> <img style='width:500px; max-width:80%; margin: auto;' src="https://images.waylonwalker.com/eight-years-cat.png" al...

Find and Replace in the Terminal.

grepr # [1] grepr() {grep -iRl "$1" | xargs sed -i "s/$1/$2/g"} ```bash grepr() {grep -iRl "$1" | xargs sed -i "s/$1/$2/g"} grepd # [2] grepd() {grep -iRl "$1" | xargs sed -i "/^$1/d"} CocSearch # [3] :CocSearch published: false -g *.md References: [1]: #grepr [2]: #grepd [3]: #cocsearch

Resume Tips

- customize for the job - Why are you a good fit? - What will you bring to the role? - Give real outcomes - give real experience - Stop tech vomiting - if you link to GitHub - Make a profile readme - Guide me to your best work - have some activity - if you link to LinkedIn - Provide some benefit that is not on your resume - Have a logical flow of experience (dont make me hunt for past experience) - Keep it under 2 pages - Who you know. - Reference real experience - Deployed 12 data pipelines with over 500 nodes to process 200GB of data at a Fortune 100 company - vs - Knowledge of Data Engineering methodology with python EC2 - Dont be so fluffy
1 min read
Codeit Bro Interview

Codeit Bro Interview

[1] use this profile image Please share your professional role as a data scientist? [Also feel free to share about your personal projects, publications, etc.] I graduated with a Mechanical Engineering Degree 8 years ago. Much of my work early in my career [2] was wrapped around analyzing larger datasets for my group to understand quality, drive changes to improve quality or prove that quality was already good. [3] Three years ago I made the switch to Data Science and have loved every minute of it. It is a very dynamic field that is continually changing and there are always a new set of skills to learn and hone in on. I talk a lot about the mindset of always learning, sharing knowledge, and communicating in my newsletter [4] What are the most difficult challenges you faced as a data scientist and how you resolved them? Deployment is a high bar to enter. Jupyter notebooks provide a suspiciously simple start into Data Science. Folks with very little coding experience can easily ...
7 min read

reasons-to-kedro

There are many reasons that you should be using kedro. If you are on a team of Data Scientists/Data Engineers processing DataFrames from many data sources should be considering a pipeline framework. Kedro is a great option that provides many benefits for teams to collaborate, develop, and deploy data pipelines What is Kedro [1] Starter Template # [2] Kedro makes it super easy to get started with their cli that utilizes cookiecutter under the hood. conda create -n my-new-project -y python=3.8 kedro new kedro install kedro run Create New Kedro Project [3] read more about how to start your first kedro project here Collaboration # [4] Kedro provides many tools that help teams collaborate on a single codebase. While writing monolithic scripts it can be easy to pin yourself in a corner where it is difficult to have multiple people making changes to the notebook/script at the same time. Kedro helps guide your team to break your project down into small pieces that different members o...

Reasons to Kedro

Reasons to Kedro # [1] - collaboration - Sharable catalog - small nodes over monolithic notebooks - catalog - easily load anything without needing to run - No need to write read/write code - pipeline - No need to keep execution order in your head - easily run a slice of a pipeline - plugins - pip install - make your own - hooks - flexible expandable cli Reasons Not to Kedro # [2] - Already utilizing another DAG framework - Data is not in a widely supported format - Micro short-lived project - Large Project / Deadline - Use a lower profile project to learn first - Team not willing to change - Need minimal dependencies - God Project - kedro owns everything?? References: [1]: #reasons-to-kedro [2]: #reasons-not-to-kedro

Reading List

Latest Post # [1] latest [2] STOP LEAVING Browser Tabs open and save them here! - https://nesbitt.io/2026/03/04/package-managers-need-to-cool-down.html - https://mariozechner.at/posts/2026-03-25-thoughts-on-slowing-the-fuck-down/ - https://danielmiessler.com/blog/ai-stops-being-artificially-cheap --- - jbrancha til [3] - The Video Course Launch that Made Me Think [4] - photo prism [5] - box python library [6] - kedro on hn [7] - How can a Data Scientist refactor Jupyter notebooks towards production-quality code? [8] - Sourcing vs executing in Bash [9] - Should We Follow The Open-Closed Principle? [10] - Create multi-dimensional arrays in pure Python: The Correct Way [11] - Beware of These 9 Red Flags in a Developer Interview [12] - How to Overcome Impostor Syndrome as a Developer [13] - lazy load youtube videos [14] - lite youtube embeds [15] - full subtitle youtube search [16] --- - Jungle Scout - Kedro Case Study [17] - Kedro Sessions [18] - Julia Evans - A...
1 min read

What's New in Kedro 0.16.6

Kedro 0.16.6 [1] is out! Let’s take a look through the release notes Deployment Docs # [2] This is really exciting to see more deployment options coming from the kedro team. It really shows the power of the framework. The power of some of these orchestrations options is incredible. - Argo [3] - Prefect [4] - Kubeflow [5] - Batch [6] - SageMaker [7] Most of them hinge on a sweet combination of the kedro cli, docker image, and the pipeline knowing your nodes dependencies. Argo, Prefect, and Kubeflow have an interesting technique where they translate the pipeline and its dependencies from kedro to their language. Batch uses the aws cli to submit jobs, one node per job, and listen for them to complete. It will submit all nodes with completed dependencies at once, meaning that we can get some massive parallelization. I did a quick and dirty test of one of these by simulating the technique in a bash script and saw a 40 hr pipeline finish in about 1 hour. I am excited to get thi...

A brain dump of stories

I started making stories as kind of a brain dump a few times per day and posting them to [LinkedIn](https://www.linkedin.com/in/waylonwalker/(https://www.linkedin.com/in/waylonwalker/). Here are the last 11 days of stories. I store all the stories on my website with the hopes of doing something with them on my own platform eventually. For now it makes it easy to make these posts. cd static/stories ls | xargs -I {} echo '![](https://waylonwalker.com/stories/{})' Stories 10-10-2020 - 10-21-2020 # [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] References: [1]: #stories-10-10-2020---10-21-2020 [2]: https://waylonwalker.com/stories/TIL-kedro-sorts-nodes.png [3]: https://waylonwalker.com/stories/disable-base-pip.png [4]: https://waylonwalker.com/stories/discovered-social-cards.png [5]: https://waylonwalker.com/stories/find-kedro-de1-contributor.png [6]: https://waylonwalker.com/stories/hacktoberfest-2020-kedro-538-tests-pass.png [7]: https://waylonwalk...

Fix git commit author

I was 20 commits into a hackoberfest PR when I suddenly realized they they all had my work email on them instead of my personal email 😱. This is the story of how I corrected my email address on 19 individual commits after already submitting for a PR. - Change the email for this repo [1] - Prepare for rebasing [2] - start the rebase [3] - 🛠 Fix First wrong Commit [4] - Fix all commits [5] - Done [6] - ReCap [7] Change the email for this repo # [1] stop the bleeding Before anything else set the email correctly! cd kedro git config user.name "Waylon Walker" git config user.email [email protected] Prepare for rebasing # [2] First thing is to find how many commits back this mistake goes. I opened up the git [8] log, and saw mine went back 19 commits. I rolled back 20 just to be sure. $ git log ... commit a355926b9d7ec4c05659adaa254beefbdb036332 Author: WaylonWalker <[email protected]> Date: Sat Oct 17 10:28:59 2020 -0500 give name of function inside incorrect parameters erro...
3 min read

Designing a "Router" for kedro

nodes_global # [1] I released a router-like plugin for kedro back in April 2020. This was not the first design, the idea actually came from one of the QB folks who taught me kedro nearly a year before. We were assembling our pipelines with something called nodes_global. It worked fairly well but did have some issues around being set as a global variable. But… One thing in particular that it did not lend itself well to was being able to create a packagable pipeline that I could pip install and append into any of my existing pipelines. Something I am still trying to work out, maybe I don’t need this. I think I have it working for our internal pipelines and it seems like the way to go, but we don’t necessarily end up using it. Also… With this pattern all of the nodes needed to be importable by the module containing nodes_global. I find that this becomes a big hurdle for new pipelines coming from jupyter to overcome and can be most infuriating when their nodes aren’t getting ran af...
4 min read

Reclaim memory usage in Jupyter

Today I ran into an issue where we had a one-off script that just needed to work, but it was just chewing threw memory like nothing. It started with a colleague asking me How do I clear the memory in a Jupyter notebook, these are the steps we took to debug the issue and free up some memory in their notebook. How do I clear the memory in a Jupyter notebook? Pre check the status of memory # [1] There are a number of ways that you can check the amount of memory on your system. The easiest is not necessarily my first go to is free… literally free. check for free space $ free -h total used free shared buffers cached Mem: 15G 15G 150M 0B 59M 8.7G Generally my first go to is a bit more graphical, and not available on a stock stystem, but far more useful…. htop. htop [2] is a terminal process explorer that shows cpu usage, mem usage, and running processes. htop sudo apt-get install htop # install it from your package repo htop [3] First step throw more swap at it # [4] Often be...
3 min read

Strip Trailing Whitespace from Git projects

A common linting error thrown by various linters is for trailing whitespace. I most often use flake8. I generally have [pre-commit](https://waylonwalker.com/pre-commit-is-awesome hooks setup to strip this, but sometimes I run into situations where I jump into a project without it, and my editor lights up with errors. A simple fix is to run this one-liner. One-Liner to strip whitespace # [1] bash git grep -I --name-only -z -e '' | xargs -0 sed -i -e 's/[ \t]\+\(\r\?\)$/\1/' [2] read more about how pre-commit is awesome [3] References: [1]: #one-liner-to-strip-whitespace [2]: https://waylonwalker.com/pre-commit-is-awesome [3]: /pre-commit-is-awesome/