Blog

Full Blog Posts

341 posts latest post 2026-05-11
Publishing rhythm
Feb 2026 | 6 posts

📝 Kedro Preflight Notes

This is a very rough idea for a kedro package to prevent time lost to get partway through a pipeline run only to realize that you dont have access to data or resources. Must Haves # [1] - check that inputs exist or are of a type to skip (sql) Good to haves - check that all input and output databases are accessible with good credentials - check for s3 bucket access - check for spark install Implementation # [2] @hook_spec def before_pipeline_run(run_params, pipeline, catalog): run params # [3] { "run_id": str "project_path": str, "env": str, "kedro_version": str, "tags": Optional[List[str]], "from_nodes": Optional[List[str]], "to_nodes": Optional[List[str]], "node_names": Optional[List[str]], "from_inputs": Optional[List[str]], "load_versions": Optional[List[str]], "pipeline_name": str, "extra_params": Optional[Dict[str, Any]] } References: [1]: #must-haves [2]: #implementation [3]: #run-params

📢 Announcing find-kedro

find-kedro is a small library to enhance your kedro experience. It looks through your modules to find kedro pipelines, nodes, and iterables (lists, sets, tuples) of nodes. It then assembles them into a dictionary of pipelines, each module will create a separate pipeline, and __default__ being a combination of all pipelines. This format is compatible with the kedro _create_pipelines format. [1] [2] [3] [4] # [5] kedro is a ✨ fantastic project that allows for super-fast prototyping of data pipelines, while yielding production-ready pipelines. find-kedro enhances this experience by adding a pytest like node/pipeline discovery eliminating the need to bubble up pipelines through modules. When working on larger pipeline projects, it is advisable to break your project down into different sub-modules which requires knowledge of building python libraries, and knowing how to import each module correctly. While this is not too difficult, in some cases, it can trip up even the most se...

Explicit vs Implicit Returns in Javascript

Often when reading through javascript examples you will find some arrow functions use parentheses () while others use braces {}. This key difference is that parentheses will implicitly return the last statement while braces require an explicit return statement. It is important to understand the difference between them because it is likely that you will find code examples of both and trying to edit code written differently than you’re used to may have unintended consequences. [1] # [2] Arrow functions are one-liner functions in javascript that have two main syntactical ways to create the code block. with parentheses and braces. Let’s take a look at both ways of creating arrow functions so that when we come accross them in the wild it will all make sense. [3] # [4] Here is an example of an arrow function that will implicitly return the last statement without the return keyword. I believe that these are a bit more restricted in that you cannot set variables inside them. They are ...

Twitter deepdives

Inspired by Chris Achard My ideas # [1] Python # [2] - List comps - Classes - Inheritance - Background - Click - Lambdas Kedro # [3] - Cataloging - Custom datasets - Reusable pipelines - find-kedro Learn kedro in 5 days # [4] Email course inspired by learn d3 in 5 days Mail # [5] - Share your knowledge - Practice - Practice in public - Make practice easy - Share your notes - Digital Gardening - Own your content - Build your audience - Be nice - Have empathy - Learn your way - Continuous learning References: [1]: #my-ideas [2]: #python [3]: #kedro [4]: #learn-kedro-in-5-days [5]: #mail
1 min read

TIL: Bind arguments to dynamically generated lambdas in python

This past week I had a really weird bug in my kedro [1] pipeline. For some reason data running through my pipeline was coming out completely made no sense, but if I manually request raw data outside of the pipeline it matched expectations. NOTE While this story is about a kedro pipeline, it can be applied anywhere closures are put into an iterable. [2] # [3] After a few days of looking at it off and on, I pinpointed that it was all the way down in the raw layer. Right as data is coming off of the database. For this I already had existing sql files stored and a read_sql function to get the data so I opted to just set up the pipeline to utilize the existing code as much as possible, leaning on the kedro [1] framework a bit less. I have dynamically created lists of pipeline nodes many times in the past, but typically I take data from kedro [1] input and use it in the lambda. I prefer the simplicity of using lambdas over functools.partial. It typically looks something like this. #...
2 min read

python-deepwatch

Is it possible to deep watch a single python function for changes? Shallow Watch # [1] keeping track of a python functions hash is quite simple. There is a__hash__ method attached to every python function. Calling it will return a hash of the function. If the function changes the hash will change. [ins] In [1]: def test(): ...: return "hello" [ins] In [2]: test.__hash__() Out[2]: 8760526380347 [ins] In [3]: test.__hash__() Out[3]: 8760526380347 [ins] In [4]: def test(): ...: return "hello world" [ins] In [5]: test.__hash__() Out[5]: 8760525617988 [ins] In [6]: def test(): ...: return "hello" [ins] In [7]: test.__hash__() Out[7]: 8760526380491 Using hashlib provides a consistent hash. import inspect import hashlib def test(): return "hello" [ins] In [17]: m.update(inspect.getsource(test).encode()) [ins] In [18]: m Out[18]: <sha256 HASH object @ 0x7f7b7b70fde0> [ins] In [19]: m.hexdigest() Out[19]: '1f2ff4c69eb69b545469686edd6f849136e104cd535785891586d90620328757' [i...
1 min read

Four Github Actions for Python

If you are developing python packages and using GitHub here are four actions that you can use today to automate your release workflow. Since python tools generally have such a simple cli I have opted to use the cli for most of these, that way I know exactly what is happening and have more control over it if I need. h2 img { width: 100%; box-shadow: .5rem .5rem 3rem #141F2D, -.5rem -.5rem 3rem rgba(255,255,255,.1);} img{ max-width: 100% !important;} If you are developing python packages and using GitHub here are four actions that you can use today to automate your release workflow. Since python tools generally have such a simple cli I have opted to use the cli for most of these, that way I know exactly what is happening and have more control over it if I need. - Lint - Test - Package - Upload to PyPi Lint With flake8 # [1] flake8 is pythons quintessential linting tool to ensure that your code is up to the standards that you have set for the project, and to help prevent hidden...

Variables names don't need their type

So often I see a variables type() inside of its name and it hurts me a little inside. Tell me I’m right or prove me wrong below. Examples # [1] Pandas DataFrames are probably the worst offender that I see # bad sales_df = get_sales() # good sales = get_sales() Sometimes vanilla structures too! # bad items_list = ['sneakers', 'pencils', 'paper', ] # good items = ['sneakers', 'pencils', 'paper', ] Edge Cases? # [2] It’s so common when you need to get inside a data structure in a special way that itsn’t provided by the library…. I am not exactly sure of a good way around it. # bad ?? sales = get_sales() sales_dict = sales.to_dict() # good 🤷‍♀️ Containers are plural # [3] Always name your containers plural, so that naming while iterating is simple. prices = {} items = ['sneakers', 'pencils', 'paper', ] for item in items: prices[item] = get_price(item) Before I start fights 🥊 in code review, am I inline here or just being pedantic? References: [1]: #examples [2]: #edge-cases...

Send Emails with GitHub Actions

Here is one useful thing that you can do with GitHub actions no matter what language you use, send email. You might want to know right away when your ci passes. You might want to give your team a nice pat on the back when a new release is deployed. There might be subscribers wanting to see the latest release notes in their inbox as soon as the latest version is deployed. Whatever it is, its pretty easy to do with an action right out of the actions marketplace. Mail on Star # [1] Here is a silly example that sends an email to yourself anytime someone stars your repo. name: Mail on Star on: watch: types: [ started ] # A workflow run is made up of one or more jobs that can run sequentially or in parallel jobs: # This workflow contains a single job called "email" email: # The type of runner that the job will run on runs-on: ubuntu-latest # Steps represent a sequence of tasks that will be executed as part of the job steps: - name: ✨ Send email, you star uses: dawidd6/acti...

What Are GitHub Actions

GitHub actions are an amazing tool that allows us to run code based on triggers inside of our repo. Their is a large and growing community of actions inside the marketplace to use with very little effort. Best of all they are free for public repositories, and private repos have a very generous free tier. h2 img { width: 100%; box-shadow: .5rem .5rem 3rem #141F2D, -.5rem -.5rem 3rem rgba(255,255,255,.1);} img{ max-width: 100% !important;} I have been diving deep into Github actions for about a month now and they are wicked good! They allow you to run any sort of arbitrary code based on events in your repo, webhooks, or schedules. They are very reasonably priced. The interface that GitHub hs developed for them is top-notch! It’s so good I have done 90% of my editing of them right from github.com. TLDR # [1] some interaction to your repository triggers code to run. [2] # [3] The online editor for actions is pretty amazing. When creating a new workflow it automatically sets up a ...

Getting Started with GitHub Actions

Github actions are written in configuration files using the YAML syntax. YAML is a superset of JSON. Most YAML can be expressed inline with JSON syntax. Similar to python YAML is whitespace driven by whitespace rather than brackets tags. The argument for using YAML for configuration files such as actions is that it is more human-readable and editable. It’s much easier to see the whitespace layout than it is to get closing brackets correct. For actions, I believe this is mostly true. I don’t see any use case to get past 3-5 indents, which is completely manageable. Can I just say that I learned more than I realized about YAML by writing this article Arrays and Objects # [1] In YAML or JSON, the most basic containers for data are arrays, a 1D list of things, and objects, for key-value pairs. Arrays # [2] The start of an array container is signified with a leading -. This is probably one of the big things I didn’t understand about YAML before writing this post, but hats off to the ...

Today I learned `git diff feature..main`

Today I learned how to diff between two branches. git diff feature..main Sometimes we get a little git add . && git commit -m "WIP" happy and mistakenly commit something that we just can’t figure out. This is a good way to figure out what the heck has changed on the current branch compared to any other branch. Example # [1] Let’s create a new directory, initialize git [2] and toss some content into a readme. mkdir git-diff git init echo "hello there" > readme.md git add . && git commit -m "hello there" cat readme.md After all of that, we have a git repository on our local machine with a single file readme.md that contains the following. hello there Create a branch and ✍ edit # [3] Let’s checkout a new branch called Waylon and change the word there to Waylon in our readme.md file, then diff it. git checkout -b Waylon echo "hello Waylon" > readme.md git add . && git commit -m "hello Waylon" git diff - hello there + hello waylon At this point we have one commit. Things are real...
2 min read

Create New Kedro Project

This is a quickstart to getting a new kedro [1] pipeline up and running. After this article you should be able to understand how to get started with kedro [1]. You can learn more about this Hello World Example [2] in the docs [2] 🧹 Install Kedro [1] 🛢 Create the Example Pipeline 💨 Run the example 📉 Show the pipeline visualization Create a Virtual Environment [3] # [4] I use conda to control my virtual environments and will create a new environment called kedro_iris with the following command. note the latest compatible version of python is 3.7. EDIT: as of kedro 0.16.0 kedro supports up to 3.8 conda create -n kedro_iris python=3.8 -y [5] Options Activate your conda environment # [6] I try to keep my base environment as clean as possible. I have ran into too many issues installing things in the base environment. Almost always its some dependency that starts causing issues making it even harder to realize where its coming from as I never even installed it in base. source...

What is YOUR Advice for New Data Scientists

- Learn the business - Learn Git [1] - Your code does not need to be amazing - Keep Learning Learn Git # [2] You dont have to start out as a git wizard with the cleanest possible commit history. At first dont let yourself get too wrapped up in it, the most important part is that you make commits. You will find needs for more advanced stuff later. git add . git commit -m "FEAT added new function to calculate revenue by product family" git push Get comfortable with this, then learn how to branch, rebase, stash, etc… Your code does not need to be amazing # [3] Get the job done. Keep it in small bite size pieces. Make readable function definitions and variable names. You will thank yourself for naming things well later. Readability counts more than performance in most cases of data science. If it gets the job done try not to over worry about things like performance. A few extra seconds to clean a dataset or build a model is not worth hours of your time. As you go you will have c...

Do You Hoist

I am working through Wes Bos’s beginnerjavascript.com/ [1] I just hit module 18 on hoisting. It’s something that I always knew was there, Its not something I typically see used or use myself. Do you Hoist? # [2] Do you have any use cases that you use hoising? Why? It seems like a really cool feature in any language that uses it, but I dont really notice it in use. What is Hoising # [3] There are many articles that cover this in far more depth, but its the idea that variable declarations and functions are defined before they are executed. This means that it doesnt matter if you call a function before or after it is defined. Hoisting # [4] console.log(`Hello ${getUser()}`) function getUser() { return 'Waylon' } Running this code will log out “Waylon” What about variable hoisting # [5] I am most familiar with python which does not variable hoist so this one kinda confused me at first. It only hoists the variable declaration not the value of the variable. It defines whether th...

What is Kedro

What is Kedro [1] This is my original what-is-kedro article. There is a brand new one --- Kedro is an open source data pipeline framework. It provides guardrails to set your project up right from the start without needing to know deeply how to setup your own python library for data pipelining. It includes really great ways to manipulate catalogs and pipelines. This article will cover the 10K view of kedro, future articles will dive deper into each one. kedro [2] is an open-source data pipeline framework. It provides guardrails to set your project up right from the start without needing to know deeply how to set up your own python library for data pipelining. It includes great ways to manipulate catalogs and pipelines. This article will cover the 10K view of kedro [2], future articles will dive deeper into each one. Libraries # [3] Currently, kedro [2] is broken down into 3 different libraries. 💎 kedro [2] 📉 kedro-viz [4] 🏗 kedro-docker [5] kedro [2] # [6] [7] kedro [2] ...

Custom Scrollbar Design

Getting a custom scrollbar on your site makes it stand out a bit compared to the very plain stock one that are on most sites. This is how I set mine up on my gatsby site. Inspired by Wes Bos’s new uses.tech [1] I wanted a custom scrollbar on my personal site. I had tried to do it in the past, but gave up after it was not working. Looking at the Source # [2] Since uses.tech [1] is open source I jumped on github, searched for scroll and found this layout.js [3]. Copy it to my own component # [4] My first step was to take his css and copy it into a styled component for my entire layout, but it failed. I do not fully understand why. None of the custom style came through at all. If you know please leave me a comment. [5] I suspect for some reason it has to do with attatching to the html [6] element inside of a styled-component. I think wes was able to get around this by using createGlobalStyle. But I was still using much of the default gatsby template, so I did not have a createG...

Don’t waste your time learning everything

“Don’t waste your time learning everything.” [1] Inspired by this linkedIn post [2] I felt that this comment was very powerful. Here are my 2 cents. Be Productive # [3] Stick to what you know, and learn a little bit of something new every day. If what you know is how to use Excel like a boss, don’t fee ashamed that you are missing something. Be proud and use what you know. Don’t Stagnate # [4] Take small steps enhance what you know now with something new that you get you closer to where you want to be. If you need something that sci-py offers learn how to load in data and use that part. If your sick of waiting for IT to pull data out of the database so you can use it, learn that. Dont Overwhelm Yourself # [5] If you try to drop everything you know now and jump whole hog into these new flashy things its not going to work. Learn what you need to know. New things crop up very often. They will come and go. Some things will get traction, some will never get much traction past an...
2 min read ↺ 7

2020 waylonwalker.com rebrand

Moving into 2020 I have been really leaning on using purple as my theme color everywhere more and more. Its time for an update to my personal site, not just because it feels plain, not just because the cover art I am using for dev.to doesn’t fit my current card layout, but because I feel inspired and I want to. Starting point # [1] ![This is what we are working with.](https://images.waylonwalker.com/2020-02-10 12-17-43_Start.png) This is what we are working with. It has been my card design for at least a year now. Its not bad but, its a bit play, doesnt fit my new cover art style, and that date is not working over top of the cover art text. - plain - cover art does not fit - I am not digging the date on cover art that also has text Colors # [2] I have been really into using a deep purple lately. It is a neutral color that does not get enough respect, i.e. it’s not used as frequently and kinda stands out when used. How I pick colors # [3] I am really bad at picking colors t...