Drafts

Draft and unpublished posts

0 posts

fix crlf for entire git repo

Final Result # git checkout main git reset --hard git rm -rf --cached . echo "* text=auto" > .gitattributes git add .
1 min read

Automatic Conda Environments

I have automated my process to create virtual environments in my python projects, here is how I did it.

I’ve really been digging my new tmux session management setup. Now I have leveled it up by adding direnv to my workflow. It will execute a shell script whenever I cd into a directory. One thing I wanted to add to this was, automatic activation of python environments whenever I cd into a directory, or create a new environment if one does not exist.

https://waylonwalker.com/tmux-nav-2021/

...

3 min read

How I Review Pipeline Code

I have started doing more regular PR’s on my teams Kedro pipelines. I generally take a two phase approach to the review in order to give the reviewee both quick and detailed feedback.

What is Kedro

Phase1 is typically a quick scan over the PR right within the PR window in my browser.

...

2 min read

Sample

There is a glossary item in vibe coding here and clippy-no-simpy/" class="glossary-term" title=""Clippy no Simpy" is a term coined by Louis Rossmann, when people try to stand up for companies doing scummy things like charging your for features that you...">clippy no simpy.

Now you don’t have to manually link to how to create a 4 min read

🐍 Pluggable Architecture with Python

pytest has open sourced their amazing plugin framework pluggy, it allows library authors to give their users a way to modify the libaries behavior without needing to submit a change that may not make sense to the entire library.

My experience so far as a plugin user, and plugin author has been great. Building and using plugins are incredibly intuitive. I wanted to dive a bit deeper and see how they are implemented inside of a library and its a bit of a mind bend the first time you try to do it.

A hook is a single function that has a specific place that it is ran by the PluginManager.

...

4 min read

⚙ How Python Tools Are Configured

There are various ways to configure python tools, config files, code, or environment variables. Let’s look at a few projects that allow users to configure them through the use of config files and how they do it.

This will not include how they are implemented, I’ve looked at a few and its not simple. This will focus on where config is placed and the order in which duplicates are resolved.

The motivation of this article is to serve as a bit of a reference guide for those who may want to create their own package that needs configuration.

...

5 min read

Todo

Overrated underrated command line applications

Get started on daily kedro shorts, these are indended to be short clips that people can watch a playlist and learn about kedro concepts at their own pace. This is meant to be low barrier to entry for me to create. Like the tmux series, I hope to make a sub three minute video within one or two takes, no edits, all straight from obs.

making your first nodes in kedro 8/23

...

2 min read

Ipython Ninjitsu

Stop going to google everytime your stuck and stay in your workflow. The ipython ? is a superhero for productivity and staying on task.

from kedro.pipeline import Pipeline Pipeline? Init signature: Pipeline( nodes: Iterable[Union[kedro.pipeline.node.Node, ForwardRef('Pipeline')]], *, tags: Union[str, Iterable[str]] = None, ) Docstring: A ``Pipeline`` defined as a collection of ``Node`` objects. This class treats nodes as part of a graph representation and provides inputs, outputs and execution order. Init docstring: Initialise ``Pipeline`` with a list of ``Node`` instances. Args: nodes: The iterable of nodes the ``Pipeline`` will be made of. If you provide pipelines among the list of nodes, those pipelines will be expanded and all their nodes will become part of this new pipeline. tags: Optional set of tags to be applied to all the pipeline nodes. Raises: ValueError: When an empty list of nodes is provided, or when not all nodes have unique names....

...

Compare Directories In Bash

Today I needed to check for articles that used the same slug from two directories, bash made it super simple.

1 min

Testing Data Pipelines

Lint/Format/Doc black flake8 interrogate mypy Pipeline Assertions pipeline constructs pipeline as expected nodes pipeline has minimum nodes test minimum tags test alternate tags Catalog Assertions test catalog follows naming structure Node Tests test function does the correct operations on test data Great Expectations

Kedro Factory

Dynamically generate kedro pipelines with yaml or script

Inspiration

1 min read

rebrand

simple landing page https://swyx.io joel on software recent reading lists More from waylon just above footer 4x2 grid link strategy latest post next/prev similar tags search in nav tag stickers simple cards? bookmarks? nav style stinks single post template flat routes no need to /blog /notes post types 🌳 full 🌱 budding 🖊 Note 💻 hot tip usage of tags MDX stories slides ⚠ ❌ ✔ kedro viz charts inlink component https://joshwcomeau.com/ auto-card oneline links meta posts about uses how site is built how to search stories TODO # review package.json update package.json Done # ahrefs fix canonical urls fix broken inlinks convert to one post template
1 min read

Avoid Nesting Loops in Python

Nesting loops inside of each other in python makes for much harder code to understand, it takes more brain power to understand, and is thus more error prone than if its avoidable. One issue with this complexity is that toy examples may make sense, but most real example will grow and become more deeply nested over time. Avoiding this complexity from the start can help simplify the project in the future.

Lets take a pretty simple example where we are using a ficticious library to get some sales data for our transportation company. The api allows us to fetch teh sales data for one class of vehicle and one region at a time.

import pandas as pd from datastore import get_sales # ficticious library cars = ['sedan', 'coupe', 'hatchback'] regions = ['US', 'CA', 'MX']

❌ Nesting Loops #

We have setup to fetch our data with two lists that...

...

2 min read

List the latest files to change in a git repo

while read file; do echo $(git log --pretty=format:%ad -n 1 --date=raw -- $file) $file; done < <(git ls-tree -r --name-only HEAD | grep static/stories) | sort -r | head -n 3 | cut -d &#34; &#34; -f 3