Archive

All published posts

2469 posts latest post 2026-05-08
Publishing rhythm
Apr 2026 | 47 posts
The work on desert [1] by python-desert [2]. Deserialize to objects while staying DRY References: [1]: https://github.com/python-desert/desert [2]: https://github.com/python-desert
I recently discovered kedro-wings [1] by tamsanh [2], and it’s truly impressive. Kedro Wings automatically creates catalog entries to simplify Kedro pipeline writing. See the video here: https://www.youtube.com/watch?v=p4ELo1tqbYY References: [1]: https://github.com/tamsanh/kedro-wings [2]: https://github.com/tamsanh
Check out kedro-streaming-twitter-pipeline [1] by dataengineerone [2]. It’s a well-crafted project with great potential. No description available. References: [1]: https://github.com/dataengineerone/kedro-streaming-twitter-pipeline [2]: https://github.com/dataengineerone
junegunn [1] has done a fantastic job with fzf.vim [2]. Highly recommend taking a look. fzf ❤️ vim References: [1]: https://github.com/junegunn [2]: https://github.com/junegunn/fzf.vim

Kedro Static Viz 0.3.0 is out with Hooks Support

kedro-static-viz [1] is out with support for the newly released hooks feature. This means that you can have kedro-static-viz automatically deploy a full gatsby site before_pipeline_run keeping your visualization always up to date. Even though it is a static site there is no functionality lost. The only thing that’s missing is the flask server. With kedro-static-viz [1] you can deploy your visualization to a number of static hosting providers such as GitHub pages free of charge with wicked fast performance ⚡ It’s Fast # [2] Even though it’s built on gatsbyjs the full site builds in under 2s even on slower hardware. This is because the site is already pre-rendered and stripped of any excess. It’s zipped up right into the python package and is typically used with the cli, but now can be used with python, or as a hook as well. What is kedro-viz [3] 🤔 # [4] Kedro viz is a fantastic kedro plugin that allows you to visualize your data pipeline. Kedro allows you to quickly build produc...
I’m really excited about pytest-watch [1], an amazing project by joeyespo [2]. It’s worth exploring! Local continuous test runner with pytest and watchdog. References: [1]: https://github.com/joeyespo/pytest-watch [2]: https://github.com/joeyespo
Check out aws [1] and their project aws-cli [2]. Universal Command Line Interface for Amazon Web Services References: [1]: https://github.com/aws [2]: https://github.com/aws/aws-cli

Create Configurable Kedro Hooks

There are two main ways to create kedro hooks, with modules and classes. Each one still uses the same verbiage as the function/method names. Class hooks seem a bit special as they give you a way to configure them so that they are a bit more generally useful. What is Kedro [1] If you are completely unsure what kedro is be sure to check out my what is kedro [2] post Installation # [3] .create a new environment manager of choice. Here I will use conda. Then we will install kedro from pypi. conda create -n kedro_class_hooks -y conda activate kedro_class_hooks # may also be source activate kedro_class_hooks or activate kedro_class_hooks pip install kedro Create a sample project # [4] Kedro new # [5] For more details check out my full post on kedro new [6] For this post I really just want a working pipeline as fast as possible. For this I am going to use iris pipeline that is generated from the kedro new command in the cli. It’s important that you answer y to create an example pi...
3 min read ↺ 1

Brainstorming Kedro Hooks

This post is a 🧠 branstorming work in progress. I will likely use it as a storage location/brain dump of hook ideas. What is Kedro 🤔 # [1] If you are completely unsure what kedro is be sure to check out my what is kedro [2] post after_catalog_created # [3] - filepath replacer - bucket replacer before_pipeline_run # [4] - preflight - check that data exists - run kedro_static_viz - run mypy - run interrogate - run flake8 after_pipeline_run # [5] - Great Expectations - send email - send slack before_node_run # [6] after_node_run # [7] - Great Expectations - save stats/meta data - Execution Order # [8] hooks are executed in reverse order of the hooks list. hooks with tryfirst will be moved to the end of the list hooks with trylast will be moved to the end of the list - after_catalog_created - before_pipeline_run - args - run_params = run_params = {‘run_id’: ‘2020-05-23T15.24.23.958Z’, ‘project_path’: ‘/mnt/c/temp/kedro0160’, ’env’: ’local’, ‘kedro_version’: ‘...

How to get Dev Comments from an article Url

I want to incorporate some of the wonderful comments, \U0001F495, \U0001F984, and \U0001F516’s that I have been getting on dev.to on my website. I have dabbled once or twice with no avail this time I am taking notes on my journey, so follow along and let’s get there together. By the end of this post, I will have a way to get comments from posts on the client-side thanks to the wonderfully open dev.to API. I want to incorporate some of the wonderful comments, 💕, 🦄, and 🔖’s that I have been getting on dev.to on my website. I have dabbled once or twice with no avail this time I am taking notes on my journey, so follow along and let’s get there together. By the end of this post, I will have a way to get comments from posts on the client-side thanks to the wonderfully open dev.to API. The API # [1] dev.to has an open API that allows us to easily get comments as HTML [2]. They have their API hosted at https://docs.forem.com/api/#tag/comments, let’s take a look at it. [3] Here we can...

Four github actions for your website

GitHub’s actions are a new GitHub feature that will trigger GitHub to spin up a virtual machine and run some tasks with some special access to your repo. It can interact with comments/issues, it can clone your repo, You can explicitly pass in secrets so that it can commit back to the repo or deploy to another service. The environment may be a Linux, windows, or even a mac machine. I believe this is wildly incredible for the open-source community, putting these tools in the same place that we are already collaborating is so convenient. What can they do for my personal website? 🤔 # [1] GitHub actions can give you confidence that your site is up and running, with the latest JavaScript packages, does not have broken links, and can even take screenshots of what your website looks like on different screen sizes and operating systems. - periodically check that the website is up - update npm - url checker - screenshot website srt32/uptime [2] # [3] srt32/uptime [2] is an action that...

Create Custom Kedro Dataset

Kedro provides an efficient way to build out data catalogs with their yaml api. It allows you to be very declaritive about loading and saving your data. For the most part you just need to tell Kedro what connector to use and its filepath. When running Kedro takes care of all of the read/write, you just reference the catalog key. But what is happening behind the scenes # [1] Under the hood there is an AbstractDataSet that each connector inherits from. It sets up a lot of the behind the scenes structure for us so that we dont have to. For the most part kedro has connectors for about anything that you want to load, csv, parquet, sql, json, from about anywhere, http, s3, localfile system are just some of the examples. Here is a DataSet implementation from their docs. Here you can see the barebones example straight from the docs. Parameters from the yaml catalog will get passed in from pathlib import Path import pandas as pd from kedro.io import AbstractDataSet class MyOwnDataSet(...

Interrogate is a pretty awesome, brand new, cli for Python packages

As usual while listening to python bytes 181 [1] I heard of a tool that I had to try out right away! This thing is 🔥 hot off the press folks, we’re talking the first release only 3 weeks ago. Its something that the python community needed years ago, and it belongs in your CI today. I had tried several tools that tried to do docstring coverage in the past but they were a bit cumbersome and were quickly forgotten about. Not interrogate, its dead simple! Nothing I have tried has come close to being this good Interrogate # [2] It runs documentation coverage for your python project. It allows you to set the minimum amount of docstring coverage for your project and has some great setup instructions right in the readme. Install it # [3] Interrogate is on pypi so it is super simple to install with pip pip install interrogate run it # [4] This is the best part, its super easy to run right from the command line! Just call it, and give it a path to run. interrogate -v <path> 😲 I hav...
2 min read
Just starred pyp [1] by hauntsaninja [2]. It’s an exciting project with a lot to offer. Easily run Python at the shell! Magical, but never mysterious. References: [1]: https://github.com/hauntsaninja/pyp [2]: https://github.com/hauntsaninja
I like econchick’s [1] project interrogate [2]. Explain yourself! Interrogate a codebase for docstring coverage. References: [1]: https://github.com/econchick [2]: https://github.com/econchick/interrogate

drawing ascii boxes

When creating cli’s I often want some nice full-width character. I find it tough to find them, and when I do half the time it is an image or something that cannot be copied 👿. I rarely get very complex with my semi-manual ASCII art. I can do 98% of what I need with bars and corners. Using some simple full-width characters can really give your cli a nice clean look. Example # [1] I’d say 50% of what I need is just a full-width horizontal bar to give some visual flair or separation. [2] Bars # [3] ― ⍽ ⎸ ⎹ ␣ ─ ━ │ ┃ Square Corners # [4] ┌ ┍ ┎ ┏ ┐ ┑ ┒ ┓ └ ┕ ┖ ┗ ┘ ┙ ┚ ┛ Round Corners # [5] ╭ ╮ ╯ ╰ ╱ ╲ ╳ Harpoons # [6] ⃑ ⃬ ⃭ ↼ ↽ ↾ ↿ ⇀ ⇁ ⇂ ⇃ ⇋ ⇌ ⥊ ⥋ ⥌ ⥍ ⥎ ⥏ ⥐ ⥑ ⥒ ⥓ ⥔ ⥕ ⥖ ⥗ ⥘ ⥙ ⥚ ⥛ ⥜ ⥝ ⥞ ⥟ ⥠ ⥡ ⥢ ⥣ ⥤ ⥥ ⥦ ⥧ ⥨ ⥩ ⥪ ⥫ ⥬ ⥭ ⥮ ⥯ Double Boxes # [7] ═ ║ ╒ ╓ ╔ ╕ ╖ ╗ ╘ ╙ ╚ ╛ ╜ ╝ ╞ ╟ ╠ ╡ ╢ ╣ ╤ ╥ ╦ ╧ ╨ ╩ ╪ ╫ ╬ Dashed Boxes # [8] ┄ ┅ ┆ ┇ ┈ ┉ ┊ ┋╌ ╍ ╎ ╏ Connectors # [9] ├ ┝ ┞ ┟ ┠ ┡ ┢ ┣ ┤ ┥ ┦ ┧ ┨ ┩ ┪ ┫ ┬ ┭ ┮ ┯ ┰ ┱ ┲ ┳ ┴ ┵ ┶ ┷ ┸ ┹ ┺ ┻ ┼ ┽ ┾ ┿ ╀ ╁ ╂ ╃ ╄ ╅ ╆ ╇ ╈ ╉ ╊ ╋ Others # [10] ☐ ☑ ☒ ⫍ ⫎ ⮹ ⮽...
1 min read
Check out rec [1] and their project safer [2]. 🧷 A safer writer 🧷 References: [1]: https://github.com/rec [2]: https://github.com/rec/safer

creating the kedro-preflight hook

Kedro Hooks Intro - kedro hooks are an exciting upcoming feature of kedro 0.16.0. They allow you to hook into catalog_created,pipeline_run, and node_run(nouns). With a before, or after (adjective). This really reminds me of reacts lifecycle hooks, that let you hook into various state of react web components. This is going to make kedro so extendable by the community. I am super pumped to see what the community is able to do with this ability. kedro hooks are an exciting upcoming feature of kedro 0.16.0. They allow you to hook into catalog_created,pipeline_run, and node_run(nouns). With a before, or after (adjective). This really reminds me of reacts lifecycle hooks, that let you hook into various state of react web components. This is going to make kedro so extendable by the community. I am super pumped to see what the community is able to do with this ability. What is Kedro [1] If you are completely unsure what kedro is be sure to check out my what is kedro post Docs # [2] a w...

📝 Kedro Preflight Notes

This is a very rough idea for a kedro package to prevent time lost to get partway through a pipeline run only to realize that you dont have access to data or resources. Must Haves # [1] - check that inputs exist or are of a type to skip (sql) Good to haves - check that all input and output databases are accessible with good credentials - check for s3 bucket access - check for spark install Implementation # [2] @hook_spec def before_pipeline_run(run_params, pipeline, catalog): run params # [3] { "run_id": str "project_path": str, "env": str, "kedro_version": str, "tags": Optional[List[str]], "from_nodes": Optional[List[str]], "to_nodes": Optional[List[str]], "node_names": Optional[List[str]], "from_inputs": Optional[List[str]], "load_versions": Optional[List[str]], "pipeline_name": str, "extra_params": Optional[Dict[str, Any]] } References: [1]: #must-haves [2]: #implementation [3]: #run-params
The work on autoflake [1] by fake-name [2]. Removes unused imports and unused variables as reported by pyflakes References: [1]: https://github.com/fake-name/autoflake [2]: https://github.com/fake-name
Check out trys [1] and their project sergey [2]. A tiny lil’ static site generator References: [1]: https://github.com/trys [2]: https://github.com/trys/sergey

📢 Announcing find-kedro

find-kedro is a small library to enhance your kedro experience. It looks through your modules to find kedro pipelines, nodes, and iterables (lists, sets, tuples) of nodes. It then assembles them into a dictionary of pipelines, each module will create a separate pipeline, and __default__ being a combination of all pipelines. This format is compatible with the kedro _create_pipelines format. [1] [2] [3] [4] # [5] kedro is a ✨ fantastic project that allows for super-fast prototyping of data pipelines, while yielding production-ready pipelines. find-kedro enhances this experience by adding a pytest like node/pipeline discovery eliminating the need to bubble up pipelines through modules. When working on larger pipeline projects, it is advisable to break your project down into different sub-modules which requires knowledge of building python libraries, and knowing how to import each module correctly. While this is not too difficult, in some cases, it can trip up even the most se...

Explicit vs Implicit Returns in Javascript

Often when reading through javascript examples you will find some arrow functions use parentheses () while others use braces {}. This key difference is that parentheses will implicitly return the last statement while braces require an explicit return statement. It is important to understand the difference between them because it is likely that you will find code examples of both and trying to edit code written differently than you’re used to may have unintended consequences. [1] # [2] Arrow functions are one-liner functions in javascript that have two main syntactical ways to create the code block. with parentheses and braces. Let’s take a look at both ways of creating arrow functions so that when we come accross them in the wild it will all make sense. [3] # [4] Here is an example of an arrow function that will implicitly return the last statement without the return keyword. I believe that these are a bit more restricted in that you cannot set variables inside them. They are ...

Twitter deepdives

Inspired by Chris Achard My ideas # [1] Python # [2] - List comps - Classes - Inheritance - Background - Click - Lambdas Kedro # [3] - Cataloging - Custom datasets - Reusable pipelines - find-kedro Learn kedro in 5 days # [4] Email course inspired by learn d3 in 5 days Mail # [5] - Share your knowledge - Practice - Practice in public - Make practice easy - Share your notes - Digital Gardening - Own your content - Build your audience - Be nice - Have empathy - Learn your way - Continuous learning References: [1]: #my-ideas [2]: #python [3]: #kedro [4]: #learn-kedro-in-5-days [5]: #mail
1 min read
I’m impressed by alive-progress [1] from rsalmei [2]. A new kind of Progress Bar, with real-time throughput, ETA, and very cool animations! References: [1]: https://github.com/rsalmei/alive-progress [2]: https://github.com/rsalmei
I’m really excited about vim-quickui [1], an amazing project by skywind3000 [2]. It’s worth exploring! The missing UI extensions for Vim 9 (and NeoVim) !! 😎 References: [1]: https://github.com/skywind3000/vim-quickui [2]: https://github.com/skywind3000
If you’re into interesting projects, don’t miss out on bashtop [1], created by aristocratos [2]. Linux/OSX/FreeBSD resource monitor References: [1]: https://github.com/aristocratos/bashtop [2]: https://github.com/aristocratos

TIL: Bind arguments to dynamically generated lambdas in python

This past week I had a really weird bug in my kedro [1] pipeline. For some reason data running through my pipeline was coming out completely made no sense, but if I manually request raw data outside of the pipeline it matched expectations. NOTE While this story is about a kedro pipeline, it can be applied anywhere closures are put into an iterable. [2] # [3] After a few days of looking at it off and on, I pinpointed that it was all the way down in the raw layer. Right as data is coming off of the database. For this I already had existing sql files stored and a read_sql function to get the data so I opted to just set up the pipeline to utilize the existing code as much as possible, leaning on the kedro [1] framework a bit less. I have dynamically created lists of pipeline nodes many times in the past, but typically I take data from kedro [1] input and use it in the lambda. I prefer the simplicity of using lambdas over functools.partial. It typically looks something like this. #...
2 min read

python-deepwatch

Is it possible to deep watch a single python function for changes? Shallow Watch # [1] keeping track of a python functions hash is quite simple. There is a__hash__ method attached to every python function. Calling it will return a hash of the function. If the function changes the hash will change. [ins] In [1]: def test(): ...: return "hello" [ins] In [2]: test.__hash__() Out[2]: 8760526380347 [ins] In [3]: test.__hash__() Out[3]: 8760526380347 [ins] In [4]: def test(): ...: return "hello world" [ins] In [5]: test.__hash__() Out[5]: 8760525617988 [ins] In [6]: def test(): ...: return "hello" [ins] In [7]: test.__hash__() Out[7]: 8760526380491 Using hashlib provides a consistent hash. import inspect import hashlib def test(): return "hello" [ins] In [17]: m.update(inspect.getsource(test).encode()) [ins] In [18]: m Out[18]: <sha256 HASH object @ 0x7f7b7b70fde0> [ins] In [19]: m.hexdigest() Out[19]: '1f2ff4c69eb69b545469686edd6f849136e104cd535785891586d90620328757' [i...
1 min read
Looking for inspiration? autoreload [1] by stevekrenzel [2]. A small python script to watch a directory for changes and reload a process when a change is detected. References: [1]: https://github.com/stevekrenzel/autoreload [2]: https://github.com/stevekrenzel
Check out madzak [1] and their project python-json-logger [2]. Json Formatter for the standard python logger References: [1]: https://github.com/madzak [2]: https://github.com/madzak/python-json-logger

Four Github Actions for Python

If you are developing python packages and using GitHub here are four actions that you can use today to automate your release workflow. Since python tools generally have such a simple cli I have opted to use the cli for most of these, that way I know exactly what is happening and have more control over it if I need. h2 img { width: 100%; box-shadow: .5rem .5rem 3rem #141F2D, -.5rem -.5rem 3rem rgba(255,255,255,.1);} img{ max-width: 100% !important;} If you are developing python packages and using GitHub here are four actions that you can use today to automate your release workflow. Since python tools generally have such a simple cli I have opted to use the cli for most of these, that way I know exactly what is happening and have more control over it if I need. - Lint - Test - Package - Upload to PyPi Lint With flake8 # [1] flake8 is pythons quintessential linting tool to ensure that your code is up to the standards that you have set for the project, and to help prevent hidden...
Check out justmarkham [1] and their project scikit-learn-tips [2]. 🤖⚡ 50 scikit-learn tips References: [1]: https://github.com/justmarkham [2]: https://github.com/justmarkham/scikit-learn-tips
Check out Textualize [1] and their project rich [2]. Rich is a Python library for rich text and beautiful formatting in the terminal. References: [1]: https://github.com/Textualize [2]: https://github.com/Textualize/rich

Variables names don't need their type

So often I see a variables type() inside of its name and it hurts me a little inside. Tell me I’m right or prove me wrong below. Examples # [1] Pandas DataFrames are probably the worst offender that I see # bad sales_df = get_sales() # good sales = get_sales() Sometimes vanilla structures too! # bad items_list = ['sneakers', 'pencils', 'paper', ] # good items = ['sneakers', 'pencils', 'paper', ] Edge Cases? # [2] It’s so common when you need to get inside a data structure in a special way that itsn’t provided by the library…. I am not exactly sure of a good way around it. # bad ?? sales = get_sales() sales_dict = sales.to_dict() # good 🤷‍♀️ Containers are plural # [3] Always name your containers plural, so that naming while iterating is simple. prices = {} items = ['sneakers', 'pencils', 'paper', ] for item in items: prices[item] = get_price(item) Before I start fights 🥊 in code review, am I inline here or just being pedantic? References: [1]: #examples [2]: #edge-cases...
I’m really excited about cpython [1], an amazing project by python [2]. It’s worth exploring! The Python programming language References: [1]: https://github.com/python/cpython [2]: https://github.com/python
I recently discovered scully [1] by scullyio [2], and it’s truly impressive. The Static Site Generator for Angular apps References: [1]: https://github.com/scullyio/scully [2]: https://github.com/scullyio
I came across pydevto [1] from lpellis [2], and it’s packed with great features and ideas. Unofficial dev.to api References: [1]: https://github.com/lpellis/pydevto [2]: https://github.com/lpellis
I’m really excited about kedro-pandas-profiling [1], an amazing project by brickfrog [2]. It’s worth exploring! A simple wrapper to use Pandas Profiling easily in Kedro References: [1]: https://github.com/brickfrog/kedro-pandas-profiling [2]: https://github.com/brickfrog
The work on gregives.co.uk [1] by gregives [2]. Personal site and portfolio of software engineer Greg Ives References: [1]: https://github.com/gregives/gregives.co.uk [2]: https://github.com/gregives
I’m impressed by act [1] from nektos [2]. Run your GitHub Actions locally 🚀 References: [1]: https://github.com/nektos/act [2]: https://github.com/nektos

Send Emails with GitHub Actions

Here is one useful thing that you can do with GitHub actions no matter what language you use, send email. You might want to know right away when your ci passes. You might want to give your team a nice pat on the back when a new release is deployed. There might be subscribers wanting to see the latest release notes in their inbox as soon as the latest version is deployed. Whatever it is, its pretty easy to do with an action right out of the actions marketplace. Mail on Star # [1] Here is a silly example that sends an email to yourself anytime someone stars your repo. name: Mail on Star on: watch: types: [ started ] # A workflow run is made up of one or more jobs that can run sequentially or in parallel jobs: # This workflow contains a single job called "email" email: # The type of runner that the job will run on runs-on: ubuntu-latest # Steps represent a sequence of tasks that will be executed as part of the job steps: - name: ✨ Send email, you star uses: dawidd6/acti...
I’m really excited about awesome-python-bytes [1], an amazing project by JackMcKew [2]. It’s worth exploring! 😎 🐍 Awesome lists about Python Bytes https://pythonbytes.fm/ References: [1]: https://github.com/JackMcKew/awesome-python-bytes [2]: https://github.com/JackMcKew
Check out get-diff-action [1] by technote-space [2]. It’s a well-crafted project with great potential. GitHub Actions to get git [3] diff References: [1]: https://github.com/technote-space/get-diff-action [2]: https://github.com/technote-space [3]: /glossary/git/
If you’re into interesting projects, don’t miss out on react-toastify [1], created by fkhadra [2]. React notification made easy 🚀 ! References: [1]: https://github.com/fkhadra/react-toastify [2]: https://github.com/fkhadra

What Are GitHub Actions

GitHub actions are an amazing tool that allows us to run code based on triggers inside of our repo. Their is a large and growing community of actions inside the marketplace to use with very little effort. Best of all they are free for public repositories, and private repos have a very generous free tier. h2 img { width: 100%; box-shadow: .5rem .5rem 3rem #141F2D, -.5rem -.5rem 3rem rgba(255,255,255,.1);} img{ max-width: 100% !important;} I have been diving deep into Github actions for about a month now and they are wicked good! They allow you to run any sort of arbitrary code based on events in your repo, webhooks, or schedules. They are very reasonably priced. The interface that GitHub hs developed for them is top-notch! It’s so good I have done 90% of my editing of them right from github.com. TLDR # [1] some interaction to your repository triggers code to run. [2] # [3] The online editor for actions is pretty amazing. When creating a new workflow it automatically sets up a ...

Getting Started with GitHub Actions

Github actions are written in configuration files using the YAML syntax. YAML is a superset of JSON. Most YAML can be expressed inline with JSON syntax. Similar to python YAML is whitespace driven by whitespace rather than brackets tags. The argument for using YAML for configuration files such as actions is that it is more human-readable and editable. It’s much easier to see the whitespace layout than it is to get closing brackets correct. For actions, I believe this is mostly true. I don’t see any use case to get past 3-5 indents, which is completely manageable. Can I just say that I learned more than I realized about YAML by writing this article Arrays and Objects # [1] In YAML or JSON, the most basic containers for data are arrays, a 1D list of things, and objects, for key-value pairs. Arrays # [2] The start of an array container is signified with a leading -. This is probably one of the big things I didn’t understand about YAML before writing this post, but hats off to the ...
Check out poke95 [1] by wobsoriano [2]. It’s a well-crafted project with great potential. 🚀 A Windows 95 style Pokédex built with React. References: [1]: https://github.com/wobsoriano/poke95 [2]: https://github.com/wobsoriano