Published

All published posts

2540 posts latest post 2026-06-16 simple view
Publishing rhythm
May 2026 | 58 posts
Check out csurfer [1] and their project pypette [2]. Ridiculously simple flow controller for building complex pipelines References: [1]: https://github.com/csurfer [2]: https://github.com/csurfer/pypette

Kedro

See all of my kedro related posts in [[ tag/kedro ]]. #kedrotips [1] # [2] I am tweeting out most of these snippets as I add them, you can find them all here #kedrotips [3]. 🗣 Heads up # [4] Below are some quick snippets/notes for when using kedro to build data pipelines. So far I am just compiling snippets. Eventually I will create several posts on kedro. These are mostly things that I use In my everyday with kedro. Some are a bit more essoteric. Some are helpful when writing production code, some are useful more usefule for exploration. 📚 Catalog # [5] [6] Photo by jesse orrico on Unsplash CSVLocalDataSet # [7] python import pandas as pd iris = pd.read_csv('https://raw.githubusercontent.com/kedro-org/kedro/d3218bd89ce8d1148b1f79dfe589065f47037be6/kedro/template/%7B%7B%20cookiecutter.repo_name%20%7D%7D/data/01_raw/iris.csv') data_set = CSVLocalDataSet(filepath="test.csv", load_args=None, save_args={"index": False}) iris_data_set.save(iris) reloaded_iris = iris_data_se...
Check out requests [1] by psf [2]. It’s a well-crafted project with great potential. A simple, yet elegant, HTTP library. References: [1]: https://github.com/psf/requests [2]: https://github.com/psf
Check out vscode-git-semantic-commit [1] by nitayneeman [2]. It’s a well-crafted project with great potential. 💬 A Visual Studio Code extension which enables to commit simply by the semantic message conventions References: [1]: https://github.com/nitayneeman/vscode-git-semantic-commit [2]: https://github.com/nitayneeman
awesome-streamlit [1] by MarcSkovMadsen [2] is a game-changer in its space. Excited to see how it evolves. The purpose of this project is to share knowledge on how awesome Streamlit is and can be References: [1]: https://github.com/MarcSkovMadsen/awesome-streamlit [2]: https://github.com/MarcSkovMadsen
I’m impressed by js13k-2019 [1] from bencoder [2]. xx142-b2.exe. An entry for js13kgames 2019 References: [1]: https://github.com/bencoder/js13k-2019 [2]: https://github.com/bencoder
Just starred death-to-ie11 [1] by gabLaroche [2]. It’s an exciting project with a lot to offer. Countdown for IE11 end of support References: [1]: https://github.com/gabLaroche/death-to-ie11 [2]: https://github.com/gabLaroche

📝 Packages to Investigate Notes

- jmespath - Tabnine Bulwark # [1] |-|-| |github: |https://github.com/zaxr/bulwark| I definitely want to try this out with kedro. Bulwark is a package for convenient property-based testing of pandas dataframes, supported for Python 3.5+. Example # [2] import bulwark.decorators as dc @dc.IsShape((-1, 10)) @dc.IsMonotonic(strict=True) @dc.HasNoNans() def compute(df): # complex operations to determine result ... return result_df References: [1]: #bulwark [2]: #example
1 min read
I came across awesome-data-engineering [1] from igorbarinov [2], and it’s packed with great features and ideas. A curated list of data engineering tools for software developers References: [1]: https://github.com/igorbarinov/awesome-data-engineering [2]: https://github.com/igorbarinov
I’m really excited about vscode-python [1], an amazing project by microsoft [2]. It’s worth exploring! Python extension for Visual Studio Code References: [1]: https://github.com/microsoft/vscode-python [2]: https://github.com/microsoft

Just Use Pathlib

Pathlib is an amazing cross-platform path tool. Import # [1] from pathlib import Path Create path object # [2] Current Directory cwd = Path('.').absolute() Users Home Directory home = Path.home() module directory module_path = Path(__file__) Others Let’s create a path relative to our current module. data_path = Path(__file__) / 'data' Check if files exist # [3] Make Directories # [4] data_path.mkdir(parents=True, exists_ok=True) rename files # [5] Path(data_path /'example.csv').rename('real.csv') List files # [6] Glob Files # [7] data_path.glob('*.csv') recursively data_path.rglob('*.csv') Write # [8] Path(data_path / 'meta.txt').write_text(f'created on {datetime.datetime.today()}) References: [1]: #import [2]: #create-path-object [3]: #check-if-files-exist [4]: #make-directories [5]: #rename-files [6]: #list-files [7]: #glob-files [8]: #write
1 min read

Custom Python Exceptions

Custom Exceptions # [1] class ProjectNameError(NameError): pass class UserNameError(NameError): pass class CondaEnvironmentError(RuntimeError): pass class BucketNotDefinedError(NameError): pass References: [1]: #custom-exceptions
1 min read

Filtering Pandas

query # [1] Good for method chaining, i.e. adding more methods or filters without assigning a new variable. # is skus.query('AVAILABILITY == " AVAILABLE"') # is not skus.query('AVAILABILITY != " AVAILABLE"') masking # [2] general purpose, this is probably the most common method you see in training/examples # is skus[skus['AVAILABILITY'] == 'AVAILABLE'] # is not skus[~skus['AVAILABILITY'] == 'AVAILABLE'] isin # [3] capable of including multiple strings to include # is in df[df.AVAILABILITY.isin(['AVAILABLE', 'AVL'])] # is not in df[~df.AVAILABILITY.isin(['AVAILABLE', 'AVL'])] contains # [4] Good For partial matches # contains df[df.AVAILABILITY.str.contains('AVA')] # not contains df[~df.AVAILABILITY.str.contains('AVA')] MASKS # [5] anything that we put inside of square brackets can be set as a variable then passed in. service_mask = skus['AVAILABILITY'] == 'AVAILABLE' name_mask = skus['NAME'] == 'Dell chromebook 11' Operators # [6] & - and ~ - not | - or AVAILABLE and ...

Digital Ocean

I love digital ocean for it’s simplicity and its commitment to open source.
1 min read
If you’re into interesting projects, don’t miss out on Recreation-of-Nature [1], created by Kashu7100 [2]. ALife simulation with Python: patterns, behavior, and cognition. References: [1]: https://github.com/Kashu7100/Recreation-of-Nature [2]: https://github.com/Kashu7100

Quick Progress Bars in python using TQDM

tqdm is one of my favorite general purpose utility libraries in python. It allows me to see progress of multipart processes as they happen. I really like this for when I am developing something that takes some amount of time and I am unsure of performance. It allows me to be patient when the process is going well and will finish in sufficient time, and allows me to 💥 kill it and find a way to make it perform better if it will not finish in sufficient time. [1] for more gifs like these follow me on twitter @waylonwalker [2] Add a simple Progress bar! from tqdm import tqdm from time import sleep for i in tqdm(range(10)): sleep(1) convenience TQDM also has a convenience function called trange that wraps the range function with a tqdm progress bar automatically. from tqdm import trange from time import sleep for i in trange(range(10)): sleep(1) notebook support There is also notebook support. If you are bouncing between ipython and jupyter I recomend importing from the auto ...
1 min read
I’m impressed by bake [1] from kennethreitz [2]. Bake — the strangely familiar workflow utility. References: [1]: https://github.com/kennethreitz/bake [2]: https://github.com/kennethreitz
Check out terminal [1] by microsoft [2]. It’s a well-crafted project with great potential. The new Windows Terminal and the original Windows console host, all in the same place! References: [1]: https://github.com/microsoft/terminal [2]: https://github.com/microsoft

Clean up Your Data Science with Named Tuples

If you are a regular listener of TalkPython [1] or PythonBytes you have hear Michael Kennedy talk about Named Tuples many times, but what are they and how do they fit into my data science workflow. Example # [2] As you graduate your scripts into modules and libraries you might start to notice that you need to pass a lot of data around to all of the functions that you have created. For example if you are running some analysis utilizing sales, inventory, and pricing data. You may need to calculate total revenue, inventory on hand. You may need to pass these data sets into various models to drive production or pricing based on predicted volumes. Load data # [3] Here we setup functions that can load data from the sales database. Assume that we also have similar functions to get_inventory and get_pricing. def get_engine(): engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase') def get_sales(): ''' gets sales history from the sales database ''' engine = ge...