How small can a minimum kedro pipeline ready to package be? I made one within 4 files that you can pip install. It's only a total of 35 lines of python, 8 in setup.py and 27 in mini_kedro_pipeline.py. Minimal Kedro Pipeline I have everything for th...
2021-1-20
This is a post that may be a work in progress for awhile, Its a collections of thoughts on managing my blog, but could be translated into anythiung that is just a collection of markdown. Listing things posts tags draft posts data frontmatter filep...
2021-1-20
In python data science/engineering most of our data is in the form of some sort of table, typically a DataFrame from a library like pandas, spark, or dask. DataFrames are the heart of most pipelines These containers for data contain many convenient m...
2021-1-14
Changing conda environments is a bit verbose, I use a function with fzf that both lists environments and selects the one I want in one go. Conda I have used conda as a virtual environment tool for years now. I started using conda for its simplicity...
2021-1-11
What does it take to create an installable python package that can be hosted on pypi? What is the minimal python package setup.py my_module.py This post is somewhat inspired by the bottle framework, which is famously created as a single python modu...
2021-1-10
I've grown tired of the standard ipython prompt as it doesn't do much to give me any useful information. The default one gives out a line number that only seems to add anxiety as I am working on a simple problem and see that number grow to several h...
2020-12-20
I use my ipython terminal daily. It's my go to way of running python most of the time. After you use it for a little bit you will probably want to setup a bit of your own configuration. install ipython Activate your virtual environment of choice an...
2020-12-20
One thing we all dread is mundane work of getting started, and all the hoops it takes to get going. This year I want to post more often and I am taking some steps towards making it easier for myself to just get started. When I start a new post I nee...
2020-12-11
In python data science we often will reach for pandas a bit more than necessary. While pandas can save us so much there are times where there are alternatives that are much simpler. The itertoolsandmore-itertools` are full of cases of this. This pos...
2020-12-10
There are many reasons that you should be using kedro. If you are on a team of Data Scientists/Data Engineers processing DataFrames from many data sources should be considering a pipeline framework. Kedro is a great option that provides many benef...
2020-11-1
Kedro 0.16.6 is out! Let's take a look through the release notes Deployment Docs This is really exciting to see more deployment options coming from the kedro team. It really shows the power of the framework. The power of some of these orchestrations...
2020-10-25
nodes_global I released a router-like plugin for kedro back in April 2020. This was not the first design, the idea actually came from one of the QB folks who taught me kedro nearly a year before. We were assembling our pipelines with something call...
2020-10-8
Today we ran into an issue where we had a one-off script that just needed to work, but it was just chewing threw memory like nothing. Pre check the status of memory. There are a number of ways that you can check the amount of memory on your system....
2020-10-1
Here are three things that I see my non programming counterparts doing every single day. These really sum up so much of what folks do within an office. So many of us dabble in or become power users of spreadsheets without knowing there is an altern...
2020-8-11
miniconda is a python distribution from continuum. It's a slimmed-down version of their very popular anaconda distribution. It comes with its own environment manager and has eased the install process for many that do not have a way to compile c-exten...
2020-8-10
If we take a look at the release notes I see one major feature improvement on the list, auto-discovery of hooks. ``` markdown Major features and improvements Enabled auto-discovery of hooks implementations coming from installed plugins. ``` This on...
2020-8-1
As I continue to build out waylonwalker.com I sometimes run into some errors that are not caught because I do not have good testing implemented. I want to explore some integration testing options using GitHub's actions. Running integration tests wil...
2020-7-27
When learning a new skill it's important to practice along the way. In order for me to show up to practice I need to make it easy to show up. An easy way to show up to practice with python is to use an online repl. With these, you can try out somethi...
2020-7-25
mypy Mypy's config parser seems to be one of the most complex. This is likely in part to it having the largest backwards compatability of all projects that I looked at. mypy/config_parser flake8 options/config.py black black portray only uses pypro...
2020-7-21
I am looking into a way to replace my google reader experience that I had back in 2013 before google took it from us. I am starting by learning how to parse feeds with python, and without much previous knowledge, it proved to be much easier than ant...
2020-7-13
Why use kedro catalog? While using the catalog alone will not reap all of the benefits of the framework, it does get you and your project ready for the full framework eventually. For me the full benefit of the catalog comes when you combine it with...
2020-6-29
kedro 0.16.2 just dropped last week with a long-awaited feature... catalog search! I went as far as monkey patching this into each of my projects. I work jump between a few really big projects that have tons of datasets. Being able to quickly sear...
2020-6-22
Passing inputs into kedro is a key concept. Understanding how it accepts a single catalog key as input is quite trivial that easily makes sense, but passing a list or dictionary of catalog entries can be a bit confusing. args/*args review Check out...
2020-6-19
Pathlib is an amazing cross-platform path tool. Import python from pathlib import Path Create path object Current Directory python cwd = Path('.').absolute() Users Home Directory python home = Path.home() module directory python module_path = Path(__...
2019-9-26
query Good for method chaining, i.e. adding more methods or filters without assigning a new variable. ```python is skus.query('AVAILABILITY == " AVAILABLE"') is not skus.query('AVAILABILITY != " AVAILABLE"') ``` masking general purpose, this is proba...
2019-9-24
tqdm is one of my favorite general purpose utility libraries in python. It allows me to see progress of multipart processes as they happen. I really like this for when I am developing something that takes some amount of time and I am unsure of perf...
2019-9-18
If you are a regular listener of TalkPython or PythonBytes you have hear Michael Kennedy talk about Named Tuples many times, but what are they and how do they fit into my data science workflow. Example As you graduate your scripts into modules and li...
2019-9-11
This post is intended as an extension/update from background tasks in python. I started using background the week that Kenneth Reitz released it. It takes away so much boilerplate from running background tasks that I use it in more places than I pr...
2019-9-10
Autoreload in Ipython I have used %autoreload for several years now with great success and 🔥 rapid reloads. It allows me to move super fast when developing libraries and modules. They have made some great updates this year that allows class modules...
2019-9-8
Bash Notes Bash is super powerful. File System Full Show Remaining Space on Drives bash df -h show largest files in current directory bash du . -h --max-depth=1 Move files then symlink them bash mkdir /mnt/mounted_drive mv ~/bigdir /mnt/mounted_drive...
2019-9-8