Kedro - My Data Is Not A Table

Waylon Walker Author

Jan 14, 2021 2 min read

In python data science/engineering most of our data is in the form of some sort of table, typically a DataFrame from a library like pandas, spark, or dask.

DataFrames are the heart of most pipelines #

These containers for data contain many convenient methods to manipulate table like data structures. Sometimes we leverage other data types, namely vanilla types like lists and dicts, or even numpy data types.

What is Kedro

unfamiliar with kedro, check out this post

Sometimes datasets are not tables #

There are times when our data doesn’t fit nicely into a DataFrame. Lucky for us Kedro has pickle support out of the box. Pickle is a way to store any python object to disk. Beware that pickle files coming from an unknown source can run malicous code and are considered unsafe. For the most part though when you read and write your own pickle files they are a good tool to consider.

See more about pickle from python.org.

Cataloging Pickle #

I may have a dictionary that describes some cars.

{
  'truck-012-abc': {
    'type': 'truck'
    'sales': [12, 2, 3, 4, 8]
    'weight': 9024,
    'accesories': ['leather', 'audio-1']
}

In the catalog we will simply set the type as pickle.PickleDataSet and give it a filepath.

cars:
  filepath: data/cars.pkl
  type: pickle.PickleDataSet

This filepath does not have to be on the local filesystem it can be on the cloud thanks to how kedro utilizes fsspec for each of its datasets.

Loading the dataset #

The benefit of cataloging this dataset compared to leaving it as a MemoryDataSet is that you can easily load this data back into memory for further development or debugging without running any of the pipeline.

catalog.load('cars')

Latest Thoughts #

Sustainable Augmented Development • Kent Beck • YOW! 2025

youtube.com

Interesting thoughts from a real OG Kent Beck. He argues we are whitnissing the biggest advancement in computing in 40 years. For 40 years we have been almost exclusively writing in compiled languages. Languages that output to some sort of binary compiled assembly instructions for each processor type. Before that it was all hand written instructions directly to CPU. Now we are seeing the genie revolution. The genie is good a writing in many different programming languages given english prompts. It loves to complete problems and is not afraid to write lots of code very quickly. Its quite good at skirting around our best judgement in order to accomplish the goal we set.

Some take aways from this talk is that NO ONE knows what the fuck is next. No one knows the best ways to use these things. Everyone has their own opinion wrapped around their experience that might be wildly different than yours. It seems clear this is changing every day and ways no one can measure, predict, or prepare for. So much sits in the hands of model labs and model providers. Without real access to any of it, its all magic.

The Internet I Grew Up With Doesn’t Exist Anymore - cleberg.net

cleberg.net html" hidden="">

I align a lot with this post. From growing up in a rural house where internet access was harder to get. It was slow and only had one line, if you were on the internet, no one in the house could recieve a phone call. Idk if what is now the smol webl, the indieweb, is seeing a resurgence, if I’m just noticing it more. It does feel like along there has been a very small number of users willing to get their own domain, their own server, and host their own shit. A few of these are starting to really break out occasionally, see things like wordle [1] hitting mass adoptioun before being swooped up by a giant.

I too miss the internet of old, heck I really dont even mind the middle era of fb, twitter, instagram dominance, but what we have now is dominated by political undertones on it all with everyone shouting “Fake News” at each other. These sites are the easiest place to stay in touch with those you have met online, yet they are riddled with so much toxicity its impossible to have a great experience on them. I’m about a year and a half of not having a single one on my phone, and at this point I barely log into them once a week. RSS is the way forward, as long as its not killed by pay…

Nibelungenlied40 | Book Split Keyboard Design

www.reddit.com

Reddit - Please wait for verification reddit.com [1]

Absolute whimsy achieved in this book themed #keeb build!

References: [1]: https://www.reddit.com/r/MechanicalKeyboards/comments/1u9awcz/nibelungenlied40_book_split_keyboard_design/?utm_source=cassidoo&amp;utm_medium=email&amp;utm_campaign=u1f6b8-you-can-make-anything-by-writing-cs-lewis

Dumb people lay off - YouTube

www.youtube.com

oof Mark, this does not feel like it is set to age well. These people in power feel so disconnected from regular people with a job trying to do work.

The Website Specification

specification.website

The Website Specification A platform-agnostic, full specification of the technical features a good website should have. Built in the open under an MIT licence. The Website Specification · specification.website [1]

A solid checklist for agents to implement on most sites. Very few sites need 100% coverage, but most should probably check most of these boxes

References: [1]: https://specification.website/

Recent Pings #

Ping 65 #

Usage Resets??

Apparently you can use a reset to completely reset your codex usage limits. I never knew these existed and apparently I have some expiring soon. Not sure if they auto apply and this is why I get the weirdest reset experience on the usage chart or what.

I have 3? they expire soon, how many have I missed out on.

hmmm. #

TIL, `?.` is sometimes pronounced as a `hmm dot`

Does codex change throughout the token usage window? #

Is it me or can you feel the 5 hour and weekly limit of codex nearing. It feels like it slowly stops working tasks to completion and asking to continue. I nearly start yelling into the enter, "YES, DO WHAT WE AGREED ON, STOP ASKING" then I notice I'm in the single digits of remaining window left.

Mythos gone already #

Mythos released days ago, now its gone, before most of us were allowed to touch it.

just good enough #

I built shit on the web before AI, some of it was okay, some I put a lot of time into, but some was just really shitty. Done just well enough to get the job done. All of it had to be written by hand or copy pasted from stack overflow. It was a lot of fun. Now we have a new level of shitty, it looks fine. It looks like it should work good. None of it is just barely good enough, nearly browser default style anymore.

`j`	Scroll down
`k`	Scroll up
`g` `g`	Scroll to top
`Shift` `G`	Scroll to bottom
`d`	Half-page down
`u`	Half-page up

`j` / `↓`	Next post (in feeds)
`k` / `↑`	Previous post (in feeds)
`Enter` / `o`	Open highlighted post
`Shift` `O`	Open in new tab
`g` `h`	Go to home
`g` `s`	Focus search
`[`	Previous page
`]`	Next page
`b`	Toggle left sidebar
`Shift` `B`	Toggle right sidebar
`s`	Toggle simple/rich feed view

`/`	Focus search input
`⌘CtrlK`	Focus search (alternative)
`y` `y`	Copy URL to clipboard
`?`	Show this help
`Esc`	Close / clear highlight

DataFrames are the heart of most pipelines #

Sometimes datasets are not tables #

Cataloging Pickle #

Loading the dataset #

Connections

Share this post