Tags
Kedro versioned datasets can be mixed with incremental and partitioned datasets to do some timeseries analysis on how our dataset changes over time. Kedro is a very extensible and composible framework, that allows us to build solutions from the individual components that it provides. This article is a great example of how you can combine these components in unique ways to achieve some powerful results with very little work.
๐ Unsure what kedro is? Check out this post.
How does our dataset change over time??
This was a question presented to me at work. We had some plots being produces as the output of our pipeline and the user wanted the ability to compare results over time. Luckily this was asked early in the project so we were able to proactively setup versioning on the right datasets.
To enable this all we needed to do now was to add versioned: true
and we will
be able to compare results over time. Yes kedro makes it that easy to setup.
set up a project
Set up a new project just as usual. note I like using pipx for global cli packages. You can pick a specific version of kedro or opt for the latest while simply globally installing kedro and running kedro new is purely dependent on the last time you chose to update kedro.
pip install pipx pipx run kedro new cd versioned-partitioned-kedro-example conda create -n versioned-partitioned-kedro-example python=3.8 -y conda activate versioned-partitioned-kedro-example pip install kedro kedro install git init git add . git commit -m "init project from pipx run kedro new"
I called my project versioned-partitioned-kedro-example. You can call your
project whatever you like. If you try to use some special characters where
they don't belong, kedro will catch you. Under the hood, kedro is using a
library called cookiecutter
โ ๏ธ Please do not skip out on using a virtual environment. You may use whichever virtual environment tool you prefer, but please do not skip out. Wrecking a running project for learning is not fun.
update dependencies
I popped open my dependencies, added kedro[pandas]
and find-kedro
. Since those are extra packages our example will require.
aiohttp black==21.5b1 find-kedro flake8>=3.7.9, <4.0 ipython isort~=5.0 jupyter_client>=5.1, <7.0 jupyterlab~=3.0 jupyter~=1.0 kedro-telemetry~=0.1.0 kedro==0.17.4 kedro[pandas] nbstripout~=0.4 pytest-cov~=2.5 pytest-mock>=1.7.1, <2.0 pytest~=6.2 requests wheel>=0.35, <0.37
note I created find-kedro
, and I like using it to create my
pipeline object. Think of how pytest automatically picks up everything named
test
, find-kedro
does the same thing for kedro. It picks up everything
with node
or pipeline
in the name and creates pipelines out of it.
Install new dependencies
After adding our additional dependencies to the requirements.in
, we
can tell kedro to install everything and compile the dependencies. Behind
the scenes --build-reqs
uses a library called pip-compile
to create a
requirements.txt
file with hard pinned dependencies, which is ideal for
creating reproducible projects. You and your future colleagues may not thank
you for this, but they sure as heck won't be cursing your name when they can't
get the project to run.
kedro install --build-reqs git add . git commit -m "added additional dependencies"
create a node
For this example, we need a node to do much. This node will
pass the cars.csv
from a URL to a parquet
file. I am going to use a
lambda to build my identity function inline.
# pipelines/cars_nodes.py from kedro.pipeline import node nodes = [] nodes.append( node( func=lambda x:x, inputs='raw_cars', outputs='int_cars', name='create_int_cars', ) )
๐๏ธ note
find-kedro' will automatically pick up these nodes for us after we set up our
pipeline_registry.py`.
git add . git commit -m "add create_int_cars node"
implement find-kedro
Next, we need to tell kedro where our nodes are. This is where find-kedro
comes in. Once we point to the directory where our modules of
nodes/pipelines are, it creates the pipelines dictionary for us automatically. It will even separate each module into a pipeline and stitch them all into one default pipeline.
# pipeline_registry.py """Project pipelines.""" from typing import Dict from pathlib import Path from kedro.pipeline import Pipeline from find_kedro import find_kedro def register_pipelines() -> Dict[str, Pipeline]: """Register the project's pipelines. Returns: A mapping from a pipeline name to a "Pipeline "object. """ pipeline_dir = Path(__file__).parent / 'pipelines' return find_kedro(directory= pipeline_dir)
๐๏ธ This is very similar to the default ` pipeline_registry'except the last two lines.
git add . git commit -m "implement find-kedro"
create a baseline catalog
Once we have a pipeline setup, the kedro cli can automatically fill in missing
catalog entries with MemoryDataSet
's. Thus, using the cli helps consistently scaffold the
catalog and ensure we don't end up with a typo in our
dataset name.
kedro catalog create --pipeline cars_nodes
Kedro will kick out the following catalog file to base/catalog/cars_nodes.yml
for us to get started with.
raw_cars: type: MemoryDataSet int_cars: type: MemoryDataSet
๐ฅ use the kedro cli to fill in any missing datasets from the automatically catalog.
make a versioned dataset
Kedro has scaffolded MemoryDataSet
's for us. We will convert them to the
appropriate dataset type and turn on versioning for our int
layer, which is
the first point we save in our environment.
raw_cars: type: pandas.CSVDataSet filepath: https://waylonwalker.com/cars.csv int_cars: type: pandas.ParquetDataSet filepath: data/int_cars.parquet versioned: true
Commit your changes to the catalog.
git add . git commit -m "create catalog"
run the pipeline
Once we have the nodes and catalog setup, we can run the pipeline a few times
to get some versioned data. Each time we run, it will save a new version inside
the int_cars.parquet
directory.
kedro run kedro run kedro run kedro run kedro run
๐๏ธ we put our data in the data directory. By default, this directory is included in the
.gitignore
and will not be picked up by git.
inspect the data
Listing the files in data/int_cars.parquet
shows that I now have five different
datasets available. I can load old ones, but by default, kedro will load the
latest one.
ls data/int_cars.parquet 2021-07-05T15.24.53.164Z 2021-07-05T15.29.56.144Z 2021-07-05T15.30.23.101Z 2021-07-05T15.30.26.555Z 2021-07-05T15.31.12.688Z
๐๏ธ kedro sets the version at the timestamp that the session starts. All datasets created within the same run will have the same version.
stack on an incremental dataset
This is where things get interesting. Kedro comes with an incremental dataset
that will load all of the files from a particular directory into a dictionary
where the keys are the filename of the dataset. To load up all datasets into
this dictionary all we need to do is add a new catalog entry that is a type: PartitionedDataSet
, with a path
pointing to the same place as the original,
and a dataset
type the same as the original.
int_cars_partitioned: type: PartitionedDataSet dataset: pandas.ParquetDataSet path: data/int_cars.parquet
catalog list
Listing the catalog entries confirms that we have successfully added our new PartitionedDataSet
.
In [17]: context.catalog.list() Out[17]: ['raw_cars', 'int_cars', 'int_cars_partitioned', 'parameters']
loading an incremental dataset
Now we can easily load the datasets from every run we just did into a single
dictionary, simply by running context.catalog.load('int_cars_incremental')
.
In [19]: context.catalog.load('int_cars_incremental') 2021-07-05 11:32:40,534 - kedro.io.data_catalog - INFO - Loading data from `int_cars_incremental` (IncrementalDataSet)... Out[19]: {'2021-07-05T15.29.56.144Z/int_cars.parquet': Unnamed: 0 mpg cyl disp hp drat wt qsec vs am gear carb 0 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 3 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 5 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 6 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 7 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 8 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 9 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 10 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 11 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 12 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 13 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 14 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 15 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 16 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 17 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 18 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 19 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 20 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 21 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 22 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 23 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 24 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 25 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 26 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 27 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 28 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 29 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 30 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 31 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2, '2021-07-05T15.30.23.101Z/int_cars.parquet': Unnamed: 0 mpg cyl disp hp drat wt qsec vs am gear carb 0 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 3 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 5 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 6 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 7 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 8 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 9 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 10 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 11 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 12 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 13 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 14 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 15 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 16 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 17 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 18 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 19 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 20 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 21 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 22 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 23 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 24 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 25 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 26 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 27 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 28 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 29 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 30 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 31 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2, '2021-07-05T15.30.26.555Z/int_cars.parquet': Unnamed: 0 mpg cyl disp hp drat wt qsec vs am gear carb 0 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 3 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 5 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 6 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 7 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 8 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 9 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 10 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 11 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 12 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 13 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 14 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 15 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 16 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 17 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 18 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 19 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 20 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 21 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 22 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 23 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 24 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 25 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 26 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 27 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 28 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 29 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 30 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 31 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2, '2021-07-05T15.31.12.688Z/int_cars.parquet': Unnamed: 0 mpg cyl disp hp drat wt qsec vs am gear carb 0 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 3 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 5 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 6 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 7 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 8 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 9 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 10 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 11 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 12 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 13 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 14 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 15 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 16 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 17 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 18 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 19 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 20 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 21 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 22 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 23 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 24 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 25 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 26 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 27 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 28 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 29 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 30 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 31 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2}
๐ notice that incremental datasets are all loaded for you, its a dict of
filepath:dataset
stack on a partitioned dataset
Let's take a look at a similar type of dataset called PartitionedDataSet
. We
can add it to the catalog in a very similar way to how we added the
IncrementalDataSet
.
int_cars_incremental: type: IncrementalDataSet dataset: pandas.ParquetDataSet path: data/int_cars.parquet
loading a partitioned dataset
Note that we get a dict with the same keys as before, but this time the
values are a load function rather than loaded data. Partitioned datasets can be helpful if
you are operating on datasets that take up more memory than you have available.
In our case of coupling this with versioned datasets, its likely to grow quite
large, so PartitionedDataSet
's are likely a better option for this use.
In [18]: context.catalog.load('int_cars_partitioned') 2021-07-05 11:31:11,253 - kedro.io.data_catalog - INFO - Loading data from `int_cars_partitioned` (PartitionedDataSet)... Out[18]: {'2021-07-05T15.29.56.144Z/int_cars.parquet': <bound method AbstractVersionedDataSet.load of <kedro.extras.datasets.pandas.parquet_dataset.ParquetDataSet object at 0x7f4bb1570820>>, '2021-07-05T15.30.23.101Z/int_cars.parquet': <bound method AbstractVersionedDataSet.load of <kedro.extras.datasets.pandas.parquet_dataset.ParquetDataSet object at 0x7f4bb1570850>>, '2021-07-05T15.30.26.555Z/int_cars.parquet': <bound method AbstractVersionedDataSet.load of <kedro.extras.datasets.pandas.parquet_dataset.ParquetDataSet object at 0x7f4bb1570910>>, '2021-07-05T15.31.12.688Z/int_cars.parquet': <bound method AbstractVersionedDataSet.load of <kedro.extras.datasets.pandas.parquet_dataset.ParquetDataSet object at 0x7f4bb15709a0>>}
incremental vs. partitioned
IncrementalDataSet
's and PartitionedDataSet
's are very similar as they give
you access to a whole directory of data that uses the same underlying dataset
loader. The significant difference is whether you want your data pre-loaded or if
you want to load and dispose of it as you iterate over it.
- incremental loads the data
- partitioned give a load function
creating nodes with partitioned datasets
Let's create a node with this PartitionedDataSet
to collect stats on our
dataset over time. This node does a dict comprehension to get the length of
each version that we pulled.
def timeseries_partitioned(cars: Dict): return {k:len(car()) for k, car in cars.items()} nodes.append( node( func=timeseries_partitioned, inputs='int_cars_partitioned', outputs='int_cars_timeseries_partitioned', name='create_int_cars_timeseries_partitioned', ) )
๐๏ธ note that inside of the dict comprehension car is a load function that we need to call.
creating nodes with incremental datasets
Doing the same node with our IncrementalDataSet
looks very similar, except
this time car is loaded data inside of the dict comprehension, not a function
that we need to call.
def timeseries_incremental(cars: Dict): return {k:len(car) for k, car in cars.items()} nodes.append( node( func=timeseries_incremental, inputs='int_cars_incremental', outputs='int_cars_timeseries_incremental', name='create_int_cars_timeseries_incremental', ) )
More catalog entries
After adding those nodes, we can add the catalog entries again with the command line. This will not overwrite any of the datasets we just created. It will only add to it.
kedro catalog create --pipeline cars_nodes
int_cars_timeseries_partitioned: type: MemoryDataSet int_cars_timeseries_incremental: type: MemoryDataSet
int_cars_timeseries_partitioned: type: pickle.PickleDataSet filepath: data/int_cars_timeseries_partitioned.parquet int_cars_timeseries_incremental: type: pickle.PickleDataSet filepath: data/int_cars_timeseries_incremental.parquet
Loading the new datasets
Loading the two dtasets that we just created show that we have the ended up with the same result using both incremental and partitioned datasets. This result is a dictionary of filepaths mapped to the size of the dataset. Since the default filepaths are timestamps we could start doing some time series analysis to see how our dataset is changing over time.
In [32]: context.catalog.load('int_cars_timeseries_incremental') 2021-07-05 12:00:55,014 - kedro.io.data_catalog - INFO - Loading data from `int_cars_timeseries_incremental` (PickleDataSet)... Out[32]: {'2021-07-05T15.29.56.144Z/int_cars.parquet': 32, '2021-07-05T15.30.23.101Z/int_cars.parquet': 32, '2021-07-05T15.30.26.555Z/int_cars.parquet': 32, '2021-07-05T15.31.12.688Z/int_cars.parquet': 32, '2021-07-05T16.43.43.088Z/int_cars.parquet': 32} In [33]: context.catalog.load('int_cars_timeseries_partitioned') 2021-07-05 12:01:03,223 - kedro.io.data_catalog - INFO - Loading data from `int_cars_timeseries_partitioned` (PickleDataSet)... Out[33]: {'2021-07-05T15.29.56.144Z/int_cars.parquet': 32, '2021-07-05T15.30.23.101Z/int_cars.parquet': 32, '2021-07-05T15.30.26.555Z/int_cars.parquet': 32, '2021-07-05T15.31.12.688Z/int_cars.parquet': 32, '2021-07-05T16.43.43.088Z/int_cars.parquet': 32, '2021-07-05T16.50.46.686Z/int_cars.parquet': 32}
โ๏ธ I have a full article on creating datasets that are not tabular datasets using pickle.
cross posted to dev
If you like this article give it some โค๏ธ๐ฆ๐ท๏ธ on dev https://dev.to/waylonwalker/incremental-versioned-datasets-in-kedro-3ajn
This post was primarily built live on https://twitch.tv/waylonwalker, give me a follow and join in the live show if that is something that interests you.