Kedro Catalog
━━━━━━━━━━━━━

I am exploring a kedro catalog meta data hook

Date: July 24, 2020

I am exploring a kedro catalog meta data hook, these are some notes about what I am thinking.

[1m[38;2;167;192;128mProcess[0m
[38;2;71;82;88m───────[0m

- metadata will be attached to the dataset object under a

  .metadata

  attribute
- metadata will be updated

  after_node_run
- metadata will be empty until a pipeline is ran with the hook on
- optionally a function to add metadata will be added
- metadata will be stored in a file next to the

  filepath
- meta

[1m[38;2;167;192;128mProblems This Hook Should solve[0m
[38;2;71;82;88m───────────────────────────────[0m

- what datasets have a columns with

  sales

  in the name
- what datasets were updated after last tuesday
- which pipeline node created this dataset
- how many rows are in this dataset (without reloading all datasets)

[1m[38;2;167;192;128mimplementation details[0m
[38;2;71;82;88m──────────────────────[0m

- metadata will be attached to each dataset as a dictionary
- list/dict comprehensions can be used to make queries

[1m[38;2;167;192;128mMetadata to Capture[0m
[38;2;71;82;88m───────────────────[0m

try pandas method -> try spark -> try dict/list -> none

- column names
- length
- Null count
- created_by node name

[1m[38;2;167;192;128mDatabase?[0m
[38;2;71;82;88m─────────[0m

Is there an easy way to create a nosql database in memory from a a list of dictionaries?

- list-dict-DB
- dataset
- TinyDB