Reader-2020 ━━━━━━━━━━━ Notes about my reader idea Date: July 12, 2020 Inputs ────── The input will be a yaml file containing a list of Items you want to stay up to date with. Inside each item will be a url, and weight. ``` email: max-entries: 10 recipients: - waylon@waylonwalker.com markdown: max-entries: 100 output: - README.md json: max-entries: 1000 output: - feeds/feed.json rss: max-entries: 1000 output: - feeds/feed.xml html: max-entries: 100 output: index.html items: Waylon Walker: weight: 5 url: https://waylonwalker.com/rss.xml @_WaylonWalker: weight: 3 twitter: https://twitter.com/_waylonwalker DEV Waylon Walker: weight: 8 url: https://dev.to/waylonwalker Stack Overflow Kedro: weight: 5 url: https://stackoverflow.com/questions/tagged/kedro Kedro GitHub: weight: 4 url: https://github.com/kedro-org/kedro Kedro Pypi weight: 10 url: https://pypi.org/project/kedro/ ``` Types ───── - rss feed (primary source) - youtube feed - Stack Overflow tags - GitHub repo activity - pypi release - dev.to post - Twitter Search # user will need an api key Methodology ━━━━━━━━━━━ Each url will be pulled in and parsed into a standard data scructure. Some items may yield special feaures, a schemaless/nosql datastructure may be best. Pipeline will decide to how to weight posts based on users weight, recent position on feed, . ``` a_raw (raw plain text / json items) -> b_int (cleaned items) -> c_pri (single feed of items) -> d_fea (weighted feed of items) -> e_out (requested output formats) ``` output ────── Pipeline outputs will be email, json, markdown, xml, html . Each will be able to be configured by the config file (max-entries, output location). Running ─────── Users will be able to create their own reader. Here is a list of possiblilites. Users will not have the pipeline inside their repo. It will pull the pipeline from a package repo, pypi, dockerhub, GitHub. - fork a template repo (might be cumbersome to update) - use a GitHub action from the Marketplace (easier to update) - GH actions will run the pipeline on a schedule