🐍 Parsing RSS feeds with Python

Waylon Walker Author

Jul 13, 2020 2 min read

I am looking into a way to replace my google reader experience that I had back in 2013 before google took it from us. I am starting by learning how to parse feeds with python, and without much previous knowledge, it proved to be much easier than anticipated thanks to the feedparser library.

This is how I used python to parse rss and setup my own custom feed.

Install #

Install the feedparser library.

conda create -n reader python=3.8 -y
source activate reader
pip install feedparser

Get the content #

import feedparser
feed = feedparser.parse('https://waylonwalker.com/rss.xml')

The feed object #

The feed is a feedparser.FeedParserDict. For all intents and purposes this seems to just behave like a dict with the following keys().

feed.keys()
['feed', 'entries', 'bozo', 'headers', 'etag', 'href', 'status', 'encoding', 'version', 'namespaces', 'content'])

feed has some general information about the rss feed, but the meat of the feed is in entries. The rest of the keys weren’t all that useful for me at the moment.

pulling multiple feeds #

I grabbed a few popular RSS feeds that I was familiar with to get started.

urls = ['https://waylonwalker.com/rss.xml',
        'https://joelhooks.com/rss.xml',
        'https://swyx.io/rss.xml',
    ]
feeds = [feedparser.parse(url)['entries'] for url in urls]

I checked out the keys, all three had the following keys. Mine also had the full post under 'content', this is because I added an extra custom_element for publishing to dev.to from an RSS feed.

feeds[1][0].keys()
>>> dict_keys(['title', 'title_detail', 'summary', 'summary_detail', 'links', 'link', 'id', 'guidislink', 'published'

<!--markata-attribution-->
, 'published_parsed'])

NOTE: dev.to/feed #

I also pulled the dev.to/feed. Since is it setup for more Authors it had a few extra keys.

feedparser.parse('https://dev.to/feed')[0].keys()
>>> dict_keys(['title', 'title_detail', 'authors', 'author', 'author_detail', 'published', 'published_parsed', 'links
', 'link', 'id', 'guidislink', 'summary', 'summary_detail', 'tags'])

Combining Feeds #

Now that I have a list of feeds, I can create a single feed sorted by date with a list comprehension. Note I did need to pull in dateutil.parser to convert the date strings to datetime objects to be sorted.

import dateutil.parser

feed = [item for feed in feeds for item in feed]
feed.sort(key=lambda x: dateutil.parser.parse(x['published']), reverse=True)

[ins] In [115]: [{'title': i['title'], 'date': i['published'], 'link': i['link']}  for i in feed[:10]]
>>>
[{'title': '🙋\u200d♂️ Can Anyone Explain Twitter Cards to me?',

<!--markata-attribution-->
  'date': 'Sat, 11 Jul 2020 03:00:00 GMT',

<!--markata-attribution-->
  'link': 'https://waylonwalker.com/explain-twitter-cards/'},

<!--markata-attribution-->
 {'title': 'How I Built My GitHub Profile',

<!--markata-attribution-->
  'date': 'Fri, 10 Jul 2020 03:00:00 GMT',

<!--markata-attribution-->
  'link': 'https://waylonwalker.com/my-github-profile/'},

<!--markata-attribution-->
 {'title': 'Lessons and Regrets from My $25000 Launch',

<!--markata-attribution-->
  'date': 'Fri, 03 Jul 2020 04:06:47 GMT',

<!--markata-attribution-->
  'link': 'https://swyx.io/writing/coding-career-launch'},
 {'title': 'SLIDES - understanding python *args and **kwargs',

<!--markata-attribution-->
  'date': 'Thu, 02 Jul 2020 05:00:00 GMT',
  'link': 'https://waylonwalker.com/python-args-kwargs-slides/'},

<!--markata-attribution-->
 {'title': 'Launching the Coding Career Handbook!',

<!--markata-attribution-->
  'date': 'Wed, 01 Jul 2020 13:08:37 GMT',

<!--markata-attribution-->
  'link': 'https://swyx.io/writing/launching-coding-career'},

<!--markata-attribution-->
 {'title': 'Gracefully adopt kedro, the catalog',

<!--markata-attribution-->
  'date': 'Mon, 29 Jun 2020 03:00:00 GMT',

<!--markata-attribution-->
  'link': 'https://waylonwalker.com/graceful-kedro-catalog/'},

<!--markata-attribution-->
 {'title': "🤓 What's on your GitHub Profile",

<!--markata-attribution-->
  'date': 'Mon, 29 Jun 2020 03:00:00 GMT',
  'link': 'https://waylonwalker.com/whats-on-your-github-profile/'},
 {'title': "Versioned Docs in 30 Seconds with Amplify Console's Branch Subdomains",

<!--markata-attribution-->
  'date': 'Fri, 26 Jun 2020 16:34:09 GMT',
  'link': 'https://swyx.io/writing/amplify-console-branch-subdomains'},

<!--markata-attribution-->
 {'title': "What's New in React",

<!--markata-attribution-->
  'date': 'Wed, 24 Jun 2020 00:00:00 GMT',

<!--markata-attribution-->
  'link': 'https://swyx.io/speaking/react-whats-new'},

<!--markata-attribution-->
 {'title': 'Coding Careers - Vincit',

<!--markata-attribution-->
  'date': 'Wed, 24 Jun 2020 00:00:00 GMT',

<!--markata-attribution-->
  'link': 'https://swyx.io/speaking/coding-careers-vincit'}]

Decentralized Feed #

I think the idea of RSS is super cool, and the idea that I can potentially create my own custom platform-agnostic decentralized feed is pretty cool. I would love to have a google reader like experience back.

This post was super fun to explore. I used an external library (feedparser) to pull in the feeds, but other than that It was all vanilla python 3.8. In DataScience we tend to get very DataFrame heavy and I miss working with vanilla datatypes sometimes.

Trying to step up your python game #

While trying to step up your skills you will need lots of practice. Its good to have several options to try out ideas quickly. I often use https://replit.com, check out this post to see how I use it.

🐍 Practice Python Online

Not a sponsor REPL.it is a great way to practice.

Latest Thoughts #

Sustainable Augmented Development • Kent Beck • YOW! 2025

youtube.com

Interesting thoughts from a real OG Kent Beck. He argues we are whitnissing the biggest advancement in computing in 40 years. For 40 years we have been almost exclusively writing in compiled languages. Languages that output to some sort of binary compiled assembly instructions for each processor type. Before that it was all hand written instructions directly to CPU. Now we are seeing the genie revolution. The genie is good a writing in many different programming languages given english prompts. It loves to complete problems and is not afraid to write lots of code very quickly. Its quite good at skirting around our best judgement in order to accomplish the goal we set.

Some take aways from this talk is that NO ONE knows what the fuck is next. No one knows the best ways to use these things. Everyone has their own opinion wrapped around their experience that might be wildly different than yours. It seems clear this is changing every day and ways no one can measure, predict, or prepare for. So much sits in the hands of model labs and model providers. Without real access to any of it, its all magic.

The Internet I Grew Up With Doesn’t Exist Anymore - cleberg.net

cleberg.net html" hidden="">

I align a lot with this post. From growing up in a rural house where internet access was harder to get. It was slow and only had one line, if you were on the internet, no one in the house could recieve a phone call. Idk if what is now the smol webl, the indieweb, is seeing a resurgence, if I’m just noticing it more. It does feel like along there has been a very small number of users willing to get their own domain, their own server, and host their own shit. A few of these are starting to really break out occasionally, see things like wordle [1] hitting mass adoptioun before being swooped up by a giant.

I too miss the internet of old, heck I really dont even mind the middle era of fb, twitter, instagram dominance, but what we have now is dominated by political undertones on it all with everyone shouting “Fake News” at each other. These sites are the easiest place to stay in touch with those you have met online, yet they are riddled with so much toxicity its impossible to have a great experience on them. I’m about a year and a half of not having a single one on my phone, and at this point I barely log into them once a week. RSS is the way forward, as long as its not killed by pay…

Nibelungenlied40 | Book Split Keyboard Design

www.reddit.com

Reddit - Please wait for verification reddit.com [1]

Absolute whimsy achieved in this book themed #keeb build!

References: [1]: https://www.reddit.com/r/MechanicalKeyboards/comments/1u9awcz/nibelungenlied40_book_split_keyboard_design/?utm_source=cassidoo&amp;utm_medium=email&amp;utm_campaign=u1f6b8-you-can-make-anything-by-writing-cs-lewis

Dumb people lay off - YouTube

www.youtube.com

oof Mark, this does not feel like it is set to age well. These people in power feel so disconnected from regular people with a job trying to do work.

The Website Specification

specification.website

The Website Specification A platform-agnostic, full specification of the technical features a good website should have. Built in the open under an MIT licence. The Website Specification · specification.website [1]

A solid checklist for agents to implement on most sites. Very few sites need 100% coverage, but most should probably check most of these boxes

References: [1]: https://specification.website/

Recent Pings #

Ping 65 #

Usage Resets??

Apparently you can use a reset to completely reset your codex usage limits. I never knew these existed and apparently I have some expiring soon. Not sure if they auto apply and this is why I get the weirdest reset experience on the usage chart or what.

I have 3? they expire soon, how many have I missed out on.

hmmm. #

TIL, `?.` is sometimes pronounced as a `hmm dot`

Does codex change throughout the token usage window? #

Is it me or can you feel the 5 hour and weekly limit of codex nearing. It feels like it slowly stops working tasks to completion and asking to continue. I nearly start yelling into the enter, "YES, DO WHAT WE AGREED ON, STOP ASKING" then I notice I'm in the single digits of remaining window left.

Mythos gone already #

Mythos released days ago, now its gone, before most of us were allowed to touch it.

just good enough #

I built shit on the web before AI, some of it was okay, some I put a lot of time into, but some was just really shitty. Done just well enough to get the job done. All of it had to be written by hand or copy pasted from stack overflow. It was a lot of fun. Now we have a new level of shitty, it looks fine. It looks like it should work good. None of it is just barely good enough, nearly browser default style anymore.

`j`	Scroll down
`k`	Scroll up
`g` `g`	Scroll to top
`Shift` `G`	Scroll to bottom
`d`	Half-page down
`u`	Half-page up

`j` / `↓`	Next post (in feeds)
`k` / `↑`	Previous post (in feeds)
`Enter` / `o`	Open highlighted post
`Shift` `O`	Open in new tab
`g` `h`	Go to home
`g` `s`	Focus search
`[`	Previous page
`]`	Next page
`b`	Toggle left sidebar
`Shift` `B`	Toggle right sidebar
`s`	Toggle simple/rich feed view

`/`	Focus search input
`⌘CtrlK`	Focus search (alternative)
`y` `y`	Copy URL to clipboard
`?`	Show this help
`Esc`	Close / clear highlight