Using a Python Markdown ast to Find All Paragraphs

Waylon Walker Author

Feb 5, 2022 1 min read

In looking for a way to automatically generate descriptions for pages I stumbled into a markdown ast in python. It allows me to go over the markdown page and get only paragraph text. This will ignore headings, blockquotes, and code fences.

import commonmark
import frontmatter

post = frontmatter.load("post.md")
parser = commonmark.Parser()
ast = parser.parse(post.content)

paragraphs = ''
for node in ast.walker():
    if node[0].t == "paragraph":
        paragraphs += " "
        paragraphs += node[0].first_child.literal

It’s also super fast, previously I was rendering to html and using beautifulsoup to get only the paragraphs. Using the commonmark ast was about 5x faster on my site.

Duplicate Paragraphs #

When I originally wrote this post, I did not realize at the time that commonmark duplicates nodes. I still do not understand why, but I have had success duplicating them based on the source position of the node with the snippet below.

from itertools import compress

import commonmark
import frontmatter

post = frontmatter.load("post.md")
parser = commonmark.Parser()
ast = parser.parse(post.content)

# find all paragraph nodes
paragraph_nodes = [
    n[0]
    for n in ast.walker()
    if n[0].t == "paragraph" and n[0].first_child.literal is not None
]
# for reasons unknown to me commonmark duplicates nodes, dedupe based on sourcepos
sourcepos = [p.sourcepos for p in paragraph_nodes]
# find first occurence of node based on source position
unique_mask = [sourcepos.index(s) == i for i, s in enumerate(sourcepos)]
# deduplicate paragraph_nodes based on unique source position
unique_paragraph_nodes = list(compress(paragraph_nodes, unique_mask))
paragraphs = " ".join([p.first_child.literal for p in unique_paragraph_nodes])

Latest Thoughts #

Sustainable Augmented Development • Kent Beck • YOW! 2025

youtube.com

Interesting thoughts from a real OG Kent Beck. He argues we are whitnissing the biggest advancement in computing in 40 years. For 40 years we have been almost exclusively writing in compiled languages. Languages that output to some sort of binary compiled assembly instructions for each processor type. Before that it was all hand written instructions directly to CPU. Now we are seeing the genie revolution. The genie is good a writing in many different programming languages given english prompts. It loves to complete problems and is not afraid to write lots of code very quickly. Its quite good at skirting around our best judgement in order to accomplish the goal we set.

Some take aways from this talk is that NO ONE knows what the fuck is next. No one knows the best ways to use these things. Everyone has their own opinion wrapped around their experience that might be wildly different than yours. It seems clear this is changing every day and ways no one can measure, predict, or prepare for. So much sits in the hands of model labs and model providers. Without real access to any of it, its all magic.

The Internet I Grew Up With Doesn’t Exist Anymore - cleberg.net

cleberg.net html" hidden="">

I align a lot with this post. From growing up in a rural house where internet access was harder to get. It was slow and only had one line, if you were on the internet, no one in the house could recieve a phone call. Idk if what is now the smol webl, the indieweb, is seeing a resurgence, if I’m just noticing it more. It does feel like along there has been a very small number of users willing to get their own domain, their own server, and host their own shit. A few of these are starting to really break out occasionally, see things like wordle [1] hitting mass adoptioun before being swooped up by a giant.

I too miss the internet of old, heck I really dont even mind the middle era of fb, twitter, instagram dominance, but what we have now is dominated by political undertones on it all with everyone shouting “Fake News” at each other. These sites are the easiest place to stay in touch with those you have met online, yet they are riddled with so much toxicity its impossible to have a great experience on them. I’m about a year and a half of not having a single one on my phone, and at this point I barely log into them once a week. RSS is the way forward, as long as its not killed by pay…

Nibelungenlied40 | Book Split Keyboard Design

www.reddit.com

Reddit - Please wait for verification reddit.com [1]

Absolute whimsy achieved in this book themed #keeb build!

References: [1]: https://www.reddit.com/r/MechanicalKeyboards/comments/1u9awcz/nibelungenlied40_book_split_keyboard_design/?utm_source=cassidoo&amp;utm_medium=email&amp;utm_campaign=u1f6b8-you-can-make-anything-by-writing-cs-lewis

Dumb people lay off - YouTube

www.youtube.com

oof Mark, this does not feel like it is set to age well. These people in power feel so disconnected from regular people with a job trying to do work.

The Website Specification

specification.website

The Website Specification A platform-agnostic, full specification of the technical features a good website should have. Built in the open under an MIT licence. The Website Specification · specification.website [1]

A solid checklist for agents to implement on most sites. Very few sites need 100% coverage, but most should probably check most of these boxes

References: [1]: https://specification.website/

Recent Pings #

Ping 65 #

Usage Resets??

Apparently you can use a reset to completely reset your codex usage limits. I never knew these existed and apparently I have some expiring soon. Not sure if they auto apply and this is why I get the weirdest reset experience on the usage chart or what.

I have 3? they expire soon, how many have I missed out on.

hmmm. #

TIL, `?.` is sometimes pronounced as a `hmm dot`

Does codex change throughout the token usage window? #

Is it me or can you feel the 5 hour and weekly limit of codex nearing. It feels like it slowly stops working tasks to completion and asking to continue. I nearly start yelling into the enter, "YES, DO WHAT WE AGREED ON, STOP ASKING" then I notice I'm in the single digits of remaining window left.

Mythos gone already #

Mythos released days ago, now its gone, before most of us were allowed to touch it.

just good enough #

I built shit on the web before AI, some of it was okay, some I put a lot of time into, but some was just really shitty. Done just well enough to get the job done. All of it had to be written by hand or copy pasted from stack overflow. It was a lot of fun. Now we have a new level of shitty, it looks fine. It looks like it should work good. None of it is just barely good enough, nearly browser default style anymore.

`j`	Scroll down
`k`	Scroll up
`g` `g`	Scroll to top
`Shift` `G`	Scroll to bottom
`d`	Half-page down
`u`	Half-page up

`j` / `↓`	Next post (in feeds)
`k` / `↑`	Previous post (in feeds)
`Enter` / `o`	Open highlighted post
`Shift` `O`	Open in new tab
`g` `h`	Go to home
`g` `s`	Focus search
`[`	Previous page
`]`	Next page
`b`	Toggle left sidebar
`Shift` `B`	Toggle right sidebar
`s`	Toggle simple/rich feed view

`/`	Focus search input
`⌘CtrlK`	Focus search (alternative)
`y` `y`	Copy URL to clipboard
`?`	Show this help
`Esc`	Close / clear highlight