Find all Headings with BeautifulSoup ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BeautifulSoup is a DOM like library for python. It's quite useful to manipulate html. Here is an example to find_all html headings. I stole the regex from... Date: February 1, 2022 BeautifulSoup is a DOM like library for python. It’s quite useful to manipulate [4m[38;2;127;187;179mhtml[0m <[38;2;122;132;120m/html/[0m>. Here is an example to find_all html headings. I stole the regex from stack overflow, but who doesn’t. [1m[38;2;167;192;128mMake an example[0m [38;2;71;82;88m───────────────[0m [3msample.html[0m Lets make a sample.html file with the following contents. It mainly has some headings, [38;2;167;192;128m

[0m and [38;2;167;192;128m

[0m tags that I want to be able to find. [38;2;122;132;120m[code][0m

hello

this is a paragraph

second heading

this is also a paragraph

third heading

this is the last paragraph

[1m[38;2;167;192;128mGet the headings with BeautifulSoup[0m [38;2;71;82;88m───────────────────────────────────[0m Lets import our packages, read in our [38;2;167;192;128msample.html[0m using pathlib and find all headings using BeautifulSoup. [38;2;122;132;120m[code][0m from bs4 import BeautifulSoup from pathlib import Path soup = BeautifulSoup(Path('sample.html').read_text(), features="lxml") headings = soup.find_all(re.compile("^h[1-6]$")) And what we get is a list of [38;2;167;192;128mbs4.element.Tag[0m’s. [38;2;122;132;120m[code][0m >> print(headings) [

hello

second heading

third heading

] I recently added a heading_link plugin to markata, you might notice the 🔗’s next to each heading on this page, that is powered by this exact technique.

[0m and [38;2;167;192;128m

[0m tags that I want to be able to find. [38;2;122;132;120m[code][0m

hello

second heading

third heading

hello

second heading

third heading

[0m and [38;2;167;192;128m

[0m tags that I want to be able to find. [38;2;122;132;120m[code][0m