Find all Headings with BeautifulSoup ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BeautifulSoup is a DOM like library for python. It's quite useful to manipulate html. Here is an example to find_all html headings. I stole the regex from... Date: February 1, 2022 BeautifulSoup is a DOM like library for python. It’s quite useful to manipulate [4m[38;2;127;187;179mhtml[0m <[38;2;122;132;120m/html/[0m>. Here is an example to find_all html headings. I stole the regex from stack overflow, but who doesn’t. [1m[38;2;167;192;128mMake an example[0m [38;2;71;82;88m───────────────[0m [3msample.html[0m Lets make a sample.html file with the following contents. It mainly has some headings, [38;2;167;192;128m
this is a paragraph
this is also a paragraph
this is the last paragraph
[1m[38;2;167;192;128mGet the headings with BeautifulSoup[0m [38;2;71;82;88m───────────────────────────────────[0m Lets import our packages, read in our [38;2;167;192;128msample.html[0m using pathlib and find all headings using BeautifulSoup. [38;2;122;132;120m[code][0m from bs4 import BeautifulSoup from pathlib import Path soup = BeautifulSoup(Path('sample.html').read_text(), features="lxml") headings = soup.find_all(re.compile("^h[1-6]$")) And what we get is a list of [38;2;167;192;128mbs4.element.Tag[0m’s. [38;2;122;132;120m[code][0m >> print(headings) [