Find all Headings with BeautifulSoup

Waylon Walker

Make an example #

sample.html

Lets make a sample.html file with the following contents. It mainly has some headings, <h1> and <h2> tags that I want to be able to find.






        
<!DOCTYPE html>
<html lang="en">
  <body>
    <h1>hello</h1>
    <p>this is a paragraph</p>
    <h2>second heading</h2>
    <p>this is also a paragraph</p>
    <h2>third heading</h2>
    <p>this is the last paragraph</p>

  </body>
</html>

Get the headings with BeautifulSoup #

Lets import our packages, read in our sample.html using pathlib and find all headings using BeautifulSoup.






        
from bs4 import BeautifulSoup
from pathlib import Path

soup = BeautifulSoup(Path('sample.html').read_text(), features="lxml")
headings = soup.find_all(re.compile("^h[1-6]$"))

And what we get is a list of bs4.element.Tag's.






        
>> print(headings)
[<h1>hello</h1>, <h2>second heading</h2>, <h2>third heading</h2>]

I recently added a heading_link plugin to markata, you might notice the 🔗's next to each heading on this page, that is powered by this exact technique.

Find all Headings with BeautifulSoup

Tags

Make an example #

Get the headings with BeautifulSoup #

Recent Posts

Recent Thoughts

Recent Stars