--- title: "Find all Headings with BeautifulSoup" description: "BeautifulSoup is a DOM like library for python. It's quite useful to manipulate html. Here is an example to find_all html headings. I stole the regex from..." date: 2022-02-01 published: false tags: - dev - python - webdev template: til --- BeautifulSoup is a DOM like library for python. It's quite useful to manipulate html. Here is an example to find_all html headings. I stole the regex from stack overflow, but who doesn't. ## Make an example _sample.html_ Lets make a sample.html file with the following contents. It mainly has some headings, `

` and `

` tags that I want to be able to find. ```html

hello

this is a paragraph

second heading

this is also a paragraph

third heading

this is the last paragraph

``` ## Get the headings with BeautifulSoup Lets import our packages, read in our `sample.html` using pathlib and find all headings using BeautifulSoup. ```python from bs4 import BeautifulSoup from pathlib import Path soup = BeautifulSoup(Path('sample.html').read_text(), features="lxml") headings = soup.find_all(re.compile("^h[1-6]$")) ``` And what we get is a list of `bs4.element.Tag`'s. ```python >> print(headings) [

hello

second heading

third heading

] ``` I recently added a heading_link plugin to markata, you might notice the 🔗's next to each heading on this page, that is powered by this exact technique.