Set User Agent on pandas read_csv

Waylon Walker

Hosts switched #

I recently switched hosting from netlify over to cloudflare. Well cloudflare does some work to block certain requests that it does not think is a real user. One of these checks is to ensure there is a real user agent on the request.

Not my go to dataset 😭 #

This breaks my go to example dataset.






        
pd.read_csv("https://waylonwalker.com/cars.csv")

# HTTPError: HTTP Error 403: Forbidden

But requests works??? #

What's weird is, requests still works just fine! Not sure why using urllib the way pandas does breaks the request, but it does.






        
requests.get("https://waylonwalker.com/cars.csv")

<Response [200]>

Setting the User Agent in pandas.read_csv #

this fixed the issue for me!

After a bit of googling I realize that this is a common thing, and that setting the user-agent fixes it. This is the point I remember seeing in the cloudflare dashbard that they protect against a lot of different attacks, aparantly it treats pd.read_csv as an attack on my cloudflare pages site.






        
pd.read_csv("https://waylonwalker.com/cars.csv", storage_options = {'User-Agent': 'Mozilla/5.0'})

# success

Now my data is back #

Now this works again, but it feels like just a bit more effort than I want to do by hand. I might need to look into my cloudflare settings to see if I can allow this dataset to be accessed by pd.read_csv.

Set User Agent on pandas read_csv

Tags

Hosts switched #

Not my go to dataset 😭 #

But requests works??? #

Setting the User Agent in pandas.read_csv #

Now my data is back #

Recent Posts

Recent Thoughts

Recent Stars