Making good documentation in python
I just started using portray and it is amazingly simple to use!
Draft and unpublished posts
I just started using portray and it is amazingly simple to use!
I started using aws in March 2019. Here are some of my notes.
Use .gitignore.io and consider adding an alias to your terminal to quickly add a .gitignore to any project missing one.
alias gitignore='curl https://www.gitignore.io/api/vim,emacs,python,pycharm,sublimetext,visualstudio,visualstudiocode,data > .gitignore'
Add a minimal setup.py to the root of your project, and use the following command to install it.
pip install -e .
consider using **cookiecutter
My original inspiration for this post came from steven ostermiller’s blog post that no longer exists from my last check in May, 2024.
https://blog.ostermiller.org/removing-and-purging-files-from-git-history/
I was able to find it on the way back machine though.
...
my journey into data science
In January 2018 I started work as a full time data scientist turning my passion into a career. It is something that I didn’t see myself doing 5 years ago, but is something that I love to do. It combines my love of data, visualization, story telling, software development, and writing code. Most of all it allows me to work in a space that promotes learning and creativity. As a mechanical engineer for a company that has been building equipment for nearly a century the mechanical engineering is very well established I felt that there was not a lot of room for creativity.
When I first started as a full time mechanical engineer
I commonly have a need to paste the first few lines of a dataset into a markdown file. I use two handy packages to do this, tabulate and pyperclip. Lets say I have a Pandas DataFrame in memory as df already. All I would need to do to convert the first 5 rows to markdown and copy it to the clipboard is the following.
from tabulate import tabulate import pyperclip md = tabulate.tabulate(df.head(), df.columns, tablefmt='pipe') pyperclip.copy(md)
This is a super handy snippet that I use a lot. Folks really appreciate it when they can see a sample of the data without opening the entire file.
These are my notes from pycon 2018 videos. I love the python community and especially the conference talks. This year I am going to take some notes from my favorite talks and post them here.
This is an Incomplete working post.
https://www.youtube.com/watch?v=zQeYx87mfyw
...
In 2018 I transitioned from a Product Engineering (Mechanical) role to a Data Scientist Role. I entered this space with strong subject matter expertise with our products, our data, munging through data in pyhon, and data visualization in python. My sql skills were lacking to say the least. I had learned what I needed to know to get data from our relational databases, then use pandas to do any further analysis. Just run something like the following and you have data.
SELECT * FROM Table Where col_1 = 'col_1_filter'
This technique works great for small data sets that you only need to run once. There is no shame to pull in a big dataset and start munging with it in pandas to get some results, and make decisions. The problem becomes when your dataset becomes too big or you need to run the query on a frequent basis. Doing the aggregations on the server run much quicker, as it reduces the time spent in io. My longest running steps are currently io related....
...
I work with a lot of transactional timeseries data that includes categories. I often want to create timeseries plots with each category as its own line. This is the method that I use almost data to achieve this result. Typically the data that am working with changes very slowly and trends happen over years not days or weeks. Plotting daily/weekly data tends to be noisy and hides the trend. I use this pattern because it works well with my data and is easy to explain to my stakeholders.
import pandas as pd import numpy as np % matplotlib inline
Here I am trying to simulate a subset of a large transactional data set. This could be something like sales data, production data, hourly billing, anything that has a date, category, and value. Since we generated this data we know that it is clean. I am still going to assume that it contains some nulls, and an...
...
I recently gave a presentation at the Big Brothers and Big Sisters Data Challenge. I wanted to use reveal to create my slides. I have used it before and it is a really nice package. Compared to PowerPoint it is much easier to incorporate interactive visualizations right into the presentation,easy to re factor and maintain slides. Since you are just working with text you can easily convert from a list of items on one slide to a set of slides.
If you have not seen David JP Phillips Death By PowerPoint TEDx, stop now and watch it. You will never look at slides the same again. Watching this video ruined me for watching presentations with these issues. Reveal is a tool that makes it very easy to follow these principles
...
I Waylon S. Walker vow that from this point forward I will no longer create powerpoints to be considerec DEATH BY POWERPOINT
If you have not seen David JP Phillips Death By PowerPoint TEDx, stop now and watch it. You will never look at slides the same again. Watching this video ruined me for watching presentations with these issues. Reveal is a tool that makes it very easy to follow these principles
I currently work in a company that employs over 100K employees, and to this day I cannot recall a single presentation given where the slides did not violate the rules stated in David’s Talk. This year I am putting a stop to this starting with myself. I am starting a new job role in 2018 and there is no better time to make some...
...
I have tried most of the different methods in the past and found that copying and pasting the threadpoolexecutor example or the processpoolexecutor example from the standard library documentation to be the most reliable. Since this is often something that I stuff in the back of a utility module of a library it is not something that I write often enough to be familiar with, which makes it both hard to write and hard to read and debug. If you are looking for a good overview of the difference concurrency Raymond Hettinger has a great talk about the difference between the various different methods, when to use them and why.
...
Good afternoon fellow Data Geeks. Last week Pycon released 141 videos of greatness. Here are my top picks from the event.
https://www.youtube.com/watch?v=u_iAXzy3xBA&t=1795s
https://www.youtube.com/watch?v=abrcJ9MpF60
...
date: 2022-09-01 19:12:34 templateKey: til title: GitHub Actions Delete all Workflow Runs published: true tags: - bash
Announcement banner shown above the site header -- included via the top_banner slot.
Homepage sidebar content -- included by home.html via include_post.
Homepage now card -- included by home.html via include_post.
Markdown-driven navigation -- replaces the config-driven nav via the nav_content slot.