Posts tagged: python

Generating Readme Tables From Pandas

Generating Readme Tables From Pandas # [1] I commonly have a need to paste the first few lines of a dataset into a markdown file. I use two handy packages to do this, tabulate and pyperclip. Lets say I have a Pandas DataFrame in memory as df already. All I would need to do to convert the first 5 rows to markdown and copy it to the clipboard is the following. from tabulate import tabulate import pyperclip md = tabulate.tabulate(df.head(), df.columns, tablefmt='pipe') pyperclip.copy(md) This is a super handy snippet that I use a lot. Folks really appreciate it when they can see a sample of the data without opening the entire file. References: [1]: #generating-readme-tables-from-pandas

Pycon 2018 Roundup

These are my notes from pycon 2018 videos. I love the python community and especially the conference talks. This year I am going to take some notes from my favorite talks and post them here. This is an Incomplete working post. Jake VanderPlas - Performance Python: Seven Strategies for Optimizing Your Numerical Code [1] # [2] - Always profile before making any optimizations. - Vectorize with Numpy - Looping in python can be slow - Use specialized data structures. - scipy.spacial - pandas - xarray - scipy.sparse - sparse package - scipy.sparce.csgraph - Cython - Add types - Numba - jit - Fortran Like Speed - heavy dependencies - Dask - distributed tasks - Can be executed locally or on a cluster - Look for an existing package - resist the urge to reinvent the wheel https://www.youtube.com/watch?v=zQeYx87mfyw Justin Crown - “WHAT IS THIS MESS?” - Writing tests for pre-existing code bases - PyCon 2018 [3] # [4] This was a great talk about not only test driven de...

Stepping Up My SQL Game

In 2018 I transitioned from a Product Engineering (Mechanical) role to a Data Scientist Role. I entered this space with strong subject matter expertise with our products, our data, munging through data in pyhon, and data visualization in python. My sql skills were lacking to say the least. I had learned what I needed to know to get data from our relational databases, then use pandas to do any further analysis. Just run something like the following and you have data. SELECT * FROM Table Where col_1 = 'col_1_filter' This technique works great for small data sets that you only need to run once. There is no shame to pull in a big dataset and start munging with it in pandas to get some results, and make decisions. The problem becomes when your dataset becomes too big or you need to run the query on a frequent basis. Doing the aggregations on the server run much quicker, as it reduces the time spent in io. My longest running steps are currently io related. Reducing these steps have im...

My favorite pandas pattern

My favorite pandas pattern I work with a lot of transactional timeseries data that includes categories. I often want to create timeseries plots with each category as its own line. This is the method that I use almost data to achieve this result. Typically the data that am working with changes very slowly and trends happen over years not days or weeks. Plotting daily/weekly data tends to be noisy and hides the trend. I use this pattern because it works well with my data and is easy to explain to my stakeholders. import pandas as pd import numpy as np % matplotlib inline Lets Fake some data # [1] Here I am trying to simulate a subset of a large transactional data set. This could be something like sales data, production data, hourly billing, anything that has a date, category, and value. Since we generated this data we know that it is clean. I am still going to assume that it contains some nulls, and an irregular date range. n = 365*5 cols = {'level_0': 'date', 'level_1': 'item', ...

background tasks in python

I have tried most of the different methods in the past and found that copying and pasting the threadpoolexecutor example [1] or the processpoolexecutor example [2] from the standard library documentation to be the most reliable. Since this is often something that I stuff in the back of a utility module of a library it is not something that I write often enough to be familiar with, which makes it both hard to write and hard to read and debug. If you are looking for a good overview of the difference concurrency Raymond Hettinger [3] has a great talk about the difference between the various different methods, when to use them and why. Recently a new python library was released to make running tasks in the background very simple. The background [4] project by Kenneth Reitz is a high level implementation of python 3’s ThreadPoolExecutor. I have been playing around with this project over the last week and I will say that this is definitely the simplest way to run background tasks in pyth...

Pycon 2017 Roundup

Pycon 2017 Roundup Good afternoon fellow Data Geeks. Last week Pycon [1] released 141 videos of greatness. Here are my top picks from the event. #3 Kelsey Hightower - Keynote - Pycon 2017 # [2] https://www.youtube.com/watch?v=u_iAXzy3xBA&t=1795s [3] #2 Al Sweigart Yes, It’s Time to Learn Regular Expressions PyCon 2017 # [4] https://www.youtube.com/watch?v=abrcJ9MpF60 #1 Trey Hunner Readability Counts PyCon 2017 # [5] https://www.youtube.com/watch?v=knMg6G9_XCg What’s on Tap # [6] This afternoon we have a cup of from one of my favorite roasters Thirty Thiry Coffee. This [7] References: [1]: https://www.youtube.com/channel/UCrJhliKNQ8g0qoE_zvL8eVg [2]: #3-kelsey-hightower---keynote---pycon-2017 [3]: https://www.youtube.com/watch?v=u_iAXzy3xBA&t=1795s [4]: #2-al-sweigart-yes-its-time-to-learn-regular-expressions-pycon-2017 [5]: #1-trey-hunner-readability-counts-pycon-2017 [6]: #whats-on-tap [7]: https://www.thirty-thirtycoffee.com/

« Newer

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Older »

`j`	Scroll down
`k`	Scroll up
`g` `g`	Scroll to top
`Shift` `G`	Scroll to bottom
`d`	Half-page down
`u`	Half-page up

`j` / `↓`	Next post (in feeds)
`k` / `↑`	Previous post (in feeds)
`Enter` / `o`	Open highlighted post
`Shift` `O`	Open in new tab
`g` `h`	Go to home
`g` `s`	Focus search
`[`	Previous page
`]`	Next page
`b`	Toggle left sidebar
`Shift` `B`	Toggle right sidebar
`s`	Toggle simple/rich feed view

`/`	Focus search input
`⌘CtrlK`	Focus search (alternative)
`y` `y`	Copy URL to clipboard
`?`	Show this help
`Esc`	Close / clear highlight