Posts tagged: python

All posts with the tag "python"

312 posts latest post 2026-05-06
Publishing rhythm
Jan 2026 | 3 posts

Long variable names are good

🏷️ Long variable names are a good thing. Self documenting code is more important than poorly documented code. Simply adding a few characters to your variable names can go a long ways. Containers are plural # [1] Aliases are welcome # [2] Scope is important References: [1]: #containers-are-plural [2]: #aliases-are-welcome
1 min read

simple click

cli tools are super handy and easy to add to your python libraries to supercharge them. Even if your library is not a cli tool there are a number of things that a cli can do to your library. Example Ideas # [1] Things a cli can do to enhance your library. πŸ†š print version πŸ•Ά print readme πŸ“ print changelog πŸ“ƒ print config ✏ change config πŸ‘©β€πŸŽ“ run a tutorial πŸ— scaffold a project with cookiecutter πŸ–± Click [2] # [3] Click [2] is the most popular python cli tool framework for python. There are others, some old, some new comers that make take the crown. For now Click [2] is the gold standard if you want to make a powerful cli quickly. If you are dependency conscious and dont need a lot of tooling, use argparse [4]. Project Structure # [5] . β”œβ”€β”€ setup.py └── simple_click β”œβ”€β”€ cli.py └── __init__.py ❯ cli.py # [6] # simple_click/cli.py import click __version__ = "1.0.0" @click.group() def cli(): pass @cli.command() def version(): """prints project version""" click.echo(__...

SqlAlchemy Models

Make a connection # [1] from sqlalchemy import create_engine def get_engine(): return create_engine("sqlite:///mode_examples.sqlite") Make a session # [2] from sqlalchemy.orm import sessionmaker def get_session(): con = get_engine() Base.bind = con Base.metadata.create_all() Session = sessionmaker(bind=con) session = Session() return session Make a Base Class # [3] from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() Base.metadata.bind = get_engine() Make your First Model # [4] class User(Base): __tablename__ = "users" username = Column('username', Text()) firstname = Column('firstname', Text()) lastname = Column('lastname', Text()) Make your own Base Class to inherit From # [5] class MyBaseHelper: def to_dict(self): return {k: v for k, v in self.__dict__.items() if k[0] != "_"} def update(self, **attrs): for key, value in attrs.items(): if hasattr(self, key): setattr(self, key, value) Use the Custom Base Class # [6] class User(Ba...
1 min read

Building Cli apps in Python

Packages # [1] Click [2] # [3] Inputs # [4] Click primarily takes two forms of inputs Options and arguments. I think of options as keyword argument and arguments as regular positional arguments. Option # [5] - typically aliased with a shorthand (’-v’, β€˜β€“verbose’) --- **From the Docs [6] To get the Python argument name, the chosen name is converted to lower case, up to two dashes are removed as the prefix, and other dashes are converted to underscores. @click.command() @click.option('-s', '--string-to-echo') def echo(string_to_echo): click.echo(string_to_echo) @click.command() @click.option('-s', '--string-to-echo', 'string') def echo(string): click.echo(string) --- Argument # [7] - positional - required - no help text supplied by click Yaspin [8] # [9] [10] Click Help Colors [11] # [12] [13] # [14] Colorama [15] # [16] Colorama Example [17] Click DidYouMean [18] # [19] References: [1]: #packages [2]: https://click.palletsprojects.com/en/7.x/ [3]: #clickhttp...
1 min read

Kedro

See all of my kedro related posts in [[ tag/kedro ]]. #kedrotips [1] # [2] I am tweeting out most of these snippets as I add them, you can find them all here #kedrotips [3]. πŸ—£ Heads up # [4] Below are some quick snippets/notes for when using kedro to build data pipelines. So far I am just compiling snippets. Eventually I will create several posts on kedro. These are mostly things that I use In my everyday with kedro. Some are a bit more essoteric. Some are helpful when writing production code, some are useful more usefule for exploration. πŸ“š Catalog # [5] [6] Photo by jesse orrico on Unsplash CSVLocalDataSet # [7] python import pandas as pd iris = pd.read_csv('https://raw.githubusercontent.com/kedro-org/kedro/d3218bd89ce8d1148b1f79dfe589065f47037be6/kedro/template/%7B%7B%20cookiecutter.repo_name%20%7D%7D/data/01_raw/iris.csv') data_set = CSVLocalDataSet(filepath="test.csv", load_args=None, save_args={"index": False}) iris_data_set.save(iris) reloaded_iris = iris_data_se...

πŸ“ Packages to Investigate Notes

- jmespath - Tabnine Bulwark # [1] |-|-| |github: |https://github.com/zaxr/bulwark| I definitely want to try this out with kedro. Bulwark is a package for convenient property-based testing of pandas dataframes, supported for Python 3.5+. Example # [2] import bulwark.decorators as dc @dc.IsShape((-1, 10)) @dc.IsMonotonic(strict=True) @dc.HasNoNans() def compute(df): # complex operations to determine result ... return result_df References: [1]: #bulwark [2]: #example
1 min read

Just Use Pathlib

Pathlib is an amazing cross-platform path tool. Import # [1] from pathlib import Path Create path object # [2] Current Directory cwd = Path('.').absolute() Users Home Directory home = Path.home() module directory module_path = Path(__file__) Others Let’s create a path relative to our current module. data_path = Path(__file__) / 'data' Check if files exist # [3] Make Directories # [4] data_path.mkdir(parents=True, exists_ok=True) rename files # [5] Path(data_path /'example.csv').rename('real.csv') List files # [6] Glob Files # [7] data_path.glob('*.csv') recursively data_path.rglob('*.csv') Write # [8] Path(data_path / 'meta.txt').write_text(f'created on {datetime.datetime.today()}) References: [1]: #import [2]: #create-path-object [3]: #check-if-files-exist [4]: #make-directories [5]: #rename-files [6]: #list-files [7]: #glob-files [8]: #write
1 min read

Filtering Pandas

query # [1] Good for method chaining, i.e. adding more methods or filters without assigning a new variable. # is skus.query('AVAILABILITY == " AVAILABLE"') # is not skus.query('AVAILABILITY != " AVAILABLE"') masking # [2] general purpose, this is probably the most common method you see in training/examples # is skus[skus['AVAILABILITY'] == 'AVAILABLE'] # is not skus[~skus['AVAILABILITY'] == 'AVAILABLE'] isin # [3] capable of including multiple strings to include # is in df[df.AVAILABILITY.isin(['AVAILABLE', 'AVL'])] # is not in df[~df.AVAILABILITY.isin(['AVAILABLE', 'AVL'])] contains # [4] Good For partial matches # contains df[df.AVAILABILITY.str.contains('AVA')] # not contains df[~df.AVAILABILITY.str.contains('AVA')] MASKS # [5] anything that we put inside of square brackets can be set as a variable then passed in. service_mask = skus['AVAILABILITY'] == 'AVAILABLE' name_mask = skus['NAME'] == 'Dell chromebook 11' Operators # [6] & - and ~ - not | - or AVAILABLE and ...

Pyspark

I have been using pyspark since March 2019, here are my thoughts.
1 min read

Making good documentation in python

Tools Sphinx # [1] Portray # [2] I just started using portray and it is amazingly simple to use! Methodology References: [1]: #sphinx [2]: #portray
1 min read

Quick Progress Bars in python using TQDM

tqdm is one of my favorite general purpose utility libraries in python. It allows me to see progress of multipart processes as they happen. I really like this for when I am developing something that takes some amount of time and I am unsure of performance. It allows me to be patient when the process is going well and will finish in sufficient time, and allows me to πŸ’₯ kill it and find a way to make it perform better if it will not finish in sufficient time. [1] for more gifs like these follow me on twitter @waylonwalker [2] Add a simple Progress bar! from tqdm import tqdm from time import sleep for i in tqdm(range(10)): sleep(1) convenience TQDM also has a convenience function called trange that wraps the range function with a tqdm progress bar automatically. from tqdm import trange from time import sleep for i in trange(range(10)): sleep(1) notebook support There is also notebook support. If you are bouncing between ipython and jupyter I recomend importing from the auto m...
1 min read

Clean up Your Data Science with Named Tuples

If you are a regular listener of TalkPython [1] or PythonBytes you have hear Michael Kennedy talk about Named Tuples many times, but what are they and how do they fit into my data science workflow. Example # [2] As you graduate your scripts into modules and libraries you might start to notice that you need to pass a lot of data around to all of the functions that you have created. For example if you are running some analysis utilizing sales, inventory, and pricing data. You may need to calculate total revenue, inventory on hand. You may need to pass these data sets into various models to drive production or pricing based on predicted volumes. Load data # [3] Here we setup functions that can load data from the sales database. Assume that we also have similar functions to get_inventory and get_pricing. def get_engine(): engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase') def get_sales(): ''' gets sales history from the sales database ''' engine = ge...

Background Tasks in Python for Data Science

This post is intended as an extension/update from background tasks in python [1]. I started using background the week that Kenneth Reitz released it. It takes away so much boilerplate from running background tasks that I use it in more places than I probably should. After taking a look at that post today, I wanted to put a better data science example in here to help folks get started. This post is intended as an extension/update from background tasks in python [1]. I started using background the week that Kenneth Reitz released it. It takes away so much boilerplate from running background tasks that I use it in more places than I probably should. After taking a look at that post today, I wanted to put a better data science example in here to help folks get started. I use it in more places than I probably should Before we get into it, I want to make a shout out to Kenneth Reitz for making this so easy. Kenneth is a python God for all that he has given to the community in so many w...

πŸ“ Bash Notes

Bash is super powerful. File System Full # [1] Show Remaining Space on Drives df -h show largest files in current directory du . -h --max-depth=1 Move files then symlink them mkdir /mnt/mounted_drive mv ~/bigdir /mnt/mounted_drive ln -s /mnt/mounted_drive/bigdir ~/bigdir Fuzzy One Liners # [2] a() {source activate "$(conda info --envs | fzf | awk '{print $ edit in vim vf() { fzf | xargs -r -I % $EDITOR % ;} cat a file vf() { fzf | xargs -r -I % $EDITOR % ;} bash execute bf() { bash "$(fzf)" } git [3] add gadd() { git status -s | fzf -m | awk '{print $2}' | xargs git add && git status -s} git reset greset() { git status -s | fzf -m | awk '{print $2}' |xargs git reset && git status -s} Kill a process fkill() {kill $(ps aux | fzf | awk '{print($2)}')} Finding things # [4] Files # [5] fd-find [6] is amazing for finding files, it even respects your .gitignore file 😲. Install with apt install fd-find. fd md ag -g python find . -n "*.md" ++Vanilla Bonus Content # [7] ** sh...

Autoreload in Ipython

I have used %autoreload for several years now with great success and πŸ”₯ rapid reloads. It allows me to move super fast when developing libraries and modules. They have made some great updates this year that allows class modules to be automatically be updated. What I like about autoreload # [1] πŸ”₯ Blazing Fast πŸ’₯ Keeps me in the comfort of my text editor πŸ‘ Allows me to use Jupyter when I need πŸ‘Ÿ Extremely Reliable One of the biggest benefits that I find is that it shortens the distance between my module/library code and test code inside of a terminal/notebook. Now I primarily use jupyter notebooks for the presentation aspect. I develop code from the comfort of my editor with all of the tools I have setup, and run the functions in a notebook to get the output. From there I might do some aggregations or plots, but the πŸ₯© meat of development is done outside of jupyter. Now I primarily use jupyter notebooks for the presentation aspect. Enabling Autoreload # [2] πŸ“ config This is a sh...
3 min read

Python Tips

Dictionaries # [1] Unpacking # [2] - **kwargs - func(**input) - locals().update(d) # [3] References: [1]: #dictionaries [2]: #unpacking [3]: #heading
1 min read

Generating Readme Tables From Pandas

Generating Readme Tables From Pandas # [1] I commonly have a need to paste the first few lines of a dataset into a markdown file. I use two handy packages to do this, tabulate and pyperclip. Lets say I have a Pandas DataFrame in memory as df already. All I would need to do to convert the first 5 rows to markdown and copy it to the clipboard is the following. from tabulate import tabulate import pyperclip md = tabulate.tabulate(df.head(), df.columns, tablefmt='pipe') pyperclip.copy(md) This is a super handy snippet that I use a lot. Folks really appreciate it when they can see a sample of the data without opening the entire file. References: [1]: #generating-readme-tables-from-pandas
Pycon 2018 Roundup

Pycon 2018 Roundup

These are my notes from pycon 2018 videos. I love the python community and especially the conference talks. This year I am going to take some notes from my favorite talks and post them here. This is an Incomplete working post. Jake VanderPlas - Performance Python: Seven Strategies for Optimizing Your Numerical Code [1] # [2] - Always profile before making any optimizations. - Vectorize with Numpy - Looping in python can be slow - Use specialized data structures. - scipy.spacial - pandas - xarray - scipy.sparse - sparse package - scipy.sparce.csgraph - Cython - Add types - Numba - jit - Fortran Like Speed - heavy dependencies - Dask - distributed tasks - Can be executed locally or on a cluster - Look for an existing package - resist the urge to reinvent the wheel https://www.youtube.com/watch?v=zQeYx87mfyw Justin Crown - β€œWHAT IS THIS MESS?” - Writing tests for pre-existing code bases - PyCon 2018 [3] # [4] This was a great talk about not only test driven de...
6 min read
Stepping Up My SQL Game

Stepping Up My SQL Game

In 2018 I transitioned from a Product Engineering (Mechanical) role to a Data Scientist Role. I entered this space with strong subject matter expertise with our products, our data, munging through data in pyhon, and data visualization in python. My sql skills were lacking to say the least. I had learned what I needed to know to get data from our relational databases, then use pandas to do any further analysis. Just run something like the following and you have data. SELECT * FROM Table Where col_1 = 'col_1_filter' This technique works great for small data sets that you only need to run once. There is no shame to pull in a big dataset and start munging with it in pandas to get some results, and make decisions. The problem becomes when your dataset becomes too big or you need to run the query on a frequent basis. Doing the aggregations on the server run much quicker, as it reduces the time spent in io. My longest running steps are currently io related. Reducing these steps have im...