Filtering Pandas

edit ✏️

query

Good for method chaining, i.e. adding more methods or filters without assigning a new variable.

# is
skus.query('AVAILABILITY == " AVAILABLE"')
# is not
skus.query('AVAILABILITY != " AVAILABLE"')

masking

general purpose, this is probably the most common method you see in training/examples

# is
skus[skus['AVAILABILITY'] == 'AVAILABLE']
# is not
skus[~skus['AVAILABILITY'] == 'AVAILABLE']

isin

capable of including multiple strings to include

# is in
df[df.AVAILABILITY.isin(['AVAILABLE', 'AVL'])]
# is not in
df[~df.AVAILABILITY.isin(['AVAILABLE', 'AVL'])]

contains

Good For partial matches

# contains
df[df.AVAILABILITY.str.contains('AVA')]
# not contains
df[~df.AVAILABILITY.str.contains('AVA')]

MASKS

anything that we put inside of square brackets can be set as a variable then passed in.

service_mask = skus['AVAILABILITY'] == 'AVAILABLE'
name_mask = skus['NAME'] == 'Dell chromebook 11'

Operators

& - and ~ - not | - or

AVAILABLE and NAME

df[service_mask & name_mask]

AVAILABLE or NAME

df[service_mask | name_mask]

AVAILABLE and not NAME

df[service_mask & ~name_mask]


Socials