Stepping Up My SQL Game

Waylon Walker Author

Mar 25, 2018 5 min read

In 2018 I transitioned from a Product Engineering (Mechanical) role to a Data Scientist Role. I entered this space with strong subject matter expertise with our products, our data, munging through data in pyhon, and data visualization in python. My sql skills were lacking to say the least. I had learned what I needed to know to get data from our relational databases, then use pandas to do any further analysis. Just run something like the following and you have data.

SELECT
    *
FROM
    Table
Where
    col_1 = 'col_1_filter'

This technique works great for small data sets that you only need to run once. There is no shame to pull in a big dataset and start munging with it in pandas to get some results, and make decisions. The problem becomes when your dataset becomes too big or you need to run the query on a frequent basis. Doing the aggregations on the server run much quicker, as it reduces the time spent in io. My longest running steps are currently io related. Reducing these steps have improved my workflow. At the point that I was getting server timeout errors, or using the same long running query in many places I would be searching for examples online, because I just did not have the experience with many more techniques. I decided it was time to put away the cheat sheets, step away from Stack Overflow, and improve my speed.

Why Learn SQL in 2018?? #

SQL is far from the hot topic in 2018, AI, Deep Learning, BIG data, Machine Learning, Natural Language Processing take the win here. SQL is so simple why would anyone want to spend time learning SQL? The reason… all of those hot topics in 2018 require data. My data mostly comes from relational databases which require sql to get data from them. Without the ability to efficiently get the data I need to do an aanlysis I cannot even start. Sure I could use an ORM, but I found that to be a bit unwieldy with the thousands of tables we have in formats that were determined many years ago. Plus raw SQL is more transportable. I commonly collaborate with other folks who do not use python. I am proud that I am able to point them to the SQL I use rather than telling them to suck it up an learn python. I truly believe that people are the most effective when they are able to choose their own stack of tools. Taking some time to focus on the basics of Data Science will help be build a strong foundation for my career.

Joining Data in Posgres #

Below are my notes from the Joining Data in Posgres course on DataCamp. I will use these notes as a refresher when I need a quick reference.

Using() #

When joining two tables on the same column the USING clause can be used as a shorthand.

without using

SELECT *

FROM
    Table1 as t1

LEFT JOIN
    Table2 as t2
    ON t1.id = t2.id

with using

SELECT
    *

FROM
    Table1 as t1

LEFT JOIN
    Table2 as t2
    USING (id)

Join Types #

for joining columns of data together into a single table

INNER: Includes only records contained in both tables.

RIGHT: Inlcudes all records from the right, droping values from the left if non-existent in the right, or leaving nulls if non-existant in the left.

LEFT: Inlcudes all records from the left, droping values from the right if non-existent in the left, or leaving nulls if non-existant in the right.

FULL: Combination of Left and Right Join, leaving nulls where data is missing in one table, and not droping any data.

CROSS: returns all pairs from two tables, does not have an on or using clause.

Union #

for concatenating rows of data with the same columns

Union: returns only unique records, does not include duplicates.

Union All: returns all records(including duplicates)

Intersect #

Intersect: returns only records appearing in both tables

Execpt #

Except: returns only records not in the second table

Self-Joins #

Semi-Join: Filters based on results of a subquery. Does not have direct sql syntax. This type of join is achieved through a subquery in the where statement.

Anti-Join: Similar to the Semi-join, but using a not modifier. This is particularly useful for debugging situations.

Subqueries #

This is where I have really stepped up my sql game. I was able to get practice writing more complex queries. I also learned about different methods of joining tables together.

WHERE #

Subqueries are commonly found in the where clause to filter data. Below is an example given in the course to select only the Asian countries with below average fertility rate from the states table.

SELECT
   name,
   fert_rate
FROM
    states
WHERE
    continent = 'Asia'
AND fert_rate <
        (SELECT AVG(fert_rate)
         FROM states;)

SELECT #

Subqueries can be found in the SELECT clause to create new columns of data. This is a different technique than I have used in the past. Previously I have only used GROUPBY statements to get this effect. I can see where this can be really useful because it is not constrained by aggregations any data point can be pulled in with this tecnhique.

SELECT DISTINCT
    continent,
    (SELECT
        COUNT(*)
     FROM
        states
     WHERE
        prime_ministers.continent = states.continent
    ) AS countries_num

From Prime Ministers

FROM #

subqueries found in the FROM clause can be very helpful to create a new dataset from an existing table. I find these the easiest to read as it is not much different than creating a new table. Again this can be very powerful in creating new columns that were not easily available otherwise.

SELECT DISTINCT
    monarchs.continent,
    subquery.max_perc

FROM
    monarchs,
    (SELECT
        continent,
        MAX(women_parli_perc) AS max_perc

    FROM
        states

    GROUP BY
        continent
    ) as subquery

WHERE
    monarchs.continent = subquery.continent

ORDER BY
    continent;

ON #

Challenge Problem 1 This problem was the one that had me more stumped than any other problem in the course. I found the subquery inside the on statement very confusing to understand. In this question we are joining the countries table to a subquery what yields country codes of countries with offial languages from the languages table.

SELECT DISTINCT
    c.name,
    e.total_investment,
    e.imports

FROM
    countries as c
LEFT JOIN
    economies as e
    ON c.code = e.code

    AND c.code in (
    SELECT
        l.code
    FROM
        languages as l
    WHERE
        official = true
    )

WHERE
    c.region = 'Central America'
AND e.year = 2015

ORDER BY
    c.name asc;

Latest Thoughts #

The Internet I Grew Up With Doesn’t Exist Anymore - cleberg.net

cleberg.net html" hidden="">

I align a lot with this post. From growing up in a rural house where internet access was harder to get. It was slow and only had one line, if you were on the internet, no one in the house could recieve a phone call. Idk if what is now the smol webl, the indieweb, is seeing a resurgence, if I’m just noticing it more. It does feel like along there has been a very small number of users willing to get their own domain, their own server, and host their own shit. A few of these are starting to really break out occasionally, see things like wordle [1] hitting mass adoptioun before being swooped up by a giant.

I too miss the internet of old, heck I really dont even mind the middle era of fb, twitter, instagram dominance, but what we have now is dominated by political undertones on it all with everyone shouting “Fake News” at each other. These sites are the easiest place to stay in touch with those you have met online, yet they are riddled with so much toxicity its impossible to have a great experience on them. I’m about a year and a half of not having a single one on my phone, and at this point I barely log into them once a week. RSS is the way forward, as long as its not killed by pay…

Nibelungenlied40 | Book Split Keyboard Design

www.reddit.com

Reddit - Please wait for verification reddit.com [1]

Absolute whimsy achieved in this book themed #keeb build!

References: [1]: https://www.reddit.com/r/MechanicalKeyboards/comments/1u9awcz/nibelungenlied40_book_split_keyboard_design/?utm_source=cassidoo&amp;utm_medium=email&amp;utm_campaign=u1f6b8-you-can-make-anything-by-writing-cs-lewis

Dumb people lay off - YouTube

www.youtube.com

oof Mark, this does not feel like it is set to age well. These people in power feel so disconnected from regular people with a job trying to do work.

The Website Specification

specification.website

The Website Specification A platform-agnostic, full specification of the technical features a good website should have. Built in the open under an MIT licence. The Website Specification · specification.website [1]

A solid checklist for agents to implement on most sites. Very few sites need 100% coverage, but most should probably check most of these boxes

References: [1]: https://specification.website/

Revisiting the closed canon

derekkedziora.com

Revisiting the closed canon A post I wrote in 2023, the closing of the canon, predicted that LLM answers would replace search results, dramatically lowering traffic to individual sites, thereby removing the incentives to eve... Derek Kedziora · derekkedziora.com [1]

This is what makes rss so interesting to me. Its boring old tech that fell out of mainstream popularity years ago, yet many sites still support it. Not all, especially ones that come with a good dickover.

At the same time, it’s sad to see the human internet dying, even more quickly than before. Not only do we have rampant bots and sites seo maxxing to get to the top. We have ai search overview that answers mose simple questions pretty good, chat that does good, and agents at our fingertips. The need for tutorials is pretty much dead.

What we need now is human experiences shared and documented more than ever. I’ve been writing a whole lot less simply because this transition has been hard. Most of my pre 2024 posts were how to, notes for future me. Things so simple agents just spat out better versions in seconds these days with barely a question.

References: [1]: https://derekkedziora.com/notes/revisiting-the…

Recent Pings #

hmmm. #

TIL, `?.` is sometimes pronounced as a `hmm dot`

Does codex change throughout the token usage window? #

Is it me or can you feel the 5 hour and weekly limit of codex nearing. It feels like it slowly stops working tasks to completion and asking to continue. I nearly start yelling into the enter, "YES, DO WHAT WE AGREED ON, STOP ASKING" then I notice I'm in the single digits of remaining window left.

Mythos gone already #

Mythos released days ago, now its gone, before most of us were allowed to touch it.

just good enough #

I built shit on the web before AI, some of it was okay, some I put a lot of time into, but some was just really shitty. Done just well enough to get the job done. All of it had to be written by hand or copy pasted from stack overflow. It was a lot of fun. Now we have a new level of shitty, it looks fine. It looks like it should work good. None of it is just barely good enough, nearly browser default style anymore.

Flowing Thoughts Ai To Help Blast Radius #

Not sure how this matters to anyone else, but I'm sitting in the car and letting the thoughts flow.

I'm having really interesting conversations with ai recently. Like things I never thought I would care this deeply about. In part because it feels like the vulns are coming faster and harder, and in part because it is really enabling me to invest some time into the development that I would not otherwise have. I'm thinking about least privilege, reducing dependencies in containers, limiting pod access to the Internet and other pods. Reducing the blast radius.

Now I've always been hesitant to bring in new dependencies. I've always tried to strip to the lowest possible dependency set n my containers, but I would also re-use the main server container to run cron job workflows. I wasn't giving much thought about what services they could access, or their internet access

`j`	Scroll down
`k`	Scroll up
`g` `g`	Scroll to top
`Shift` `G`	Scroll to bottom
`d`	Half-page down
`u`	Half-page up

`j` / `↓`	Next post (in feeds)
`k` / `↑`	Previous post (in feeds)
`Enter` / `o`	Open highlighted post
`Shift` `O`	Open in new tab
`g` `h`	Go to home
`g` `s`	Focus search
`[`	Previous page
`]`	Next page
`b`	Toggle left sidebar
`Shift` `B`	Toggle right sidebar
`s`	Toggle simple/rich feed view

`/`	Focus search input
`⌘CtrlK`	Focus search (alternative)
`y` `y`	Copy URL to clipboard
`?`	Show this help
`Esc`	Close / clear highlight