The ability to query s3 buckets so seamless looks like such a pleasure to work with if you have a use case for that. Kedro catalog takes care of this most of the time for me, but I wonder if there are some cross project searching use cases I might find for this.
Posts tagged: data
All posts with the tag "data"
nice dataset to use for example / test projects. I’m using it to play with duckdb currently.
Today I learned how to VACUUM a sqlite database and cut its size in about half. It’s a database that I have had running for quite awhile and has some decent traffic on it.
Why is it important to do a VACUUM? In short its becuase the file system gets fragmented with as data is updated. On delete the files are removed from the database and marked as available for reuse in the filesystem, but the space is not reclaimed.
To VACUUM a database, run the following sql command. You can do it right form the sqlite shell by running sqlite3.
You will need about double the current size of the database as free space to do the VACUUM, you need space for a full copy, journaling or write ahead logs, and the existing database.
...
![[None]]
First I need to fetch my thoughts from the api, and put it in a local sqlite database using sqlite-utils.
fthoughts () { # fetch thoughts curl 'https://thoughts.waylonwalker.com/posts/waylonwalker/?page_size=9999999999' | sqlite-utils insert ~/.config/thoughts/database2.db post --pk=id --alter --ignore - }
Now that I have my posts in a local sqlite database I can use sqlite-utils to enable full text search and populate the full text search on the post table using the title message and tags columns as search.
sthoughts () { # search thoughts # sqlite-utils enable-fts ~/.config/thoughts/database2.db post title message tags # sqlite-utils populate-fts ~/.config/thoughts/database2.db post title message tags sqlite-utils search ~/.config/thoughts/database2.db post "$*" | ~/git/thoughts/format_thought.py | bat --style=plain --color=always --language=markdown } alias st=sthoughts
Now I am ready to search my thoughts, which is a tiny blog format that I created mostly for leaving my own personal comment on web pages, so most of them have a link to some other online content, and their title is based...
Ben Johnson was on the Changelog a few years back covering his work on litestream, and talks about why he chose to go open source, but not open contribution.
You should have a good reason to move off of sqlite.
Very inspiring talk, TLDR, you probably don’t need a database server. sqlite will probably be faster, simpler to maintain, and simpler to test your application.
I recently se tup minio object storage in my homelab for litestream sqlite backups. The litestream quickstart made it easy to get everything up and running on localhost, but I hit a wall when dns was involved to pull it from a different machine.
First I had to configure the Key ID and Secret Access Key generated in the minio ui.
❯ aws configure AWS Access Key ID [****************VZnD]: AWS Secret Access Key [****************xAm8]: Default region name [us-east-1]: Default output format [None]:
Then set the the s3 signature_version to s3v4.
aws configure set default.s3.signature_version s3v4
Now when I have minio running on https://my-minio-endpoint.com I can use the aws cli to access the bucket.
...
`litestream` is a sick cli tool for steaming replicas of sqlite. It automatically does daily snapshots, and streams all of the writes to the replica live.
install #
Install is fast using installer, no compilation, just copy the binary and run.
why-is-postgres-default
Serious question.
But, why. It’s the most loved db, right? Right? Maybe it’s time to rethink it.
Don’t get me wrong, if I need a relational db as a service, PostgreSQL is going to be my first choice, but why do I need to run a separate application for it?
...
As applications scale to the edge, to put compute as close to the user as possible, database queries back to the master node get slower and slower. Enter sqlite replication, put the database wtih the application code and replicate from master.
SQLite is the next big database trend. with more horizontal scaling, close to user read heavy applications, having your database in the same application stack makes a lot of sense. Tools like litestream are going to enable global distribution in an impressive way.
Fly.io’s solution to sqlite managed backups.I definitely want to look into this a bit, but moreso the tech under the hook litestream.
sqlite has 3 different tokenizers, porter, ascii, trigram.
These can be used with sqlite-utils.
sqlite-utils enable-fts --tokenize porter database.db post title message tags
And with the python api.
datasette really does everything doesn’t it!
After a nasty time with alembic upgrades, thoughts is about to get a new users table. This may have came from incorrectly setting up alembic for sqlite from the start, but I was able to fix the issue with this GitHub issue.
alembic sqlite ValueError: Constraint must have a name
The change I needed to make to get my migration to run.
![[None]]
Since using alembic I have been just running out a new revision checking its content and deleting it if its empty, today I learned there is an alembic check command to check for operations that need to be created.
duckdb is a new in process database that has been making its rounds in analytics for its high performance in those applications.
Mother duck is a centeralized server that brings manages storage, data sharing and an ide to duckdb.
Super useful way to show a tree view of an s3 bucket’s structure!
A neovim database client that I need to check out.
As the title states sqlite-utils now supports plugins. I dug in just a bit and Simon implemented this completely with entrypoints, no framework or library at all.