Pragmatic CS #3: Database Decisions, Chromium Memory Safety, Web Frameworks Benchmarked
Choose the best web framework and database for your use case
Hey there,
Hope you had a fulfilling week of personal growth! I took part in Complexity Weekend last week, and was blown away by how eye-opening an experience it is (it goes on beyond the weekend!), when a diverse group of talented people from various disciplines come together to address problems brought about by COVID-19.
If you are interested in Complexity Science (Computer Scientists have much to offer!), you can check out this Complexity Explained page and the free online courses by Santa Fe Institute. You can also join the Complexity Weekend public chat on Keybase to network and grow together with other like-minded folks.
Complexity science studies how a large collection of components – locally interacting with each other at small scales – can spontaneously self-organize to exhibit non-trivial global structures and behaviors at larger scales, often without external intervention, central authorities or leaders. The properties of the collection may not be understood or predicted from the full knowledge of its constituents alone.
"There's no love in a carbon atom, No hurricane in a water molecule, No financial collapse in a dollar bill."
– Peter Dodds
In this week’s issue, we look at a recent post from the Chromium team on how they are addressing a finding that 70% of high severity bugs in Chromium are from C++ memory safety problems. We also explore considerations behind choosing a suitable database for your project, and rankings under a composite performance benchmark for web frameworks.
Dealing with C++ Memory-safety Problems
Around 70% of high severity security bugs in Chromium are memory safety problems (mistakes with C/C++ pointers). Half of those are use-after-free bugs. Chromium’s approach has been to address memory-safety problems through the use of sandboxing and site isolation.
The process is the smallest unit of isolation. Yet, Chromium still has processes sharing information about multiple sites – like in its network service. Having many network service processes, each tied to a site or (preferably) an origin, would hugely reduce the severity of a network service compromise. However, that would pose efficiency concerns.
Spectrum of options for a solution:
These considerations and options are relevant for any software engineer using memory unsafe languages like C and C++. This is especially so if you are writing security-sensitive software such as operating systems, network servers, and desktop software.
How To Decide What Database to Use
This Hacker News thread on MongoDB blew up into a discussion on choice of databases. On when to use MongoDB:
Basically when speed and horizontal scalability are very important, and consistency/durability are less important. It’s also pretty good for unstructured or irregularly structured data that’s hard to write a schema for.
Web scraping or data ingestion from APIs might be a reasonable use case. Or maybe consumer apps/games where occasional data loss or inconsistency isn’t a big deal.
It can also be used effectively as a kind of durable cache (with a nice query language) in place of Redis/Memcached if you give it plenty of RAM. While its guarantees or lack thereof aren’t great for a database, they’re pretty good for a cache.
Emerging practice for using PostgreSQL:
All new features use a JSONB (or hstore) field instead of a "real" one until they're bedded in and stop changing. Then convert the field + data to a real field with a NOT NULL constraint in one easy migration.
This allows joining on JSONB fields right in the SQL.
Also, if you haven’t, this paper on DynamoDB is a must-read.
2020 Web Framework Benchmark Rankings
Never heard of Drogon? It is a C++ web framework which came out on top for this year’s newly released composite scores. Just in case you want to go all out on performance while sacrificing some developer efficiency…
The benchmarking scores are based on fundamental tasks like JSON serialization, database access, and server-side template composition. Popular frameworks like Rails, Django and Laravel are seen at the bottom of the rankings.
Application performance can be directly mapped to hosting dollars, and for companies both large and small, hosting costs can be a pain point. Weak performance can also cause premature and costly scale pain by requiring earlier optimization efforts and increased architectural complexity. Finally, slow applications yield poor user experience and may suffer penalties levied by search engines.
What if building an application on one framework meant that at the very best your hardware is suitable for one tenth as much load as it would be had you chosen a different framework? The differences aren't always that extreme, but in some cases, they might be. Especially with several modern high-performance frameworks offering respectable developer efficiency, it's worth knowing what you're getting into.
Benefited from this edition? Forward it to a friend or share it!
As usual, free free to reply to me or leave a comment with your thoughts! Let’s grow together as a community.
Cheers,
Zac