Pragmatic CS #1: Software Architecture, Large-scale Image Retrieval and Interview With Alex Pareto from Brex

Alex is the former Head of Engineering at video-first e-commerce app NTWRK and co-founder of a YCombinator-backed startup

Zac Yap

May 15, 2020

Key takeaways:

Tradeoffs in the spectrum of server-side rendering and client-side rendering approaches include performance, overhead costs and SEO. Hybrid approaches that make use of prefetching, code splitting and lazy loading of Javascript do help.
To scale: cache liberally, use a CDN, index your database and add a load balancer
Use compact representations to allow indexing and quick search for images at scale

Articles

Computer Vision & Image Retrieval

Facebook’s Large-Scale AI Similarity Search To Detect COVID-19 Misinformation

SimSearchNet is a convolutional neural network which creates compact representations to allow indexing and quick search of photos at scale (tool is open-sourced!)

This is particularly important because for each piece of misinformation identified by fact checkers, there may be thousands or millions of copies.
Using AI to detect these matches enables fact-checkers to focus on catching new instances of misinformation rather than near-identical variations of content they’ve already seen.

Software Architecture

Where To Implement Application Logic and Rendering?

From Google Developer Update Feb. 2019

Server rendering generates the full HTML for a page on the server. That avoids additional round-trips for data fetching and templating on the client. Drawback is the Time To First Byte is slower as generating pages on the server takes time.

Netflix’s hybrid approach is worth considering. They server render static landing pages, while prefetching the Javascript for interaction-heavy pages.

If you are doing client-side rendering, make sure you implement aggressive code splitting and lazy-load Javascript.

Also watch out for developments in making streaming server rendering and progressive rehydration work.

“Trisomorphic” rendering makes a rare appearance:

Use streaming server rendering for initial/non-JS navigations. Then have your service worker take on rendering of HTML for navigations.
This keeps cached components and templates up to date. Enables SPA-style navigations to render new views in the same session.
Works best when you share the same templating and routing code between the server, client page, and service worker.

Facebook's Tech Stack Rebuild

Code splitting for incremental code download. Feed a dependency graph using a declarative API of A/B test decisions to minimize delivery of dependencies. Innovative data fetching approaches. Atomic CSS, colocation of styles and build-time handling.

Prevent flickering of icons which come in later than the rest of the content:

Inline SVGs into the HTML using React instead of passing SVG files to <img> tags. Inlined SVGs are effectively JavaScript. These can be bundled and delivered together with surrounding components for a clean one-pass render.
Upside of loading these at the same time as the JavaScript is greater than the cost of SVG painting performance. Also allows smooth colour change of icons at runtime without requiring further downloads.

Scaling

Khan Academy’s Rapid Scaling For Increased Demand From COVID-19

Scaling to 2.5x traffic in a week using serverless architecture and CDN caching of all static data. Also extensively cached common queries, user preferences and session data.

Bottleneck-Centric Approach

First check your resource monitoring to identify the bottleneck. It is usually the database first. But bottlenecks can be memory, CPU, Network I/O or Disk I/O.

As a principle, make the web stack do less work for the most common requests .

Some ideas:

Cache database queries
Index the database
Move session storage to an in-memory caching tool
HTML fragment caching
Use queues and more workers
Use HTTP caching headers
Add a Content Delivery Network in front of a static file host

Progressively Scaling To 11M Users On AWS

Use vertical scaling early on but it has no failover or redundancy.

At >1000: Add availability zones, load balancers, and slave database to RDS.

At 10K-100Ks: Horizontal scaling of instances. Move static content to S3 and even some dynamic content to the Cloudfront CDN. Add more read replicas of the database to RDS. Shift session state off your web tier and store session state in ElasticCache or DynamoDB

At >500K: Add automation tools and decouple infrastructure. Add monitoring, metrics and logging.

At >10M: Use federation, sharding and explore other types of DBs

“In The Trenches” Interview With Alex Pareto

Who are you and what’s your backstory?

Hey, I’m Alex! I work on engineering at Brex. We’re reinventing financial systems to help ambitious companies scale. I work on building out the Card product.

Before Brex, I led the engineering team at NTWRK, a video first e-commerce app. NTWRK releases new goods in collaboration with popular brands and celebrities. During my time there we worked with Drake, Lebron James, Nike, and many others. We often dealt with interesting technical challenges around scaling and thundering herds.

Earlier than NTWRK, I co-founded Demeanor.co, a Y Combinator backed startup. Demeanor was a platform for celebrities and internet creators to create custom merchandise. During my time at Demeanor, I focused a lot on how to build software in a way that let us iterate and ship quickly.

In my past, I also have spent time working on software at Facebook and a few small startups. Back at USC, I started a CS organization for making things with other students called Scope.

I write (occasionally) at alexpareto.com, about my thoughts on software and startups.

You mentioned scaling and thundering herds challenges back during your time at NTWRK. Can you talk us through how you dealt with it?

Definitely! At NTWRK every show is live. We sent a push notification to interested users just before the show starts and everyone watches the show concurrently. This means the traffic patterns are very spiky - traffic can increase by 100x in a matter of seconds. At that rate of increase autoscaling can't keep up.

Addressing spiky traffic takes a few steps. The first is to cache liberally (preferably on the edge using a CDN like Cloudflare or Cloudfront). This helps ease the load on the servers and database quite a bit.

A massive amount of users online concurrently means that every time the cache expires, a host of requests cascade onto the servers requesting the same data.

All these requests hitting the database can cause a big slow down. The trick here is to note that all the requests are asking for the same data. So instead of letting all the requests cascade to the database to recalculate the same data thousands of times - the solution is to recalculate the data once and share it with all the requests.

Redis works well for this - to ensure that only one request goes to the database when a complex query needs to be recalculated and that all other requests read the result from Redis.

If you're interested, this article on High Scalability does a nice job describing various caching techniques and this video from Facebook does a nice job of describing the thundering herd problem.

Editor’s Note: Alex also wrote an extremely helpful post for scaling stage-by-stage up to 100k users.

You said that back at Demeanor you focused on building software in a way that lets you iterate and ship quickly. What’s the difference between that and the approach you have to take now, and the implications of the different approaches?

There are a few different stages of companies, and at each stage shipping fast and iterating is important. With that said, there are different goals at different stages.

An early stage, zero to one company like Demeanor is about iterating fast to find product market fit and make something people want. We sacrificed certain things like stability and broad feature sets to get something into people's hands quickly. For example - during the early stages of the company we had no staging environment, feature flags, or QA testing.

Growth stage companies like NTWRK operate at a larger scale - both in users and engineers. Shipping fast is important, but so is stability and maintainability. This means having feature flags, interfaces for third party libraries, proper integration + unit testing, and other practices. I think these practices take a bit more time up front, but help teams ship faster once the team size grows. New engineers can onboard faster, people have confidence to ship changes, libraries can be swapped out fast, and new features can be added quickly.

Different tools and techniques for different stages and goals!

What are you excited about right now in CS?

I’m really excited about the advancements in developer tooling made in recent years. There are some exciting companies in the space making it much easier to ship very scalable software. I’ve used both Vercel and Render in the past few months - they are a blast to work with.

Advancements like these lower the barrier to entry for people to start or build something new. With less capital needed - and no need to learn the nuances of AWS or a linux box off the bat - we’ll start to see a lot more startups and products launch in the coming years.

What was the last thing you learned?

The most recent thing I’ve learned about is functional programming. We use Elixir at Brex which meant I had to break some old object-oriented instincts when I joined. Lots of the paradigms around object-oriented programming get thrown out the window with functional programming.

The key thing to keep in mind is that functional programming is all about taking data and transforming it.

If you’re interested, Elixir is a fantastic language to get started with functional programming. Programming Elixir by Dave Thomas is a great introduction to the language.

Editor’s Note: Check out Why Brex Chose Elixir and also How Discord Scaled Elixir to 5 million Concurrent Users

For me, the quickest way to learn something is by doing (and deep work!). Reading a book like Programming Elixir is great for getting a background. After establishing a background, diving in and writing code forces one to start thinking and solving problems in a "functional" way. I would encourage anyone interested to build a side project (even if it's for personal use) in a functional language like Elixir to get started with it.

Who inspires you as a software engineer? Why?

I’m a big fan of Cal Newport’s writing on deep work (he also happens to be a CS professor).

Writing great software is a craft that requires intense focus.

In his writing, Cal talks a lot about how to cut distractions to open up periods of time for this intense focus. He argues that these periods of time are where we'll do our best work.

I've found this to be very true in practice - even outside of software engineering. Whether writing strategy plans or writing software, the way to do the task well is to set aside some time to focus on that and only that. I block out an hour or two each day to focus - and turn off Slack, messages, notifications, and e-mail for the whole block of time.

Over time, I've noticed this is a common habit among many people I know to get more high quality work done faster. If you're interested, Deep Work by Cal Newport is a great place to start!

Liked this issue? Help me get you better content and more interesting interviews by sharing it with your friends!

If you have any ideas on how to make this newsletter better, what I should write about, or people whom you think I should interview, please do hit reply to this email or DM me on Twitter.

Stay tuned for more next week, and subscribe to get it first!

Acknowledgements: Thank you to Andrew Kamphey and Gabriel Chuan from the Young Makers Mastermind Group for invaluable advice in the starting of this newsletter, and as well as Sun Xiaofei for feedback on versions of the draft.

Pragmatic CS