placeholder

The Canva outage: another tale of saturation and resilience

Today’s public incident writeup comes courtesy of Brendan Humphries, the CTO of Canva. Like so many other incidents that came before, this is another tale of saturation, where the failure mod…

Click to view the original at surfingcomplexity.blog

Hasnain says:

“We need to build in the ability to reconfigure our systems in advance, without knowing exactly what sorts of changes we’ll need to make. The Canva engineers had some powerful operational knobs at their disposal through the Cloudflare firewall configuration. This allowed them to make changes. The more powerful and generic these sorts of dynamic configuration features are, the more room for maneuver we have. Of course, dynamic configuration is also dangerous, and is itself a contributor to incidents. Too often we focus solely on the dangers of such functionality in creating incidents, without seeing its ability to help us reconfigure the system to mitigate incidents.

Finally, these sorts of operator interfaces are of no use if the responders aren’t familiar with them. Ultimately, the more your responders know about the system, the better position they’ll be in to implement these adaptations. Changing an unhealthy system is dangerous: no matter how bad things are, you can always accidentally make things worse. The more knowledge about the system you can bring to bear during an incident, the better position you’ll be in to adaptive your system to extend that competence envelope.”

Posted on 2025-01-26T03:43:36+0000

placeholder

Hasnain says:

This was a fun read, reminding me I need to go back and keep up with the latest in succinct data structure research.

Also kinda timely quote given the recent hoopla around DeepSeek:

“Even though modern spell checkers use different techniques like edit distance and language models, the engineering insights from Unix spell remain valuable. It shows how deep understanding of theoretical concepts combined with practical constraints can lead to efficient and elegant solutions.

Most importantly, it demonstrates that some of the best innovations happen when we are resource constrained, forcing us to think deeper about our problems rather than throwing more hardware at them.”

Posted on 2025-01-26T02:44:56+0000

placeholder

The Jagged, Monstrous Function That Broke Calculus | Quanta Magazine

In the late 19th century, Karl Weierstrass invented a fractal-like function that was decried as nothing less than a “deplorable evil.” In time, it would transform the foundations of mathematics.

Click to view the original at quantamagazine.org

Hasnain says:

“In 1872, Weierstrass published a function that threatened everything mathematicians thought they understood about calculus. He was met with indifference, anger and fear, particularly from the mathematical giants of the French school of thought. Henri Poincaré condemned Weierstrass’ function as “an outrage against common sense.” Charles Hermite called it a “deplorable evil.””

Posted on 2025-01-25T22:17:26+0000

placeholder

Hasnain says:

“The Real Culprit: setenv and getenv

setenv is not a safe function to call in a multithreaded environment. This is often a problem, and occasionally rediscovered as developers like us hit weird crashes in libc’s getenv [9], [10], [11], [12].”

Posted on 2025-01-25T21:10:27+0000

placeholder

Life Lessons from the First Half-Century of My Career – Communications of the ACM

Membership in ACM includes a subscription to Communications of the ACM (CACM), the computing industry's most trusted source for staying connected to the world of advanced computing.

Click to view the original at cacm.acm.org

Hasnain says:

This was chock full of great advice.

“Choose happiness. If you’re unhappy in life, success is much harder to achieve. When I was growing up, the American mantra was that happiness requires wealth. Wealth and happiness are two different goals; we have unhappy billionaires today! I always picked happiness over wealth when there was a choice, and I’m very glad that I did.”

Posted on 2025-01-25T20:16:27+0000

placeholder

Did a Private Equity Fire Truck Roll-Up Worsen the L.A. Fires?

During the LA fires, dozens of fire trucks sat in the boneyard, waiting for repairs the city couldn't afford. Why? A private equity roll-up made replacing and repairing those trucks much pricier.

Click to view the original at thebignewsletter.com

Hasnain says:

TIL over half of LA’s fire trucks were out of commission during the recent fires and a nontrivial amount of the blame here goes to… private equity

“While AIP’s consolidation of economic power over fire truck manufacturing is appalling, it is not some unsolvable, intractable problem we just have to live with. State and federal antitrust laws already prohibit the kind of monopolistic roll-up that AIP perpetrated — they just need to be enforced. State AGs can bring lawsuits to force REV Group to divest the manufacturers it illegally acquired and to pay damages to fire departments for the harm that its (attempted) monopolization of the fire-truck industry has caused. Fire departments and other fire-apparatus purchasers can bring their own lawsuits to do the same. So can the FTC and the DOJ’s Antitrust Division. If state legislators or members of Congress want to pave the way for such lawsuits, they can launch their own investigations into the fire apparatus industry. And if anyone wants guidance on what a lawsuit against AIP could look like, Lina Khan left us a roadmap just before she stepped down from the FTC last week — when she sued private-equity giant Welsh Carson for rolling up Texas anesthesiology practices to drive up the price of anesthesia services to Texas patients.

We have all the tools we need to check AIP’s greed and abuse and restructure the fire-truck industry so it serves the public interest. The only question is whether our political leaders have the will.”

Posted on 2025-01-25T20:08:09+0000

placeholder

Mostly civilians were killed in IDF attack on Lebanon village, BBC finds

The missile strike on a Lebanese apartment block targeting Hezbollah left mostly civilians dead, BBC finds.

Click to view the original at bbc.com

Hasnain says:

“The Israel Defense Forces (IDF) says the building was targeted because it was a Hezbollah "terrorist command centre" and it "eliminated" a Hezbollah commander. It added that "the overwhelming majority" of those killed in the strike were "confirmed to be terror operatives".
But a BBC Eye investigation verified the identity of 68 of the 73 people killed in the attack and uncovered evidence suggesting just six were linked to Hezbollah's military wing. None of those we identified appeared to hold a senior rank. The BBC's World Service also found that the other 62 were civilians - 23 of them children.”

Posted on 2025-01-25T09:29:54+0000

placeholder

Strobelight: A profiling service built on open source technology

We’re sharing details about Strobelight, Meta’s profiling orchestrator. Strobelight combines several technologies, many open source, into a single service that helps engineers at Meta improve effic…

Click to view the original at engineering.fb.com

Hasnain says:

I am glad this is finally out, if only because I can finally reference Mark S's famous one ampersand commit and have people believe me and not think that I'm making shit up. Great read on profilers and also TIL the code is open source.

"A seasoned performance engineer was looking through Strobelight data and discovered that by filtering on a particular std::vector function call (using the symbolized file and line number) he could identify computationally expensive array copies that happen unintentionally with the ‘auto’ keyword in C++.

The engineer turned a few knobs, adjusted his Scuba query, and happened to notice one of these copies in a particularly hot call path in one of Meta’s largest ads services. He then cracked open his code editor to investigate whether this particular vector copy was intentional… it wasn’t.

It was a simple mistake that any engineer working in C++ has made a hundred times.

So, the engineer typed an “&” after the auto keyword to indicate we want a reference instead of a copy. It was a one-character commit, which, after it was shipped to production, equated to an estimated 15,000 servers in capacity savings per year!

Go back and re-read that sentence. One ampersand! "

Posted on 2025-01-24T05:28:29+0000

placeholder

Hasnain says:

This is why I always name settings that have a time component as eg “settingNameSeconds” so there is no confusion because what even is this

“Which was what the setting value was changed to in the patch that was eventually accepted. This means that setting help.autocorrect to 1 logically means "wait 100ms (1 decisecond) before continuing".

Now, why Junio thought deciseconds was a reasonable unit of time measurement for this is never discussed, so I don't really know why that is. Perhaps 1 full second felt too long so he wanted to be able to set it to half a second? We may never know. All we truly know is that this has never made sense to anyone ever since.”

Posted on 2025-01-23T07:19:50+0000

placeholder

Hasnain says:

Lots to ponder and think about from this rant. I do think as a society (maybe I’m just grumpy) the value of artisanal, high quality work, has really gone by the wayside. It’s so magnificent when you get to see an expert at work, someone who really cares about their craft.

“When I joined my former Big Tech job, everyone cared. Over time, incentives attracted a different set of people who didn't care as much. Eventually those people became the majority. It's painful to work with people who don't care if you care a lot, and eventually I left because of it.

Now, I'm at a small startup full of people who care. Customer bug reports go right to our chatroom. We fix them immediately. I feel guilty I wrote the bugs at all. We reach out to users to see if we can make their lives better. We care.

I want to live in a community where everyone cares.

The one place in the world you get this vibe is probably Japan. Most people just really care. Patrick McKenzie refers to this as the will to have nice things. Japan has it, and the US mostly does not.

In Japan, you get the impression that everyone takes their job and role in society seriously. The median Japanese 7-11 clerk takes their job more seriously than the median US city bureaucrat. And the result is obvious if you visit both places.”

Posted on 2025-01-21T06:02:54+0000