o3 Beats a Master-Level Geoguessr Player—Even with Fake EXIF Data
In Which I Try to Maintain Human Supremacy for a Bit Longer
Hasnain says:
This is going to give me nightmares as I sleep because what the heck, man
“So to put a bow on this:
The o3 model isn’t smoke and mirrors, tricking us by only using EXIF data. It’s at a comparable Geoguessr skill level to Master I or better players now (at least according to my own ~20 or so rounds of testing).
Humans still hold a big edge in decision time—most of my guesses were < 2 min, o3 often took > 4 min.”
Spoofing EXIF data doesn’t throw off the model.
Whether you view this as dystopian or as a technological marvel - or both - you can’t claim it’s a parlor trick.”
Posted on 2025-04-30T06:32:53+0000
Introducing AutoPatchBench: A Benchmark for AI-Powered Security Fixes
We are introducing AutoPatchBench, a benchmark for the automated repair of vulnerabilities identified through fuzzing. By providing a standardized benchmark, AutoPatchBench enables researchers and …
Hasnain says:
Great work by some great folks here, gonna bookmark this for late re-reading
“In some instances, the LLM resorted to “cheating” by producing patches that superficially resolved the issue without addressing the underlying problem. This can occur when the generator modifies or removes code in a way that prevents the crash from occurring, but does not actually fix the root cause of the issue. We observed that cheating happens more frequently when we request the LLM to retry within the same trajectory. A potential solution to this could be to empower the LLM to say “I cannot fix it,” which may come with a tradeoff with success rate. However, note that most of the cheating was caught in the verification step, highlighting the utility of differential testing.”
Posted on 2025-04-30T06:23:32+0000
Why did Windows 7, for a few months, log on slower if you have a solid color background? - The Old New Thing
It's waiting for Godot and eventually gives up.
Hasnain says:
If I was still at Meta this would probably go in the clowntown group
"Personally, I use a solid color background. It was the default in Windows 95,¹ and I’ve stuck with that bluish-green background color ever since. It’s sort of like my comfort food.
Imagine my surprise when someone pointed me to a support article titled “The Welcome screen may be displayed for 30 seconds during the logon process after you set a solid color as the desktop background in Windows 7 or in Windows Server 2008 R2.” Why is logon slower with a solid background?"
Posted on 2025-04-30T05:16:34+0000
The Starvation of Gaza
Israel ramps up its terror campaign.
Hasnain says:
We must bear witness. I did not think I’d be agreeing with “the American Conservative” but here I am.
“As the days roll on and the death toll piles up, how will future generations remember the role we played in this disaster? Will they view us, and Trump, as peacemakers and not the ones that looked away? It is not our responsibility or in our interest to save the world, true, but it’s also naive to believe that we do not play a sizable role in permitting Israel’s continued bombardment and starvation of the Palestinian people. Something must change and quickly or the blood of innocents will forever be stapled to the recorded rule of Trump and the MAGA right.”
Posted on 2025-04-29T19:28:34+0000
Top Biden aide: Israel missed opportunity for Saudi deal; hopefully it won’t do so again
In interviews with Israeli investigative TV program, nine senior officials from previous US administration vent their frustrations in dealing with Netanyahu during Gaza war
Hasnain says:
““God did the State of Israel a favor that Biden was the president during this period, because it could have been much worse. We fought [in Gaza] for over a year and the administration never came to us and said, ‘ceasefire now.’ It never did. And that’s not to be taken for granted,” the former Israeli ambassador said.”
Posted on 2025-04-28T15:02:37+0000
How a 20 year old bug in GTA San Andreas surfaced in Windows 11 24H2
After over two decades, players are now forbidden from flying a seaplane, all thanks to undefined code behavior.
Hasnain says:
Gotta love fun bugs and UB
“This was the most interesting bug I’ve encountered for a while. I initially had a hard time believing that a bug like this would directly tie to a specific OS release, but I was proven completely wrong. At the end of the day, it was a simple bug in San Andreas and this function should have never worked right, and yet, at least on PC it hid itself for two decades.
This is an interesting lesson in compatibility: even changes to the stack layout of the internal implementations can have compatibility implications if an application is bugged and unintentionally relies on a specific behavior. This is also not the first time I encountered issues like this: regular visitors might remember Bully: Scholarship Edition which famously broke on Windows 10, for very similar reasons. Just like in this case, Bully should have never worked properly to begin with, but instead, it got away with making incorrect assumptions for years, before changes in Windows 10 finally made it run out of luck.
Yet again, we are reminded to:
Validate your input data – San Andreas was notoriously bad at this, and ultimately this was the main reason why an incomplete config line remained unnoticed.
Not ignore the compilation warnings – this code most likely threw a warning in the original code that was either ignored or disabled!
In the end, the GTA players are lucky: in many other games, issues like this would’ve remained unfixed and they’d become a folk legend. Thankfully, GTAs are moddable and well understood, so we can act upon problems like this and ensure the game stays functional for many more years to come.”
The group chats that changed America
A loose private network on Signal and WhatsApp helped usher in the new alliance between Silicon Valley and Donald Trump’s new right.
Hasnain says:
Ooooof. I have heard people talk about these group chats and elite cabals before, but oof. Especially that screenshot at the end.
“The tone was jesting, but “Marc radicalized over time,” Hanania recalled. Hanania said he found himself increasingly alienated from the group and the shift toward partisan pro-Trump politics, and he came to see the chat he’d established as a “vehicle for groupthink.” (A friend of Andreessen’s said it was Hanania, not Andreessen, who had shifted his politics.) The group continues without him.
Hanania argued with the other members “about whether it’s a good idea to buy into Trump’s election denial stuff. I’d say, ‘That’s not true and that actually matters.’ I got the sense these guys didn’t want to hear it,” he said. “There’s an idea that you don’t criticize, because what really matters is defeating the left.” He left the group in June of 2023.”
Posted on 2025-04-28T06:12:45+0000
Meta’s ‘Digital Companions’ Will Talk Sex With Users—Even Children
Chatbots on Instagram, Facebook and WhatsApp are empowered to engage in ‘romantic role-play’ that can turn explicit. Some people inside the company are concerned.
Hasnain says:
Ugh. That line about these being fringe test cases was a bit much though. Like has that PR person never talked to the average AI chatbot user?
[ insert joke about how the propaganda was better in my day ]
“It’s not an accident that Meta’s chatbots can speak this way. Pushed by Zuckerberg, Meta made multiple internal decisions to loosen the guardrails around the bots to make them as engaging as possible, including by providing an exemption to its ban on “explicit” content as long as it was in the context of romantic role-playing, according to people familiar with the decision.”.
Posted on 2025-04-27T23:38:12+0000
These Bay Area Chefs Are Preserving Palestinian Culture One Dish at a Time
In the aftermath of Oct. 7, chefs at Manakish, Shawarmaji and Azúkar are representing their Palestinian roots more than ever.
Hasnain says:
Now I want to drive down to shawarmaji again
“For Abutaha, keeping his food “authentic” isn’t just about holding on to traditions, but also using them as a way to spark conversation about each dish’s Palestinian origins. On the surface, Shawarmaji has a typical shawarma spot menu: falafel, chicken and beef shawarma, and a range of Levantine salads. “My path is more about recreating the food I grew up eating, preserving the culture and the original food,” Abutaha explains. His food reclaims the flavors of Palestine and Jordan, even if it’s just by simply preserving the original spices and cooking methods, resisting the need for it to be “whitewashed” or “catered to a certain audience.”
However, this approach isn’t always met with positive reviews. He acknowledges, “You know, people aren’t gonna like the garlic sauce — ‘it’s too garlicky, blah, blah, blah,’ — but that’s something I didn’t wanna compromise on because that’s how I ate it.” By keeping the garlic sauce authentic to how it’s served in Jordan, Abutaha hopes to preserve all the hard work that went into creating shawarma — the years of his ancestors’ labor that ought to be remembered. “
Posted on 2025-04-26T23:52:45+0000
Mathematicians Crack 125-Year-Old Problem, Unite Three Physics Theories
A breakthrough in Hilbert’s sixth problem is a major step in grounding physics in math
Hasnain says:
“Gluing together their long-timescale breakthrough with previous work on deriving the Euler and Navier-Stokes equations from the Boltzmann equation unifies three theories of fluid dynamics. The finding justifies taking different perspectives on fluids based on what’s most useful in context because mathematically they converge on one ultimate theory describing one reality. Assuming that the proof is correct, it breaks new ground in Hilbert’s program. We can only hope that with just such fresh approaches, the dam will burst on Hilbert’s challenges and more physics will flow downstream.”
Posted on 2025-04-26T23:46:44+0000