Humanity’s Last Exam
Benchmarks are interesting.
Here’s the deep thought – at what point in the overall benchmark process will AI inject bias into the benchmark test? And to what end? Maybe not so deep a thought.
Humanity’s Last Exam has been bantered about extensively. Here’s a great place to catch up on it: Humanity’s Last Exam
My musings: check out the crazy difficulty of the questions:
So 2500 questions of this caliber of difficulty. The top AI models hit 20% accuracy in answering.
I would also note the Calibration Error, which is affirms that “ Given low performance on Humanity’s Last Exam, models should be calibrated, recognizing their uncertainty rather than confidently provide incorrect answers, indicative of confabulation/hallucination. To measure calibration, we prompt models to provide both an answer and their confidence from 0% to 100%. ” The better performing models – OpenAI o3 and o4-mini and Gemini 2.5 Pro – also have better Calbration Error numbers.
Alexa, cook me some eggs!
NY Times / AMZN deal Here's an article from The Verge about The New York Times recent deal with Amazon do deliver its “editorial content to a variety of Amazon customer experiences,” I shudder to think how little The Gray Lady is getting paid. However, I hope the AI...
Alexa, cook me some eggs!
NY Times / AMZN deal Here's an article from The Verge about The New York Times recent deal with Amazon do deliver its “editorial content to a variety of Amazon customer experiences,” I shudder to think how little The Gray Lady is getting paid. However, I hope the AI...
Rollups best left to fruit
The Neuron newsletter served up this TechCrunch article about Read the article https://techcrunch.com/2025/06/01/early-ai-investor-elad-gil-finds-his-next-big-bet-ai-powered-rollups/. Here's a quote from the article: "The idea is to identify opportunities to buy...
Copilot Summarizes just the page
I saw a headline Windows dashboard about a popular internet provider that filed for Chapter 11. I was curious and decided to check it out and clicked the story. The news item is from "TheStreet" and the initial page does not show the entire article, just the first...
Character.AI and possible suicide
I came across this news item from a while back: This mom believes an AI chatbot is responsible for her son’s suicide Wow, this is just brutal and my heart goes out to his family and friends. Character.ai and the other named defendents tried to use free speech in...
SMH – AI-Driven Layoffs
Shaking my head in response to someone responding "We acted too quickly" with respect to downsizing as a result of AI implementation. Step 1 - Jump on AI bandwagon Step 2 - implement (pricey?) AI stuff hot off the presses Step 3 - Layoffs = more $$$ for us! Or more...
TEDs v 2.what?
I came across this TechCrunch article covering the recent news that Google is committing 150M to develop AI glasses with Warby Parker: https://techcrunch.com/2025/05/20/google-commits-150m-to-develop-ai-glasses-with-warby-parker/ My immediate thought was a new take on...