Humanity’s Last Exam

Benchmarks are interesting.

Here’s the deep thought – at what point in the overall benchmark process will AI inject bias into the benchmark test? And to what end? Maybe not so deep a thought.

Humanity’s Last Exam has been bantered about extensively. Here’s a great place to catch up on it: Humanity’s Last Exam

My musings: check out the crazy difficulty of the questions:

So 2500 questions of this caliber of difficulty. The top AI models hit 20% accuracy in answering.

I would also note the Calibration Error, which is affirms that “ Given low performance on Humanity’s Last Exam, models should be calibrated, recognizing their uncertainty rather than confidently provide incorrect answers, indicative of confabulation/hallucination. To measure calibration, we prompt models to provide both an answer and their confidence from 0% to 100%. ” The better performing models – OpenAI o3 and o4-mini and Gemini 2.5 Pro – also have better Calbration Error numbers.

AI Action plan, and stuff

by Chris Rufe | Aug 3, 2025 | AI, News

AI Action plan Here ya go folks - this is the current administration's AI Action Plan: https://www.ai.gov/action-plan Here are some words from the current administration about preventing "woke AI" in the federal government...

CMS sites revisited

Applications

Recent work research includes CMS review. I haven't looked into what is out there in a long time. The quick search hits showed lots of familiar faces and a couple new ones. I found this post informative:...

Congratulations Impact Makers

News

Impact Makers was awarded "Best for the World" by B Lab. Here's a Richmond Time's Dispatch mentioning of the award. I met Michael Pirron shortly after moving to Richmond in 2004, and have watched him methodically and conscientiously build Impact Makers into a...

GOOG perspective on links

Analytics

Analytics is an area that I will be focusing quite a bit in the coming months. I read this article about linking and Penguin 2.0 changes on a website called "Search Engine Watch." There are many pieces to the content puzzle that websites have to face, and...

Google Apps v. Office 365

Applications

The TechRepublic newsletter is worth a scan every day. I found this gem recently and it's worth a read if you use, or might use in the future, either Google Apps or Microsoft Office. Here it is: Google Apps v. Office 365: Head-to-head comparison of features...

UltraEdit

Applications

I first used UltraEdit sometime in the late 90s. I loved it back then. When I went to work for TQuist in 2003 I purchased another copy. I just now retired my last XP system, and so I've decided to get the latest copy. I hope it's aged well. They've changed...

Next Entries »

Oh the Humanity!’s Last Exam!

Humanity’s Last Exam

AI Action plan, and stuff

CMS sites revisited

Congratulations Impact Makers

GOOG perspective on links

Google Apps v. Office 365

UltraEdit

Submit a Comment Cancel reply

Oh the Humanity!’s Last Exam!

Humanity’s Last Exam

AI Action plan, and stuff

CMS sites revisited

Congratulations Impact Makers

DropBox Security

GOOG perspective on links

Google Apps v. Office 365

UltraEdit

Submit a Comment Cancel reply