Humanity’s Last Exam
Benchmarks are interesting.
Here’s the deep thought – at what point in the overall benchmark process will AI inject bias into the benchmark test? And to what end? Maybe not so deep a thought.
Humanity’s Last Exam has been bantered about extensively. Here’s a great place to catch up on it: Humanity’s Last Exam
My musings: check out the crazy difficulty of the questions:
So 2500 questions of this caliber of difficulty. The top AI models hit 20% accuracy in answering.
I would also note the Calibration Error, which is affirms that “ Given low performance on Humanity’s Last Exam, models should be calibrated, recognizing their uncertainty rather than confidently provide incorrect answers, indicative of confabulation/hallucination. To measure calibration, we prompt models to provide both an answer and their confidence from 0% to 100%. ” The better performing models – OpenAI o3 and o4-mini and Gemini 2.5 Pro – also have better Calbration Error numbers.
AI Action plan, and stuff
AI Action plan Here ya go folks - this is the current administration's AI Action Plan: https://www.ai.gov/action-plan Here are some words from the current administration about preventing "woke AI" in the federal government...
CMS sites revisited
Recent work research includes CMS review. I haven't looked into what is out there in a long time. The quick search hits showed lots of familiar faces and a couple new ones. I found this post informative:...
Congratulations Impact Makers
Impact Makers was awarded "Best for the World" by B Lab. Here's a Richmond Time's Dispatch mentioning of the award. I met Michael Pirron shortly after moving to Richmond in 2004, and have watched him methodically and conscientiously build Impact Makers into a...
DropBox Security
Oh, the line between security and convenience is harsh. While reading TechRepublic I found this interesting article on DropBox security. I love the convenience of cloud technologies, and use DropBox like lots of people, including the article author (Michael Kassner)....
GOOG perspective on links
Analytics is an area that I will be focusing quite a bit in the coming months. I read this article about linking and Penguin 2.0 changes on a website called "Search Engine Watch." There are many pieces to the content puzzle that websites have to face, and...
Google Apps v. Office 365
The TechRepublic newsletter is worth a scan every day. I found this gem recently and it's worth a read if you use, or might use in the future, either Google Apps or Microsoft Office. Here it is: Google Apps v. Office 365: Head-to-head comparison of features...
UltraEdit
I first used UltraEdit sometime in the late 90s. I loved it back then. When I went to work for TQuist in 2003 I purchased another copy. I just now retired my last XP system, and so I've decided to get the latest copy. I hope it's aged well. They've changed...

