Humanity’s Last Exam
Benchmarks are interesting.
Here’s the deep thought – at what point in the overall benchmark process will AI inject bias into the benchmark test? And to what end? Maybe not so deep a thought.
Humanity’s Last Exam has been bantered about extensively. Here’s a great place to catch up on it: Humanity’s Last Exam
My musings: check out the crazy difficulty of the questions:
So 2500 questions of this caliber of difficulty. The top AI models hit 20% accuracy in answering.
I would also note the Calibration Error, which is affirms that “ Given low performance on Humanity’s Last Exam, models should be calibrated, recognizing their uncertainty rather than confidently provide incorrect answers, indicative of confabulation/hallucination. To measure calibration, we prompt models to provide both an answer and their confidence from 0% to 100%. ” The better performing models – OpenAI o3 and o4-mini and Gemini 2.5 Pro – also have better Calbration Error numbers.
Copilot Summarizes just the page
I saw a headline Windows dashboard about a popular internet provider that filed for Chapter 11. I was curious and decided to check it out and clicked the story. The news item is from "TheStreet" and the initial page does not show the entire article, just the first...
I am your Master of AI for Everyone Basics
I am your Master. Why? Because I've audited and completed the IBM "AI for Everyone: Master the Basics" course offered on EdX. My knowledge of AI was miniscule a few weekds ago. Now it is very slightly less miniscule. I think the most important take away from the...
AI to the rescue? Categories and Tags in WordPress
I realized when writing the first blog post that I needed consideration around this site's taxonomy of Categories and Tags. As it has been a while (ahem) since I worked with such classification for my own site(s) I really could not remember how Categoreis and Tags...
First AI post
I was all excited to write my first blog post about the subject of Artificial Intelligence (not that any intelligence is conveyed in the post, I'm just trying to get some structure established) that I called the post "First AI Post" which I subsequently concluded is a...
Contact page on the home screen, SuperPWA FTW
Years have gone by since I've updated this blog. But, no time like the present to change that. The journey of a thousand miles begins with a single step or something like that. I thought I would outline yesterday's research project. A prospective customer with a...
Aussie Blockchain Cryptokitties
I was in this band called Aussie Blockchain Cryptokitties back in college. We were so /\/\etal. I have a philosophical post agitating in my mind about the private network applications of blockachain technology. I thought I would lead with"Bitcoin goes higher and...
“BLOCKCHAIN” by Artemis Caro
I decided recently to try a new tablet. I've had a 10" Samsung Galaxy Tab A for several years and it is slow, and not new, and the 16 GB memory is a limitation. I purchased an Amazon HD Fire 8 and decided to try their "Kindle unlimited" offer. I can then...