Humanity’s Last Exam
Benchmarks are interesting.
Here’s the deep thought – at what point in the overall benchmark process will AI inject bias into the benchmark test? And to what end? Maybe not so deep a thought.
Humanity’s Last Exam has been bantered about extensively. Here’s a great place to catch up on it: Humanity’s Last Exam
My musings: check out the crazy difficulty of the questions:
So 2500 questions of this caliber of difficulty. The top AI models hit 20% accuracy in answering.
I would also note the Calibration Error, which is affirms that “ Given low performance on Humanity’s Last Exam, models should be calibrated, recognizing their uncertainty rather than confidently provide incorrect answers, indicative of confabulation/hallucination. To measure calibration, we prompt models to provide both an answer and their confidence from 0% to 100%. ” The better performing models – OpenAI o3 and o4-mini and Gemini 2.5 Pro – also have better Calbration Error numbers.
Perplexed-ity?
Perplexed-ity? I came across this blog post from Cloudfare: Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives I've read and heard a lot of positive things about Perplexity's Comet browser. I want to like them and I want to cheer...
Advanced Google Analytics Academy, part 1
I've completed the first part (of four) areas in the Google Analytics Academy - Advanced Google Analytics course. I've come to several conclusions: the information builds nicely upon material covered in the Basic Google Analytics course. at the bottom of the page of...
Google Analytics Academy
Oh, if only the Google Analytics Academy could be as cool as Starfleet Academy. I spent a couple hours off and on going through their beginner course and they gave me a certificate! Thanks Google! Seriously, though, the course is a nice primer into...
CodeVA
I attended RVATechJam's Tuesday luncheon that featured CodeVA as the program. I am a big proponent of Computer Science programming/literacy and applaud their efforts.
RVATechJam
I spent my lunch hour listening to "Telling Stories with Data" technology panel / discussion hosted by RVATech. I enjoyed the panel discussion. Altria's use and command of big data should be telling to all businesses looking for an edge - big data can...
The secret language of Vote for Trump!
I recently launched my completely unheralded food blog mostly as a way to play around with Elegant Themes Divi theme. I do, however, have lots of various food pictures I'd like to categorize more for my own recorded history. If someone stumbles across...
Google Voice
I've been working a bit to update content on TQuist website. I had gotten a Google Voice number years ago but then never used it, and after a time it was recycled. I signed up for a new one today and the process is very, very smooth. One has to bear in mind that...

