The Obtain: Rethinking AI benchmarks, and the ethics of AI brokers

November 26, 2024

2

Each time a brand new AI mannequin is launched, it’s usually touted as acing its efficiency towards a collection of benchmarks. OpenAI’s GPT-4o, for instance, was launched in Might with a compilation of outcomes that confirmed its efficiency topping each different AI firm’s newest mannequin in a number of assessments.

The issue is that these benchmarks are poorly designed, the outcomes arduous to duplicate, and the metrics they use are regularly arbitrary, in keeping with new analysis. That issues as a result of AI fashions’ scores towards these benchmarks decide the extent of scrutiny they obtain.

AI firms regularly cite benchmarks as testomony to a brand new mannequin’s success, and people benchmarks already kind a part of some governments’ plans for regulating AI. However proper now, they won’t be adequate to make use of that means—and researchers have some concepts for a way they need to be improved.

—Scott J Mulligan

We have to begin wrestling with the ethics of AI brokers

Generative AI fashions have grow to be remarkably good at conversing with us, and creating photographs, movies, and music for us, however they’re not all that good at doing issues for us.

AI brokers promise to vary that. Final week researchers printed a brand new paper explaining how they skilled simulation brokers to duplicate 1,000 folks’s personalities with gorgeous accuracy.

AI fashions that mimic you possibly can exit and act in your behalf within the close to future. If such instruments grow to be low-cost and straightforward to construct, it’s going to elevate a number of new moral considerations, however two specifically stand out. Learn the complete story.

—James O’Donnell

The Obtain: Rethinking AI benchmarks, and the ethics of AI brokers

Related Articles

Cisco unveils SD-WAN Configuration Catalog to streamline industrial deployments

Revenera survey finds that product utilization insights gas higher roadmaps

3 knowledge engineering tendencies driving Kafka, Flink, and Iceberg

LEAVE A REPLY Cancel reply

Latest Articles

Cisco unveils SD-WAN Configuration Catalog to streamline industrial deployments

Revenera survey finds that product utilization insights gas higher roadmaps

3 knowledge engineering tendencies driving Kafka, Flink, and Iceberg

How Cisco Makes use of Isovalent to Safe Cloud Workloads

From Service to Safety: My Path to Empowerment at Cisco