Friday, November 22, 2024

The Obtain: GPT-4o’s polluted Chinese language coaching information, and astronomy’s AI problem

Quickly after OpenAI launched GPT-4o final Monday, some Chinese language audio system began to note that one thing appeared off about this latest model of the chatbot: the tokens it makes use of to parse textual content had been filled with spam and porn phrases.

People learn in phrases, however LLMs learn in tokens, that are distinct models in a sentence which have constant and vital meanings. GPT-4o is meant to be higher than its predecessors at dealing with multi-language duties, and lots of the advances had been achieved by means of a brand new tokenization device that does a greater job compressing texts in non-English languages.

However, a minimum of in terms of the Chinese language language, the brand new tokenizer utilized by GPT-4o has launched a disproportionate variety of meaningless phrases—and consultants say that’s doubtless because of inadequate information cleansing and filtering earlier than the tokenizer was educated. If left unresolved, it may result in hallucinations, poor efficiency, and misuse. Learn the complete story.

—Zeyi Yang

Astronomers are enlisting AI to arrange for a knowledge downpour

In deserts throughout Australia and South Africa, astronomers are planting forests of metallic detectors that can collectively scour the cosmos for radio indicators. When it boots up in 5 years or so, the Sq. Kilometer Array Observatory will search for new details about the universe’s first stars and the totally different phases of galactic evolution. 

However after synching tons of of 1000’s of dishes and antennas, astronomers will shortly face a brand new problem: combing by means of some 300 petabytes of cosmological information a yr—sufficient to fill one million laptops. So in preparation for the data deluge, astronomers are turning to AI for help. Learn the complete story.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles