OpenAI เปิดโมเดล o3 ทำข้อสอบ ARC-AGI เหนือมนุษย์ แต่ค่ารันข้อละ 120,000 บาท

By: lew

on 21 December 2024 - 01:46 Tags:

Topics:

OpenAI

LLM

OpenAI เปิดตัวโมเดล o3 โมเดลที่พยายามคิดเป็นขั้น โดยมุ่งเป้าแก้ปัญหาที่ยากมาก เช่น ปัญหาคณิตศาสตร์ระดับงานวิจัยที่แม้แต่นักคณิตศาสตร์ก็อาจจะใช้เวลาเป็นวัน หรือชุดทดสอบ ARC-AGI ที่เป็นชุดทดสอบวัดไอคิว โดย เปิดตัวมาตั้งแต่ปี 2019 ที่ผ่านมาไม่มีปัญญาประดิษฐ์ใดได้คะแนนถึง 50% แม้มนุษย์จะได้คะแนนประมาณ 85%

ก่อนหน้านี้ GPT-3 ทำคะแนน ARC-AGI ได้ 0% GPT-4o ได้ 5% และ o1 ได้สูงสุด 32% ความพิเศษของ ARC-AGI คือมันเป็นปัญหาที่ต้องทำความเข้าใจกฎของแต่ละปัญหาโดยไม่ซ้ำกันเลยในแต่ละข้อ แต่ o3 ในโหมดคำนวณแบบไม่สนงบประมาณสามารถทำคะแนนได้สูงถึง 87.5% แซงหน้ามนุษย์ทั่วไปแล้ว ขณะที่โหมดปกติทำคะแนนได้ 75.7% มีต้นทุนการรันข้อละ 20 ดอลลาร์ ทาง OpenAI ไม่เปิดเผยต้นทุนการรันในโหมดไม่สนงบประมาณโดยตรง แต่อยู่ที่ประมาณ 172 เท่าตัว จากโหมดปกติ คิดเป็น 3,440 ดอลลาร์หรือประมาณ 120,000 บาท

ทาง ARC-AGI ระบุว่าที่ผ่านมาสามารถจ้างคนมานั่งแก้ปัญหาโดยใช้เงินประมาณข้อละ 5 ดอลลาร์ ตอนนี้ต้นทุนการรัน AI จึงแพงกว่ามนุษย์อยู่ดี แต่ก็คาดว่าต้นทุนการรันจะถูกลงมากในอนาคต

ทาง OpenAI จะเปิดให้สมัครเข้าไปทดสอบความปลอดภัยของ o3 กันได้ โดยหากมีการทดสอบมาพอก็น่าจะเปิดโมเดลให้ใช้งานได้เร็วๆ นี้ โดยคาดว่าจะเปิด o3-mini ให้ใช้งานได้ก่อนภายในเดือนมกราคม 2025

ที่มา - OpenAI , ARC-PRIZE

No Description

Hiring! บริษัทที่น่าสนใจ

Kiatnakin Phatra Financial Group

Financial Service

CDG GROUP

Provider of IT solutions to public, state, and private sectors in Thailand for over 56 years

United Information Highway Co., Ltd.

UIH is Thailand’s leading Digital Infrastructure and Solution Provider for Business

Comments

By: au8ust

on 21 December 2024 - 08:44 #1329310

อีกส่วนสำคัญที่เขียนไว้ในบล็อก ARC-AGI

So is it AGI?ARC-AGI serves as a critical benchmark for detecting such breakthroughs, highlighting generalization power in a way that saturated or less demanding benchmarks cannot. However, it is important to note that ARC-AGI is not an acid test for AGI – as we've repeated dozens of times this year. It's a research tool designed to focus attention on the most challenging unsolved problems in AI, a role it has fulfilled well over the past five years.

Passing ARC-AGI does not equate to achieving AGI, and, as a matter of fact, I don't think o3 is AGI yet. o3 still fails on some very easy tasks, indicating fundamental differences with human intelligence.

Furthermore, early data points suggest that the upcoming ARC-AGI-2 benchmark will still pose a significant challenge to o3, potentially reducing its score to under 30% even at high compute (while a smart human would still be able to score over 95% with no training). This demonstrates the continued possibility of creating challenging, unsaturated benchmarks without having to rely on expert domain knowledge. You'll know AGI is here when the exercise of creating tasks that are easy for regular humans but hard for AI becomes simply impossible.

ก็ยังคงต้องรอดูกันต่อไปว่าใครจะไปถึง AGI แท้ได้ก่อนกัน

By: zda98

on 21 December 2024 - 17:05 #1329324

o1 ราคาเกินเอื้อม o3 จะราคาเท่าไรเนี้ย