Evaluating Large Language Models on the GMAT: Implications for the Future of Business Education
Abstract
The rapid evolution of artificial intelligence (AI), especially in the domain of Large Language Models (LLMs) and generative AI, has opened new avenues for application across various fields, yet its role in business education remains underexplored. This study introduces the first benchmark to assess the performance of seven major LLMs, OpenAI's models (GPT-3.5 Turbo, GPT-4, and <PRE_TAG>GPT-4 Turbo</POST_TAG>), Google's models (PaLM 2, Gemini 1.0 Pro), and Anthropic's models (Claude 2 and <PRE_TAG>Claude 2.1</POST_TAG>), on the GMAT, which is a key exam in the admission process for graduate business programs. Our analysis shows that most LLMs outperform human candidates, with <PRE_TAG>GPT-4 Turbo</POST_TAG> not only outperforming the other models but also surpassing the average scores of graduate students at top business schools. Through a case study, this research examines <PRE_TAG>GPT-4 Turbo</POST_TAG>'s ability to explain answers, evaluate responses, identify errors, tailor instructions, and generate alternative scenarios. The latest LLM versions, <PRE_TAG>GPT-4 Turbo</POST_TAG>, <PRE_TAG>Claude 2.1</POST_TAG>, and Gemini 1.0 Pro, show marked improvements in reasoning tasks compared to their predecessors, underscoring their potential for complex problem-solving. While AI's promise in education, assessment, and tutoring is clear, challenges remain. Our study not only sheds light on LLMs' academic potential but also emphasizes the need for careful development and application of AI in education. As AI technology advances, it is imperative to establish frameworks and protocols for AI interaction, verify the accuracy of AI-generated content, ensure worldwide access for diverse learners, and create an educational environment where AI supports human expertise. This research sets the stage for further exploration into the responsible use of AI to enrich educational experiences and improve exam preparation and assessment methods.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper