AI LLM Models Pass CFA Level III Exam


In 2024, a study by J.P. Morgan AI Research and Queen’s University found that leading proprietary artificial intelligence models could pass the CFA Level I and II mock exams, but they struggled with the essay portion of the Level III exam. A new research study has found that today’s leading large language models can now clear the CFA Level III exam, including the essay portion. The CFA Level III is widely known as one of the most difficult professional exams in the finance industry.

The new research was conducted by the NYU Stern School of Business and Goodfin, an AI wealth platform for exclusive private market investments. It set out to assess the capabilities of large language models in specialized domains like finance.

The study, Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III, benchmarked 23 leading AI models, including Open AI’s GPT-4, Google’s Gemini 2.5 and Anthropic’s Claude Opus 4, against the CFA Level III mock exam. LLMs are a subset of generative AI that are applied to perform language-related tasks specifically.

The study found that Open AI’s o4-mini model had a composite score of 79.1%, while Gemini’s 2.5 Flash model scored 77.3%. While most models performed well on multiple-choice questions, only a few excelled at the essay prompts, requiring analysis, synthesis and strategic thinking.

Related:The WealthStack Podcast: AI, Advice and the Future of Financial Planning with Ken Lotocki

That said, NYU Stern Professor Srikanth Jagabathula said the reasoning-based LLMs of the recent past have shown immense capabilities in performing tasks that require a lot of quantitative and critical thinking, such as the essay portion. The models now have the ability to think through the problem and provide a reasoning for the response.

To grade the essay portion, Jagabathula had another LLM act as a judge, giving the LLM the essay response, the true response, some context about the question and a grading rubric. He also had that same set of responses graded by a certified human grader. They found that the LLM was actually stricter than the human, assigning fewer overall points to the same question.

“We thought they would be more lenient in assigning their grades, but we found in this case at least that it is the other way around,” he said.

The study also found that prompting the AI models drove performance on the essay portion. Specifically, they used chain-of-thought prompting, which involves asking the LLM to think through the response and provide a reasoning. That process yielded a better answer from the LLM than a more direct response. In fact, this boosted essay accuracy by 15 percentage points.

Related:Q&A: What’s Behind Mark Casady’s New Role at FMG

In response to the study’s results, Chris Wiese, managing director of education at CFA Institute, pointed out that the qualifications for the CFA designation go beyond passing all three exams. They also require 4,000 hours of qualifying work experience, a minimum of two references, an attestation of following the CFA code of ethics and standards, and completion of hands-on practical skills modules.

“Without knowing the details of how this study was conducted, we can only note that at CFA Institute, we continue to believe that a combination of trust, human relationships, sound ethical judgment and professionalism are as important as ever in financial markets,” Wiese said. “Our own research shows that AI will continue to grow in utility and efficacy for investment managers—just as it is across a range of disciplines and industries—and we are committed to keeping our members, candidates and the profession abreast of these opportunities.”

When asked whether an LLM could perform the job of a CFA professional, Jagabathula said it’s difficult to forecast what capabilities the models will develop. But he pointed to some preliminary results of a small-scale study he’s conducting, in which a set of users were asked to interact with both the AI model and a human for financial advice.

Related:SigFig Rebrands as Tandems; Rolls Out AI-Embedded Tools

“What we found was, the LLM was often quite good at giving very precise answers to specific questions for which there was a precise answer, but they often struggled in capturing context that was not very explicitly stated by the user. And at least in some cases they could not,” he said. “The end user found it a little bit difficult to trust the system. So, as of now, it seems clear that LLMs can significantly augment the abilities of existing financial professionals. As to whether they can actually replace them, the jury is still out.”




#LLM #Models #Pass #CFA #Level #III #Exam

Leave a Reply

Your email address will not be published. Required fields are marked *