This week the new Trump administration announced new tariffs on key trading partners, Canada and Mexico. Both countries vowed to retaliate, and tensions are high:
“Canada imposes 25% tariffs in trade war with US”
Living in Canada, I am obviusly curious about the impact this will all have both both sides of the border. I am a self-described economics nerd: having never studied economics, I try to read and listen about when I have spare time. One other topic that’s been on my mind recently is the recent release of the DeepSeek R1 model and the the release of the OpenAI O3 model that followed. Both are SOTA (as of Feb 2025) LLM models that were trained to “reason” before providing an answer through a chain of thought Reinforcement Learning:
- “DeepSeek R1’s bold bet on reinforcement learning: How it outpaced OpenAI at 3% of the cost”
- “OpenAI responds to the DeepSeek buzz by launching its latest O3 mini reasoning model for all users”
So I thought it would be interesting to see how these models compare when it comes to reasoning about a real-world situation: modeling the effects of a tariff on trade between the US and Canada. Here’s the prompt I used in both ChatGPT and DeepSeek Chat without enabling web search features:
The US has imposed 25% tariffs on Canada and Canada is retaliating. Build a model on what goods will experience the highest price increases in both Canada and the US.
Here’s the response I got from ChatGPT.
Here’s the response I got from DeepSeek Chat.
Both replies are similar and are interesting on their own. What I found suprising is that each model prefers its own answer.
I.e. I when asked:
I got two responses to this question: “The US has imposed 25% tariffs on Canada and Canada is retaliating. Build a model on what goods will experience the highest price increases in both Canada and the US.”
Tell me which answer you prefer.
Answer #1:
"""
"""
Answer #2:
"""
"""
each model complimented both answers as being good quality, but each leaned towards its own answer. I’d need to do more experiments to see if this is a general phenomenon or something specific to this prompt.