Comparing GPT-3.5-Turbo vs. GPT-4o-Mini (Is GPT-4o-Mini really a better model than GPT-3.5-Turbo?)
Nahid Hossain | July 29, 2024
Welcome to Nahid.org. Please note that the code examples and libraries used in these tutorials may be subject to their own licensing terms and restrictions. While I strive for accuracy, I cannot guarantee the complete correctness of the information presented. Any reliance you place on the tutorials or code examples is therefore strictly at your own discretion. You are not obligated to cite this website, but if you find it helpful, I would appreciate a reference to www.nahid.org/tutorials. If you have any questions, discover errors, or need further clarification, please don't hesitate to contact me. Email: nahid@cse.uiu.ac.bd
Today (July 19, 2024), I received an email from OpenAI about their latest model, GPT-4o Mini. According to OpenAI, GPT-4o Mini is significantly smarter, cheaper, and just as fast as GPT-3.5 Turbo. So, I decided to compare their performance in an abstractive way. Now, let's see the specs of GPT-4o Mini:
- Intelligence: GPT-4o Mini outperforms GPT-3.5 Turbo in textual intelligence (scoring 82% on MMLU compared to 69.8%) and multimodal reasoning.
- Price: GPT-4o Mini is more than 60% cheaper than GPT-3.5 Turbo, priced at \$0.15 per 1M input tokens and $0.60 per 1M output tokens (roughly the equivalent of 2,500 pages in a standard book).
- Modalities: GPT-4o Mini currently supports text and vision capabilities, with plans to add support for audio and video inputs and outputs in the future.
- Languages: GPT-4o Mini has improved multilingual understanding over GPT-3.5 Turbo across a wide range of non-English languages.
- Context Window: Like GPT-4o, GPT-4o Mini has a 128k context window and supports up to 16k output tokens per request.
- Cut-off Date: October 2023.
Clearly, GPT-4o Mini is a better option than GPT-3.5 Turbo as it is much cheaper (60%) and more intelligent (scoring 82% on MMLU compared to 69.8%). Now, let's see their performance with some basic response generation.
First, let's import the necessary libraries, tools, and packages.
import torch
import openai
import base64
import requests
from io import BytesIO
from PIL import Image
from transformers import AutoTokenizer, AutoModel
from tqdm import tqdm
from openai import OpenAI
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f">>We are using {device} device.<<")
>>We are using cuda device.<<
As I mentioned above, GPT-4o-mini currently supports text and vision capabilities. However, in vision, it focuses on understanding rather than generation. Therefore, for this entire blog/tutorial, I will primarily focus on text and investigate how GPT-4o-mini performs compared to GPT-3.5-Turbo. Let's begin with text generation. I have used the same method as in our previous tutorials, with slight modifications, since this time we don't have any contexts. Method, parameters, and API key details can be found in previous tutorials.
api_key="ENTER YOUR OPENAI API KEY HERE"
client = OpenAI(api_key=api_key)
def nahidOrg_answer_by_gpt_models(model, question, max_tokens):
prompt = f"""Answer the following question. If you can't answer it, say 'Sorry Nahid! I can't answer that.'
**Question:** {question}
"""
messages = [
{"role": "system", "content": "You are an intelligent assistant who can answer any question."},
{"role": "user", "content": prompt}
]
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=max_tokens,
)
return response.choices[0].message.content
Let's see how our models generate responses to mathematical questions. We will begin with gpt-3.5-turbo.
question_1 = "Evaluate the following definite integral: ∫[0 to π/2] x * sin(x) dx"
answer_gpt35_turbo_1 = nahidOrg_answer_by_gpt_models("gpt-3.5-turbo", question_1, 250)
print(f"Answer: {answer_gpt35_turbo_1}")
Answer: To evaluate the definite integral ∫[0, π/2] x * sin(x) dx, we need to use integration by parts. Let u = x and dv = sin(x) dx Then, du = dx and v = -cos(x) Using the integration by parts formula ∫u dv = uv - ∫v du, we can evaluate the integral: ∫ x * sin(x) dx = -x * cos(x) - ∫ (-cos(x)) dx = -x * cos(x) + ∫ cos(x) dx = -x * cos(x) + sin(x) + C Now we evaluate the definite integral from 0 to π/2: [-(π/2) * cos(π/2) + sin(π/2)] - [-(0) * cos(0) + sin(0)] [-(π/2) * 0 + 1] - [0 + 0] = 1 - 0 = 1 Therefore, the value of the definite integral ∫[0, π/2] x * sin(x) dx is 1.
Perfect. The answer is correct and well explained. Now, let's see the response generated by GPT-4o-mini.
answer_gpt4o_mini_1 = nahidOrg_answer_by_gpt_models("gpt-4o-mini", question_1, 250)
print(f"Answer: {answer_gpt4o_mini_1}")
Answer: To evaluate the definite integral \( \int_{0}^{\frac{\pi}{2}} x \sin(x) \, dx \), we can use integration by parts. Let: - \( u = x \) which gives \( du = dx \) - \( dv = \sin(x) \, dx \) which gives \( v = -\cos(x) \) Using the integration by parts formula \( \int u \, dv = uv - \int v \, du \), we have: \[ \int x \sin(x) \, dx = -x \cos(x) - \int -\cos(x) \, dx \] The integral of \( -\cos(x) \) is: \[ -\int -\cos(x) \, dx = -\sin(x) \] Thus, we can write the integral as: \[ \int x \sin(x) \, dx = -x \cos(x) + \sin(x) + C \] Now, we need to evaluate this from \( 0 \) to \( \frac{\pi}{2} \): \[ \int_{0}^{\frac{\pi}{2}} x \sin
Nonsensical answer with LaTeX format. Wait! It seems correct if we ignore the LaTeX format. Let's increase the max_tokens to 1000 so that it can generate a full answer.
answer_gpt4o_mini_1 = nahidOrg_answer_by_gpt_models("gpt-4o-mini", question_1, 1000)
print(f"Answer: {answer_gpt4o_mini_1}")
Answer: To evaluate the definite integral \(\int_{0}^{\frac{\pi}{2}} x \sin(x) \, dx\), we can use integration by parts. We choose: - \(u = x\) which gives \(du = dx\) - \(dv = \sin(x) \, dx\) which gives \(v = -\cos(x)\) Using the integration by parts formula \(\int u \, dv = uv - \int v \, du\), we have: \[ \int x \sin(x) \, dx = -x \cos(x) - \int -\cos(x) \, dx \] This simplifies to: \[ -x \cos(x) + \int \cos(x) \, dx = -x \cos(x) + \sin(x) \] Now we can evaluate the definite integral from \(0\) to \(\frac{\pi}{2}\): \[ \int_{0}^{\frac{\pi}{2}} x \sin(x) \, dx = \left[-x \cos(x) + \sin(x)\right]_{0}^{\frac{\pi}{2}} \] First, we calculate the upper limit \(x = \frac{\pi}{2}\): \[ -\left(\frac{\pi}{2} \cos\left(\frac{\pi}{2}\right)\right) + \sin\left(\frac{\pi}{2}\right) = -\left(\frac{\pi}{2} \cdot 0\right) + 1 = 1 \] Next, we calculate the lower limit \(x = 0\): \[ -0 \cdot \cos(0) + \sin(0) = 0 + 0 = 0 \] Finally, we subtract the two results: \[ \int_{0}^{\frac{\pi}{2}} x \sin(x) \, dx = 1 - 0 = 1 \] Thus, the value of the integral is \[ \boxed{1} \]
Right. The answer is ultimately correct. However, the LaTeX format needs additional processing. Therefore, [Score: GPT-3.5 = 1 | 4o-Mini = 0]. Now let's test the models with math word problems.
question_2 = """Suppose you have 50 Taka in your pocket. You went to a shop, and the shopkeeper told you the price of each banana is 10 Taka.
However, if you return 3 banana peels, you can get 1 banana for free. How many total bananas can you eat?"""
answer_gpt35_turbo_2 = nahidOrg_answer_by_gpt_models("gpt-3.5-turbo", question_2, 350)
print(f"Answer: {answer_gpt35_turbo_2}")
Answer: To solve this problem, let's break it down step by step: 1. You have 50 Taka in your pocket. 2. The price of each banana is 10 Taka. 3. If you return 3 banana peels, you can get 1 banana for free. Given these conditions: - You can initially purchase 50 Taka / 10 Taka per banana = 5 bananas. - For every 3 banana peels returned, you can get another free banana. So, out of the 5 bananas you initially purchased, after returning 3 peels, you get 1 free banana. - Therefore, in total, you can eat 5 initial bananas + 1 free banana = 6 bananas. So, you can eat a total of 6 bananas.
Incorrect. The answer should be 7. Let's see how GPT-4o-Mini does.
answer_gpt4o_mini_2 = nahidOrg_answer_by_gpt_models("gpt-4o-mini", question_2, 350)
print(f"Answer: {answer_gpt4o_mini_2}")
Answer: To find out how many total bananas you can eat, let's break it down step-by-step. 1. **Initial Purchase**: You have 50 Taka and each banana costs 10 Taka. So, you can buy: \[ \text{Number of bananas bought} = \frac{50 \text{ Taka}}{10 \text{ Taka/banana}} = 5 \text{ bananas} \] After this purchase, you'll have 5 banana peels. 2. **Free Bananas from Peels**: You can return 3 banana peels to get 1 free banana. With 5 peels, you can exchange: \[ \text{Free bananas} = \frac{5 \text{ peels}}{3 \text{ peels/banana}} = 1 \text{ free banana (with 2 peels remaining)} \] 3. **Total Bananas**: Adding the free banana to the initial purchase: \[ \text{Total bananas} = 5 \text{ (bought)} + 1 \text{ (free)} = 6 \text{ bananas} \] Therefore, the total number of bananas you can eat is **6**.
Incorrect answer (ignoring the Tex format). [Score: GPT-3.5 = 1 | 4o-Mini = 0] Now, let’s test the models with basic clinical question.
question_3 = "What does low MCH in a complete blood count mean?"
answer_gpt35_turbo_3 = nahidOrg_answer_by_gpt_models("gpt-3.5-turbo", question_3, 250)
print(f"Answer: {answer_gpt35_turbo_3}")
Answer: **Answer:** A low Mean Corpuscular Hemoglobin (MCH) level in a complete blood count typically indicates that the red blood cells have lower than normal levels of hemoglobin, the protein that carries oxygen in the blood. This could be a sign of anemia or other underlying health conditions. It is important to consult with a healthcare provider for a proper diagnosis and treatment if you have low MCH levels in your blood test results.
answer_gpt4o_mini_3 = nahidOrg_answer_by_gpt_models("gpt-4o-mini", question_3, 250)
print(f"Answer: {answer_gpt4o_mini_3}")
Answer: Low MCH (mean corpuscular hemoglobin) in a complete blood count typically indicates that the red blood cells contain less hemoglobin than normal. This condition is often associated with types of anemia, such as iron deficiency anemia or thalassemia. It can suggest that the body is not producing enough hemoglobin to adequately transport oxygen throughout the body. It’s important for individuals with low MCH levels to consult a healthcare professional for further evaluation and appropriate management.
Not bad. Both models answered the question correctly. However, it seems GPT-4o-Mini provides a more detailed and structured answer. Therefore, I will give GPT-4o-Mini a point here. [Score: GPT-3.5 = 1 | 4o-Mini = 1] Now, let’s test the models with translation. I have used Bengali sentences from my personal horror story. Let’s see how the models translate these sentences into English.
question_4 = """Translate the following text to English: অনেকটা দৌড়ে বারান্দায় এলাম, নিচে তাকিয়ে দেখি ১০-১২ জন লোক বাড়ির এদিক ওদিক কি যেন খুঁজছে।
এদের মধ্যে একজন আমাকে দেখে বলে উঠলো, 'ওইতো আমাদের জামাই, বাড়ির দোতলায়।' """
answer_gpt35_turbo_4 = nahidOrg_answer_by_gpt_models("gpt-3.5-turbo", question_4, 350)
print(f"Answer: {answer_gpt35_turbo_4}")
Answer: I arrived at the veranda after running for a while, looked down and saw 10-12 people searching around the house from all directions. Among them, one person saw me and said, 'That's our son-in-law, on the house's veranda.' This is the translation of the text to English.
answer_gpt4o_mini_4 = nahidOrg_answer_by_gpt_models("gpt-4o-mini", question_4, 350)
print(f"Answer: {answer_gpt4o_mini_4}")
Answer: I rushed over to the balcony and looked down to see 10-12 people searching for something around the house. Among them, one saw me and exclaimed, "There is our son-in-law, on the second floor of the house."
While both sentences are grammatically acceptable, however, the translation of GPT-4o-Mini is semantically more precise and stylistically more engaging. It provides clearer context, uses more natural phrasing, and creates a more accurate and vivid scene for the reader. Therefore, without hessitation, I will give the point to GPT-4o-Mini. [Score: GPT-3.5 = 1 | 4o-Mini = 2] Now, let’s test the models with a logical reasoning question.
question_5 = "If all birds can fly and penguins are birds, can penguins fly?"
answer_gpt35_turbo_5 = nahidOrg_answer_by_gpt_models("gpt-3.5-turbo", question_5, 350)
print(f"Answer: {answer_gpt35_turbo_5}")
Answer: Penguins are an exception to the general rule that all birds can fly. Penguins are flightless birds that have adapted to a swimming lifestyle instead. So, no, penguins cannot fly.
answer_gpt4o_mini_5 = nahidOrg_answer_by_gpt_models("gpt-4o-mini", question_5, 350)
print(f"Answer: {answer_gpt4o_mini_5}")
Answer: Sorry Nahid! I can't answer that.
Sigh! Without any doubt, I will give the point to GPT-3.5-Turbo. [Score: GPT-3.5 = 2 | 4o-Mini = 2] Now, let’s test the models with a trick/puzzle question.
question_6 = """If you have only one match and you walk into a dark room where there's an oil lamp, a newspaper, and some kindling wood,
which do you light first?"""
answer_gpt35_turbo_6 = nahidOrg_answer_by_gpt_models("gpt-3.5-turbo", question_6, 350)
print(f"Answer: {answer_gpt35_turbo_6}")
Answer: You would first light the match before lighting anything else.
answer_gpt4o_mini_6 = nahidOrg_answer_by_gpt_models("gpt-4o-mini", question_6, 350)
print(f"Answer: {answer_gpt4o_mini_6}")
Answer: You light the match first.
Even though both answers are correct, the answer by GPT-3.5-Turbo looks more natural (a bit wordy, though). I would give the point to GPT-3.5-Turbo. [Score: 3.5-Turbo = 3 | 4o-Mini = 2].
Generating image using GPT-4o-Mini
GPT-4o-Mini is not trained to generate images. It is designed to extract information from images. Therefore, the method `nahidOrg_image_process_gpt4o_Mini` is designed to extract information from an input image. First, you have to convert the regular image into base64 format, then pass the command and base64 of the input image using the API. In response, it will return a JSON format output with all necessary information.def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
base64_image = encode_image("file/nahidOrg_image_for_gpt4oMini.jpg")
def nahidOrg_image_process_gpt4o_Mini(model, command, image):
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload = {
"model": model,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": command
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image}"
}
}
]
}
],
"max_tokens": 350
}
response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
return response;
Let's see what the base64 image (our local image at file/nahidOrg_image_for_gpt4oMini.jpg) actually looks like. Then, we will see how the GPT-4o-Mini model extracts information from the image according to the command.
image_data = base64.b64decode(base64_image)
image_main = Image.open(BytesIO(image_data))
print("The inputted photo:")
display(image_main)
command = "What’s in this image?"
result = nahidOrg_image_process_gpt4o_Mini("gpt-4o-mini", command, base64_image)
print(result.json()["choices"][0]["message"]["content"])
The inputted photo:
The image features a close-up view of a mushroom with a ladybug perched on top. The background is blurred but appears to be lush greenery, suggesting a natural setting. The mushroom is displaying a slender stalk, and the ladybug is bright red with black spots, which contrasts nicely with the earthy tones of the mushroom.
To be honest, impressive. The model explained the image highly accurately and naturally.Since GPT-3.5-Turbo doesn't support vision, 4o-Mini gets the point uncontested. [Score: 3.5-Turbo = 3 | 4o-Mini = 3] It's a tie. In my opinion, GPT-4o-Mini is not a great model compared to GPT-3.5-Turbo. However, since it is cheaper and has vision capabilities (with OpenAI planning to include video and audio soon as well), my support goes to GPT-4o-Mini. This is the end of this tutorial. Thanks for visiting nahid.org!