Generation comparison between Stable LM 2 Chat, GPT-3.5, GPT-4o, and GPT-4.0¶
Welcome to Nahid.org. Please note that the code examples and libraries used in these tutorials may be subject to their own licensing terms and restrictions. While I strive for accuracy, I cannot guarantee the complete correctness of the information presented. Any reliance you place on the tutorials or code examples is therefore strictly at your own discretion. You are not obligated to cite this website, but if you find it helpful, I would appreciate a reference to www.nahid.org/tutorials. If you have any questions, discover errors, or need further clarification, please don't hesitate to contact me. Email: nahid@cse.uiu.ac.bd
Tutorial-6 is a continuation of Tutorial-4 and Tutorial-5. I suggest completing those tutorials first before proceeding with this one. However, since this tutorial (Tutorial-6) is self-contained and has a different focus, you can skip those tutorials if you prefer. In this tutorial, we will compare the answer generation by Stable LM 2 Chat, GPT-3.5, GPT-4.0, and GPT-4.0-turbo. Stable LM 2 Chat is developed by Stability AI, while the rest of the GPT models are developed by OpenAI. My plan is to use the same approach mentioned in Tutorial-4 and Tutorial-5 to prepare the questions and context for the models. Then, we will generate answers using these four models. Let's begin.
Once again, for this tutorial, we will demonstrate RAG on a news article dataset. I have downloaded a CNN news article dataset from Kaggle (https://www.kaggle.com/datasets/hadasu92/cnn-articles-after-basic-cleaning). The 23.1 MB CSV dataset contains 4,729 articles with columns 'Index', 'Author', 'Date published', 'Category', 'Section', 'Url', 'Headline', 'Description', 'Keywords', 'Second headline', and 'article'. For simplicity, in this tutorial, we used only the main article body ('article' column).
First, let's import the necessary libraries, tools, and packages.
import pandas as pd
import torch
import os
import faiss
import numpy as np
import pickle
import openai
from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
from tqdm import tqdm
from transformers import T5Tokenizer, T5ForConditionalGeneration
from openai import OpenAI
from torch.nn.functional import softmax
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f">>We are using {device} device.<<")
>>We are using cuda device.<<
Now, import the dataset. Since the dataset has already been cleaned by the author, I decided to use it as it is. Due to computational limitations, I decided to use only the articles with a length of 512 words or less, creating a new dataset called df_filtered
.
df = pd.read_csv('nahidorg_files/news_articles.csv')
df['word_count'] = df['article'].astype(str).apply(lambda x: len(x.split()))
df_filtered = df[df['word_count'] <= 512].copy()
df_filtered = df_filtered.reset_index(drop=True)
Let's take a look at a random article (maybe 57?) in our filtered dataset.
df_filtered['article'][57]
' (CNN)Joe Biden has been president for a little over a year. And that year was not kind to him. In a new NPR/PBS/Marist College poll, more than half -- 56% -- of Americans said that Biden\'s first year in office was a "failure," while just 39% described it as a success. The news doesn\'t get better the more you dig into the survey. Two-thirds of independents said Biden\'s first year was a failure, while more than 9 in 10 Republicans (91%) agreed with that assessment.Read More Biden\'s numbers are better among Democrats -- 80% called year one a success -- but 15% of members of his own party described his first year in office as a failure. Now, asking such a binary question -- either Biden\'s first year was a success or a failure, with no room in the middle -- does tend to strip any nuance from issue. There are incredible complexities that go into assessing how a president has done. Oftentimes, a president is judged in one way during his time in office and in another after he leaves, once the impacts of his policies come into clearer focus. That said, elections tend to force voters to think in this all-or-nothing way. Either you vote for a Democrat or for a Republican. Either you vote to re-elect your incumbent or you choose the challenger.Seen through that political lens, these poll numbers are extremely problematic for Democrats on the ballot this fall. We know that, historically, the first midterm election of a president\'s term is a referendum on his time in office up to that point. The Point: If the public\'s report card on Biden\'s second year in office is anything like the one for his first year, Democrats can kiss their House and Senate majorities goodbye.'
Looks good. Let's save the new dataset with the word counts to our local storage.
df_filtered.to_csv('nahidOrg_files/news_articles_filtered.csv')
Now, it's time for embeddings. Embeddings capture the semantic meaning of each article, allowing us to compare articles and find similar ones to the question. I have decided to use theall-MiniLM-L6-v2
(https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) pretrained sentence transformer model, which is specifically designed to convert sentences (or whole articles) into numerical representations (embeddings). You can use any other model of your choice.
tokenizer_for_embeddings = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
model_for_embeddings = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2").to(device)
Now, the method nahidOrg_generate_article_embeddings
uses the model to tokenize and generate embeddings for all the articles. The dataset's 'article' column is converted into a list, and using a loop, embeddings are generated one by one. The embeddings of all articles are then combined into article_embeddings
.
def nahidOrg_generate_embeddings(text):
inputs = tokenizer_for_embeddings(text, return_tensors="pt", padding=True, truncation=True, max_length=512).to(device)
with torch.no_grad():
outputs = model_for_embeddings(**inputs)
return outputs.last_hidden_state.mean(dim=1).squeeze().cpu().numpy()
articles = df_filtered['article'].tolist()
article_embeddings = []
for doc in tqdm(articles, desc="Generating Embeddings For All Articles:"):
article_embeddings.append(nahidOrg_generate_embeddings(doc))
Generating Embeddings For All Articles:: 100%|██████████| 1696/1696 [00:15<00:00, 107.44it/s]
The all-MiniLM-L6-v2
model maps each article to a 384-dimensional dense vector space. Let's examine the embeddings for the article with index 57 and check the length of each article in the vector space.
print(f"Dimension: {len(article_embeddings[57])} \nVectors: {article_embeddings[57]}" )
Dimension: 384 Vectors: [-2.01892834e-02 -1.14912458e-01 1.79007471e-01 -4.92094830e-02 -5.30712046e-02 7.04352325e-03 -6.71370029e-02 2.57717613e-02 2.37255320e-02 -9.33224242e-03 -8.10500458e-02 7.36312866e-02 1.02148175e-01 -8.13134760e-02 -3.62228192e-02 3.80506255e-02 -7.86402747e-02 -2.01281793e-02 5.02002379e-03 1.45729259e-01 1.04409298e-02 -1.39027685e-02 1.11154690e-02 -6.20222762e-02 9.10239220e-02 -7.37913623e-02 -4.15083878e-02 3.25369611e-02 -1.23255804e-01 2.53378414e-02 1.19637847e-01 8.17944389e-03 2.00212076e-02 -2.43159775e-02 1.35595892e-02 -8.83679613e-02 -3.15758251e-02 3.64803709e-02 7.26935565e-02 -9.51444544e-03 2.54194271e-02 -8.85571241e-02 -2.54622865e-02 -1.14408731e-01 -5.08983135e-02 -6.31365925e-02 3.24245319e-02 3.40552106e-02 5.87988272e-02 -1.88004668e-03 -6.26266152e-02 1.66669995e-01 1.19201124e-01 4.45382930e-02 6.01483397e-02 -8.27206299e-03 -9.85154882e-02 7.78866783e-02 -1.53992241e-02 1.42525025e-02 -3.11543774e-02 5.14553487e-02 -1.19240202e-01 4.12306329e-03 -7.37935677e-02 -2.59105470e-02 -1.80375129e-02 -6.41717166e-02 -1.26709357e-01 1.05186462e-01 4.82414998e-02 8.39416906e-02 -9.23505872e-02 2.80472543e-02 1.91233698e-02 -3.86493616e-02 6.80598915e-02 1.81943759e-01 7.24544674e-02 2.53408682e-02 2.24920530e-02 6.29734546e-02 -7.23356307e-02 4.58484553e-02 4.65138964e-02 -2.12726882e-03 3.29738073e-02 -7.56233409e-02 -5.63905947e-02 -2.50272937e-02 -1.33154660e-01 -9.14965477e-03 -3.69019061e-02 5.35627268e-02 2.02158943e-01 -1.00977151e-02 1.55723676e-01 9.80109349e-02 6.90655503e-03 9.96548012e-02 -1.07611544e-01 2.88127866e-02 -6.42951652e-02 -1.36958063e-01 5.53582385e-02 -2.78255343e-02 -2.55129691e-02 -4.73257825e-02 -8.44465941e-03 1.20440172e-02 -4.25070897e-02 -1.04748830e-01 1.21579366e-02 3.03675979e-03 5.20857424e-02 -2.02560917e-01 -3.79728228e-02 9.63516906e-03 -1.09964507e-02 9.28812623e-02 1.96290556e-02 7.86204413e-02 1.03023462e-02 -5.65868020e-02 1.00512005e-01 -1.32752120e-01 1.24412365e-02 7.16248104e-33 1.19891018e-01 -5.26563227e-02 -1.77421961e-02 1.51675925e-01 -2.12321073e-01 1.83724120e-01 -5.97830955e-03 -7.10825156e-03 4.57776971e-02 -9.86153912e-03 -1.29151061e-01 2.46302132e-02 1.03053777e-02 3.80883925e-02 1.26627252e-01 7.17195123e-02 -8.61205906e-02 1.61148757e-02 -7.43382871e-02 -5.68589531e-02 -6.19590543e-02 -8.01299140e-03 9.71254408e-02 -2.40886137e-02 6.21751361e-02 -8.09162185e-02 2.36392859e-02 -7.37583218e-03 -1.51392609e-01 3.70808644e-03 -2.84876078e-02 -5.69950938e-02 5.14408748e-04 2.88289171e-02 -1.39162689e-01 -1.02060236e-01 3.09253559e-02 3.40579301e-02 2.55051651e-04 -8.58151615e-02 -1.17827259e-01 9.74867120e-02 -2.52726750e-04 9.50914472e-02 4.28357571e-02 -2.27661822e-02 9.03079733e-02 2.51530334e-02 1.69970992e-03 5.42469434e-02 6.98917732e-02 1.77767515e-01 1.01885892e-01 8.10813680e-02 -9.86912847e-02 -1.11007482e-01 1.22445934e-02 -9.82034877e-02 -8.12226757e-02 -1.22838803e-01 -2.26601232e-02 4.77053151e-02 -6.41654432e-02 -1.08614881e-02 -3.07090618e-02 1.49837360e-01 -8.34392384e-02 1.04294000e-02 5.10281883e-03 3.26119661e-02 2.17934623e-02 -1.64406136e-01 -1.17238581e-01 -1.65872395e-01 1.78990573e-01 6.91657364e-02 1.64821178e-01 -5.91278300e-02 1.40000418e-01 -4.61488888e-02 -3.08363363e-02 -1.33843258e-01 1.85990557e-01 -3.38812172e-02 -5.19311614e-02 3.06764059e-02 1.77480459e-01 -1.51981702e-02 1.30993556e-02 -1.92077085e-02 4.39916961e-02 -2.84911580e-02 -5.51112257e-02 -2.18046065e-02 3.31922211e-02 -1.04522695e-32 -1.88462228e-01 -5.79406954e-02 2.62207408e-02 1.27757281e-01 1.27181839e-02 -3.79264951e-02 -5.52784167e-02 -2.30416469e-02 -5.83647713e-02 -2.54219115e-01 -2.61844024e-02 1.71815734e-02 4.38315831e-02 7.52230585e-02 -7.06655085e-02 6.00856207e-02 -2.07208153e-02 -1.67866483e-01 -5.72751127e-02 2.57421471e-02 9.98094380e-02 3.17116439e-01 -1.04443654e-01 1.74125526e-02 -9.59867090e-02 -8.14053193e-02 9.85870697e-03 -5.54250330e-02 7.70409703e-02 -1.08991869e-01 -1.89289656e-02 -4.53761034e-02 1.40071921e-02 -5.28207202e-05 1.14096463e-01 1.12393469e-01 -3.63129899e-02 -1.55875221e-01 -3.74694355e-02 1.44593731e-01 8.00276846e-02 -1.40194565e-01 2.20788550e-03 -3.04751918e-02 -3.28673087e-02 -2.96463221e-02 -5.44768460e-02 1.18773589e-02 -4.23803180e-02 5.94725125e-02 -5.59582226e-02 9.28407386e-02 -1.87613741e-02 1.03839673e-01 4.82336096e-02 -1.69674437e-02 -9.30988863e-02 3.17105800e-02 -2.40745451e-02 1.18087888e-01 -8.13131183e-02 3.80030856e-03 1.18802942e-01 -1.24176487e-01 -2.09780242e-02 3.49535681e-02 -5.48393987e-02 -5.74621744e-02 7.16111511e-02 2.26743314e-02 -3.67218740e-02 -5.19035533e-02 1.09749446e-02 -5.15675172e-03 1.32910743e-01 1.08401679e-01 -9.15631652e-02 -3.48400720e-03 -6.74914196e-02 4.15285081e-02 -1.56219482e-01 -4.18921150e-02 -7.06432983e-02 -1.18980460e-01 -3.10220085e-02 7.00604022e-02 6.02825470e-02 -2.23237425e-01 -9.32039618e-02 1.03252558e-02 5.32658678e-03 9.78887454e-03 -1.19128287e-01 1.99773479e-02 -4.64258976e-02 -1.00708078e-07 7.26036802e-02 8.89522061e-02 -2.14765109e-02 6.24277778e-02 2.73415893e-02 1.06242429e-02 1.06720245e-02 -7.04248622e-02 3.59466411e-02 -1.29500488e-02 2.42287114e-01 3.41082551e-02 -6.06526509e-02 -5.57040684e-02 1.64863933e-02 4.06796634e-02 -1.52448341e-02 6.12530559e-02 -8.88596773e-02 -1.06397524e-01 -2.21159104e-02 -1.11982645e-02 -6.37607574e-02 4.63510975e-02 4.92123440e-02 -5.90871684e-02 -3.05061154e-02 8.88455510e-02 -8.40443745e-02 -2.61514988e-02 1.06942005e-01 -5.11405133e-02 2.10148897e-02 -6.86062425e-02 2.30475832e-02 1.27767712e-01 -6.39184937e-02 4.55625989e-02 8.07546377e-02 -7.09113032e-02 4.98737767e-03 7.09691569e-02 -3.51500437e-02 3.77463177e-02 -4.36865501e-02 -3.76870893e-02 2.67856847e-02 1.61938712e-01 6.23990111e-02 -1.68922514e-01 -6.50245184e-03 9.39275622e-02 -3.83484401e-02 -2.91046929e-02 1.84935793e-01 7.51326382e-02 -4.41047400e-02 -1.81095749e-02 -5.84891066e-02 2.42759418e-02 -2.44450830e-02 4.38334122e-02 -1.06747225e-01 1.07360572e-01]
Additionally, you can save the embeddings for later use to reduce computation time for future work with the same dataset. I used the pickle library to save the embeddings, but you can use other methods or libraries as well.
with open('nahidOrg_files/article_embeddings.pkl', 'wb') as f:
pickle.dump(article_embeddings, f)
Loading the embeddings:
with open('nahidOrg_files/article_embeddings.pkl', 'rb') as f:
article_embeddings = pickle.load(f)
For efficient similarity search, I have used the FAISS
(Facebook AI Similarity Search) index. In my opinion, FAISS is excellent for similarity search. I used IndexFlatL2
indexing, which is highly accurate but a bit slow. It is perfect for small datasets like our news articles but not always suitable for larger datasets. If you are working with very large datasets, you can use IndexIVFFlat
indexing which is faster but not as accurate as IndexFlatL2
. First, we determine the dimension of the article embeddings (384). Next, the IndexFlatL2 FAISS index, which uses the L2 (Euclidean) distance metric to measure similarity between article vectors, is used.
dimension = len(article_embeddings[0])
index = faiss.IndexFlatL2(dimension)
index.add(np.array(article_embeddings))
Now, it's time to retrieve similar articles that might contain the answer to the user's question. The method nahidOrg_retrieve_documents
takes the question's embeddings and a value for k
, representing the number of articles you want to retrieve. The method returns the indices of the articles that might contain the answer.
def nahidOrg_retrieve_documents(query_embedding, k):
distances, indices = index.search(np.array([query_embedding]), k)
return indices[0]
Now, let's take the question and find its embeddings.
question = "What is the estimated financial impact on products due to the ban on Russian steel imports, according to the Commission?"
question_embeddings = nahidOrg_generate_embeddings(question)
Now, with the embeddings of the question, we will search and retrieve the k=3
closest candidate articles that might contain the answer.
retrieved_articles_indices = nahidOrg_retrieve_documents(question_embeddings, 3)
retrieved_articles_indices
array([0, 4, 2], dtype=int64)
For the question 'What is the estimated financial impact on products due to the ban on Russian steel imports, according to the Commission?
', we got three article indices, [0, 4, 2], since 𝑘 = 3
. You can adjust this according to your needs. Now, we will merge the 3 most relevant articles together to create a single context that will be used in all 4 generative models. Just like Tutorial-4 & 5, the method nahidOrg_processing_context
does this.
retrieved_articles_indices = pd.Series(retrieved_articles_indices)
retrieved_text_articles = retrieved_articles_indices.apply(lambda x: df_filtered['article'][x]).tolist()
def nahidOrg_processing_context(docs, max_tokens=1536):
processed_docs = []
total_tokens = 0
for doc in docs:
tokens = len(doc.split())
if total_tokens + tokens <= max_tokens:
processed_docs.append(doc)
total_tokens += tokens
else:
break
return "\n".join(f"- {doc}" for doc in processed_docs)
final_retrieved_contexts = nahidOrg_processing_context(retrieved_text_articles)
Let's see our context:
final_retrieved_contexts
'- The European Union formally approved on Tuesday a new barrage of sanctions against Russia for its invasion of Ukraine, which include bans on investments in the Russian energy sector, luxury goods exports and imports of steel products from Russia.The sanctions, which come into effect after publication in the EU official journal later on Tuesday, also freeze the assets of more business leaders who support the Russian state, including Chelsea football club owner Roman Abramovich. The European Commission said in a statement on Tuesday that the sanctions included "a far-reaching ban on new investment across the Russian energy sector with limited exceptions for civil nuclear energy and the transport of certain energy products back to the EU."The measure will hit Russia\'s oil majors Rosneft, Transneft and Gazprom Neft (GZPFY), but EU members will be still able to buy oil and gas from them, an EU source told Reuters.There will also be a total ban on transactions with some Russian state-owned enterprises linked to the Kremlin\'s military-industrial complex, the EU executive said.Read MoreThe bloc reached a preliminary agreement on the new sanctions on Monday, and no objections were raised before an agreed deadline.The ban on Russian steel imports is estimated to affect 3.3 billion euros ($3.6 billion) worth of products, the Commission said.EU companies will also be no longer allowed to export any luxury goods worth more than 300 euros, including jewelry. Exports of cars costing more than 50,000 euros will also be banned, EU sources said.The package also prohibits EU credit rating agencies from issuing ratings for Russia and Russian companies, which the Commission says will further restrict their access to European financial markets.The latest sanctions follow three rounds of punitive measures which included freezing of assets of the Russian central bank and the exclusion from the SWIFT banking system of some Russian and Belarusian banks.The EU also agreed on Tuesday to strip Russia of its "most-favored nation" trade status, opening the door to punitive tariffs on Russian goods or outright import bans.UK slaps 35% tariff on Russian vodkaThe UK government also announced a fresh round of sanctions on Russia on Tuesday. It will ban the export of luxury goods to Russia and will introduce tariffs on Russian goods worth more than $1 billion. The additional 35% tariff will be applied to imports including vodka, steel, works of art and fur. The United Kingdom will also deny Russia and Belarus access to its most favoured nation trading tariff for hundreds of their exports, effectively depriving both countries of key benefits of their WTO membership."Our new tariffs will further isolate the Russian economy from global trade, ensuring it does not benefit from the rules-based international system it does not respect. These tariffs build on the UK\'s existing work to starve Russia\'s access to international finance, sanction Putin\'s cronies and exert maximum economic pressure on his regime," finance minister Rishi Sunak said in a statement.— CNN\'s Rob North contributed to this article. \n- New York (CNN Business)Citigroup, the major US bank with the biggest footprint in Russia, said Monday it will expand its exit from Russia to go beyond the long-planned sale of its consumer bank there.Citi said in a statement it has decided to "expand the scope" of its exit process to include "other lines of business and continue to reduce our remaining operations and exposure" in Russia."Due to the nature of banking and financial services operations, this decision will take time to execute," Citi said, adding that it is "moving with urgency" to complete the assessment of its Russia operations.Here are the companies pulling back from RussiaCiti did not detail specifically which operations will be unwound, but the bank provides investment and corporate banking services in Russia to institutions and high net-worth individuals. In April 2021, Citi disclosed its intent to exit its consumer business in Russia.Citi\'s latest announcement comes after Goldman Sachs and JPMorgan announced last week they will leave Russia. Deutsche Bank on Friday said it will get out of Russia, reversing its previous position.Read MoreCiti has nearly $10 billion worth of exposure to Russia as of the end of last year, according to regulatory filings. "We will continue to manage our existing regulatory commitments and our obligations to depositors, as well as support all of our employees during this very difficult time," Citi said Monday.\n- Japanese authorities ordered crypto exchanges on Monday not to process transactions involving crypto assets subject to asset-freeze sanctions against Russia and Belarus over the war in Ukraine.The step was taken after a Group of Seven (G7) statement on Friday that said Western nations "will impose costs on illicit Russian actors using digital assets to enhance and transfer their wealth."There are growing concerns among G7 advanced economies that cryptocurrencies are being used by Russian entities as a loophole for financial sanctions imposed upon the country for invading Ukraine.The US Treasury Department issued new guidance on Friday that required US-based cryptocurrency firms not to engage in transactions with sanction targets."We decided to make an announcement to keep the G7 momentum alive," said a senior official at Japan\'s Financial Services Agency. "The sooner the better."Read MoreCoinbase, Binance resist calls to kick Russians off crypto platformsThe Japanese government will strengthen measures against the transfer of funds using crypto assets that would violate the sanctions, the FSA and the Ministry of Finance said in a joint statement.Japan has lagged a global shift among financial regulators in setting stricter rules on private digital currencies, while the G7 rich powers and the Group of 20 powerhouses have all called for greater regulation of "stablecoins."Unauthorized payments to targets under sanctions, including through crypto assets, are subject to punishment of up to three years in prison or a 1 million yen ($8,487.52) fine, the FSA said on Monday.There were 31 crypto exchanges in Japan as of March 4, according to an industry association.Global regulators remain concerned about the safety of the new market for investors, given its surge in popularity. The U.S. Securities and Exchange Commission has cited the potential for market manipulation as one of the primary reasons for rejecting several applications for spot bitcoin exchange-traded funds.'
I have tweaked the method nahidOrg_answer_by_gpt35_turbo from Tutorial-4 and Tutorial-5 slightly for Tutorial-6 so that I can reuse the same method for all three GPT models. I kept the method the same except for the model parameter, where I will pass the model name that I want to use to generate the answer. Let's see the method first.
client = OpenAI(api_key='sk-proj-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
def nahidOrg_answer_by_gpt_models(model, question, contexts, max_tokens):
prompt = f"""Answer the question based on the provided context. If the context doesn't contain the answer, say "Sorry Nahid! I don't have enough information to answer that."
**Question:** {question}
**Context:** {contexts}
"""
messages = [
{"role": "system", "content": "You are an intelligent assistant that answers questions based on the provided context."},
{"role": "user", "content": prompt}
]
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=max_tokens,
temperature=0.7
)
return response.choices[0].message.content
Now, let's generate an answer using the GPT-3.5-Turbo
model and see how it performs.
answer_gpt35 = nahidOrg_answer_by_gpt_models("gpt-3.5-turbo", question, final_retrieved_contexts ,250)
answer_gpt35
'The estimated financial impact on products due to the ban on Russian steel imports is 3.3 billion euros ($3.6 billion), according to the Commission.'
Not bad, highly accurate and satisfactory answer generated by the GPT-3.5 model. Now, let's generate an answer using the GPT-4o
model and see how it performs.
answer_gpt4o = nahidOrg_answer_by_gpt_models("gpt-4o", question, final_retrieved_contexts ,250)
answer_gpt4o
'The ban on Russian steel imports is estimated to affect 3.3 billion euros ($3.6 billion) worth of products, according to the Commission.'
Semantically the same answer. Now, let's see how the most expensive model, GPT-4-turbo
, generates the answer.
answer_gpt4 = nahidOrg_answer_by_gpt_models("gpt-4-turbo", question, final_retrieved_contexts ,250)
answer_gpt4
'The estimated financial impact on products due to the ban on Russian steel imports is 3.3 billion euros ($3.6 billion), according to the Commission.'
Answers from all GPT models are almost the same (syntactically and semantically) for the same temperature (0.7). However, I personally did not like the answer generated by the gpt-4-turbo
model. Since this model is several times more expensive than gpt-3.5-turbo
, I was expecting a more robust answer. Now, let's see how the Stable LM 2 Chat 1.6B (stablelm-2-1_6b-chat
) model, a 1.6 billion parameter open-access model by Stability AI, performs. For further information, read the model documentation here: https://huggingface.co/stabilityai/stablelm-2-1_6b-chat. There is a 12 billion parameter variant available of the same model, but on my computer, it takes many seconds to generate an answer. In comparison to the generation time of both models, I think the 1.6 billion version is much better and sufficient for this QA task. Now, let's load the model and tokenizer of the stablelm-2-1_6b-chat model.
tokenizer_stablelm = AutoTokenizer.from_pretrained('stabilityai/stablelm-2-1_6b-chat')
model_stablelm = AutoModelForCausalLM.from_pretrained('stabilityai/stablelm-2-1_6b-chat',device_map="auto")
model_stablelm.generation_config.pad_token_id = tokenizer_stablelm.pad_token_id
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
The method nahidOrg_answer_by_stablelm
below takes four parameters: temperature, question, contexts, and max_tokens. I added temperature because I have seen that the Stable LM 2 model generates highly diverse answers for different temperatures. In general, in deep learning models, a lower temperature indicates the model will generate more conservative and high-probability tokens for the answer. On the other hand, models with a higher temperature generate tokens more independently, sometimes far from the dataset. To balance this, I have tried different temperatures and was impressed by the model's performance. However, for this Tutorial-6, I will demonstrate temperatures of 0.5 and 0.7. Let's see.
def nahidOrg_answer_by_stablelm(temperature, question, contexts, max_tokens):
prompt_content = f"""Answer the directly question based on the provided context. If the context doesn't contain the answer, say "Sorry Nahid! I don't have enough information to answer that."
**Question:** {question}
**Context:** {contexts}
"""
prompt = [{'role': 'user', 'content': prompt_content}]
inputs = tokenizer_stablelm.apply_chat_template(
prompt,
add_generation_prompt=True,
return_tensors='pt'
)
tokens = model_stablelm.generate(
inputs.to(model_stablelm.device),
max_new_tokens=100,
temperature=temperature,
do_sample=True
)
output = tokenizer_stablelm.decode(tokens[:, inputs.shape[-1]:][0], skip_special_tokens=False)
output = output.replace("<|im_end|>", "").strip()
return output
Now, let's see how Stable LM 2 Chat performs. First, test it with a temperature of 0.5.
answer_stablelm_0_5 = nahidOrg_answer_by_stablelm(0.5, question, final_retrieved_contexts ,250)
answer_stablelm_0_5
'According to the European Commission, the ban on Russian steel imports, which is estimated to affect 3.3 billion euros ($3.6 billion) worth of products, will have an impact on products. However, EU companies will still be able to buy oil and gas from the affected oil majors.'
I would say this is a slightly better answer than any GPT model. Let's see the answer with a temperature of 0.7.
answer_stablelm_0_7 = nahidOrg_answer_by_stablelm(0.7, question, final_retrieved_contexts ,250)
answer_stablelm_0_7
'According to the European Commission, the ban on Russian steel imports, which is estimated to affect 3.3 billion euros ($3.6 billion) worth of products, is one of the measures included in the new sanctions against Russia for its invasion of Ukraine.'
I know my final verdict is controversial and actually depends on many other factors. However, I would say Stable LM 2 Chat's answers are a bit more comprehensive compared to all three GPT models. This is the end of this tutorial. I encourage you to explore other PLLMs to achieve even better answers and let me know if you find more robust open-access models for same task through the message field below. Thanks for visiting nahid.org.