Langchain-Evaluation Framework

https://medium.com/@rahulpant.me/langchain-evaluation-framework-8ac1a95c9050

In this blog, we will look at how Langchain can be used for evaluating the LLM generated responses. There is a platform released by Langchain few months back LangSmith which makes life much more simpler by providing rich features for debussing and evaluation (for organizations)

We will focus on using the different Langchain opensource function in our notebook(not LangSmith). The overall workflow will be as follows:

Load data using CSVLoader() function
Use gpt-3.5-turbo model as the large language model for all purposes from OpenAI
Vectorize and index our data as DocArray in-memory
Answer questions using Langchain RetrievalQA chain
For evaluation generate pairs of Q&A using Langchain QAGenerateChain
Finally evaluate the model performance using Langchain QAEvalChain

The environment variable as OpenAI key is already established before creating this notebook

1. Load the required Libraries

# For creating vector store
from langchain.indexes import VectorstoreIndexCreator
#from langchain.vectorstores import DocArrayInMemorySearch
from langchain.document_loaders import CSVLoader
from langchain_community.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.evaluation.qa import QAGenerateChain

2. Load the required dataset

file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)
data = loader.load()

3. Create the Index(Vector Store): We create an in-memory vector store (index) that can search documents loaded from a source (loader variable) using their vector representations. This allows for efficient retrieval of relevant documents based on a search query.

index=VectorstoreIndexCreator(vectorstore_cls=DocArrayInMemorySearch).from_loaders([loader])

4. Retrieval Chain

Create a RetrievalQA object that leverages the above vector store index for retrieval and the LLM for answering questions.

llm=ChatOpenAI(temperature=0,model=llm_model)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

We use the “stuff” method above, which stuffs all the documents into the a prompt and hence only 1 call to LLM. Incase of huge size of docs this approach may not be useful and we should explore other techniques like map-reduce, refine and map_rerank.

5. Create Some example data for Q & A and also generate examples in similar format using QAGenerateChain

### Hard Coded Examples
examples = [
    {'qa_pairs':
        {"query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
        }
    },
    {'qa_pairs':
         {"query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
         }
    }
]


example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI())

# auto generated Q/A
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:3]]
)

6. Manual Evaluation of LLM

import langchain
langchain.debug = True
qa.run(examples[2]['qa_pairs']["query"])

[chain/start] [1:chain:RetrievalQA] Entering Chain run with input:
{
  "query": "According to the document, what is the approximate weight of the Women's Campside Oxfords per pair?"
}
[chain/start] [1:chain:RetrievalQA > 3:chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain] Entering Chain run with input:
{
  "question": "According to the document, what is the approximate weight of the Women's Campside Oxfords per pair?",
  "context": ": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.<<<<>>>>>: 299\nname: Women's Trail Model 4 All-Weather Hiking Shoes\ndescription: Supercomfortable lightweight hikers with a built-in waterproof membrane are versatile enough for casual wear and a wide variety of outdoor adventures. These shoes feature 's exclusive VertiGrip outsole which provides excellent traction on a variety of surfaces, as well as a suede-and-fabric upper with a waterproof TEK2.5® barrier to keep feet dry. Cushioned EVA midsole and removable footbed provide noticeable comfort right out of the box, and heel-and-toe bumpers add durability. Approximate weight: 1 lb. 13 oz. Imported.<<<<>>>>>: 846\nname: Women's Katahdin Hiking Sneakers, Nubuck Mesh\ndescription: These lightweight, breathable retro hikers are perfect for day hiking on gentle terrain or city streets. \n\nSize & Fit: Order regular shoe size. Size 10 1/2 wearers, order size 11.\n\nSpecs: Approx. Weight: 2 lb. 4 oz. per pair. Heel Height: 2½\".\n\nConstruction: Upper is full-grain nubuck leather and polyester mesh. Polyester mesh lining with odor and moisture control. Removable, mesh-covered EVA insole. Moderate arch contour provides excellent support. Euro-hiker-inspired stacked EVA midsole. Our exclusive VertaGrip rubber lugged outsole for traction. Padded collar and tongue for comfort. Perforations in tongue and sides add breathability. Nylon webbing pull-on loop at back heel. Thermoplastic toe box and heel counter. Imported.\n\nQuestions? Contact customer service for more information.<<<<>>>>>: 514\nname: Women's Trail Model 4 All-Weather Hiking Shoes\ndescription: Supercomfortable lightweight hikers with a built-in waterproof membrane.\n\nSpecs: Approx. weight: 1 lb. 13 oz.\n\nConstruction: 's exclusive VertiGrip outsole provides excellent traction on a variety of surfaces. Suede-and-fabric upper with a waterproof TEK2.5® barrier keeps feet dry. Cushioned EVA midsole and removable footbed provide noticeable comfort right out of the box. Heel-and-toe bumpers add durability.\n\nAdditional Features: Versatile enough for casual wear and a wide variety of outdoor adventures. Imported.\n\nQuestions? Please contact us for more information."
}
[llm/start] [1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain > 5:llm:ChatOpenAI] Entering LLM run with input:
{
  "prompts": [
    "System: Use the following pieces of context to answer the user's question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n: 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.<<<<>>>>>: 299\nname: Women's Trail Model 4 All-Weather Hiking Shoes\ndescription: Supercomfortable lightweight hikers with a built-in waterproof membrane are versatile enough for casual wear and a wide variety of outdoor adventures. These shoes feature 's exclusive VertiGrip outsole which provides excellent traction on a variety of surfaces, as well as a suede-and-fabric upper with a waterproof TEK2.5® barrier to keep feet dry. Cushioned EVA midsole and removable footbed provide noticeable comfort right out of the box, and heel-and-toe bumpers add durability. Approximate weight: 1 lb. 13 oz. Imported.<<<<>>>>>: 846\nname: Women's Katahdin Hiking Sneakers, Nubuck Mesh\ndescription: These lightweight, breathable retro hikers are perfect for day hiking on gentle terrain or city streets. \n\nSize & Fit: Order regular shoe size. Size 10 1/2 wearers, order size 11.\n\nSpecs: Approx. Weight: 2 lb. 4 oz. per pair. Heel Height: 2½\".\n\nConstruction: Upper is full-grain nubuck leather and polyester mesh. Polyester mesh lining with odor and moisture control. Removable, mesh-covered EVA insole. Moderate arch contour provides excellent support. Euro-hiker-inspired stacked EVA midsole. Our exclusive VertaGrip rubber lugged outsole for traction. Padded collar and tongue for comfort. Perforations in tongue and sides add breathability. Nylon webbing pull-on loop at back heel. Thermoplastic toe box and heel counter. Imported.\n\nQuestions? Contact customer service for more information.<<<<>>>>>: 514\nname: Women's Trail Model 4 All-Weather Hiking Shoes\ndescription: Supercomfortable lightweight hikers with a built-in waterproof membrane.\n\nSpecs: Approx. weight: 1 lb. 13 oz.\n\nConstruction: 's exclusive VertiGrip outsole provides excellent traction on a variety of surfaces. Suede-and-fabric upper with a waterproof TEK2.5® barrier keeps feet dry. Cushioned EVA midsole and removable footbed provide noticeable comfort right out of the box. Heel-and-toe bumpers add durability.\n\nAdditional Features: Versatile enough for casual wear and a wide variety of outdoor adventures. Imported.\n\nQuestions? Please contact us for more information.\nHuman: According to the document, what is the approximate weight of the Women's Campside Oxfords per pair?"
  ]
}
[llm/end] [1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain > 5:llm:ChatOpenAI] [1.06s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.",
            "additional_kwargs": {}
          }
        }
      }
    ]
  ],
  "llm_output": {
    "token_usage": {
      "completion_tokens": 23,
      "prompt_tokens": 755,
      "total_tokens": 778
    },
    "model_name": "gpt-3.5-turbo-0301",
    "system_fingerprint": null
  },
  "run": null
}
[chain/end] [1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain] [1.06s] Exiting Chain run with output:
{
  "text": "The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz."
}
[chain/end] [1:chain:RetrievalQA > 3:chain:StuffDocumentsChain] [1.06s] Exiting Chain run with output:
{
  "output_text": "The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz."
}
[chain/end] [1:chain:RetrievalQA] [1.46s] Exiting Chain run with output:
{
  "result": "The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz."
}
"The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.

7. LLM Assisted Evaluation Step: QAEvalChain

from langchain.evaluation.qa import QAEvalChain

predictions=qa.apply(examples)

llm=ChatOpenAI(temperature=0,model=llm_model)
eval_chain=QAEvalChain.from_llm(llm=llm)

graded_outputs= eval_chain.evaluate(examples,predictions)

graded_outputs

8. Print the result

for i,eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Grade: " + graded_outputs[i]['results'])
    print()

OUTPUT

Example 0:
Question: Do the Cozy Comfort Pullover Set        have side pockets?
Real Answer: Yes
Predicted Answer: The Cozy Comfort Pullover Set, Stripe does have side pockets.
Grade: CORRECT

Example 1:
Question: What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek collection
Predicted Answer: The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.
Grade: CORRECT

Example 2:
Question: According to the document, what is the approximate weight of the Women's Campside Oxfords per pair?
Real Answer: The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.
Predicted Answer: The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.
Grade: CORRECT