Cricket

I need to write a LangChain agent to collect cricket team players from ESPN website.

Here’s how you can create a second LangChain agent to fetch player statistics from your private SQLite database and save the results to a CSV file:

Steps to Implement the Second Agent

1. Set Up the Environment

Ensure you have the necessary Python packages installed:

pip install langchain sqlite3 pandas

2. Define the SQLite DB Query Function

Write a function to query the SQLite database for player stats.

import sqlite3
import pandas as pd

def fetch_player_stats(player_name, db_path):
    """
    Fetch stats for a given player from the SQLite database.
    """
    conn = sqlite3.connect(db_path)
    query = """
        SELECT * FROM player_stats WHERE name = ?
    """
    stats = pd.read_sql_query(query, conn, params=(player_name,))
    conn.close()
    return stats

3. Create a LangChain Tool for Fetching Stats

Define a tool that integrates the above function.

from langchain.tools import Tool

db_path = "path/to/your/database.sqlite"  # Replace with the actual path to your SQLite DB

fetch_stats_tool = Tool(
    name="fetch_player_stats",
    func=lambda player_name: fetch_player_stats(player_name, db_path).to_dict(orient='records'),
    description="Fetches player stats from the SQLite DB for a given player name."
)

4. Initialize the Second Agent

Set up another agent to handle database queries.

from langchain.agents import initialize_agent
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0)
tools = [fetch_stats_tool]
db_agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

5. Fetch Stats for Each Player and Save to CSV

Use the agent to fetch stats for all players and save them to a CSV file.

def dump_stats_to_csv(players, output_csv, db_agent):
    all_stats = []
    for player in players:
        try:
            stats = db_agent.run(f"Fetch stats for player {player}")
            all_stats.extend(stats)  # Assuming stats is a list of records
        except Exception as e:
            print(f"Error fetching stats for {player}: {e}")

    # Convert the collected stats into a DataFrame and save as CSV
    df = pd.DataFrame(all_stats)
    df.to_csv(output_csv, index=False)
    print(f"Player stats saved to {output_csv}")

6. Combine Both Agents

First, use the first agent to collect player names.
Then, pass these names to the second agent to fetch stats and save to a CSV.

# First agent fetches player names from ESPN
player_names = ["Player1", "Player2", "Player3"]  # Replace with actual list from Agent 1

# Second agent fetches stats and dumps into CSV
output_csv = "player_stats.csv"
dump_stats_to_csv(player_names, output_csv, db_agent)

Key Considerations

Database Schema: Ensure the database schema matches the query structure in the fetch_player_stats function.
Error Handling: Handle cases where stats for a player might not exist in the database.
Optimization: If fetching stats for multiple players, consider batching queries to improve performance.

This setup integrates LangChain agents with your SQLite database and automates the process of retrieving and exporting player statistics.

Run these agents together

To run these agents together, you can design a pipeline where:

The First Agent scrapes player names from ESPN.
The Second Agent fetches statistics for each player from your SQLite database.
The results are saved into a CSV file.

Below is the combined implementation:

Complete Implementation

1. Import Required Libraries

from langchain.agents import initialize_agent, Tool
from langchain.chat_models import ChatOpenAI
from bs4 import BeautifulSoup
import requests
import sqlite3
import pandas as pd

2. First Agent: Scrape Player Names

Define a function and tool to scrape player names from ESPN.

def fetch_team_players(team_url):
    """
    Scrape player names and roles from the ESPN team page.
    """
    response = requests.get(team_url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')
        players = []
        # Adjust the selectors based on ESPN's structure
        for player in soup.select('div.player-card'):
            name = player.select_one('h3.player-name').text.strip()
            role = player.select_one('span.player-role').text.strip()
            players.append({'name': name, 'role': role})
        return players
    else:
        return f"Failed to retrieve data. Status code: {response.status_code}"

fetch_players_tool = Tool(
    name="fetch_team_players",
    func=lambda url: fetch_team_players(url),
    description="Fetches player names and roles from a given ESPN team URL."
)

3. Second Agent: Fetch Player Stats from SQLite DB

Define the function and tool to fetch player statistics.

def fetch_player_stats(player_name, db_path):
    """
    Fetch stats for a given player from the SQLite database.
    """
    conn = sqlite3.connect(db_path)
    query = """
        SELECT * FROM player_stats WHERE name = ?
    """
    stats = pd.read_sql_query(query, conn, params=(player_name,))
    conn.close()
    return stats

db_path = "path/to/your/database.sqlite"  # Replace with your SQLite DB path

fetch_stats_tool = Tool(
    name="fetch_player_stats",
    func=lambda player_name: fetch_player_stats(player_name, db_path).to_dict(orient='records'),
    description="Fetches player stats from the SQLite DB for a given player name."
)

4. Initialize Agents

Set up both agents with their respective tools.

# Initialize the LLM
llm = ChatOpenAI(temperature=0)

# First Agent for fetching player names
player_agent = initialize_agent([fetch_players_tool], llm, agent="zero-shot-react-description", verbose=True)

# Second Agent for fetching player stats
stats_agent = initialize_agent([fetch_stats_tool], llm, agent="zero-shot-react-description", verbose=True)

5. Pipeline Execution

Combine both agents in a single pipeline.

def pipeline(team_url, output_csv):
    # Step 1: Fetch player names from ESPN
    print("Fetching player names...")
    players = player_agent.run(f"Fetch the list of players from {team_url}")
    player_names = [player['name'] for player in players]  # Extract player names

    # Step 2: Fetch stats for each player from the database
    print("Fetching player stats...")
    all_stats = []
    for player_name in player_names:
        try:
            stats = stats_agent.run(f"Fetch stats for player {player_name}")
            all_stats.extend(stats)
        except Exception as e:
            print(f"Error fetching stats for {player_name}: {e}")

    # Step 3: Save stats to CSV
    print(f"Saving stats to {output_csv}...")
    df = pd.DataFrame(all_stats)
    df.to_csv(output_csv, index=False)
    print("Pipeline execution complete!")

6. Run the Pipeline

Provide the ESPN team URL and the desired CSV file name.

team_url = "https://www.espncricinfo.com/team/india-6"  # Replace with the actual team URL
output_csv = "player_stats.csv"

pipeline(team_url, output_csv)

Explanation

Agent 1 (Player Names):
- Scrapes player names and roles from ESPN’s website using BeautifulSoup.
- Provides the list of player names to the pipeline.
Agent 2 (Player Stats):
- Fetches statistics for each player from the SQLite database using Pandas.
- Collects all player stats into a DataFrame.
Pipeline:
- Combines the output of both agents.
- Saves the final player statistics into a CSV file.

Key Notes

Replace team_url and db_path with actual URLs and file paths.
Adjust the HTML selectors in fetch_team_players based on ESPN’s website structure.
Make sure your SQLite database has the required schema (player_stats table with relevant columns).
Handle large datasets efficiently by batching queries or optimizing database access.

Create a third agent to get image url for an each player.

The API will give image url for player name.

While dumping into CSV, add an extra colum to add image url.

Combine 3 in the pipeline again

To incorporate a third agent for fetching player image URLs into the pipeline, you can design it as follows:

Steps to Add the Third Agent

1. Define the Image URL Fetching Function

Assume there's an API endpoint that provides player image URLs based on player names. Here's a sample implementation:

import requests

def fetch_player_image_url(player_name, api_url, api_key):
    """
    Fetch the image URL for a given player using an external API.
    """
    try:
        response = requests.get(api_url, params={"name": player_name, "api_key": api_key})
        response.raise_for_status()  # Raise HTTPError for bad responses
        data = response.json()
        return data.get("image_url", "Image not available")  # Default if URL is not found
    except Exception as e:
        return f"Error fetching image: {str(e)}"

2. Create a LangChain Tool for Image URL Fetching

Integrate the above function into a LangChain tool.

api_url = "https://api.example.com/get-player-image"  # Replace with the actual API endpoint
api_key = "your_api_key"  # Replace with your API key

fetch_image_url_tool = Tool(
    name="fetch_player_image_url",
    func=lambda player_name: fetch_player_image_url(player_name, api_url, api_key),
    description="Fetches the image URL for a player using an external API."
)

3. Initialize the Third Agent

Create the agent for fetching player image URLs.

image_agent = initialize_agent([fetch_image_url_tool], llm, agent="zero-shot-react-description", verbose=True)

4. Update the Pipeline

Modify the pipeline to integrate all three agents.

def pipeline_with_images(team_url, output_csv):
    # Step 1: Fetch player names from ESPN
    print("Fetching player names...")
    players = player_agent.run(f"Fetch the list of players from {team_url}")
    player_names = [player['name'] for player in players]  # Extract player names

    # Step 2: Fetch stats and image URLs for each player
    print("Fetching player stats and image URLs...")
    all_stats = []
    for player_name in player_names:
        try:
            # Fetch stats using the second agent
            stats = stats_agent.run(f"Fetch stats for player {player_name}")
            
            # Fetch image URL using the third agent
            image_url = image_agent.run(f"Fetch the image URL for player {player_name}")
            
            # Add the image URL to the stats
            if stats:
                for record in stats:
                    record["image_url"] = image_url
                all_stats.extend(stats)
        except Exception as e:
            print(f"Error processing player {player_name}: {e}")

    # Step 3: Save stats and image URLs to CSV
    print(f"Saving stats and image URLs to {output_csv}...")
    df = pd.DataFrame(all_stats)
    df.to_csv(output_csv, index=False)
    print("Pipeline execution complete!")

5. Run the Combined Pipeline

Provide the ESPN team URL and the desired CSV file name.

team_url = "https://www.espncricinfo.com/team/india-6"  # Replace with the actual team URL
output_csv = "player_stats_with_images.csv"

pipeline_with_images(team_url, output_csv)

Final CSV Output

The CSV file will contain columns like:

name: Player's name.
role: Player's role (e.g., batsman, bowler).
Additional columns from the database (e.g., matches, runs).
image_url: URL to the player's image.

Full Flow

Agent 1: Scrapes player names and roles from ESPN.
Agent 2: Fetches player stats from the SQLite database.
Agent 3: Fetches player image URLs from the external API.
Pipeline: Combines the outputs of all three agents and dumps the data into a CSV file.

Key Considerations

Ensure the API for fetching image URLs is accessible and provides consistent data.
Handle missing data gracefully by adding fallback values (e.g., "N/A").
Optimize the pipeline for large datasets by batching or parallel processing if necessary.

PreviousMCQ Generator NextScrum Summarizer

Last updated 1 year ago

hashtagSteps to Implement the Second Agent

hashtagKey Considerations

hashtagComplete Implementation

hashtagExplanation

hashtagKey Notes

hashtagSteps to Add the Third Agent

hashtagFinal CSV Output

hashtagFull Flow

hashtagKey Considerations

Steps to Implement the Second Agent

Key Considerations

Complete Implementation

Explanation

Key Notes

Steps to Add the Third Agent

Final CSV Output

Full Flow

Key Considerations