A Comprehensive Tutorial on the Five Levels of Agentic AI Architectures: From Basic Prompt Responses to Fully Autonomous Code Generation and Execution

In this tutorial, we explore five levels of Agentic Architectures, from the simplest language model calls to a fully autonomous code-generating system. This tutorial is designed to run seamlessly on Google Colab. Starting with a basic “simple processor” that simply echoes the model’s output, you will progressively build routing logic, integrate external tools, orchestrate multi-step […] The post A Comprehensive Tutorial on the Five Levels of Agentic AI Architectures: From Basic Prompt Responses to Fully Autonomous Code Generation and Execution appeared first on MarkTechPost.

Apr 25, 2025 - 20:41
 0
A Comprehensive Tutorial on the Five Levels of Agentic AI Architectures: From Basic Prompt Responses to Fully Autonomous Code Generation and Execution

In this tutorial, we explore five levels of Agentic Architectures, from the simplest language model calls to a fully autonomous code-generating system. This tutorial is designed to run seamlessly on Google Colab. Starting with a basic “simple processor” that simply echoes the model’s output, you will progressively build routing logic, integrate external tools, orchestrate multi-step workflows, and ultimately empower the model to plan, validate, refine, and execute its own Python code. Throughout each section, you’ll find detailed explanations, self-contained demo functions, and clear prompts that illustrate how to balance human control and machine autonomy in real-world AI applications.

import os
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
import re
import json
import time
import random
from IPython.display import clear_output

We import core Python and third-party libraries, including os and time for environment and execution control, torch, along with Hugging Face’s transformers (pipeline, AutoTokenizer, AutoModelForCausalLM) for model loading and inference. Also, we utilize re and json for parsing LLM outputs, random seeds, and mock data, while clear_output maintains a tidy Colab interface.

MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
def get_model_and_tokenizer():
    if not hasattr(get_model_and_tokenizer, "model"):
        print(f"Loading model {MODEL_NAME}...")
        tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
        model = AutoModelForCausalLM.from_pretrained(
            MODEL_NAME,
            torch_dtype=torch.float16,
            device_map="auto",
            low_cpu_mem_usage=True
        )
        get_model_and_tokenizer.model = model
        get_model_and_tokenizer.tokenizer = tokenizer
        print("Model loaded successfully!")
   
    return get_model_and_tokenizer.model, get_model_and_tokenizer.tokenizer

Here, we define MODEL_NAME to point at the TinyLlama 1.1B chat model and implement a lazy‐loading helper get_model_and_tokenizer() that downloads and initializes the tokenizer and model only once, caching them on first call to minimize overhead, and then returns the cached instances for all subsequent inference calls.

def get_model_and_tokenizer():
    if not hasattr(get_model_and_tokenizer, "model"):
        print(f"Loading model {MODEL_NAME}...")
        tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
        model = AutoModelForCausalLM.from_pretrained(
            MODEL_NAME,
            torch_dtype=torch.float16,
            device_map="auto",
            low_cpu_mem_usage=True
        )
        get_model_and_tokenizer.model = model
        get_model_and_tokenizer.tokenizer = tokenizer
        print("Model loaded successfully!")
   
    return get_model_and_tokenizer.model, get_model_and_tokenizer.tokenizer

This helper function implements a lazy-loading pattern for the TinyLlama model and its tokenizer. On the first call, it downloads and initializes both with half-precision and automatic device placement, caches them as attributes on the function object, and on subsequent calls, simply returns the already-loaded instances to avoid redundant overhead.

def generate_text(prompt, max_length=512):
    model, tokenizer = get_model_and_tokenizer()
   
    messages = [{"role": "user", "content": prompt}]
    formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)
   
    inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
   
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=max_length,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
        )
   
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
   
    response = generated_text.split("ASSISTANT: ")[-1].strip()
    return response

The generate_text function wraps the TinyLlama inference workflow: it retrieves the cached model and tokenizer, formats the user prompt into the chat template, tokenizes and moves inputs to the model’s device, then samples a response with temperature and top-p settings. After generation, it decodes the output and extracts just the assistant’s reply by splitting on the “ASSISTANT: ” marker.

Level 1: Simple Processor

At the simplest level, the code defines a straightforward text‐generation pipeline that treats the model purely as a language processor. When the user provides a prompt, the `simple_processor` function invokes the `generate_text` helper, which is built on the TinyLlama 1.1B chat model, to produce a free-form response. It then displays that response directly. Under the hood, `generate_text` ensures the model and tokenizer are loaded just once by caching them inside the `get_model_and_tokenizer` function, formats the prompt for the chat model, runs generation with sampling parameters for diversity, and extracts the assistant’s reply by splitting on the “ASSISTANT:” marker. This level demonstrates the most basic interaction pattern: input is received, output is generated, and program flow remains entirely under human control.

def simple_processor(prompt):
    """Level 1: Simple Processor - Model has no impact on program flow"""
    response = generate_text(prompt)
    return response


def demo_level1():
    print("\n" + "="*50)
    print("LEVEL 1: SIMPLE PROCESSOR DEMO")
    print("="*50)
    print("At this level, the AI has no control over program flow.")
    print("It simply takes input and produces output.\n")
   
    user_input = input("Enter your question or prompt: ") or "Write a short poem about artificial intelligence."
    print("\nProcessing your request...\n")
   
    output = simple_processor(user_input)
    print("OUTPUT:")
    print("-"*50)
    print(output)
    print("-"*50)

The simple_processor function embodies the Simple Processor of our agent hierarchy by treating the model purely as a text generator; it accepts a user-provided prompt and delegates to generate_text. It returns whatever the model produces without any branching or decision logic. The accompanying demo_level1 routine provides a minimal interactive loop, printing a clear header, soliciting user input (with a sensible default), invoking simple_processor, and then displaying the raw output, showcasing the most basic prompt-to-response workflow in which the AI exerts no influence over the program’s flow.

Level 2: Router

The second level introduces conditional routing based on the model’s classification of the user’s query. The `router_agent` function first asks the model to classify a query into “technical,” “creative,” or “factual,” then normalizes the model’s response into one of those categories. Depending on which category is detected, the query is dispatched to a specialized handler, either `handle_technical_query`, `handle_creative_query`, or `handle_factual_query`, each of which wraps the user’s query in a system-style prompt tailored to the chosen tone and purpose. This routing mechanism provides the model with partial control over program flow, enabling it to guide the subsequent interaction path while still relying on human-defined handlers to generate the final output.

def router_agent(user_query):
    """Level 2: Router - Model determines basic program flow"""
   
    category_prompt = f"""Classify the following query into one of these categories:
    'technical', 'creative', or 'factual'.
   
    Query: {user_query}
   
    Return ONLY the category name and nothing else."""
   
    category_response = generate_text(category_prompt)
   
    category = category_response.lower()
    if "technical" in category:
        category = "technical"
    elif "creative" in category:
        category = "creative"
    else:
        category = "factual"
   
    print(f"Query classified as: {category}")
   
    if category == "technical":
        return handle_technical_query(user_query)
    elif category == "creative":
        return handle_creative_query(user_query)
    else:  
        return handle_factual_query(user_query)


def handle_technical_query(query):
    system_prompt = f"""You are a technical assistant. Provide detailed technical explanations.
   
    User query: {query}"""
   
    response = generate_text(system_prompt)
    return f"[Technical Response]\n{response}"


def handle_creative_query(query):
    system_prompt = f"""You are a creative assistant. Be imaginative and inspiring.
   
    User query: {query}"""
   
    response = generate_text(system_prompt)
    return f"[Creative Response]\n{response}"


def handle_factual_query(query):
    system_prompt = f"""You are a factual assistant. Provide accurate information concisely.
   
    User query: {query}"""
   
    response = generate_text(system_prompt)
    return f"[Factual Response]\n{response}"


def demo_level2():
    print("\n" + "="*50)
    print("LEVEL 2: ROUTER DEMO")
    print("="*50)
    print("At this level, the AI determines basic program flow.")
    print("It decides which processing path to take.\n")
   
    user_query = input("Enter your question or prompt: ") or "How do neural networks work?"
    print("\nProcessing your request...\n")
   
    result = router_agent(user_query)
    print("OUTPUT:")
    print("-"*50)
    print(result)
    print("-"*50)

The router_agent function implements Router behavior by first asking the model to classify the user’s query as “technical,” “creative,” or “factual,” then normalizing that classification and dispatching the query to the corresponding handler (handle_technical_query, handle_creative_query, or handle_factual_query), each of which wraps the original query in an appropriate system‐style prompt before calling generate_text. The demo_level2 routine provides a clear CLI-style interface, printing headers, accepting input (with a default), invoking router_agent, and displaying the categorized response, showcasing how the model can take basic control over program flow by choosing which processing path to follow.

Level 3: Tool Calling

At the third level, the code empowers the model to decide which of several external tools to invoke by embedding a JSON-based function selection protocol into the prompt. The `tool_calling_agent` presents the user’s question alongside a menu of potential tools, including weather lookup, web search simulation, current date and time retrieval, or direct response, and instructs the model to respond with a valid JSON message specifying the chosen tool and its parameters. A regex then extracts the first JSON object from the model’s output, and the code safely falls back to a direct response if parsing fails. Once the tool and arguments are identified, the corresponding Python function is executed, its result is captured, and a final model call integrates that result into a coherent answer. This pattern bridges LLM reasoning with concrete code execution by letting the model orchestrate which APIs or utilities to call.

def tool_calling_agent(user_query):
    """Level 3: Tool Calling - Model determines how functions are executed"""
   
    tool_selection_prompt = f"""Based on the user query, select the most appropriate tool from the following list:
    1. get_weather: Get the current weather for a location
    2. search_information: Search for specific information on a topic
    3. get_date_time: Get current date and time
    4. direct_response: Provide a direct response without using tools
   
    USER QUERY: {user_query}
   
    INSTRUCTIONS:
    - Return your response in valid JSON format
    - Include the tool name and any required parameters
    - For get_weather, include location parameter
    - For search_information, include query and depth parameter (basic or detailed)
    - For get_date_time, include timezone parameter (optional)
    - For direct_response, no parameters needed
   
    Example output format: {{"tool": "get_weather", "parameters": {{"location": "New York"}}}}"""
   
    tool_selection_response = generate_text(tool_selection_prompt)
   
    try:
        json_match = re.search(r'({.*})', tool_selection_response, re.DOTALL)
        if json_match:
            tool_selection = json.loads(json_match.group(1))
        else:
            print("Could not parse tool selection. Defaulting to direct response.")
            tool_selection = {"tool": "direct_response", "parameters": {}}
    except json.JSONDecodeError:
        print("Invalid JSON in tool selection. Defaulting to direct response.")
        tool_selection = {"tool": "direct_response", "parameters": {}}
   
    tool_name = tool_selection.get("tool", "direct_response")
    parameters = tool_selection.get("parameters", {})
   
    print(f"Selected tool: {tool_name}")
   
    if tool_name == "get_weather":
        location = parameters.get("location", "Unknown")
        tool_result = get_weather(location)
    elif tool_name == "search_information":
        query = parameters.get("query", user_query)
        depth = parameters.get("depth", "basic")
        tool_result = search_information(query, depth)
    elif tool_name == "get_date_time":
        timezone = parameters.get("timezone", "UTC")
        tool_result = get_date_time(timezone)
    else:
        return generate_text(f"Please provide a helpful response to: {user_query}")
   
    final_prompt = f"""User Query: {user_query}
    Tool Used: {tool_name}
    Tool Result: {json.dumps(tool_result)}
   
    Based on the user's query and the tool result above, provide a helpful response."""
   
    final_response = generate_text(final_prompt)
    return final_response


def get_weather(location):
    weather_conditions = ["Sunny", "Partly cloudy", "Overcast", "Light rain", "Heavy rain", "Thunderstorms", "Snowy", "Foggy"]
    temperatures = {
        "cold": list(range(-10, 10)),
        "mild": list(range(10, 25)),
        "hot": list(range(25, 40))
    }
   
    location_hash = sum(ord(c) for c in location)
    condition_index = location_hash % len(weather_conditions)
    season = ["winter", "spring", "summer", "fall"][location_hash % 4]
   
    temp_range = temperatures["cold"] if season in ["winter", "fall"] else temperatures["hot"] if season == "summer" else temperatures["mild"]
    temperature = random.choice(temp_range)
   
    return {
        "location": location,
        "temperature": f"{temperature}°C",
        "conditions": weather_conditions[condition_index],
        "humidity": f"{random.randint(30, 90)}%"
    }


def search_information(query, depth="basic"):
    mock_results = [
        f"First result about {query}",
        f"Second result discussing {query}",
        f"Third result analyzing {query}"
    ]
   
    if depth == "detailed":
        mock_results.extend([
            f"Fourth detailed analysis of {query}",
            f"Fifth comprehensive overview of {query}",
            f"Sixth academic paper on {query}"
        ])
   
    return {
        "query": query,
        "results": mock_results,
        "depth": depth,
        "sources": [f"source{i}.com" for i in range(1, len(mock_results) + 1)]
    }


def get_date_time(timezone="UTC"):
    current_time = time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime())
    return {
        "current_datetime": current_time,
        "timezone": timezone
    }


def demo_level3():
    print("\n" + "="*50)
    print("LEVEL 3: TOOL CALLING DEMO")
    print("="*50)
    print("At this level, the AI selects which tools to use and with what parameters.")
    print("It can process the results from tools to create a final response.\n")
   
    user_query = input("Enter your question or prompt: ") or "What's the weather like in San Francisco?"
    print("\nProcessing your request...\n")
   
    result = tool_calling_agent(user_query)
    print("OUTPUT:")
    print("-"*50)
    print(result)
    print("-"*50)

In the Level 3 implementation, the tool_calling_agent function prompts the model to choose among a predefined set of utilities, such as weather lookup, mock web search, or date/time retrieval, by returning a JSON object with the selected tool name and its parameters. It then safely parses that JSON, invokes the corresponding Python function to obtain structured data, and makes a follow-up model call to integrate the tool’s output into a coherent, user-facing response.

Level 4: Multi-Step Agent

The fourth level extends the tool-calling pattern into a full multi-step agent that manages its workflow and state. The `MultiStepAgent` class maintains an internal memory of user inputs, tool outputs, and agent actions. Each iteration generates a planning prompt that summarizes the entire memory, asking the model to choose one of several tools, such as web search simulation, information extraction, text summarization, or report creation, or to conclude the task with a final output. After executing the selected tool and appending its results back to memory, the process repeats until either the model issues a “complete” action or the maximum number of steps is reached. Finally, the agent collates the memory into a cohesive final response. This structure shows how an LLM can orchestrate complex, multi-stage processes while consulting external functions and refining its plan based on previous results.

class MultiStepAgent:
    """Level 4: Multi-Step Agent - Model controls iteration and program continuation"""
   
    def __init__(self):
        self.tools = {
            "search_web": self.search_web,
            "extract_info": self.extract_info,
            "summarize_text": self.summarize_text,
            "create_report": self.create_report
        }
        self.memory = []
        self.max_steps = 5
   
    def run(self, user_task):
        self.memory.append({"role": "user", "content": user_task})
       
        steps_taken = 0
        while steps_taken < self.max_steps:
            next_action = self.determine_next_action()
           
            if next_action["action"] == "complete":
                return next_action["output"]
           
            tool_name = next_action["tool"]
            tool_args = next_action["args"]
           
            print(f"\n                        </div>
                                            <div class=
                            
                                Read More