Daydreams Logo

Prompting

How Daydreams structures prompts to guide LLM reasoning and actions.

The interaction between the Daydreams framework and the Large Language Model (LLM) is mediated through carefully structured prompts. These prompts provide the LLM with the necessary context, instructions, available tools (actions and outputs), and current state, guiding its reasoning process and constraining its output format.

The Main Prompt Template (mainPrompt)

The core prompt structure is defined in packages/core/src/prompts/main.ts within the mainPrompt configuration object. It uses a main template string (promptTemplate) composed of several sections identified by placeholders:

{{intro}}      # General instructions for the agent's task
{{instructions}} # Step-by-step guide on how to process updates
{{content}}    # Dynamically generated section with available tools and state
{{response}}   # Structure definition for the LLM's expected output
{{footer}}     # Final reminders and important notes

Each placeholder ({{intro}}, {{instructions}}, etc.) corresponds to static text providing overall guidance to the LLM on how it should behave within the framework.

Dynamic Prompt Generation

At each step of the Agent Lifecycle, the framework dynamically generates the content for the {{content}} section of the promptTemplate. This ensures the LLM always has the most up-to-date information.

  1. Gathering Data (formatPromptSections): The formatPromptSections function (in packages/core/src/prompts/main.ts) collects the current state, including:
    • Available actions.
    • Available outputs.
    • Active contexts and their rendered state.
    • Recent WorkingMemory logs (both processed and unprocessed).
  2. Formatting to XML (packages/core/src/formatters.ts): Various helper functions (formatAction, formatContextState, formatOutputInterface, formatContextLog, formatValue, formatXml) convert these JavaScript objects and data structures into standardized XML strings. This XML format is designed to be clearly parsable by both the LLM and the framework's stream parser.
  3. Rendering (render): The render function (from packages/core/src/formatters.ts) injects these dynamically generated XML strings into the main promptTemplate, replacing placeholders like {{actions}}, {{outputs}}, {{contexts}}, {{workingMemory}}, and {{updates}}.

Key XML Sections in the Prompt

The dynamically generated {{content}} section typically includes these crucial XML blocks:

  • <available-actions>: Lists all actions currently enabled for the agent. Each action includes its name, description, instructions, and argument schema (as JSON schema).
    <available-actions>
      <action name="getWeather">
        <description>Fetches the current weather...</description>
        <schema>{ "type": "object", "properties": { ... } }</schema>
      </action>
      ...
    </available-actions>
  • <available-outputs>: Lists all outputs the agent can generate. Each output includes its type, description, instructions, content schema (content_schema), attribute schema (attributes_schema), and examples.
    <available-outputs>
      <output type="discord:message">
        <description>Sends a message...</description>
        <attributes_schema>{ "type": "object", ... }</attributes_schema>
        <content_schema>{ "type": "string", ... }</content_schema>
        <examples>...</examples>
      </output>
      ...
    </available-outputs>
  • <contexts>: Displays the state of currently active contexts, as rendered by their respective render functions.
    <contexts>
      <context type="chat" key="session123">
        user1: Hi there!
        agent: Hello! How can I help?
      </context>
      ...
    </contexts>
  • <working-memory>: Shows processed logs (InputRef, OutputRef, ActionCall, ActionResult, Thought, EventRef) from previous steps within the current run.
    <working-memory>
      <input type="cli:message" timestamp="...">User message</input>
      <action_call name="lookupUser" id="..." timestamp="...">...</action_call>
      <action_result callId="..." name="lookupUser" timestamp="...">...</action_result>
      <output type="cli:message" timestamp="...">Agent response</output>
    </working-memory>
  • <updates>: Shows new, unprocessed logs (typically new InputRefs or ActionResults from the previous step) that the LLM needs to analyze and react to in the current step.
    <updates>
      <input type="discord:message" timestamp="...">A new message requiring a response</input>
      <action_result callId="..." name="complexCalculation" timestamp="...">Result is 42</action_result>
    </updates>

Expected LLM Response Structure

The framework instructs the LLM (via the {{response}} section of the template) to structure its output using specific XML tags:

<response>
  <reasoning>
    [LLM's thought process explaining its analysis and plan]
  </reasoning>
 
  [Optional: Action calls]
  <action_call name="actionName">[Arguments based on action schema and format (JSON/XML)]</action_call>
 
  [Optional: Outputs]
  <output type="outputType" attribute1="value1">[Content matching the output's schema]</output>
</response>
  • <response>: The root tag for the entire response.
  • <reasoning>: Contains the LLM's step-by-step thinking process. This is logged as a Thought.
  • <action_call>: Used to invoke an action. The name must match an available action. The content (arguments) depends on the action's defined schema and callFormat (defaulting to JSON if schema is complex, but can be XML). The framework parses this content accordingly.
  • <output>: Used to generate an output. The type must match an available output. Any required attributes must be included, and the content must match the output's content schema.

The framework parses this XML structure from the LLM's response stream to trigger the corresponding handlers for actions and outputs.

Template Engine ({{...}})

The prompt template includes a mention of a simple template engine using {{...}} syntax (e.g., {{calls[0].someValue}}, {{shortTermMemory.key}}). As noted in the prompt comments, its primary intended use is for intra-turn data referencing. This means allowing an action call within the same LLM response to reference the anticipated result of a previous action call in that same response.

Example:

<response>
  <reasoning>First, I need to create a file, then write to it.</reasoning>
  <action_call name="createFile">{ "directory": "/tmp" }</action_call>
  <action_call name="writeFile">{ "fileId": "{{calls[0].fileId}}", "content": "Hello!" }</action_call>
</response>

Here, the writeFile call references the fileId expected to be returned by the createFile action called just before it within the same LLM response turn. The framework resolves these templates before executing the action handlers (using resolveTemplates in packages/core/src/handlers.ts).

This dynamic and structured prompting approach allows Daydreams to effectively leverage LLMs for complex orchestration tasks, providing them with the necessary information and tools while ensuring their output can be reliably processed.

On this page