Galaxy: Why does the AI Agent on-chain repeatedly encounter difficulties?

Author: Zack Pokorny, Research Analyst at Galaxy Digital; Source: Galaxy Digital; Compiled by: Shaw Golden Finance

Introduction

The application scenarios and capabilities of AI agents have begun to evolve step by step. They are gradually moving toward executing tasks autonomously, and related R&D is also progressing—enabling agents to hold and allocate capital, discover trade and return strategies, and more. Although this experimental shift is still at an extremely early stage, its direction has already diverged sharply from the past—where agents were mostly used only as social and analytics tools.

Blockchain provides a natural testing ground for this evolution. Blockchains have permissionless design, composability (meaning the same execution framework can host various financial infrastructure components), an open-source application ecosystem, data that is equally accessible to all participants, and—by default—programmability for all on-chain assets.

This raises a structural question: if blockchain is programmable and permissionless, why do autonomous agents still face execution obstacles? The answer is not whether execution is feasible, but how much semantic understanding and coordination-scheduling cost must be borne above the execution layer. Blockchains can ensure the correctness of state transitions, yet they typically do not provide abstract capabilities such as protocol-native economic interpretation, standardized identity, or goal-level scheduling.

Some of these frictions arise from the architectural characteristics of permissionless systems, while others stem from the current state of tools, information filtering, and market infrastructure. In real applications, many higher-level functions still rely on software and workflows designed with human operation at the core.

Blockchain architecture and AI agents

The core of blockchain design centers on the consensus mechanism and deterministic execution, not semantic interpretation. What it offers externally are underlying building blocks such as storage slots, event logs, and call tracing, not standardized economic objects. As a result, abstract concepts like positions, returns, health factors, liquidity depth, and so on usually must be reconstructed off-chain—where indexers, analytics layers, front-end interfaces, and application programming interfaces transform each protocol’s proprietary state into more usable forms.

Many mainstream decentralized finance operating workflows—especially those aimed at ordinary users and subjective decision-making—still use a core pattern: the user interacts via a front-end interface, and the user’s signature on each individual transaction confirms intent. This user-interface-centered model scales well as ordinary users become more widespread, even if a substantial portion of on-chain activity is now machine-driven. The dominant logic for ordinary user interactions today is still: intent → user interface → initiate transaction → confirm completion. While programmatic operations follow different paths, they also have their own limitations: developers must select the contract and asset scope during the build phase, and then run algorithms based on that fixed scope. Neither of these two models can adapt to systems that, during runtime, independently discover, evaluate, and combine operational behaviors based on dynamic goals.

When infrastructure optimized specifically for transaction verification is used by systems that also need to interpret economic state, assess credit, and optimize behavior around explicit goals, execution frictions begin to appear. Part of this gap comes from permissionless, heterogeneous blockchain design; the other part is that existing tools still wrap blockchain interaction flows around human review and front-end intermediaries.

Agent operating workflow vs. traditional algorithmic strategies

Before analyzing the gap between blockchain infrastructure and agent systems, it is necessary to clarify a higher-level agent operating workflow and the core differences from traditional on-chain algorithmic systems.

The difference between the two is not automation level, technical complexity, or parameterization, and even not their ability to dynamically adapt. Traditional algorithmic systems can achieve a high degree of parameterization: they can automatically discover new contracts and tokens, allocate capital across different strategy types, and rebalance based on performance. The key difference is whether the system can handle scenarios that were not predefined during the build phase.

No matter how complex traditional algorithmic systems are, they can only execute predetermined logic for patterns that were assumed during development. These systems must configure preset interface parsers for each type of protocol, predefined evaluation logic that converts contract state into economic meaning, explicit rules for credit and standard determinations, and all decision branches must be implemented as hard-coded rules (regardless of how dynamic or flexible the algorithm itself may be). If a situation arises that does not match the predefined pattern, the system either skips the scenario or simply fails to execute correctly. It cannot reason about unfamiliar scenarios; it can only check whether the current scenario matches a known template.

This kind of mechanical automatic device, like a “dining duck,” can mimic lifelike behavior, but every action is pre-set. (Popular Science, January 1899)

When scanning new lending markets, traditional algorithms can recognize familiar events—e.g., events they have seen before—or match known factory deployment patterns. But if an entirely new type of lending infrastructure component appears with an unfamiliar interface, the system cannot evaluate it. Someone must manually inspect the contract, understand how it operates, determine whether it should be included in the opportunity pool, and write the integration logic. Only after these steps can the algorithm interact with it. Humans interpret; the algorithm executes. Agent systems built on foundation models break through this boundary. They can achieve this by leveraging learned reasoning capabilities:

  • Interpreting ambiguous or unspecified goals. Instructions like “maximize returns but avoid excessive risk” require interpretation: to what extent counts as “excessive”? How should returns and risks be balanced? Traditional algorithms need to define these conditions precisely in advance, while foundation models can interpret intent, make judgments, and optimize their own understanding based on feedback.

  • Generalizing adaptation to new interfaces. Agents can read unfamiliar contract code, parse documentation, or inspect ABI interfaces they have never seen before, and infer the economic functions of that system. They do not need to build parsers for each protocol type in advance. This capability is not perfect yet; agents may misjudge, but they can attempt to interact with systems that were not predefined during development.

  • Reasoning about credibility and normative correctness under uncertainty. When trust signals are vague or incomplete, foundation models can weigh judgments probabilistically rather than simply applying binary rules. Is this smart contract an official original? Given the available evidence, is it likely that this token is legitimate? Traditional algorithms only have “rule exists” or “no rule exists,” while agents can reason based on confidence.

  • Interpreting errors and adaptively adjusting. When unexpected situations occur (for example, transaction reverts, outputs differ from expectations, or state changes between simulation and execution), agents can infer the root cause and decide how to respond. By contrast, traditional algorithms only run exception-handling code blocks and route or redirect errors, rather than interpreting the exceptions.

These capabilities are real, but they are still not fully mature. Foundation models can hallucinate, misread information, and make confident-looking but incorrect judgments. In adversarial environments involving funds (i.e., where code can be controlled or assets can be received), “trying to interact with a system not predefined” may mean directly losing money. The core point of this article is not that agents can reliably perform these functions today, but that they can attempt in ways that traditional systems cannot—and future infrastructure may make such attempts safer and more reliable. Viewing the two as a continuum rather than as absolute categories may make this easier to understand: some traditional systems have already integrated learning-based reasoning, and some agent systems may still rely on hard-coded rules on critical paths. The difference between them is directional, not a binary opposition. Agent systems push more interpretation, evaluation, and adaptive work into runtime reasoning rather than predefined assumptions at development time. This is closely related to the earlier argument about frictions, because what agents are trying to do is exactly what traditional algorithms completely avoid. Traditional algorithms avoid discovery costs by having humans filter contract sets during development; avoid control-layer costs via operator-maintained whitelists; avoid data costs by predefining parsers for known protocols; and avoid execution costs by running within a predefined safe scope. Humans do the semantic, trust, and strategy-layer work in advance, while algorithms only execute within those boundaries. Early versions of on-chain agent workflows may continue this pattern, but their core logic lies in transferring discovery, trust, and strategy evaluation into runtime reasoning rather than predefined assumptions during development.

Agents may attempt to discover and assess unfamiliar opportunities, judge contract compliance without hard-coded rules, parse heterogeneous state without predefined parsers, and execute strategies for potentially vague goals. And it is precisely in these steps that infrastructure shortcomings start to become evident. Frictions exist not because agents are doing the same things as algorithms but in a harder way, but because they are attempting entirely different things: operating in an open, dynamically interpretable action space, rather than in a closed, pre-integrated fixed space.

Where the friction is

Structurally, this contradiction does not stem from a flaw in blockchain consensus itself, but from the way the overall interaction stack built around it has evolved.

Blockchain ensures deterministic state transitions, consensus on the final state, and final determinism. But it does not encode economic interpretation, intent validation, or goal tracking at the protocol layer. These responsibilities have always been handled by the front-end, wallets, indexers, and other off-chain coordination layers—and humans have remained involved in the process.

Mainstream interaction patterns also reflect this design: even for professional participants. Ordinary users interpret state via panels, select actions via interfaces, and sign transactions via wallets—not formally verifying the results. Algorithmic trading firms implement execution automation, but they still rely on humans to filter protocol sets, check for anomalies, and update integration logic when interfaces change. In both scenarios, protocols only guarantee correct execution; interpreting intent, handling exceptions, and adapting to new opportunities are still done by humans.

Agent systems compress—and even eliminate—this division of labor. They must reconstruct economically meaningful state in a programmatic way, assess whether goals are being advanced, and validate results beyond merely getting a simple transaction written on-chain. These burdens stand out especially on blockchains, because agents operate in an open, adversarial, and fast-changing environment where new contracts, assets, and execution paths can emerge without centralized approval. Protocols guarantee transaction correctness, but they do not guarantee that economic state is easy to interpret, that contracts are official originals, that execution paths match the user’s intent, or that relevant opportunities can be discovered programmatically.

The subsequent chapters will analyze such frictions across the stages of the agent execution loop: discovering existing contracts and opportunities, validating their legitimacy, obtaining economically meaningful state, and executing according to goals.

Discovery costs

Discovery costs arise because the DeFi action space continues to expand in an open, permissionless environment, while relevance and legitimacy must be filtered manually through on-chain social signals, markets, and tool layers. New protocols are filtered through signals such as announcements and research publications, and also through filter layers like front-end integrations, token lists, analytics platforms, and liquidity formation. Over time, these signals often form an actionable set of judgment criteria to identify the economically valuable and sufficiently trustworthy parts of the action space, even though the process is frequently informal, imbalanced, and partly relies on third parties and manual filtering.

While agents can access filtered data and trust signals, they do not have the natural shortcut that humans get when interpreting those signals. From an on-chain perspective, the discoverability of all deployed contracts is equal. Legitimate protocols, malicious forks, test deployments, and abandoned projects all exist in the form of callable bytecode. On-chain does not label which are important or secure.

From an on-chain perspective, all deployed contracts have equal discoverability. Legitimate protocols, malicious fork contracts, test deployment contracts, and abandoned projects are all presented as callable bytecode.

Therefore, agents must build their own discovery mechanisms: scan deployment events, recognize interface patterns, track factory contracts (contracts that programmatically deploy other contracts), and monitor liquidity formation to determine which contracts should fall within the decision-making scope. This process is not only about finding contracts—it is also about judging whether they are worth bringing into the agent’s action space.

Identifying candidate contracts is only the first step. After contracts pass the initial discovery filter, they must then undergo compliance and authenticity verification as described in the next section. Agents must confirm that the discovered contracts match their claims before they can include them in the decision space.

Strategy-constrained discovery differs from open discovery

Discovery costs do not mean only detecting new deployments. Mature algorithmic systems can already do this within their strategy scopes. For example, a searcher that monitors Uniswap factory events and automatically includes new liquidity pools is doing dynamic discovery. The frictions appear at higher levels: judging whether the discovered contract is legitimate (the compliance issue discussed in the next section); and determining whether it serves an open-ended goal rather than merely adapting to a predefined strategy type.

A searcher’s discovery logic is tightly bound to its strategy: it knows which interface patterns to look for because the strategy has been defined in advance. An agent tasked with a broader goal like “the best opportunity after optimizing risk adjustment” cannot rely only on strategy-derived filters. It must evaluate new opportunities against the goal itself, which requires parsing unfamiliar interfaces, inferring economic functions, and deciding whether the opportunity should be included in the decision space. This part belongs to the problem of general autonomy, but blockchains make it harder: unknown code is directly executable, it can carry funds, and relying solely on protocol-native signals is insufficient to categorize it.

Control-layer frictions

Control-layer costs arise because identity and legitimacy are typically determined outside the protocol through filtering, governance, documentation, interfaces, and operator judgment. In many current workflows, humans remain an important part of that determination process. Blockchains guarantee deterministic execution and final determinism, but they do not guarantee that the caller is interacting with the intended target contract. Intent determination is outsourced to social context, websites, and manual filtering.

In the current workflow, humans use the web trust layer as an informal verification tool: by finding the official domain through aggregators like DeFiLlama or through authenticated social accounts, and treating that website as the official mapping between human concepts and contract addresses. The front end then encodes a set of valid “truth sources,” indicating which addresses are official, which token representation is used, and which entry points are safe.

The 1789 Mechanical Turk (Mechanical Turk) is a chess-playing machine. On the surface it appears to run autonomously, but in reality it relies on a hidden human operator. (Humboldt University Library)

By default, agents do not interpret brand markers, authenticated social signals, or “officialness” through social context. We can provide agents with filtered inputs derived from these signals, but to convert them into stable, machine-executable trust assumptions, we need to specify a registry, policy rules, or verification logic. Operators can configure the agent with curated whitelists, verified addresses, and trust policies. The issue is not that social context is completely unavailable, but that continuously maintaining these protective barriers in a dynamically expanding action space creates a huge operational burden—and when these barriers are missing or incomplete, agents lack the backup verification mechanisms that humans instinctively use.

The real-world consequences of insufficient trust-assessment capability are already showing up in on-chain agent-driven systems. YouTube crypto influencer and celebrity Orangie had a case where an agent put funds into a honeypot contract. In another incident, an agent named Lobstar Wilde, due to a state or context failure, misinterpreted the address state and transferred a large token balance to an online “begging address.” These examples are not the core argument of this article, but they vividly illustrate how mistakes in trust assessment, state interpretation, and execution strategy can directly lead to loss of funds.

Identifying official contracts is not a protocol-native function

The problem is not that contracts are hard to discover; it is that, on-chain, there is usually no native concept of “this is the official contract for application X.” This absence is to some extent a characteristic of permissionless systems rather than a design oversight, but it still creates coordination difficulties for autonomous systems. The problem comes partly from the open-system architecture of weak normative identity, and partly from immature mechanisms for registries, standards, and trust distribution. If an agent wants to interact with Aave v3, it must determine which addresses are official originals (liquidity pools, configurators, data providers, etc.), and whether those addresses are immutable contracts, proxy upgradeable contracts, or contracts currently in a governance proposal change state waiting to be executed.

Humans solve this with documentation, front-end interfaces, and social media, while agents must determine this by verifying the following information:

  • proxy mode and implementation contract pointers

  • admin permissions and timelock contracts

  • governance-controlled parameter update modules

  • bytecode / ABI match between known deployed contracts

In the absence of an official registry, “officialness” becomes a reasoning problem. This means agents cannot treat contract addresses as static configuration. They must either continuously maintain verified whitelists, or re-derive contract compliance via proxy checks and governance monitoring at runtime, or take on the risk of interacting with deprecated, attacked, or forged contracts. In traditional software and market infrastructure, service identity is usually anchored via naming namespaces, credentials, and access control maintained by institutions. In contrast, on-chain, a contract may be callable and function correctly, but from the caller’s perspective, it is not necessarily an official original at the economic or business layer.

Token authenticity and metadata issues are fundamentally the same

Tokens may appear to self-describe, but their metadata is not authoritative—it is only byte data returned by the contract code. A typical example is Wrapped Ether (WETH). Among widely used WETH contracts, the following is clearly defined:

This information seems to represent identity, but it does not. Any contract can return exactly the following:

  • symbol() = “WETH”

  • decimals() = 18

  • name() = “Wrapped Ether”

And implement the exact same ERC-20 token standard interface. name(), symbol(), and decimals() are only public read-only functions, and the returned content is entirely set by the deployer. In fact, on Ethereum there are nearly 200 tokens whose names all say “Wrapped Ether,” whose symbols are all “WETH,” and whose precision is all 18 decimal places. If you do not check CoinGecko or Etherscan, can you tell which “WETH” is the real official version? (Answer: the 78th item in the list)

On Ethereum, there are nearly 200 tokens named “Wrapped Ether,” with symbol “WETH.” Without relying on third-party platforms, can you determine which one is the real, original WETH?

This is the situation agents face. Blockchains do not validate uniqueness, do not compare against any registry, and do not care about it at all. Today you can deploy 500 contracts, and they can all return identical metadata. There are some experiential ways to judge on-chain (for example, comparing ETH balance to total supply, checking liquidity depth on major decentralized exchanges, or verifying whether borrowing/lending protocols list it as collateral), but none of them provides absolute certainty. Each method either relies on threshold assumptions (nobody can fake a liquidity pair of billions in size), or recursively requires first verifying the compliance of other contracts.

Like a maze, identifying the “true” path on-chain requires external guidance; no standard signal is provided. (Birmingham Museum and Art Gallery)

That is why token lists and registries exist as off-chain filtering layers. They provide a mechanism to map the concept of “WETH” to specific addresses—also explaining why wallets and front ends maintain whitelists or rely on trusted aggregators. For agents, the core issue is not only that metadata is unreliable, but that normative identity is usually established by the social or institutional layer rather than natively by the protocol. The only reliable identifier on-chain is the contract address. However, mapping human-intelligible intent (like “swap into USDC”) to the correct address still heavily depends on non-protocol-native filtering, registries, whitelists, or other trust layers.

Data frictions

An agent that optimizes across multiple DeFi protocols needs to abstract each opportunity into an economic object: yield, liquidity depth, risk parameters, fee structure, oracle sources, and so on. In some sense, this is a common system integration problem. But on blockchain, due to protocol heterogeneity, direct capital risk, the need to compose multiple call states, and the lack of an underlying unified economic schema, this burden is greatly amplified. These are precisely the baseline information required to compare opportunities, simulate configurations, and monitor risk.

On-chain, protocols typically do not expose standardized economic objects at the protocol layer. They expose only storage slots, event logs, and function return values; economic objects must be derived or reconstructed from these. Protocols only guarantee that contract calls return correct state values, but they do not guarantee that those values clearly correspond to understandable economic concepts, nor that the same concept can be retrieved via consistent interfaces across different protocols.

Therefore, abstract concepts like market rates, positions, health factors, and so on are not protocol-native components. They are reconstructed off-chain by indexers, analytics platforms, front ends, and APIs—normalizing heterogeneous protocol state into usable abstraction layers. Human users typically only see this normalized layer of data, and agents can use it too, but doing so means inheriting third-party schemas, latency, and trust assumptions; otherwise, they must reconstruct these abstractions themselves.

This issue is further exacerbated across different types of protocols. Vault share prices, lending market collateral factors, liquidity depth in DEX pools, staking contract reward rates—each is an economically meaningful base metric. Yet none of them is exposed through standardized interfaces. Each protocol system has its own way of reading data, structure/format definitions, and unit conventions. Even for the same category, the implementations can differ.

Lending markets: a typical case of fragmented reads

Lending markets clearly illustrate the problem. Economic concepts are generally similar and common—such as supply and borrow liquidity, interest rates, collateral coefficients, borrowing limits, liquidation thresholds, and so on—but the data-reading paths are completely different.

Taking Aave v3 as an example, the steps for market enumeration and reserve asset state reading are separate. A typical workflow looks like this:

Enumerate reserve assets via the following:

This function returns an array of token addresses.

For each asset, retrieve liquidity and interest rate base data via the following:

This call returns a struct—one call that can retrieve data including, for example, total liquidity, interest rate indexes, and configuration identifiers—such as:

In contrast, in Compound v3 (Comet), each deployment corresponds to a single market (e.g., USDC, USDT, ETH, etc.), and there is no unified reserve struct. Therefore, it is necessary to use multiple function calls and then assemble a full market snapshot:

Utilization rate, base data

Total

Interest rate

Collateral asset configuration

Global configuration parameters

Each call only returns a different fragment of economic state. “Market” is not a first-class object; it is an inferred structure assembled by the caller.

From an agent’s perspective, both protocols belong to lending markets. But from an integration perspective, they are entirely different data acquisition systems. There is no universal data structure like the following:

Instead, the agent must use different asset enumeration methods for different protocols, assemble state data through multiple calls, normalize measurement units and conversion rules, and coordinate how inferred values differ from directly exposed base data.

Fragmentation introduces latency and consistency risks

Beyond structural inconsistency, this fragmentation also creates latency and consistency risks. Because economic state is not exposed as a single atomic market object, the agent must make multiple remote procedure calls (RPCs) to multiple contracts to reconstruct a state snapshot. Each additional call adds latency, increases the probability of triggering interface rate limits, and introduces the risk of inconsistent blocks. In volatile market conditions, once supply interest rate calculations are completed, the utilization rate may have already changed. If block height is not explicitly locked, configuration parameters and total liquidity may come from different blocks. Human users implicitly mitigate this issue through front-end caching layers and aggregated backends, while agents that call raw RPC endpoints must explicitly handle data synchronization, batch requests, and time-consistency problems. As a result, non-standardized data acquisition is not just inconvenient for integration—it constrains performance, synchronization mechanisms, and correct execution.

A lack of unified economic data acquisition standards means that even if different protocols implement nearly the same underlying financial functions, the way they expose state remains contract-specific and depends on compositional logic. This structural difference is the core reason for data frictions.

Potential data flow mismatches

Accessing economic state on blockchain is fundamentally a pull model—even if execution signals can be streamed as push. External systems need to actively query nodes for required state, rather than receiving continuous, structured updates. This reflects the core role of blockchains—on-demand verification, not maintaining application-level continuous state views.

There are also push-mode primitive components on-chain. WebSocket subscriptions can push newly created blocks and event logs in real time, but these do not include the storage state that carries most economic meaning—unless protocols deliberately choose to redundantly record it as logs. Agents cannot directly subscribe to on-chain updates about lending market utilization, pool reserve amounts, or position health factors. These values are stored in contract storage, and most protocols do not provide native mechanisms to push storage changes to downstream users. The most feasible approach today is to subscribe to new block headers and re-query storage state for each block—meaning that even if triggered by streamed events, state access is still fundamentally a pull model. Logs can only indicate that data might have changed; they do not encode the final economic state. Reconstructing that state still requires explicit reading and access to historical state.

Agent systems are actually better suited to a reverse data flow. Agents do not need to poll hundreds of contracts for state changes. Instead, they can receive structured, precomputed state updates and push them directly into their execution environments (e.g., updated utilization rates, health factors, or position changes). A push-based architecture reduces redundant queries, lowers the delay from state change to agent awareness, and allows an intermediary layer to encapsulate state into semantically clear updates—rather than forcing the agent to interpret meanings from raw storage.

Making this mode switch is not easy. It requires subscription infrastructure, logic to filter relevance, and an economic event specification that converts storage changes into agent-executable actions. But as agents become continuous online participants rather than intermittent query clients, the inefficiency of pull mode—interface rate limiting, synchronization overhead, and repeated queries among different agents—will become increasingly severe. Treating agents as continuous consumers rather than intermittent clients might be more compatible with how autonomous systems operate.

Whether push-based infrastructure is truly better is still undecided. Full-state-change data streams create filtering problems, and agents still must determine which information is relevant—reintroducing pull logic at another layer. The core issue is not that the pull model itself is wrong; it is that existing architectures were not designed with persistent machine users in mind. As agent usage scales up, alternative approaches are worth exploring.

Execution frictions

Execution frictions arise because many existing interaction layers convert intent, perform transaction review, and validate results, wrapping these steps inside workflows centered on front ends, wallets, and human oversight. In ordinary user and subjective decision scenarios, this oversight is usually performed by humans. For autonomous systems, these functions must be formalized and encoded directly. Blockchains can ensure deterministic execution according to contract logic, but they do not guarantee that the transaction matches the user’s intent, complies with risk constraints, or produces the intended economic outcome. In existing workflows, front-end interfaces and humans fill this gap.

Front ends orchestrate sequences of operations (swap, approve, deposit, borrow), wallets provide the final checkpoint of “review and send,” and users or operators typically make an informal strategy judgment at the last step. They often decide whether a transaction is safe and whether the quoted results are acceptable under incomplete information. If a transaction fails or yields unexpected results, users retry, adjust slippage, change routes, or simply abandon the operation. Agent systems remove humans from this execution loop, meaning the system must replace three categories of human functions with machine-native logic:

  1. Intent compilation. Human goals like “allocate my stablecoins to the best risk-adjusted return channel” must be compiled into a concrete action plan: which protocol to use, which market, which token route, position size, authorization method, and execution order. For humans, this happens implicitly via the front end; for agents, it must be formalized.

  2. Strategy execution. Clicking “send transaction” is not only a signing action—it also implicitly validates that the transaction conforms to constraints: slippage tolerances, leverage caps, minimum health factors, whitelisted contracts, or rules like “do not allow upgradeable contracts,” etc. Agents need to encode explicit strategy constraints as machine-verifiable rules:

Before broadcasting a transaction, the execution system must verify that the planned call graph satisfies these rules.

  1. Result validation. Putting a transaction on-chain does not mean the task is complete. Even if a transaction executes successfully, the goal may not be met: slippage may exceed tolerance, limits may prevent reaching the target position size, or interest rates may change between simulation and on-chain execution. Humans perform informal validation of execution results through the front end after the transaction. Agents must instead programmatically evaluate post-execution state conditions:

This raises the bar for completion validation; it cannot stop at confirming the transaction was sent on-chain. An intent-centered architecture might provide partial solutions by shifting more of the burden about “how to execute” from agents to specialized solvers. Instead of sending raw call data, an agent would broadcast a signed execution intent and specify result-based constraint conditions; the solver or protocol-layer mechanisms must satisfy these constraints for the execution to be considered valid.

Multi-step workflows and failure modes

A large part of DeFi execution is naturally multi-step. A single yield configuration operation may require completing steps in sequence: approval → swap → deposit → borrow → staking. Some steps may be independent transactions, while others can be executed via batch calls or route contracts. Humans can tolerate workflows not fully finishing and continue through the UI. Agents, however, must orchestrate deterministic flows: if any step fails, the agent must decide whether to retry, switch paths, roll back operations, or pause execution.

This creates new failure modes that are typically hidden in human operation workflows:

  • State drift between decision and on-chain execution: During simulation versus actual time on-chain, interest rates, utilization, or liquidity may change. Humans may accept this volatility, but agents must set acceptable ranges and enforce them strictly.

  • Non-atomic execution and partial fills: Some operations may execute across multiple transactions, or only produce partial results. Agents must track intermediate states and confirm the final state matches the goal.

  • Authorization allowance and approval risks: Humans typically complete approval signatures habitually through the interface. Agents must incorporate the authorization scope (amount, spender, validity period) into security policy reasoning rather than treating it as just an interface step.

  • Route selection and implicit execution costs: Humans rely on routing tools and UI default settings, while agents must model slippage, miner-extractable value (MEV) risk, gas fees, and price impact into the objective function.

Execution: a machine-native control problem

The core insight of execution frictions is that DeFi interaction layers treat human wallet signatures as the final control point. In the current design, intent validation, risk tolerance judgment, and informal “does it look reasonable” checks are all concentrated at this step. Once humans are removed, execution becomes a control problem: the agent must convert goals into an action chain, automatically enforce strategy constraints, and validate results under uncertainty. This challenge exists in many autonomous systems, but it is especially strict in blockchain environments because execution directly involves funds, can compose calls to unfamiliar contracts, and must deal with adversarial state changes. Humans make decisions based on experience and correct errors through trial and error; agents must programmatically perform the same kinds of work at machine speed, often within dynamic action spaces. Therefore, the view that agents “just need to submit transactions” greatly underestimates the difficulty. Submitting transactions is just the easy part; what is truly missing is all the work that the UI and humans perform: intent compilation, safety checks, and goal completion validation.

Conclusion

Blockchain does not originally provide the semantic layer and coordination layer that autonomous agents need. Its design goal is to ensure deterministic execution and state-transition consensus in adversarial environments. The interaction layers developed on top of that design always revolve around human users: interpreting state via the UI, selecting actions via the front end, and manually verifying results.

Agent systems upend this architecture. They remove human interpreters, approvers, and verifiers from the workflow and require these functions to be machine-native. This shift reveals structural frictions along four dimensions: discovery, trust assessment, data acquisition, and execution orchestration. These frictions do not exist because execution is impossible, but because the infrastructure around blockchain in most scenarios still assumes human involvement in the step between state interpretation and transaction submission.

Bridging these gaps may require new infrastructure across multiple layers of the technical stack: a middleware that normalizes economic state across protocols into machine-readable specifications; index services or RPC extensions that expose semantic primitive components like positions, health factors, and opportunity sets rather than raw storage data; registries that provide official contract mappings and token authenticity verification; and execution frameworks that can encode policy constraints, handle multi-step workflows, and programmatically validate goal completion. Some gaps come from structural characteristics of permissionless systems: open deployment, weak normative identity, and heterogeneous interfaces. Others are constrained by existing tools, standards, and incentive design. As agent usage scales up, competition among protocols to optimize for easier integration of autonomous systems may help alleviate these issues.

As autonomous systems begin to manage funds, execute strategies, and directly interact with on-chain applications, the architectural assumptions embedded in today’s interaction layers become increasingly visible. Most of the frictions discussed in this article originate from blockchain tools and interaction patterns having been developed around human intermediary workflows; some are natural results of permissionless openness, heterogeneity, and adversarial environments; and others are common challenges autonomous systems face in complex environments.

The core challenge is not to make agents complete transaction signing, but to provide them with reliable paths—to deliver the semantic, trust, and strategy-related functions that are currently jointly performed by software and human judgment between raw blockchain state and actual operations.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments