Inquir Compute logoInquir Compute
Use case

LLM pipelines and serverless AI workflows

Stage retrieval, moderation, tool calls, and summarization as separate functions so retries, traces, and cost control apply per step—not to one oversized prompt.

Last updated: 2026-04-20

Direct answer

LLM pipelines and serverless AI workflows. Each stage is deployable and loggable; compose with pipelines for async gaps between model calls.

When it fits

  • Multi-model flows
  • Human-in-the-loop handoffs
  • Long-running enrichment

Tradeoffs

  • Exploratory notebooks lack durable graphs, retries, and cost control for production AI workflows.

Why single-prompt LLM workflows are hard to retry

One mega-prompt cannot retry retrieval without re-running moderation or tool calls—LLM pipelines need boundaries.

Costs balloon when every branch re-embeds the same context instead of caching structured retrieval output.

Why notebook scripts do not replace LLM pipelines

Exploratory notebooks lack durable graphs, retries, and cost control for production AI workflows.

Stage LLM work as deployable functions

Each stage is deployable and loggable; compose with pipelines for async gaps between model calls.

Tool calls stay HTTP functions with explicit auth—consistent with serverless AI agents elsewhere.

LLM pipeline stages to split for observability

Retrieve

Isolate embedding and search calls.

Moderate

Fail fast before expensive generation.

Call tools

Call tools with tight input validation.

Summarize

Compress for storage or user display.

How to stage LLM work with Inquir pipelines

1

Draw dataflow

Name inputs/outputs per box.

2

Codify

Implement each box as a function or pipeline step.

3

Measure cost

Track tokens and wall time per stage.

Document analysis LLM pipeline

Each stage is a separate serverless function: retry retrieval without repeating moderation, track token cost per step, cache intermediate outputs, and inspect traces when one model call fails.

pipeline: document-analysis
// Stage 1 — extract-text.mjs
// Input: { documentUrl }
// Output: { text, charCount }
export async function handler(event) {
  const { documentUrl } = event.payload ?? JSON.parse(event.body || '{}');
  const text = await extractText(documentUrl); // PDF/HTML/DOCX → plain text
  return { text, charCount: text.length };
}

// Stage 2 — classify.mjs
// Input: previousOutput.text
// Output: { text, category } — 'invoice' | 'contract' | 'report'
export async function handler(event) {
  const { text } = event.previousOutput;
  const category = await classifyWithLLM(text);
  return { ...event.previousOutput, category };
}

// Stage 3 — retrieve.mjs
// Retrying this stage does NOT re-run moderation or classification
export async function handler(event) {
  const { text, category } = event.previousOutput;
  const related = await vectorSearch(text, { filter: { category } });
  return { ...event.previousOutput, related };
}

// Stage 4 — summarize.mjs
export async function handler(event) {
  const { text, related } = event.previousOutput;
  const summary = await summarizeWithLLM(text, related);
  const tokens = summary.usage.total_tokens;
  return { summary: summary.text, tokens }; // tokens tracked per stage
}

// Stage 5 — store-and-notify.mjs
export async function handler(event) {
  const { summary, tokens } = event.previousOutput;
  const doc = await db.documents.create({ summary, tokens });
  await notify(doc.id);
  return { docId: doc.id };
}

Use pipelines when…

When this works

  • Multi-model flows
  • Human-in-the-loop handoffs
  • Long-running enrichment

When to skip it

  • Single prompt demos

FAQ

Why split an LLM workflow into stages?

Retries, cost attribution, and debugging improve when retrieval, moderation, tool calls, and summarization are separate steps with their own logs.

Streaming tokens to end users?

Keep user-visible streaming at the boundary; internal stages can use request/response for simpler failure handling and replays.

How do I control cost across stages?

Measure tokens and wall time per stage in observability; cap expensive steps with budgets and short-circuit when moderation fails.

Inquir Compute logoInquir Compute

The simplest way to run AI agents and backend jobs without infrastructure.

Contact info@inquir.org

© 2025 Inquir Compute. All rights reserved.