Ibrahim Sowunmi

Stripe and ElevenLabs to Power Voice-Driven Merchant Operations - Proof of Concept

This writeup explains the Terminal management operating system powered by Stripe, ElevenLabs, and conversational AI agents.

It’s a project that I am no longer working on, and I figured I would share my notes :)


Why Terminal Management OS?

Imagine this: A merchant scales there sales efforts across various physical locations, with the support of a worker that understand their products, story, can take payments, bookings, work in various languages etc. Zero ramp time compared to a human:

“Our product contains X allergens and is best for Y skin type. What’s your skin type?”

So, we asked ourselves: What if a knowledgable, adaptable voice driven system was a first-class feature in the payment and shopping experience?

“When would be best to book X, ah sorry that time isn’t available we can book time B or C instead”

“That product is out of stock here, we have it available in our Sheffield warehouse. How about we get that delivered to you? Enter your address in the keypad below and make a payment to secure delivery.”

That’s the premise behind the Terminal Management Operating System.


System Architecture: Stripe + ElevenLabs + AI Agent

Our stack connects Stripe for payments, ElevenLabs for voice & Agent that drives interactions. Very tightly coupled to increase ship speed.

Architecture diagram showing the core primitives used in the Temi Operating System


The image above describes a very high level level architecture for the project.

I will break down each segment, with corresponding links, images and documentation.

  • Stripe API
  • 11 Labs API
  • Terminal Management Operating System API

1. Stripe API Integration (with Connect + Terminal)

We use Stripe as the financial and payment backbone of Temi OS. Here’s how:

Key Capabilities Used:


2. ElevenLabs Conversational AI Integration

We integrate with the ElevenLabs Conversational AI API to generate dynamic, real-time voice responses.

Highly recommend reading through:

https://elevenlabs.io/docs/conversational-ai/guides/quickstarts/next-js

To develop high level understanding

Why Conversational AI?

ElevenLabs agents can make tool calls, meaning they can hit our internal APIs to retrieve or act on business data.

In our case, we defined a few tools that the agent can use:

I have defined how to access these tools in the system prompt as well.

Example Conversation

Very Basic Example video

Tech Stack & Dependencies

Tool Role
Next.js (v15.1.7) Fullstack framework
React (v19.0.0) Frontend UI
TypeScript Type safety
Prisma (v6.4.1) ORM for PostgreSQL
Stripe (v13.4.0) Payments, Terminal, Connect
NextAuth (v5.0.0-beta.25) Authentication
Tailwind CSS Styling
Fuse.js Fuzzy product search
date-fns Date utility helpers
ngrok Local webhook testing
ElevenLabs Voice generation + AI agent integration