← Back


Building Elaro: Turning Short-Form Videos Into Something You Can Actually Use

Elaro Product Images

Published: 2025-12-18


Elaro came out of a simple thought: people save tons of short-form videos, but almost none of them are usable later. A recipe with no written steps. A skincare routine explained only through captions. A Costco haul where the creator fires off 20 products in 15 seconds. Saving a link doesn't help much.

So I built Elaro — an app that pulls the actual information out of these videos and turns it into structured content you can reference later. Here's how it works under the hood.

Try it: www.elaro.xyz

Tech Stack

I built the architecture end-to-end. Codex helped with some code-writing, but the design and system organization are mine.

High-Level Architecture

iOS App

User shares video

Rails API

Creates record

Background Pipeline

Sidekiq + Redis

Parsed + Hydrated

Structured content

iOS App

Displays tutorials/items

The Parsing Pipeline

This is the heart of Elaro — a chain of background jobs where each step works independently. If one step fails, I retry just that step instead of rerunning the whole process.

  1. Fetch video + metadata.
  2. Extract transcript (if the platform provides it).
  3. Speech-to-text as a fallback.
  4. AI parser to decide if it is a tutorial or product haul.
  5. OCR + image parsing if there still isn't enough data.
  6. Hydration: OG tags, link cleanup, Amazon product detection.
  7. Finalize and save.

Each job reads the same database record and writes back what it found. If something goes wrong, the record knows the last successful step, and I can re-run from there in the admin panel.

Handling Videos That Don't Talk

I expected creators to narrate their videos. Turns out a huge chunk of tutorial-style content is just text on screen, music, and a hand pointing at things. Speech-to-text fails completely there, so I added an OCR + image parser step:

This is why Elaro works on videos that look "impossible" to parse.

Products vs Tutorials (Data Modeling)

I kept these separate because they behave differently.

Tutorials

Products

Job Orchestration + Reliability

Each pipeline stage is its own Sidekiq job. Sidekiq handles retries, and the DB record stores which step failed. That lets me re-run only that step, not the whole pipeline, and debug partial results in the admin UI. It keeps compute predictable and debuggable.

The iOS App

I hadn't built iOS in almost a decade, and SwiftUI was new to me. But describing UI with state clicked fast. A few things I'm proud of:

  1. No accounts needed: You open the app and immediately start saving. The app works with anonymous local identifiers, and the backend doesn't assume a user model upfront.
  2. Local-first shopping list: The shopping list lives entirely on-device, so it works even without service. Flow: tutorial → extracted ingredients → local store → editable list.
  3. UI that tolerates imperfect data: AI extraction is messy. The app renders incomplete steps, handles missing images, and shows placeholders while metadata loads. SwiftUI's layout + conditional rendering made this straightforward.

Hard Problems I Didn't See Coming

  1. Videos with no spoken words: Forced the OCR/image step, which became one of the most useful parts.
  2. Relearning iOS development: Coming from UIKit/Objective-C to SwiftUI meant thinking about data flow, not frames.
  3. Videos with 20–30 products: Costco hauls weren't in the original plan, but they drove the product model and UI.

What Elaro Doesn't Do (Yet)

Right now Elaro is great for tutorials, product discovery, and anything with steps or items. It doesn't yet handle locations, events, or other "real-world" content. I already know how I'll approach it; I just haven't built that pipeline.

Closing Thoughts

Elaro ended up being a bigger technical project than I expected. It pushed me into designing multi-step AI pipelines, handling failure gracefully, mixing OCR, speech-to-text, and LLMs in a usable way, relearning iOS from the ground up, and optimizing Rails jobs so they don't choke on real-world video data.

It's been one of the most fun things I've built — and more importantly, it actually solves a problem I run into every day.