Why We Built Our Own Markdown Engine (And Why It Matters)

When we started building Indilingo, we had a problem that seemed simple on paper: we needed our backend to control how text looked in the app. Sounds basic, right?

Turns out, it wasn't.

The Challenge

Building a language learning app requires extreme flexibility. When a user chats with our AI tutor, looks up a word, or gets feedback on a quiz, the content needs to adapt to where they are in their learning journey. Not just the words themselves, but how those words are presented.

We figured markdown would be the obvious choice. Except Jetpack Compose doesn't ship with markdown support. We looked at the open-source options out there, but none of them felt right. They either couldn't handle our use cases or would've become a maintenance nightmare down the road.

So we decided to build our own.

Building Something That Actually Works

This wasn't an easy call. Taking time to build infrastructure when you could be shipping features is always a tough sell. But we knew that if we got this right, it would unlock possibilities we hadn't even thought of yet.

We built a custom Kotlin Multiplatform markdown composable that converts markdown strings into Jetpack Compose's AnnotatedString format. The parser handles everything from headings (with level-specific font sizing from 30sp for H1 down to 16sp for H4+), to nested formatting (bold inside italic inside links), code blocks with monospace fonts, and even custom gesture handlers for taps, long-presses, and double-taps.

We tested it across our chat interface, dictionary, and quiz system - and it worked better than expected.

For example, when processing this simple markdown:

Lesson: Greetings

Try saying Namaste or Shukriya

Our parser recognizes the heading (automatic bold + 30sp font size), applies bold styling to “Namaste,” and marks “Shukriya” as a tappable audio word - all from a simple string the backend sends. No client-side UI logic required.

Suddenly, we could render any composable we wanted. The backend could control the presentation. We could experiment with new formats without waiting for app updates. The whole team moved faster.

Under the Hood

The markdown parser handles everything from basic bold and italic text to nested formatting, code blocks, headings, and even HTML tags. Here's what makes it special:

Regex-Based Parsing with Performance in Mind

We compile all regex patterns once at initialization instead of on every parse. Patterns like BOLD_REGEX, ITALIC_REGEX, LINK_REGEX, and INLINE_CODE_REGEX are precompiled, making the parser extremely fast even when rendering long conversations. This ~400 lines of Kotlin handle both markdown and HTML syntax without breaking a sweat.

Smart Conflict Resolution

When multiple markdown syntaxes overlap (bold inside italic, links with bold text, etc.), the parser uses a priority-based matching system. It scans for all matches, sorts by position, and applies formatting in the correct order to avoid conflicts. This means developers can nest formatting however they want - it just works.

Block and Inline Element Separation

We process markdown in two phases: first handling block-level elements (headings, code blocks, lists) by splitting into lines, then processing inline elements (bold, italic, links) within those blocks. This prevents edge cases where inline formatting might interfere with block structures. We support 6 heading levels, bullet lists, code blocks, and 8+ inline formatting styles.

HTML Tag Support

Here's where it gets interesting fur our learning use case: We support a hybrid of markdown and HTML tags. Need to underline a specific vocabulary word? Use नमस्ते (namaste). Want to bold something? Either text and text works. This gives the backend maximum flexibility to use whatever format makes sense for the content.

Interactive Learning Features Built Right In

The real magic shows up in the interactive capabilities. The markdown component supports:

Tap-to-Play Audio

For beginners, we underline clickable words. When a user taps an underlined word, they hear its pronunciation. Using AnnotatedString annotations, the parser tracks exactly which word was tapped - even in complex formatted text. The UNDERLINED_WORD annotation stores the exact word for context-aware audio playback.

Long-Press Dictionary Lookup

Double-tap or long-press any word in a message and our custom word extraction finds the exact word boundaries using Unicode-aware regex patterns that supports 50+ languages. This works seamlessly across formatted and unformatted text, automatically stripping HTML tags to find the clean word for dictionary lookup.

Where This Really Paid Off

The best decisions in product development are the ones that make future problems disappear. That's exactly what happened here.

When we added AI-powered learning features, the markdown system handled everything without any refactoring or architectural headaches. It was built for dynamic content from day one.

Want to try a new way of presenting vocabulary? Change it on the server. Want to experiment with how we show grammar corrections? Test them this afternoon. Want personalized content based on proficiency level? Already possible.

We aren't stuck in the cycle of "Building UI for feature X, then build different UI for feature Y." We built the foundation once and now we iterate on top of it endlessly.

The Real Takeaway

There’s this advice in startup culture about moving fast and not over-engineering. But there’s a difference between over-engineering and recognizing what's going to be foundational to your product.

Content presentation is everything in a learning app. It's not a nice-to-have feature, it's how learning happens. We saw that early and invested accordingly.

Now we can experiment freely. The AI generates personalized explanations without technical constraints. We prototype ideas in hours instead of weeks. The learning experience evolves as fast as out ideas do.

What's Next

Language learning is changing fast. The old model for static lessons and rigid curricula is dying. Adaptive, conversational, AI-driven experiences are taking over. Your infrastructure needs to keep up with this evolution.

Our markdown engine wasn't just a technical solution - it was a bet on flexibility, experimentation, and being able to move quickly as we figure out what actually helps people learn languages.

Turns out, that was the right bet.

Why We Built Our Own Markdown Engine (And Why It Matters)

The Challenge

Building Something That Actually Works

Under the Hood

Regex-Based Parsing with Performance in Mind

Smart Conflict Resolution

Block and Inline Element Separation

HTML Tag Support

Interactive Learning Features Built Right In

Tap-to-Play Audio

Long-Press Dictionary Lookup

Where This Really Paid Off

The Real Takeaway

What's Next

Loved the Blog?