Building a SaaS product with HTMX

About once a month, some version of the following post appears on the HTMX subreddit:

User: Why would you use HTMX for a project with [some complex specification].

In response, someone will likely reply:

If your project is really that complex, HTMX might not be for you.

But — is your project really that complex? Are those two pieces of client-side state you think you need really worth introducing React or Angular to your project?

Enter Chatterpulse AI

Chatterpulse AI is an intelligent classroom assistant that augments teachers' ability to orchestrate their classroom. What does that all mean? In short, the hardest and most import thing teachers do is teach.

Yet most of the AI tools coming out right now are focused on either (1) helping teachers build resources, (laudable, but I have some questions around the viability of the business models) (2) moving instruction online, (valuable, but if we learned anything from COVID, it’s that online instruction is sub-optimal), or (3) analytics for teachers after they teach.

So what do you do when you’re standing in front of a class of 30 or 300 students? How do you keep track of what you’re saying, figure out if students are actually hearing what your saying and identify misconceptions, and accommodate students who are learning in their non-native language or are in need of other learning accommodations?

This is where Chatterpulse AI comes in. Chatterpulse AI has 7 basic actions:

Records what you say while you’re teaching
Allows students to view transcripts in real time in the language of their choosing
Allows students to ask and get answers to questions from you or an AI
Enables teachers to give flash quizzes that you write yourself OR have an AI write for you on the fly.
Enable teachers to lead discussions with students, and see AI summaries of all the students' responses to these open-ended discussions in an instant (even in a college class of 700 students!)
Allows you to summarize the totality of your class session.
Allows you to export your sessions and save them as artifacts for later (student or parent upset about their grades on a test? If only you could prove you’d taught the concept…)

Building for simplicity

There has been interest recently in the idea that AI could enable a one-person unicorn (private company with a $1 billion evaluation). I’m not delusional enough to think I’m going to be the person to do that, but the constraint of this idea is fascinating.

If one person were to build and maintain a code-base of sufficient value to power a billion-dollar company, what would that code-base look like? In general, I believe it would:

Be monolithic in nature.
Have a tight and limited set of dependencies.
Preference locality of behavior over separation of concerns. It’s more important that you can hold everything in your head than that you can separate problems into chunks — because there is no way to distribute those chunks.
Limit and bet on a specific set of choke points for where scaling becomes a problem. I.e., pick your problems rather than allow them to pop up organically.
Centralize the value proposition of the product in a small domain/set of technologies that the founder can focus on, allowing other non-focal technologies to be either (a) outsources or at least (b) limited in their ability to harm the value proposition. See: Focus on what makes your beer taste better

I won’t pretend that I had fully articulated these ideas when I started this process, but I had developed a broad intuition and business philosophy that favored simplicity and pragmatism in general. Let’s formalize these ideas in as simple a ways as possible:

Axiom 0: Complexity scales exponentially over the number of tools you use, over time, and over an increase in the size of your business. Your intelligence and the number of hours in a day don’t scale at all. Complexity is your biggest enemy.

Finding a Tech Stack

So, I had a product idea; it played to my expertise. Now— I just needed to build it.

The only problem was I had no web dev experience. I’m primarily a data scientist, focused on Edtech. Dangerous enough with Python, and I can get along with HTML and CSS, but I’m useless basically everywhere else. Javascript? Gross. DNS, SSL, HTTPS? Things I have people for.

The value in this product was going to reside mostly in my knowledge of education and instruction and applying AI to that problem space. Everything else was a means to an end. With that in mind, I started building my stack.

Django + Postgres (Obviously)

There are other options, and I actually had more prior experience with Flask, but Django was an obvious choice. Beyond built-in templating and the batteries included approach, I found that, included with the ‘batteries’ was an ‘opinion’. Just the fact that Django forces an ORM tool and a certain file structure on me meant that, as a less experience dev, I could default to Django’s built-in opinion without shooting myself in the foot.

Sure, I’ll have my own opinions going forward, but having ‘default opinions’ built into the tooling was invaluable. For example, postgres became pretty obvious, and I spent very little time on the decision.

Bootstrap

I know, I know, bootstrap is out and tailwind is in. But guess what? Bootstrap is easier. And if a product works, it works. Have you seen Basecamp, a _no build, raw JS + CSS app? If CSS is good enough for them, it’s good enough for them, bootstrap is good enough for me.

Herkou

Heroku allows me to limit my exposure to DevOps nonsense. Maybe at scale I’ll need to make a switch, but what a good problem that will be! And sure, AWS may be cheaper in theory, but I know I’m more than capable of an accidental 100k bill. To me, AWS is the antithesis of simplicity.

HTMX

Arriving at the above tools was pretty easy. But I had come to a cross-roads; I didn’t want to become a JavaScript expert. I would need to write some, but I wanted to highly limit my exposure. I hear a lot of arguments in the HTMX community that “HTMX shouldn’t replace your knowledge of JS.”

You know what— it can and it does. I know we like to think that more options = better product, but focusing on a server-side-first, hypermedia-driven application philosophy as a constraint of the project led to a better outcome. Forcing myself to figure out a way to build an app that worked within the hypermedia application philosophy created cohesion and simplicity.

Let me put a finer point on this:

Axiom 1: Picking a philosophy or paradigm that constrains your technical problem space is a better approach than attempting to find the ‘best solution’ to every problem as it arises. The first stands you on the shoulders of those who have come before you, the second leaves you with only the disadvantage of your caveman brain.

Picking HTMX allowed me to play to my strengths as a backend-focused dev. The combination of HTMX + Django and the paradigms built into and advocated by these tools and their creators allowed me to be my dumb self and default to the opinions of people much more expert in these areas than myself.

But Wait — You’re Building a SPA?

As my designs for Chatterpulse AI evolved, the interface became fairly simple:

A sidebar to load an existing session or start a new one.
A single button bar where you could record, start quizzes, and perform session actions.
A feed, much like a social feed, where the outputs from these actions were captured.
A students-facing app with a mirrored feed that the teacher controls, where students take quizzes, participate in discussions, view transcripts, and ask question.

So, a static sidebar, a row of buttons that trigger actions, and a main body of an app that is updated dynamically.

In other words, a single-page application (SPA).

“DON’T USE HTMX! USE A CLIENT-SIDE STATE MANAGEMENT SYSTEM!” I hear you saying. In response, I wish I had some sort of principled, abstracted response. I don’t. All I can say is… it was fine. On the client side, I needed to track the microphone and send audio back to the server for transcription. I did this with one raw JS function.

Other than that, the state of the session isn’t dynamic to the client. The current session you’re in has a certain set of transcripts, quizzes, student responses etc. within it. So, as long as we track the current loaded session (by pushing to the URL), even a page refresh doesn’t break the state. \

“BUT YOUR SESSIONS ARE LIVE! NEW STUDENT QUIZ RESPONSES COME IN! HOW DO YOU DYNAMICALLY UPDATE THE PAGE?” I hear you screaming into the void.

Well, with one line of code:

<div hx-get="/poll_content_chunks?last_chunk_id={{chunk_id}}&session_id={{session_id}}&type={{chunk_type}}" hx-trigger="every 5s once" hx-swap="outerHTML settle:1s">

Note: close readers may be confused by the outerHTML swap here. By making use of OOB swapping I can do much more than outerHTML swap, and the outer acts as a sort of ‘default + reset.’ The swapped in code includes this same div, meaning the poll stays live.

By using OOB swapping, I can use a single poll to update the page state. 5 seconds is plenty reactive for my use case, but this could be changed to a websocket for greater reactivity. The endpoint returns a 204 when no new content is found, so nothing changes and the poll continues.

There are of course details here I’m skipping, but the general flow of the app is consistent:

→ User pressed a button, triggering an action

→ If needed, a bootstrap modal + form is loaded that allows the user to fill out details about that action (e.g., enter a quiz question to send to students)

→ Server computes logic and triggers action.

→ When action is complete, the data is stored in a postgres database.

→ Students and teachers poll the server from the app, and data is rendered and send as html chunks.

“WELL IF YOU’D HAVE USED REACT/ANGULAR/VUE/SVELT you could have done it [X] way” I hear you keening.

You’re probably right; I wouldn’t know. But the way I did it is good enough. I know where it breaks (polling may not scale well, and I’ll need to change to a websocket at some point). I don’t know enough to address every counterfactual, but I do know enough to know this was months faster than learning a new programming language.

Let’s formalize another axiom:

Axiom 2: You’re optimizing over a business, not over a codebase.

Takeaways: Are you sure you need that shiny thing?

Let’s review our axioms.

Axiom 0: Complexity scales exponentially over the number of tools you use, over time, and over an increase in the size of your business. Your intelligence and the number of hours in a day don’t scale at all. Complexity is your biggest enemy.
Axiom 1: Picking a philosophy or paradigm that constrains your technical problem space is a better approach than attempting to find the ‘best solution’ to every problem as it arises. The first stands you on the shoulders of those who have come before you, the second leaves you with only the disadvantage of your caveman brain.
Axiom 2: You’re optimizing over a business, not over a codebase.

Building a SPA with HTMX was a delight. It was intuitive and fast. Overtime, I may introduce more build tools— but I’m sure glad I didn’t start with them. HTMX saved me months of getting started and getting to market.

Oh, and if you’re in education, give Chatterpulse AI a try!