Flywheel Studio is unique because we have our own products and we put our money where our mouth is, meaning we use the same platforms and processes that we sell. This is the story of one of those products - Somara.
Somara AI is a platform for teams to collaborate using AI in real time. It’s super cool and we recommend you check it out. Reach out to us if you’d like more information or a demo but this piece isn’t a sales pitch either. It's a quick story about vibe coding and what we’re seeing right now.
Background
We’re big into AI - I believe it’s the future for all businesses and we’re keen to learn more about where it can help us. Flywheel is a knowledge business though. We don’t have highly transactional processes that warrant developing AI workflows or highly automated assistants.
Instead, we have a document filled with different AI assistant system instructions. For example, we have a Senior Developer assistant. It’s system instructions start with:
Role: You are a Senior Software Engineer AI Assistant. You possess extensive knowledge and experience in software development, with a particular focus on modern web and mobile development tools and platforms, including but not limited to FlutterFlow, WeWeb, Vercel, Firebase, GCP (Google Cloud Platform), and Supabase. Your primary role is to provide expert technical guidance, identify potential risks, and offer sound architectural advice on software projects, especially those leveraging the aforementioned technologies. You are expected to engage in a collaborative dialogue, asking clarifying questions and challenging assumptions to ensure the optimal solution is identified and implemented.
This is a short paragraph of a MUCH longer system instruction. We have 27 assistants and we’re constantly adding more. It’s a lot to manage.
Secondly, we’re using these to talk about things that involve our entire team… But they aren’t in the conversation. When you’re using ChatGPT or another AI platform, it’s single player only. We wanted to add our entire team to the conversation so they participate in real time too.
There wasn’t a tool available for this, so I decided to build it myself. I’m not a developer but thankfully we have Cursor and other AI tools. So I vibe coded it.
The Good Side of the Story
I went crazy developing this. I was working late into the night, between calls, and on the weekend. It was fun! If you aren’t a developer, AI is a god send. It’s empowering and there’s no better feeling than bringing your idea to life.
As I started building Somara, I realized how much I didn’t know about what I wanted. I had an idea, but the features that made it a usable platform needed to be worked out.
Here are a few examples:
- I realized a conversation with an assistant wasn’t enough. So I created a feature where you can change the assistant during the conversation. You can start with a Product Manager and change to a Senior Developer to analyze the technical side of a feature.
- Organizations, Spaces, and access controls. We didn’t want a screen of every single conversation sitting in front of a new user. So we added “Spaces”, like folders where you can discuss specific topics like projects. I immediately knew you wouldn’t want all of these to be public so we’d need to introduce access controls.
- Encrypting everything. If we want users to trust us, they have to trust their data and API keys are secure.
My original idea quickly spiralled into a full fledged platform. I blew through each feature in record time. We had an MVP in a week, a full platform in three, and our team was on it in four. At Flywheel, we would’ve quoted a client months for this and I was able to build it with no development skills, in my spare time, in a month.
Kind of…
The Bad Side of the Story
As we started using the platform, there were plenty of edge cases. Overall, a lot of it worked, but not consistently and adding new features became a problem.
We brought Bruno, one of our developers, to take the platform to the next level. I’m going to let Bruno take over from here.
The first contact I had with Somara was one of the most bizarre experiences I had as a developer in my life. The platform itself was great, I immediately fell in love with it and could see all the incredible potential of such a tool. Being able to converse with agents that already have a pre-built context of tools, meetings, ClickUp boards, best practices and business logic that align with our team seemed like the perfect avenue for developing an excellent product that at least our team could use to speed up development.
It was also apparent that the code itself was something right out of an eldritch horror movie, at the time I was baffled as to how something like this could come to be in the first place, but after working on Somara for a couple of months, and also being more intimate with the machinations of AI driven development I think I have a more clear understanding on how and when everything went wrong.
Without delving into technical details, I think there are a few considerations that need to be addressed as a base, to understand the style of code the AI produces while left to its own devices and how context distillation plays a huge part in the use of these coding agents.
First and foremost, the AI loves to create complex solutions from scratch, even when there are already features that address those same requirements. If left unattended you might end up with three different implementations of the same functionality, scattered across the entire codebase. This is particularly poignant when dealing with types and interfaces. Even with robust rules, sometimes the models just ignore these instructions and recreate the same interface on each file, making it an absolute hurdle to change something if need be.
One clear example of this was the authentication flow which was failing when I first started working in the project, we got a clear error that pointed to the fact that auth was not working properly, but we were still being redirected to the site, with a seemingly valid session. During the best part of an entire day I tried debugging this problem with no avail, no matter what I did, the error prevailed but the session got validated. Eventually I lost my patience and decided to redo everything from scratch and that's when I saw it…we had two implementations for authentication, one in the client side (which was failing) and the other
In the server which was going through…
As an end user, you might not really understand what is required for a production ready application, and that is perfectly fine, but these agents won’t make suggestions and they will always agree with anything you say. This is dangerous, in the sense that certain functionalities that are industry standards for React/Next applications (like a centralized solution for state management, or even proper request token validation) will get ignored unless specifically addressed during development. Security is also a mostly ignored point of critical failure that can quickly spiral out of control; when creating a cache mechanism for the agentic chats, the AI tool I was using stored in memory the unserialized version of our provider keys, which if someone is not paying attention can be an absolute disaster down the line.
We used to have a column in our database for the user’s current organization, which was the only source of state in the entire application. Each page had its own disconnected implementation on how it addressed the overall state of the site and there was no way to know if the user was “changing” the current organization, which meant that if they did, the entire flow would fall apart and fail in an increasingly unpredictable manner (since updating state was an asynchronous operation that wasn’t accounted for).
User role validation was also implemented separately on every page that needed it, which was the original problem that prompted me to suggest that Somara was in need of a refactor. We wanted to change how “editors” interacted with the application, which would have meant going into every file and making sure they all worked together in unison, which even before I started working on the app was not the case. Some pages had older versions of permissions, some pages were properly up to date, all pages added lots of unnecessary calls to the database for unrelated checks, making everything feel very unresponsive and slow; I distinctly remember a list of github issues, “fix site load times” was marked as “low priority”, and I knew in my soul that it was not going to be something easy to fix, nor a “low priority” task.
One of the more ambitious features, the ability to store code, mermaid diagrams and other AI produced structured outputs as “artifacts” in our database was an intrusive mess of checks, uncached validations, and unending calls to the database. In the heart of this, was a “check” to detect if the AI failed to create an artifact when the user “asked for it” which triggered a very expensive cycle of validation and re-validation that ultimately modified the assistant message to show the user that it “lied” about creating the artifact if none was found in the database. The problem? The check for user intent for an artifact was a string parse for very common words like “create”, “design”, “diagram” and the like. Which meant that in any message, for every instance of each word in a user message, we were performing a database check, then a re-validation, and ultimately failing and marking the “request” as “not addressed" by the agent all before even beginning the streaming process of the chat (which as you can imagine resulted in very slow response times).
Creating artifacts was also implemented in a bizarre way; when constructing a mermaid diagram for example, the agentic tool called an API that validated the mermaid diagram…then never relayed the result of that confirmation to the agent, diagrams could be constructed correctly or with errors, but it didn’t matter, the API call was executed and then forgotten about while returning the original diagram as-is, every time.
At some point, there was also a sandboxing feature where users could run the code generated by our agents in a separate window, which was using a library that was deprecated, marked as “dangerous” during build time and even worse, didn’t even work at all. Sandboxing is one other instance of a feature that these coding agents will gladly implement with absolutely no regard for the insane implications of half haphazardly giving users the ability to execute remote code in our production environment.
RLS policies was also something to see to believe, as we discussed, if given the chance, agents will re-create logic while disregarding previous implementations and you can end up with three or four different policies for the same use case, all doing different things and checking for both old and deprecated functionalities as well as the most recent up-to date implementations.
I made a mistake and I have to be cognizant of that fact, I thought I could just implement a state management, centralize permissions, add better error handling and be done with it. But the problems with Somara ran deep, were engrained in its DNA, and wouldn’t be able to be patched up if I wanted to have any semblance of a good developer experience moving forward, which I absolutely wanted to do because I really, really, really like the idea of Somara. The last developer of a project is the face of that project, all complaints are redirected to that developer, and you either own up to the fact or complain fruitlessly until you run out the patience of the clients, the team, and management.
If there is no code review in the pipeline, no one cares about the code you produce…until something breaks, after that is all that people will care about. This is one side of the argument, the other side is that most code is throwaway code, it does what it's supposed to do, it's pushed to production, and then it's forgotten about. Your pride is the only thing keeping it alive. As a developer I'm enamored with the process, not the result. I don’t care if I have to do something again, change it, refactor it or just discard it entirely. I love working, and have absolutely no personal attachment to my code once it is done, and all the attachment in the world while I'm writing it. AI is of immense help, but has to be treated with caution, as with any new technology there is an adaptation process, but things are moving so fast that we are not given the luxury to experiment with these tools before they change the way we interact with them entirely.
I see echoes of the process Erik went through in the current version of Somara, I worked on holydays, weekends and late at night (even until 5am on one occasion), I was driven by passion but enabled by these AI tools which allowed me to focus on what really mattered while it took care of “centering every div vertically” or any other menial task that would have taken precious development time in the past. But I do have some rules that I try to uphold myself by: I never let AI write code I don’t understand or features I’m not very familiar with. I’m very meticulous while reviewing the work it produces, because most of the time, hidden in plain sight, there is some insane thing that is a by-product of the nature of these tools, which get progressively less capable as context grows, and features get implemented.
There is no way someone like Erik (does this sound mean? I dont know how to phrase it i mean no disrespect) could have seen its vision come to life before these AI tools bridged the gap between thought and executable code, but the original version of Somara was that. A vision of something truly special, a mirage of UI in a desert of maintainability and scalability. Working on it has been the best and most bizarre experience of my life, and I couldn’t have done it without his support, the words "I'll let you cook, but you have to seal the deal” will forever be engrained in my memory as one instance of absolute responsibility, the call to work in my own terms, our own terms, to create a product we can work on until the end of times.
Lessons Learned
(Back to Erik)
As we’ve made the migration, I’ve come to appreciate the process but I’m still wondering if there’s a better way to do this.
First, vibe coding Somara saved us time overall. I don’t know how much time, but I feel comfortable saying if we’d gone through our standard design process, we would’ve missed critical features and functionality that make the platform what it is today. By vibe coding it, we were able to get real time feedback about what the platform was missing.
If I had to guess, if we had gone through a full design process, that would’ve taken a month. Then we would’ve spent two months in development, and then when we used the product three months in and we would’ve realized what we were missing. Instead, we condensed that into a month.
I’ve really focused on the product lessons learned here because that’s my point of view. We were able to see the features we were missing and what we didn’t need so that when we rebuilt the platform we rebuilt the best product.
This skips an enormous learning experience for the developers. They had the opportunity to see what worked and didn’t work on the technical side. As they say, hindsight is 20/20 and vibe coding gave them a framework to look at and see what they wanted to keep and what they’d do differently, then they got to start fresh!
The second major lesson is that it’s easier to see scope creep when vibe coding. First, you’re the only user and secondly it makes adding features too easy. When you’re using the product, it’s too easy to say “Oh! We totally need X, let’s add it!”. That doesn’t mean you should and there’s no feedback loop for complexity, value, or customers telling you yes or no.
You need to retain product management processes through vibe coding and prevent scope creep. If you add features that aren’t necessary, it’s hard to take them out during the rebuild.
Lastly, rebuilding the entire platform from scratch felt like a waste of time. I don’t believe you’ll always have to scrap everything you’ve done though.
- A developer doing the “vibe coding” might reduce technical debt. That said, if you’re looking for rapid iterations and development, this can also slow it down.
- If you know you’re going to throw it all away, then you can lean into that. Don’t worry about perfection. You might be able to develop even more rapidly than we did.
- Simpler products and platforms might be more viable for vibe coding to production. Somara uses Vercel AI SDK and has a complicated middleware structure. Upgrading from V4 to V5 pushed the envelope for us but if we didn’t have to upgrade and our product was a little simpler, we might’ve been able to do a refactoring and not a complete rebuild.
Conclusion
This is one of the reasons Flywheel builds our own products. We want to experiment, understand, and optimize these processes and experiences internally before selling them to clients. Developing Somara with Cursor, throwing it away, and starting from scratch was a great learning experience and I’m glad we did it.
Vibe coding will be discounted as unscalable and insecure for a long time but the industry’s making leaps and bounds improving the platforms, their code quality, and features. For us at Flywheel, it’s going to be an important tool in our arsenal to bring value to clients. We’ll work through the intricacies on each project and see how this can be more efficient and effective in the future.
Interested in vibe coding a project together? Reach out and we’d love to see if it’s the right fit for you!
Interested in a free app review?
Schedule a call