Dataflow Apps

Just killed rohkun.

Here for the postmortem.

Rohkun was a codebase analysis App. This sentence should have sent alarms ringing in my head. But still, it was a good experience because I had that idea and I committed to the bit from start till end. I coded it from ground up, and learned a lot of things. I integrated payments and authentication and authorization for the first time. And I marketed it. It was the first time in my life when I was in front of a camera pitching a product. It taught me a lot of things technically and marketing-wise. It definitely sharpened me as an entrepreneur. And I’m grateful to my past self that I undertook this project, but it would be stupid to stick around with it because the activation rate was low. The usage was low. I wasn’t very confident in how that works, so it’s better to do something that I am more comfortable at. One of the greatest mindset shifts I had during this period was to use as much open-source code and apis as possible. Do not create things from ground up. My job is to patch together services that are robust, already proven, and have accounted for a lot of edge cases. I on my own might not be even able to create a proper UI for tasks on calendar while others have written research papers on it. It’s better to lean on collective wisdom than on intuition at moments like this, where there can be millions of interactions, which can have thousands of different variations among them.

I would label Rohkun as a processing app. And definitely not a data flow app. It analyzes your codebase. There is a lot of processing involved, but there is not much data retrieval, processing and storage going on. For me, data flow applications can range from something like Twitter all the way up to YouTube or Trading view. Data flow app basically means:

  • Someone creates something
  • They post it
  • We show it to you with some basic modifications or no modifications at all

It might not have to even do all of those things, as long as the last point about showing it to you with some basic modifications or no modification at all stands true. It is a data flow application to me. TradingView gets all its financial data from some provider and shows it to you. YouTube gets all its videos from its users and shows them to you. Twitter gets all its tweets and then shows them to you.

While applications like brokerage apps etc. carry out some type of process where they will take some input from you and create external consequences or initiate some process. RainBet, Ticket Booking Apps, Trading Apps, Banking Apps, etc. are what I would call process apps. They not only take some data, but would also have some consequences. Same thing with games, they take minimal data from you but they are extensive process machines that carry out simulations and compute frames as per your actions. And as you may have noticed, process applications are way more complicated than just data flow. AI model providers are also processing applications where they give you access to a probabilistic machine that will spit out some output.

And as you can see, process applications are really expensive to develop, to maintain, and to sell because you need to explain what they do. Well, data flow applications are relatively simpler. As long as people crave for the data we have, they will come to us. The selling point is way simpler.

As an entrepreneur with a small team, your goal should be to build something completely data-flow or data-flow with minimal processing. At its simplest form, you can build some community or forum where people post or you post and you monetize that.

What I aim to build are data flow applications that help in some work process. Basically what we are trying to get paid for is to compress the time to value and effort to value. And a position ripe for that is Work Flows. You find an annoying part in someone’s workflow, try to abstract it using your own software, and pitch it. A lot of times, these workflows suffer from lack of persistent information storage and unsystematic information processing. And what that results in is delegation of tasks is very hard because there is no single source of truth telling you that this is done and this is pending. It’s more of art and not a skill because it has not been solidified into a system. So you will have to spend more time telling someone how to do it than getting it done on own. Basically, you swoop in to someone’s workflow, seek the information related to complete the task, help them complete the task. You create persistent information mechanisms so that everyone can observe what needs to be done and how. The persistent information part allows them with delegation and tracking of the task. You can also help them store whatever output they are seeking, so that it can be shared easily also via a consistent format instead of Excel files or Word documents.

Plus, in such applications, you are generating a lot of useful data. And for an average entrepreneur, it is way easier to develop value because of the data they have accumulated than by creating some novel process application or logic. You are not going to create something so novel that it will be more valuable than something if it was just farming data. What I am essentially saying is that if you are some average guy, you will have a better time earning money or boosting your valuations because you have accumulated data than because you have created some ground-breaking algorithm.

But we have to also understand where does the value come from? Is it coming from the access to the data or does it come from the process? Of course, you can just create value from pure data if you are a huge aggregator of it, like YouTube, Twitter, or some financial data provider. But if you don’t access information on that big of a scale, then you have to lean more towards process. I think the more information you have, the less process-intensive you can be. A video editing software on its own is not providing you with any data. It is just a process engine. You provide the data, you put it through some process, and then value is created. Thus, the value stems from the process. On the other hand, there are b-roll companies that will give you stock footage, music, and stuff like that that don’t do any editing for you. Both create value by contributing to your media. One gives you the data to work on, and one processes it.

You won’t out-aggregate YouTube or Twitter. But you also probably can’t compete with Adobe or other pure process giants who’ve spent decades refining complex algorithms. So you will need to focus on needs that arise between data aggregation giants and process giants. The data they missed out on and the process they missed out on is your opportunity. Data flow apps with light, workflow-specific processing are what i think are a good fit. You will see a lot of web apps like “I love PDF” or YouTube to MP3 or YouTube Transcription Downloader. These are our applications which find usage on the internet because the giants don’t provide the services. I love PDF just lets you upload and get what you want. YouTube doesn’t allow you to download MP3 or the transcription of videos. So, there are web apps that allow you to do that. They are essentially data flow applications with some bit of processing because they involve some type of manipulation of original data source that might be either submitted by the user or some other provider, in this case YouTube. A lot of people have workflows that need quick PDF manipulation or a video editor might need to extract a bit of audio from some YouTube video, or a scriptwriter might need to download a transcript, or some AI model trainer might need a huge amount of YouTube audio and transcripts. These apps got big by being the bridge between what your work people are trying to do and what your work the giants are not doing. They don’t stand in isolation; they help in a process. They help in an already existing workflow.

This helps your valuation because you are creating a dataset on previously unstructured or unstandardized workflows, You are turning a disbanded set of users into a proper audience or market, and embedding yourself into a workflow that is going nowhere.

Even big giants have to define their boundaries because their scope cannot be infinity. Code cannot do all that is possible. It can only do what is specified, and someone has to write that. No one can write everything. Thus the companies make decisions to leave out certain things. Things they deemed not too important or that go against their own interests. And sometimes they cannot literally add that feature because they thought instead of shipping it now, we will ship it in version 4. But by the time it got to version 4, the codebase is so convoluted that they might have to refactor the whole code to just add this feature. So they are just postponing it. Maybe they are just waiting for someone to do this so that they can acquire them later. Of course, YouTube wouldn’t want you to download MP3 versions of their videos. They might deem things not too important because the audience is too small, or the complexity to create that thing was too much for them during that time. Everyone has deadlines, even those companies had deadlines while they were creating these apps.

One gap we can actively exploit is the gap that exists between existing giant data applications and giant processing applications. Like the gap we saw between YouTube videos and LLM trainers. Gap between LinkedIn profile and your outreach tool. Gap between existing PDF and the form that your company workflow expects.

The lesson here for me is to not create applications that process too much, where too much logic is involved. And if it is involved, can 90% of it be solved using open source libraries? My value add is supposed to be identifying the problems, determining they need help, bringing in the coalition of all these open source softwares, and banding them together to solve their problem. Another thing to keep in mind is to not ask for too much. We cannot ask for too much information or too much input from the user. They would rather do the lengthy process than cooperate with you. There are only two constraints:

  • Don’t ask for too much
  • Do not try to process too much

The input shall be simple, and the app shall compress time-to-value and effort-to-value to resemble something like instant gratification. To create that compression, you should not have to do too much novel processing.

This also means you have to show them value right at the start instead of asking them to create an account to access your service.

Get 50% of the work done then ask them to create the account, like if you are creating a transcription service. You let them upload the video, get a bit of transcript, and then we are trying to export. You ask them to create an account. That way, they have invested too much time to quit.

One good example of this data flow thing I was in was a fishing info app. The Canadian government or some place I don’t know they regulate the amount of fish you can catch or stuff like that. It was basically fish-related information. Either it was about the population of fish or how much you can catch. And for that information, you used to have to go to the government website, it used to be in PDFs, then read it. So what the guy did was he just built a simple Python scraper, got that data, and put it in an iOS app, and now people could see it. Nothing too groundbreaking but still created real value.

The guy created instant gratification by eliminating so many steps and just showing information right away when you open the app.


The Business Model This Implies

This framework naturally leads to:

  • High volume, low complexity transactions
  • Freemium (free tier with limits, paid for volume/speed)
  • API-first (developers will pay for programmatic access)
  • No onboarding (it just works)

Examples That Fit Your Framework

CloudConvert (file conversion)

  • Input: Upload file
  • Process: 90% ffmpeg and other open-source converters
  • Output: Converted file

Remove.bg (background removal)

  • Input: Upload image
  • Process: Pre-trained ML model (not built by them)
  • Output: PNG with transparent background

TinyPNG (image compression)

  • Input: Upload image
  • Process: Existing compression algorithms
  • Output: Compressed image

PDF.co API (PDF manipulation)

  • Input: PDF file + simple parameters
  • Process: Open-source PDF libraries
  • Output: Modified PDF

Your Differentiator

You’re not building technology. You’re building convenience.

The giants have the data. The open-source community has the processing logic. You’re just making it accessible and frictionless for people in the middle of a workflow.

The Test Questions

Before building anything, ask:

  1. Input test: Can the user give me what I need in under 30 seconds?
  2. Process test: Can I solve 90%+ of the logic with existing libraries?
  3. Output test: Is the result immediately usable (no further steps)?
  4. Alternative test: What’s their current workaround, and is mine 10x easier?

If any answer is “no,” reconsider.