Building a Low-Code GPT-4 Workflow in Dify: A Fast Path from Dataset to Sentiment Insights

Many organisations sit on a pile of text data: product reviews, social media comments, survey responses, feedback from students or customers, and so on. The simple question is usually: “Are people mostly happy, unhappy, or neutral?” But the traditional way to answer that can be heavy: data cleaning, labelling, training a model, deploying it, maintaining infrastructure, and repeating the cycle whenever something changes.

With large language models like GPT-4, we suddenly have another option: instead of training a dedicated model, we can ask a general model to read our text (in Indonesian, English, or mixed) and return a sentiment judgement directly. When you combine that with a low-code platform like Dify, you get something very practical: a reproducible workflow for sentiment analysis without building a full ML pipeline from scratch.

This post walks through how I set up a low-code sentiment analysis workflow with GPT-4 in Dify, from dataset to insight. The focus is on the workflow and the lessons learned, not on the fact that it later became an academic paper.

From Text to a Simple Data Schema

To keep things practical, start with a simple table of data. For example, imagine a CSV of comments:

id	text	created_at
1	“Pelayanannya ramah, tapi pengiriman agak lama.”	2025-09-01 10:23:00
2	“The app keeps crashing, super annoying.”	2025-09-01 11:47:00
3	“Biasa aja sih, not too good, not too bad.”	2025-09-01 13:05:00

At minimum, the workflow needs a text field for the content you want to analyse. Additional columns (product category, channel, user segment) are helpful for later analysis, but they’re optional.

The idea is that for each row, we will attach sentiment information. A typical target schema might look like this:

sentiment – one of positive, negative, neutral
score – a numeric score from −1.0 to 1.0
positive_keywords – key phrases that carry positive meaning
negative_keywords – key phrases that carry negative meaning

So after running the workflow, we want each row to have something like:

{
  "sentiment": "negative",
  "score": -0.7,
  "positive_keywords": [],
  "negative_keywords": ["keeps crashing", "annoying"]
}

That structure becomes the “contract” between your dataset and your AI workflow.

Designing the Prompt for GPT-4

Before touching Dify, it’s worth investing a bit of thought in the prompt. The prompt defines how GPT-4 will behave and how the output is structured.

In my case, I wanted something that:

works for Indonesian, English, or code-mixed text,
produces a discrete label (positive, negative, neutral),
gives a sentiment score in a fixed range,
extracts key positive and negative expressions for later analysis.

Here’s a simplified version of the prompt I used:

You are a sentiment analysis assistant.

Task:
- Read the following text (it may be in Indonesian, English, or a mix).
- Decide the overall sentiment: "positive", "negative", or "neutral".
- Give a sentiment score between -1.0 (very negative) and 1.0 (very positive).
- Extract short phrases that represent positive and negative opinions, if any.

Return the result in JSON with this schema:
{
  "sentiment": "<positive|negative|neutral>",
  "score": <float between -1.0 and 1.0>,
  "positive_keywords": [list of strings],
  "negative_keywords": [list of strings]
}

Example:
Input: "Pelayanannya ramah, tapi pengirimannya lama."
Output:
{
  "sentiment": "neutral",
  "score": 0.1,
  "positive_keywords": ["pelayanannya ramah"],
  "negative_keywords": ["pengirimannya lama"]
}

Now analyse this text:
"{{ input_text }}"

The crucial bits are:

a clear definition of labels and score range,
one concrete example that shows both structure and style,
strict instructions to respond in JSON, which makes it easy to parse later.

This prompt will be plugged into a GPT-4 node inside a Dify workflow.

Building the Workflow in Dify

In Dify, the idea is to turn this logic into a reusable visual flow. The high-level pattern is:

take input text,
send it to GPT-4 with the prompt,
capture the JSON result,
export or store that result for analysis.

Concretely, the workflow looks like this:

Start node: defines what inputs the workflow expects. At minimum:
- input_text (string) – the comment or review to analyse.
LLM node (GPT-4): uses the prompt above and maps {{ input_text }} to the input field from the Start node. This is where the sentiment analysis happens.
Output mapping: collects the model’s output (the JSON string) and exposes it as an output field, for example sentiment_result.

In a minimal setup, one run of the workflow takes a single input_text and returns a single JSON object. Dify lets you test this interactively, so you can try a few examples, tweak wording, and make sure the JSON is stable before wiring it into anything else.

Running the Workflow on a Dataset

The next question is: how do we go from a single input to a whole CSV? There are several ways to do this; a common pattern is to call the Dify workflow via API from a small script. The script reads a CSV, sends each row’s text to the workflow, receives the JSON result, and writes a new CSV file.

Here is a simplified execute in Dify:

In a production environment you would add retry logic, rate-limit handling, logging, and maybe batching. But conceptually, this is all that’s happening: each row gets passed through the same GPT-4 workflow, and the JSON result is attached as a new column.

Later on you can parse the JSON into separate columns like sentiment, score, positive_keywords, and negative_keywords.

Turning Outputs into Insights

Once the workflow has processed your dataset, the fun part is exploring the results. With the sentiment annotations in place, a lot of analysis becomes straightforward:

Sentiment distribution: how many comments are positive, negative, or neutral overall?
Breakdown by segment: if you have product categories, channels (web vs app), or user segments, which ones attract the most negative feedback?
Common pain points: by aggregating negative_keywords, you can quickly see which issues appear over and over (for example “slow”, “error”, “queue”, “cancelled”).
Trend over time: plot average sentiment score by day or week to see whether things are improving or getting worse, especially before/after a policy or feature change.

You don’t need a fancy tech stack to do this. A simple notebook, a lightweight dashboard (Streamlit, for instance), or your BI tool of choice can be enough. The key point is that the heavy lifting—turning messy text into structured sentiment data—has already been delegated to the GPT-4 workflow.

What Worked Well and What Didn’t

A few observations from building and using this workflow:

The speed of iteration is a big win. You can move from “idea” to “working prototype” very quickly, especially compared to setting up a full supervised ML pipeline.
Prompt design becomes the main lever. When the results don’t look right, the first instinct is no longer “collect more labelled data” but “tighten the instructions, definitions, and example in the prompt”.
The approach is very language-friendly. Mixed-language text, slang, or informal phrasing are areas where traditional models often struggle unless they’ve been fine-tuned carefully. GPT-4 generally copes well.

There are also clear limitations:

Cost and scale: if you want to process hundreds of thousands or millions of texts, API costs will matter. One natural next step is to use GPT-4 to label a subset of data and then train a cheaper local model on those pseudo-labels.
Latency: for offline analytics, a few seconds per request is fine. For real-time use cases (for example live moderation), you may need additional tricks such as caching, sampling, or using a smaller model.
Fine-grained evaluation: if you need strict precision/recall metrics against a human-labelled gold standard, you still have to do annotation work and evaluation. The low-code GPT-4 workflow doesn’t remove that; it just makes the first working version much easier to build.

Closing Thoughts

A low-code GPT-4 workflow in Dify turns sentiment analysis from a heavyweight, model-training project into a relatively simple data pipeline: define your labels, craft a good prompt, wire the flow, and then focus on how to use the results.

It is not a silver bullet for every scenario, but for many practical cases—monitoring public feedback, exploring user sentiment, supporting decision-making—it offers a very fast path from raw text to actionable insight.

For those interested in a more formal, academic version of this work, the ideas in this post are based on a study that eventually became a conference paper published in the ACM International Conference Proceedings Series (ICPS). You can find the DOI here:

https://dl.acm.org/doi/10.1145/3786554.3786574