I am in too many Slack channels. Most of them are useful occasionally and noisy continuously. The mental tax of trying to keep up with them is real, and the tax of not keeping up is the slow accumulation of guilt about all the channels I have stopped checking.

The pragmatic answer is a digest. Once a day, a small script reads the busy channels, summarises what happened, and posts the summary somewhere I will actually read. Done well, this turns a forty five minute evening catch up into a five minute morning skim, and it removes the temptation to refresh Slack during the day to see if I have missed anything.

Done poorly, it produces something that reads like a press release written by a consultant who has never met any of the people in the channel. I have been in the latter state for most of the last three months. This post is the writeup of getting it from “obviously machine generated waffle” to “actually useful summary that I read every morning”.

The pipeline

The shape is unsurprising.

Slack API  ->  Python normaliser  ->  Ollama (Gemma 3)  ->  Telegram bot

Slack provides the raw messages from the channels I care about. A small Python normaliser strips the message format down to something a language model can read without a long preamble. Ollama runs a local Gemma 3 model on the Mac Mini and produces the summary. A Telegram bot posts the result into a personal channel that I read on my phone in the morning.

The Slack to Python part uses the official Slack SDK. Nothing exciting.

from datetime import datetime, timedelta, timezone
from slack_sdk import WebClient

slack = WebClient(token=SLACK_BOT_TOKEN)

def fetch_channel_window(channel_id: str, hours: int = 24):
    oldest = (datetime.now(timezone.utc) - timedelta(hours=hours)).timestamp()
    cursor = None
    messages = []
    while True:
        resp = slack.conversations_history(
            channel=channel_id,
            oldest=str(oldest),
            limit=200,
            cursor=cursor,
        )
        messages.extend(resp["messages"])
        cursor = resp.get("response_metadata", {}).get("next_cursor")
        if not cursor:
            break
    return list(reversed(messages))

The normaliser does three things. It resolves user IDs to display names with a small cache, since Slack messages contain user IDs in mentions and the model has nothing to do with <@U01ABC>. It collapses thread replies into the parent message, indented, because thread structure carries meaning that a flat dump loses. It drops anything that is just a join, leave, channel rename, or pinned message notification, because those are noise in summary form.

The output is a plain text block per channel that looks roughly like a chat transcript, with one line per message, names in front, and threads indented. That format is much easier for a model to summarise than raw Slack JSON.

The Ollama call is a single requests.post to the local API with a system prompt and the channel block as the user message. It runs once per channel, sequentially, on the Mac Mini. Total runtime for a normal day across roughly a dozen channels sits at three to five minutes.

The Telegram bot is one of the standard Python wrappers, posting into a personal “morning brief” channel with a fixed message format.

Prompt design, the part that matters

The first version of this pipeline produced output that I genuinely could not read. Every summary opened with “In this channel, the team discussed several important topics”, contained at least one bullet starting with “Key takeaways”, and signed off with “Overall, this represents a productive day of collaboration”. That output is the AI equivalent of a corporate Christmas card.

Three changes fixed almost all of it.

The first was to ban the consultant register explicitly in the system prompt. I told the model, in plain English, what register I wanted, and what register to avoid, with examples. It is faintly ridiculous that this works. It does work.

The second was to ask for a strict structure that did not include a preamble or a conclusion. The summary now starts with the first item and ends with the last item. There is no “in summary” line. There is no opening paragraph that restates that this was a busy channel. The structure is a bullet list of the form “who did what” or “what was decided”, and that is it.

The third was to feed the model with the names of the people in the channel as part of the system prompt, so that it had a reasonable shot at attributing things correctly. Without that, the model would sometimes invent a paraphrase that put a particular point in the wrong mouth, which is the worst failure mode for this kind of tool.

Here is the actual system prompt, more or less:

You are summarising a Slack channel for one specific reader who is
catching up after a day away. The reader knows the channel and the
people. They want a list of what happened, not an executive summary.

Rules:
- Output a flat bullet list. Six to twelve bullets, depending on
  how busy the day was. No headings.
- Each bullet is one short, plain sentence. No preamble. No
  conclusion.
- Attribute decisions and questions to the named person where the
  attribution is unambiguous. Skip attribution where it is not.
- If a thread changes the answer, only summarise the final state.
- Avoid the consultant register entirely. No filler nouns
  borrowed from corporate decks. Plain working language only.
- Plain UK English. No emoji. No exclamation marks.
- If the day in the channel was quiet, output a single line that
  says "quiet day".

People in this channel: {{names}}.

The “quiet day” rule is the one I am most pleased with. Without it, the model will invent a five point summary out of three messages, two of which are Slack’s automatic “X joined the channel” notifications. With it, a quiet channel produces a quiet line, and I can move on.

Cost and runtime numbers

Rough comparison with sending the same payload to OpenAI.

The local Gemma 3 12B run for the whole digest takes about four minutes wall time on the Mac Mini, drawing in the rough range of 50 watts during that period. Total daily energy draw for the digest job is well below half a pence. The depreciation on the hardware, amortised over a sensible useful life, is the dominant cost, and that is paid for by all the other things the Mac Mini does the rest of the day.

Sending the same payload to a hosted model would cost in the rough range of 5 to 20 pence per day depending on the model and the channel volume. Over a year, that is somewhere between £18 and £73. The local pipeline saves a small absolute amount of money. That alone would not justify the work.

What does justify the work is the privacy story, which is the next section.

Why local matters when the messages are real

The channels I am summarising contain real client work. They contain commercial sentiments about specific people. They contain product decisions that have not been made public. They contain rough numbers that are not yet rounded for external consumption. None of that is dramatic, and none of it would be a disaster if it leaked, but all of it is the kind of material that I would prefer not to upload to a third party that I have no real recourse against.

Running the model locally collapses that whole question. The transcripts never leave the Mac. The summary is generated on the same machine. The only thing that goes out over the network is the final summary, sent to Telegram, which is itself a tradeoff but a smaller one than dispatching the raw transcript to a model vendor.

The same logic applies more broadly. Operational data, the stuff that lives in chat logs, support queues, ERP exports, and CRM events, is exactly where most of the genuinely useful AI work for a small or mid sized business is going to happen, and it is also exactly the data that people are most uncomfortable about handing to an external model vendor. For UK SMEs that want to look at their own operational data carefully, the same local first principles apply. I do process mining work through the consultancy that follows exactly this pattern, with the analysis kept on infrastructure that the client already trusts. The technology is not the hard part. The trust story is.

Where it gets things wrong

Three failure modes recur.

Sarcasm is not handled well. If a channel member writes “great, another deployment script that ignores the staging step”, the summary will sometimes record that as approval rather than complaint. The fix is partly a prompt tweak, partly an acceptance that this is the natural failure mode of any model trying to read tone in text without context. I now scan the digest for any bullet that looks like good news and read the original thread when something feels off.

Long threads with many participants get compressed into something blander than they should be. A heated discussion with a clear winning argument can become “the team discussed the migration plan and agreed on next steps”, which is not wrong in any specific way and is also not useful. The current mitigation is to detect threads above a length threshold and to ask the model for a slightly longer multi line summary for those, in exchange for the rest of the digest being more terse. It is a partial fix.

Inside jokes confuse the model entirely. When a channel has a recurring running joke, the model will sometimes take it seriously and produce a bullet that reports the joke as if it were a real announcement. I have not solved this. I have started reading the original messages whenever a digest line sounds like a non sequitur.

What I would add next

A small “thread weight” heuristic that gives long, unresolved threads more space in the summary, and short closed threads less. The current pipeline treats every thread roughly equally on a per token basis, which is wrong.

A “people I mentioned” extractor, so that on days where a particular collaborator has come up several times across channels, I see that fact called out at the top of the digest. Most of my morning catch up is “did anything happen with X today”, and a per person rollup would address that directly.

An archive. The current pipeline writes the digest to Telegram and forgets about it. A simple SQLite append of every digest, indexed by date, would make it possible to ask the same model “what did we decide about the migration over the last fortnight” without having to scroll through several mornings of Telegram history. That is the project for the next quiet weekend.