Macalinao Studio — AI Systems, Implemented and Operated

AI Stack

01 · Stack

Claude · Hermes Agent · Paperclip · GPT-5.5

The agentic stack I implement against — a Claude / GPT-5.5 reasoning layer fronted by Hermes (the agent runtime) and Paperclip (the gateway / cron / delegation tier). One default agent, many models, all calls audited at the edge.

What I'm learning

Centralizing access through one gateway makes cost, audit, and rate limits enforceable. Direct keys per tool is the slow drift into chaos.
"Agent" is a verb (a thing that decides what to call) more than a noun. Picking one default keeps the verb consistent.
Tool inventory belongs in the agent layer, not in the prompt — keeps prompts small and tools swappable.
One agent, many models is easier to operate than many agents, one model.
Cron-driven agent delegation is dramatically simpler than event-driven for small teams. Most "real-time" requirements aren't.
Mass-update workflows require idempotency from day one — or the rollback story gets ugly fast.
Audit logging at the gateway saves arguing about what an agent actually called when something breaks at 2am.

Tools

ClaudeHermes PaperclipGPT-5.5

Posture

One default agent
Many models
Audit at the edge

Generation Stack

02 · Stack

ComfyUI · GPT-Image2 · LTX 2.3 · AceStep 1.5 XL

The media-generation stack covering image, video, and audio. ComfyUI is the workflow editor; GPT-Image2 handles batch persona renders; LTX 2.3 is the video diffusion runner; AceStep 1.5 XL covers music. Each tool is wired to the same farm and gateway.

What I'm learning

A saved workflow .json is more reproducible than any prompt — treat it like code, commit it, diff it.
Custom nodes age fast; pin the ones you depend on or accept the breakage.
Splitting the workflow at the latent stage (load, sample, decode separately) is the cheapest debugging move.
Persona enrichment as structured JSON beats freeform prompt rewrites every time — and survives reruns.
Visual-DNA maps make characters consistent across hundreds of frames without re-prompting from scratch.
Video models reward thinking in shots, not seconds — pacing is a prompt input, not a render setting.
Music generation is most useful when the visual cut already exists — composition lives in the video, not the audio.

Tools

ComfyUIGPT-Image2 LTX 2.3AceStep

Coverage

Image · Video
Audio · Music
LoRAs · Fine-tunes
Workflow as code

Hosted Stack

03 · Stack

Mail Operations · Web Operations

The self-hosted operations layer — managed mail with proper authentication and an outbound relay, plus a portfolio-wide edge / DNS / proxy setup with structured contact-form sweeping. Both managed as data, both built to roll back per zone.

What I'm learning

Reputation is the metric that matters; everything else is upstream of it.
An outbound relay for transactional mail is a much better story than sending direct from the mail server.
Authentication reports are noisy until you act on them — read them weekly or don't enable reporting at all.
The dry-run output IS the rollout plan. Anything you can't reproduce in dry-run won't behave the same on apply.
Parked domains are a contact-form attack surface most teams forget exists. Sweep them like any other input.
Edge rules outlive any single deployment script — write them as data, not as one-shot commands.
Phased rollout per zone is slower but spares you the "everything broke at once" debugging.

Approach

MailWeb DNSEdge

Stack

SESAWS

Posture

Self-hosted
Managed deliverability
Phased rollout

AI Image/Video Farm

04 · Live

Generation & LoRAs · Distributed Inference

A private compute environment for image and video generation, with LoRA fine-tuning baked into the pipeline. Centralized orchestration handles model routing, throughput shaping, and node-level health.

What I'm learning

VRAM is the binding constraint on multi-model serving — utilization without VRAM tracking is a misleading green dashboard.
LoRA training stays cheap if the base model is locked and only the adapter cycles — full retrains rarely earn their cost.
Auto-load/unload is the difference between "real cluster" and "one big model that won't move."
Cost-per-generation only becomes a real number once node-level telemetry is wired in.
Long-running generations should checkpoint to disk; "cluster goes down at 80%" should not erase the work.
Throughput shaping per-tenant is the difference between "shared farm" and "one user starves everyone else."

Approach

LinuxCUDA LoRALocal

Topology

Multi-node
Internal-only
Auto-routing

AI Attendant and Assistant

05 · Alpha

Voice-driven Attendant · Always-on Assistant

A speech-first assistant on a self-hosted speech pipeline — text-to-speech, speech-to-text, and multi-language voice generation, wired to a turn-taking attendant and an always-on assistant interface.

What I'm learning

Model selection matters more than parameter tuning for TTS quality — the wrong base model can't be fixed downstream.
Subtitle alignment is its own problem class. STT timestamps need post-processing per language.
MOS-style scoring is the only way to compare voices without taste arguments creeping in.
Multi-language coverage cascades into engine choice — the language list is locked early.
Turn-taking is harder than transcription — the assistant that interrupts itself loses every conversation.
Latency budget is real; sub-second response is the floor, sub-300ms first-syllable is the goal.
A voice persona is a brand asset — it should be consistent across calls, sessions, even years.

Approach

TTSSTT Multi-langTurn-taking

Coverage

Voice-first
Multi-language
Sub-second target

AI Music Idols

06 · Live

Albums & Music Videos · Character-led Acts

A small label of in-house AI idols — characters with discographies, not producers with stage names. Currently QKeyV (synth · dark pop) and DvYnT (electronic · cinematic). Releases ship as albums plus visual companions cut from the same generative pipeline.

What I'm learning

Visuals tied to the audio cut from day one beat post-hoc music videos — every time.
Album shape forces a story; single-track-only acts lose continuity fast.
Distribution is its own production pass — budget it as a stage.
An idol is a character with a discography, not a producer's stage name — write the persona, then the songs.
Voice consistency across releases is what keeps an act feeling like one act, not many.
The fan loop (release → social → response → next release) is the actual product, not any single song.

Acts

QKeyVDvYnT

Output

Albums + videos
Visual companions
DSP distribution

AI Persona, Models, UGC

07 · Live

Persona Roster · Brand-safe Media · UGC Feeds

A locked roster of AI personas used as brand-safe media assets. Each persona ships with a defined aesthetic, a social tone, a UGC-ready feed format, and album/track structure for ongoing campaigns. Output runs through manual QA before anything ships.

What I'm learning

A persona is a brand asset — versioned, signed off, retired the same way logos are.
Visual DNA is what separates a real character from a generated face — without it, every shoot starts at zero.
UGC feel comes from imperfection on purpose, not from prompt-perfect renders.
Cross-platform formats (story, reel, square) cost less when designed up front than retro-fitted.
Album/track structure forces editorial rhythm into otherwise scattered AI output.
DOCX as a deliverable beats HTML for client review — comments and markup land where reviewers already work.
Quality assurance has to live in the loop, not bolted on at the end.

Approach

RosterUGC SocialBrand-safe

Use

Campaign assets
Social feeds
UGC seeding

AI Learning Course

08 · 1 Foundation · 4 Tracks

Text · Image · Video · Voice · All built on the same transformer math

A from-the-ground curriculum on transformer-based AI, structured by output modality rather than module number. One Common Foundation covers representation, math, architecture, training, and inference — the layers that apply to every transformer regardless of what it generates. Four modality tracks then specialize: Text, Image, Video, Voice. The same attention mechanism powers GPT, Stable Diffusion, Sora, and ElevenLabs — and seeing them side-by-side is the curriculum's organizing idea. Authored twice in parallel with two AI co-authors (Gemini + Antigravity, Sonnet 4.6) and folded into a single canonical narrative. 125 audio narrations, two interactive demos, one pre-rendered 3D embedding video. No login. No paywall.

Open the course →

§1 Common Foundation · the five layers every modality builds on

1A Representation — tokens, embeddings, the prequel methods (3 modules).
1B Math underneath — softmax, cross-entropy, why next-token prediction works (2 modules).
1C Architecture — attention, Q/K/V, the transformer block, encoder-decoder (6 modules).
1D Training — pretraining, optimizations, RLHF, DPO/KTO/ORPO, PEFT/LoRA (6 modules).
1E Inference — the decoding loop, the highway, offline engines, the HF registry (4 modules).

§2-§5 Four modality tracks · current depth + future slots visible

§2 Text — RAG · chat UI · latent space. Inherits all of §1. Most complete track.
§3 Image — Vision Transformers, cross-attention, VAE+Diffusion, ControlNet. 4 lessons + 5 future slots.
§4 Video — AI Film Studios pioneers (Sora, Runway, Luma, Kling, Higgsfield). 1 lesson + 4 future slots.
§5 Voice — Speech synthesis (ElevenLabs, Whisper). 1 lesson + 4 future slots.

What I learned shipping the course

Two AI co-authors covering the same material in parallel surfaces what each tool understands and what it papers over — the diff is the lesson.
The original module numbering hid the modality structure — module 04 ("vision-speech") is actually pure Vision Transformers; module 26 ("multimodal-film") covers both video AND speech and had to be split into 26v + 26s.
A common foundation that all modality tracks inherit beats duplicating the basics per track — mirrors how the systems actually work.
Visible "Coming soon" slots in thin tracks (Video, Voice) set honest expectations and show where future content lands — better than hiding the asymmetry.
Three.js scrollytelling looks great in a demo and harms reading flow on a real lesson — the revamp drops the 3D shell and keeps the canonical content.
Curriculum survives best when anchored on principles rather than specific products — Llama 3 and FLUX.2 examples will date; the residual stream and the diffusion process won't.

Foundation

21 modules
5 layers
(1A-1E)

Modality tracks

Text · 3 modules
Image · 4 + 5 future
Video · 1 + 4 future
Voice · 1 + 4 future

Media

125 mp3 clips
2 interactive demos
1 pre-rendered video
3 hero diagrams

Format

Self-pacedNo login AudioDemos

Source

github.com /
ryzenx570 /
LLM-Understanding

Security, IDS, Hardening

09 · Live

SOC Posture · Intrusion Detection · Runbooks

Continuous security work across access policy, an ongoing threat-tracking notebook, an intrusion-detection layer, versioned hardening checklists, and a monitoring runbook. Real attack analysis feeds the runbook back instead of drifting into archive.

What I'm learning

A threat database without a monitoring runbook is a graveyard. Alerting is what makes the data load-bearing.
Hardening checklists need versioning the same way code does — each revision should explicitly subsume the last, with a diff.
Brute-force at the perimeter is still the floor of what you'll see, and rate-limit tooling is still the cheapest mitigation.
Snapshot completion does not equal recovery. Restore drills uncover gaps the snapshot job hides.
IDS signal-to-noise ratio drives whether you'll act on it — tune for false-positive cost, not detection rate.
Patch cadence is a posture choice, not a calendar item — pick monthly, weekly, or "as critical drops" and commit.
The audit log is only useful if someone reads it — schedule the review, or the log isn't security, it's storage.

Approach

HardeningIDS Rate LimitsRunbook

Stack

CloudflareMail Certs

Cadence

Daily review
Weekly audit
Monthly retro

AI systems,
implemented & operated.

Nine projects, three stacks, one workshop.

How a project actually moves.

Route, don't proliferate

One-shot rate is the cost line

Dry-run is the spec

Persona = structured DNA

Memory is the binding constraint

The runbook is the alert

AI systems, implemented & operated.

Nine projects, three stacks, one workshop.

How a project actually moves.

Route, don't proliferate

One-shot rate is the cost line

Dry-run is the spec

Persona = structured DNA

Memory is the binding constraint

The runbook is the alert

AI systems,
implemented & operated.