Aawegg

I build evaluation harnesses and agent infrastructure. The part of AI I love most is the boring, honest half: making outputs measurable, falsifiable, and hard to fake. I was an AI evaluation team-lead at Turing for frontier AI lab clients, where I designed grading rubrics, caught reward-hacking, and rewrote prompts that moved agreement from "throwing darts" to "actually scores reasoning."

class Aaweg:
    role        = "GenAI Engineer"
    focus       = ["LLM evaluation", "multimodal RAG", "agentic systems"]
    philosophy  = "is it actually good, or just demo-good?"
    currently   = "shipping eval pipelines and breaking my own agents"
    fun_fact    = "wrote a C++ inference engine just to feel the tokens move"

✦ Featured work

priorityjudge is a multi-pass agent that grades plans and PRDs on priority-definition quality. Extractor → 5 independent dimension scorers → deterministic citation verifier → synthesizer.

9887 / 10000 on a hand-rated calibration set · Spearman ρ = 1.0 against the human ranking · 96.5% citation precision (the verifier caught the 3.5% the LLM hallucinated) · test-retest CV = 0.4%. The win comes from architecture, not the model.

Project	What it is	Stack
`priorityjudge`	APO-native plan scorer with verified citations	Python · Gemini · GH Actions · Docker
`nibblecore`	4-bit quantization kernels for Apple Silicon, benchmarked vs `llama.cpp`	C++ · SIMD · Metal
`Story-Character-Extractor`	RAG pipeline pulling structured character profiles from stories	Embeddings · Vector DB · LLM
`Melody-Generation-using-LSTM`	LSTM trained on monophonic MIDI to generate new melodies	PyTorch · LSTM · MIDI

✦ Stack I reach for

_{also: LangGraph · LlamaIndex · MLflow · Airflow · ChromaDB · pytest · SIMD/Metal · Next.js when a UI is genuinely the right answer}

✦ Activity

✦ Reach me

_{is it actually good, or just demo-good? let's find out together.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aawegg

Block or report Aawegg

✦ Featured work

✦ Stack I reach for

✦ Activity

✦ Reach me

Pinned Loading

Uh oh!