Skip to content
View Aawegg's full-sized avatar

Block or report Aawegg

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Aawegg/README.md

I build evaluation harnesses and agent infrastructure. The part of AI I love most is the boring, honest half: making outputs measurable, falsifiable, and hard to fake. I was an AI evaluation team-lead at Turing for frontier AI lab clients, where I designed grading rubrics, caught reward-hacking, and rewrote prompts that moved agreement from "throwing darts" to "actually scores reasoning."

class Aaweg:
    role        = "GenAI Engineer"
    focus       = ["LLM evaluation", "multimodal RAG", "agentic systems"]
    philosophy  = "is it actually good, or just demo-good?"
    currently   = "shipping eval pipelines and breaking my own agents"
    fun_fact    = "wrote a C++ inference engine just to feel the tokens move"

✦ Featured work

priorityjudge is a multi-pass agent that grades plans and PRDs on priority-definition quality. Extractor → 5 independent dimension scorers → deterministic citation verifier → synthesizer.

9887 / 10000 on a hand-rated calibration set · Spearman ρ = 1.0 against the human ranking · 96.5% citation precision (the verifier caught the 3.5% the LLM hallucinated) · test-retest CV = 0.4%. The win comes from architecture, not the model.

Project What it is Stack
priorityjudge APO-native plan scorer with verified citations Python · Gemini · GH Actions · Docker
nibblecore 4-bit quantization kernels for Apple Silicon, benchmarked vs llama.cpp C++ · SIMD · Metal
Story-Character-Extractor RAG pipeline pulling structured character profiles from stories Embeddings · Vector DB · LLM
Melody-Generation-using-LSTM LSTM trained on monophonic MIDI to generate new melodies PyTorch · LSTM · MIDI

✦ Stack I reach for

Python FastAPI PyTorch LangChain Pydantic C++

Docker GCP Hugging Face Gemini GitHub Actions

also: LangGraph · LlamaIndex · MLflow · Airflow · ChromaDB · pytest · SIMD/Metal · Next.js when a UI is genuinely the right answer


✦ Activity

contribution streak contribution activity graph

✦ Reach me

email linkedin github

is it actually good, or just demo-good? let's find out together.

Pinned Loading

  1. codeforces-notebook-generator codeforces-notebook-generator Public

    Python

  2. Melody-Generation-using-LSTM Melody-Generation-using-LSTM Public

    Implemented Long Short-Term Memory (LSTM) networks for melody generation

    Python

  3. nibblecore nibblecore Public

    4-bit quantization kernels for Apple Silicon, benchmarked against llama.cpp

    C++

  4. priorityjudge priorityjudge Public

    Multi-pass agent that grades plans/PRDs on priority-definition quality. Line-cited evidence, deterministic citation verifier, 9887/10000 on a hand-rated calibration set.

    Python

  5. Review-Portal Review-Portal Public

    Python

  6. Story-Character-Extractor Story-Character-Extractor Public

    RAG pipeline that extracts structured character profiles, relationships, and roles from short stories using embeddings, a vector DB, and an LLM.

    Python