Animesh Chowdhury animesh01

Animesh Chowdhury

AI & Data Product Leader · Product, Evaluation & Quality (AI/ML) · Conversational & Agentic AI

AI/Data product leader, 10+ years building data & GenAI products end-to-end — data product lead for Walmart's AI shopping assistant, owning the evaluation, experimentation, and quality systems that steer the roadmap.

LinkedIn · Streamlit · Tableau Public · RPubs · Email

👋 About

I take customer-facing AI products from ambiguous problem to launch, and own the evaluation, experimentation, and quality systems that decide what ships next. My edge is hands-on technical depth — LLM evaluation, RAG, observability, experimentation infrastructure — paired with the product judgment to weigh customer experience, safety, cost, and scale in one call.

Currently data product lead for Sparky, Walmart's AI shopping assistant (used by ~50% of Walmart app users; a publicly cited driver of ~35% larger orders), where I defined the platform's first standardized quality KPI and its greenfield evaluation standards from zero.

🧪 Featured projects

Five runnable apps spanning the AI-product lifecycle — build → evaluate → experiment → monitor → explain. Each is live, self-contained, and built on synthetic or real public data.

Project	What it shows	Demo
🛰️ LLM Observability & Evals	Model-health monitoring across quality, safety, performance, cost & drift — SQL-backed pipeline, alerting, and PDF/PPTX export	Live ↗
💬 Chat Quality Score (CQS)	LLM-as-a-judge evaluation scoring conversations on a 4-dimension rubric, calibrated against human labels	Live ↗
🛒 Product Recommendation Quality	Tracks AI recommendation relevance week over week and surfaces the drivers behind any change	Live ↗
🧪 A/B Experimentation Framework	Hypothesis design, randomization, guardrail metrics, and ship / iterate / stop decisioning	Live ↗
🔎 LedgerIQ — Finance RAG Agent	Finance-ops RAG over two sources — real SEC EDGAR filings and FP&A planning documents — grounded, cited answers that refuse when out-of-corpus, with token-minimization controls and MCP retrieval servers	Live ↗

LedgerIQ runs on real public SEC EDGAR data (SEC source) plus synthetic FP&A documents (FP&A source); the other apps use fabricated or synthetic data — no proprietary, confidential, or employer-specific information.

_{Built with Streamlit · RAG & MCP · SQLite · LLM-as-a-judge · Python}

🛠️ What I work with

Product: product strategy & roadmap · feature prioritization · MVP scoping · PRDs & requirements · experimentation & A/B testing · KPI ownership · stakeholder management GenAI & AI/ML: LLM evaluation (LLM-as-a-judge) · RAG & grounding · agentic AI & tool use (MCP) · prompt evaluation · conversational & agentic AI · retrieval / recommendation relevance · human-in-the-loop governance · model observability · AI safety evaluation · token & cost–quality optimization Data & Platform: SQL · Python · R · BigQuery · Snowflake · PostgreSQL · Kafka · telemetry & experimentation infrastructure BI & Tools: Tableau · Power BI · Streamlit · Jira · Miro

🏆 Selected recognition

Bravo Award (×2) — for GenAI initiatives delivering ~$1M in annual savings, and for analytics spanning 30+ conversational-AI domains
Innovation Challenge Winner — top RPA solution selected from 218 ideas across 340 professionals, funded and rolled out across the US, Europe, and India

Open to AI/GenAI Product Management roles. Let's talk → chowdhuryanimesh1@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Animesh Chowdhury animesh01

Block or report animesh01

Animesh Chowdhury

👋 About

🧪 Featured projects

🛠️ What I work with

🏆 Selected recognition

Pinned Loading

Uh oh!