Mikhai logo
Back to Features

Agent²

Voice-to-new-home-in-under-90-seconds

2026

Agent² team locked-in

Demo

Agent² project demo

If you want a home between $500k-$600k on Zillow, there are 100+ pages to scroll through. We automated that. Agent² takes a spoken request over a phone call, extracts structured search criteria, scrapes and ranks Zillow listings, and submits a contact form on the top match. End to end in under 90 seconds. Form automation hits 94% across 200+ test submissions. The judge at GenAI Genesis 2026 called it one of the most technically impressive projects he'd seen.

Home buyers spend 10+ hours a week manually browsing listings and filling out contact forms. The whole workflow is automatable from a single spoken description. Why aren't we doing that yet.

Stack

TelnyxSIP trunk routing inbound phone calls to EC2. Provides the raw 8kHz telephony audio stream the noise pipeline processes
PersonaPlex4-bit quantized LLM on EC2 GPU instance for real-time voice interaction and criteria extraction during the call
SpeexDSP + RNNoiseTwo-stage noise suppression stack: SpeexDSP for adaptive echo cancellation and AGC, RNNoise for neural residual noise removal. Dropped WER from ~29% to 8.3%
ScraperAPIZillow scraping with residential proxy rotation. Also provides the proxy pool that feeds the Playwright anti-detection stack
PlaywrightAutomated form submission with randomized mouse trajectories, gaussian keystroke timing, and viewport-realistic scrolling to defeat behavioral heuristics
FastAPIOrchestration backend coordinating the noise pipeline, criteria extraction, Zillow scraping, listing ranking, and Playwright automation

What was hard

01

Dual-Stage Noise Suppression on Telephony Audio

Raw 8kHz telephony audio from Telnyx contains HVAC hum, road noise, and GSM compression artifacts that dragged transcription accuracy down to ~71% WER.

Two-stage pipeline: SpeexDSP for adaptive echo cancellation and AGC as the first pass, then RNNoise for neural residual noise removal on what SpeexDSP leaves behind.

ResultWord error rate dropped from ~29% to 8.3% across 150 test calls recorded in cars, kitchens, and outdoors.

02

Structured Criteria Extraction from Conversational Speech

People describe homes the way they think out loud: "maybe three or four bedrooms, not too expensive." Contradictory, underspecified. A naive extraction produces garbage search criteria.

I engineered a constrained extraction prompt that maps freeform speech to a typed schema (bedrooms, price range, location, must-haves, dealbreakers). A real-time validation pass flags ambiguity back to the caller before anything hits the search pipeline.

Result96.2% extraction accuracy on a 200-utterance test set. Validation pass caught 89% of remaining edge cases.

03

Anti-Bot Evasion for Automated Form Submission

Zillow's contact forms use fingerprinting, rate limiting, and behavioral heuristics. Default Playwright automation gets through about 12% of the time.

Human-like interaction patterns throughout: randomized mouse trajectories, gaussian-distributed keystroke timing, viewport-realistic scroll behavior, and residential proxy rotation through ScraperAPI.

Result94% submission success rate across 200+ test runs. Zero account bans.

Architecture

How it works

Audio Processing

Telnyx SIP trunk routes inbound calls to the EC2 instance. Raw 8kHz audio passes through SpeexDSP (adaptive echo cancellation + AGC) then RNNoise (neural residual noise removal). Two-stage approach cuts word error rate from ~29% to 8.3% across real-world recording conditions.

Criteria Extraction

PersonaPlex (4-bit quantized on EC2 GPU) runs real-time voice interaction during the call. FastAPI backend applies a constrained extraction prompt to map the cleaned transcript to a typed schema. A real-time validation pass flags ambiguity back to the caller before the criteria touch the search pipeline. 96.2% extraction accuracy on a 200-utterance test set.

Scraping and Automation

Validated criteria feed into ScraperAPI to query Zillow. Results ranked by a weighted scoring function across price delta, commute distance, and feature match. Top listing triggers Playwright with the full anti-detection stack: randomized mouse trajectories, gaussian keystroke timing, viewport-realistic scrolling, and residential proxy rotation. 94% submission success across 200+ test runs.

Links