Back to Home

Projects

A collection of my projects and applications.

Featured Projects
A showcase of my recent work

Omnichannel Conversational AI agent sdk

Python, Asyncio

Developed an omnichannel conversational AI agent sdk. The input channels can be phone calls, webrtc rooms, sms, whatsapp, etc. One agent can handle all the channels and can access shared memory. SDK has a modular architecture and can be extended to add new channels. Under the hood, it could use real-time api or three stage process - speech to text, text to text, text to speech for voice agents. Any stt/llm/tts provider can be plugged in.

Workflow Builder

Nextjs, Tailwind CSS, Shadcn/ui

Developed a sophisticated workflow builder that enables users to create AI agent workflows using a Directed Acyclic Graph (DAG) interface. Users can construct complex workflows by adding various node types including data nodes for information storage and tool nodes for specific functionalities. The platform offers extensive customization options for agent attributes such as voice characteristics, interruption thresholds, and sensitivity levels. Additionally, users can configure post-conversation actions like CRM data updates or automated SMS notifications to callers. The drag-and-drop interface makes it intuitive to design, modify, and deploy AI agent workflows for different use cases.

Video Live Streaming App & Platform

Nextjs, Tailwind CSS, Kotlin,DJI MSDK 5, Livekit

Developed an Android application that integrates with DJI drones to enable real-time video streaming. The app captures RTMP streams from connected drones and processes them through a GStreamer pipeline with a WHIP sink, enabling seamless transmission to WebRTC rooms via LiveKit. The platform supports multi-drone streaming, allowing multiple drone feeds to be aggregated into a single WebRTC room. This enables remote viewers to access a comprehensive multi-view perspective of the scene, making it ideal for surveillance, event coverage, and remote monitoring applications.

Background Music Mixing with Text to Speech Response

Python, Pydub, Audioop

Developed an audio processing pipeline that seamlessly combines background sound effects with text-to-speech responses. The system takes short sound effect files (5-10 seconds) and processes them through a chunking mechanism to create continuous background audio. These processed sound effects are then intelligently mixed with speech chunks from text-to-speech output. Users can choose from various ambient sound options including soothing jazz music, calm office environment noise, or busy typing sounds, creating a more natural and engaging auditory experience for voice interactions.