Project

WiFi Smart Doorbell with Camera and Push Notifications

The smart doorbell market is roughly Ring vs Nest vs everyone else, and all of them want a monthly subscription to keep your own video accessible. The DIY version of the same thing is genuinely competitive: an ESP32-CAM, a momentary button, a small enclosure, and a Telegram bot or your own server. No subscription, no data going to a cloud you don't control, and the parts cost is around $20.

This is different from the motion-activated security camera we built earlier. That one uses PIR to wake on motion. This one wakes on a button press, which means it has a much narrower trigger and can run on continuous power without exhausting itself. It is also designed to identify visitors quickly — a single still image plus a short audio clip if you add a microphone — rather than to record general activity.

What we are building

flowchart LR Visitor[Visitor presses button] --> Btn[Momentary push button] Btn --> ESP[[ESP32-CAM]] ESP --> Cam[OV2640 camera
capture 800x600 JPEG] ESP -->|optional| Mic[I2S microphone
5 sec clip] Cam --> ESP Mic --> ESP ESP -->|HTTPS POST| Telegram[Telegram bot API] Telegram --> Phone[Your phone
image + caption] ESP -->|GPIO HIGH| Speaker[Optional
chime relay]

Press button, capture image, send via Telegram. Optional indoor chime via relay. The whole flow takes 3–5 seconds end-to-end.

Hardware

  • ESP32-CAM (AI Thinker) — $8
  • FTDI USB-TTL adapter for flashing — $5 (one-time)
  • Momentary push button (waterproof if outdoor) — $3
  • 5V 2A power supply (run wired from inside) — $5
  • 3D-printed or off-the-shelf doorbell enclosure — $5–15
  • Optional: INMP441 I2S microphone for audio — $4
  • Optional: relay module + indoor chime/buzzer — $3

About $25 base, $30 with audio. Outdoor enclosure quality matters more than the electronics — the ESP32-CAM is fine in moderate weather inside a sealed box; direct rain or freeze-thaw cycles will kill it within a year.

Why wired power, not battery

The motion-activated security camera in our earlier project ran on 18650 cells because PIR triggers are infrequent. A doorbell button press is also infrequent, but a doorbell needs to respond fast — within 1–2 seconds — and that means staying connected to WiFi continuously. Continuous WiFi is around 80–120 mA on the ESP32-CAM, which kills any reasonable battery in a day or two.

Most existing doorbells already have wired power for the chime — usually 16–24V AC from a transformer. You can convert that to 5V DC with a small AC-DC module ($3) and reuse the existing wire run. If you have to run new wire, USB-C cable through a hidden conduit works.

What goes wrong

  • WiFi range. Doorbells live near doors, often the worst spot for WiFi. Add an external antenna (the AI Thinker board has a u.FL connector) or place a WiFi extender nearby.
  • Mechanical button bounce. A 3-second debounce in the ISR matches typical visitor behaviour (no normal person mashes the button repeatedly). Lower it for a hyperactive household.
  • Cold weather. The ESP32-CAM is rated to −40°C but the lithium battery in any battery-powered version is not. For wired builds, fine.
  • The bot's response delay. Telegram is fast (sub-second) but only after WiFi is connected. From cold start the chain is: button press → WiFi authenticate → TLS handshake → Telegram POST → notification. Total around 3 seconds in good conditions, 6–8 in poor ones.

Going further

  • Two-way audio. Add an INMP441 microphone and a small speaker. Stream voice via WebRTC.
  • Face recognition for known visitors. The ESP32-S3 (different chip, but available in CAM-board variants) has enough horsepower for face detection via TensorFlow Lite Micro.
  • Local recording. Add a microSD card; save every visitor image with timestamp.
  • Integration with smart home. Replace the Telegram bot with MQTT to Home Assistant.
  • Better camera. The OV2640 is mediocre in low light. Swap for an OV5640.

Frequently Asked Questions

How is this different from a Ring doorbell?

Cheaper (one-time $25 vs $100–200 + $5/mo subscription), fully owned (your video, your bot, your storage), but less polished — no slick app, no community-detected porch pirates, no facial-recognition alerts out of the box. Tradeoff is yours.

Can I use this without Telegram?

Yes. Replace the Telegram POST with HTTP POST to your own webhook, or with MQTT publish, or with a SignalCLI bridge. The image is just a JPEG; any service that accepts uploads will do.

How long does the SD card last for recording?

A full-quality JPEG is ~50–100 KB on the OV2640 at 800×600. With one capture per visitor and ~10 visitors a day, a 32 GB card holds years of doorbell history.

Project package

Get the complete project package

The article above shows the core firmware and the principles behind it. The complete project package — assembled, tested, and ready to flash — is available by email request. We send it manually, and we read every request.

  • Complete Arduino sketch (.ino) with full error handling
  • List of required libraries with version numbers
  • Printable wiring diagram (PDF)
  • Bill of materials with current part numbers
  • Build guide and troubleshooting tips
  • Configuration template (WiFi, MQTT, etc.)

We send the package by email within 24 hours, usually faster. Free, no spam, no mailing list. Your email is used once, for this reply.

Share your thoughts

Worked with this in production and have a story to share, or disagree with a tradeoff? Email us at support@mybytenest.com — we read everything.