\

Show HN: We built a camera only robot vacuum for less than 300$ (Well almost)

105 points - last Monday at 5:08 AM

Source
  • elaus

    yesterday at 10:48 AM

    I don't really see how the vacuum can effectively clean a whole room or flat using only a CNN of the current image in front of the robot. This would help detect obstacles, but a bumper sensor would do that as well.

    All but the most basic vacuum robots map their work area and devise plans how to clean them systematically. The others just bump into obstacles, rotate a random amount and continue forward.

    Don't get me wrong, I love this project and the idea to build it yourself. I just feel like that (huge) part is missing in the article?

      • thebruce87m

        yesterday at 11:15 AM

        https://opencv.org/structure-from-motion-in-opencv/

        Not saying that it’s viable here to build a world map since things like furniture can move but some systems, e.g. warehouse robots do use things like lights to triangulate on the assumption that the lights on the tall ceiling are fixed and consistent.

        • jhbadger

          yesterday at 11:15 AM

          The classic Roombas from a decade or so ago worked without any sort of mapping or camera at all -- they basically did a version of the "run and tumble" algorithm used by many bacteria -- go in one direction until you can't any more then go off in a random new one. It may not be efficient but it does work for covering territory.

            • skocznymroczny

              yesterday at 5:07 PM

              Sounds like it would only work for a single room with not too many obstacles.

              I guess the mapping capabilities vary greatly between vendors. I had a first gen Mi Robot vacuum and it was amazing. It would map the entire floor with all the rooms, then go room by room in a zigzag pattern, then repeat each room, having no issues going from one room to another and avoiding obstacles. It also made sure not to fall down the stairs. Then later it broke and I bought a more noname model and despite having lidar tower, it didn't perform as well as Xiaomi vacuum did. It worked for a single room, but anything more and it would get lost.

                • jhbadger

                  yesterday at 6:37 PM

                  Eh, it worked fine in my multiroom apartment - again, this is how all first generation robot vacuums worked. Mine eventually died and I got a new one with lidar, and the main adventage is that with mapping I can specify areas to avoid like a chair whose base tends to trap robot vacuums.

              • londons_explore

                yesterday at 1:11 PM

                I think the only reason for mapping is to be able to block off 'no go' areas (no escaping out the front door!) and to be able to go home to the charger.

                For the actual cleaning, random works great.

                  • _flux

                    yesterday at 1:35 PM

                    Surely mapping also helps reducing the time it takes to achieve the task?

                      • londons_explore

                        yesterday at 8:50 PM

                        A robot vacuum isn't time constrained. It literally has all day.

                          • _flux

                            today at 3:18 AM

                            They make noise, and people remote, so that might not be the case.

                            In addition, more working time equals more wear and tear for parts.

                    • Doxin

                      today at 8:21 AM

                      my previous robot vacuum did not do any mapping, but did always manage to find its way back to the charger. It'd just follow the walls until it saw the chargers IR beacon.

                      Clever design if you ask me. Doing a lot with a little.

                      • ripe

                        yesterday at 1:21 PM

                        You are right. The original Roomba was discussed on HN 3 months ago:

                        https://news.ycombinator.com/item?id=46472930

                    • winrid

                      today at 3:14 AM

                      They're terrible. A $200 SLAM equipped vacuum (like open box or something off eBay) will do in 15mins what those took an hour to do

                  • indraneelpatil

                    yesterday at 3:29 PM

                    Apart from just detecting obstacles, we wanted to build a robot which is intelligent enough to take in semantic cues like this is a doorway so I can go through it, or this is a kitchen I can clean it this way and so on

                    • stronglikedan

                      yesterday at 4:46 PM

                      There was a time when they were all what you consider basic, and they could still clean a whole room or flat.

                      • stavros

                        yesterday at 3:16 PM

                        It navigates by Brownian motion.

                    • blensor

                      yesterday at 5:29 PM

                      It may or may not be useful for you but I've been working for a while on converting ORBSLAM3 into a self contained standalone program, without the need for ROS to be useful.

                      The "UI" for saving/loading the map and calibrating the camera is exposed through a built-in crude webserver. Visualization is done via threejs instead of having a dependency on pangolin.

                      If your robot can expose the camera feed as anything opencv can ingest ( i.e. mjpeg via http ) you could just point it there and then receive the pose stream via HTTP/SSE

                      The whole thing is distributed as an AppImage so you just run it and connect to it

                      https://github.com/mgschwan/ORBSlammer_LocalizationService

                        • indraneelpatil

                          today at 4:53 AM

                          Very very cool! Thanks for sharing! I'll try it out!

                            • blensor

                              today at 5:44 AM

                              I still need to do a video on how to use it, if you get stuck just let me know

                      • sagebird

                        yesterday at 2:42 PM

                        Can you please design a version for kids to ride on?

                        With a seat and handle similar to "wooden bee ride on" by b. toys?

                        I want a vacuum that kids can actually drive, ride on, do real vacuuming and has minimal levels so safety, like turning it over halts vacuums, stairs/ledges are avoided, and lack of rollers or items that could snare a kids hair, etc.

                        There may be benefits of fusion of child input signals with supervisory vacuums route goals. Would be age dependent, older kids would want full manual I think.

                        Kids like to do real jobs, and as a parent I prefer purchasing real items for my kids rather than toy versions if practical.

                          • frail_figure

                            yesterday at 3:15 PM

                            > Kids like to do real jobs, and as a parent I prefer purchasing real items for my kids rather than toy versions if practical.

                            Real vacuums have existed for a very long time now :P

                              • pinkmuffinere

                                yesterday at 5:47 PM

                                Real vacuums are _so_ difficult for kids though, they're the wrong size and way to heavy. A zamboni-vacuum-for-kids is definitely not a general purpose thing, but does hit a nice balance between functional and kid-friendly.

                            • ge96

                              yesterday at 3:29 PM

                              Like a Zamboni but a vacuum

                                • herewulf

                                  yesterday at 11:56 PM

                                  Okay, now I'm inspired to build a mini Zamboni for my kids.

                                    • indraneelpatil

                                      today at 4:56 AM

                                      Very cool! Glad this post inspired someone! Keep me posted if you actually end up building one!

                                      indraneelrajeevpatil@gmail.com

                              • rallypi

                                yesterday at 3:34 PM

                                Oh, I think kids will like it.

                            • ghm2199

                              yesterday at 1:33 PM

                              Here is thought, this is a fixed 3d environment and you lack training data or at least an algorithm to train. Why not use RL to learn good trajectories? Like build a 3d environment of your home/room and generate images and trajectories in a game engine to generate image data to pretrain/train it, then for each run hand label only promising trajectories i.e. where the robot actually did better cleaning. That might make it a good RL exercise. You could also place some physical flags in the room that when the camera gets close enough it gets rewarded to automate these trajectory rewards.

                              I would begin in one room to practice this.

                                • indraneelpatil

                                  today at 5:02 AM

                                  Wow okay there is a lot here, just so that I understand this correctly:

                                  1. Make a replica of my home/ room in a game engine or a simulator 2. Generate trajectories with RL where the reward is hand specified by me 3. Automate trajectory rewards using some proximity flags

                                  Some stupid questions: 1. How do I build a replica of my home? Is there an SFM algorithm I could use to do this just from camera images? 2. Would this still work even if things/ furniture move around the house? 3. This data collection strategy will have a distribution shift compared to real data so it might struggle with different lighting conditions and stuff?

                              • isoprophlex

                                yesterday at 10:01 AM

                                Cool project! That validation loss curve screams train set memorization without generalization ability.

                                Too little train data, and/or data of insufficient quality. Maybe let the robot run autonomously with an (expensive) VLM operating it to bootstrap a larger train dataset without needing to annotate it yourself.

                                Or maybe the problem itself is poorly specified, or intractable with your chosen network architecture. But if you see that a vision llm can pilot the bot, at least you know you have a fighting chance.

                                  • indraneelpatil

                                    yesterday at 3:18 PM

                                    Thanks! Its probably both, too little train data and insufficient quality.

                                    Thats a cool idea, is there any VLM you would suggest? I can think of Gemini maybe? Or any would do?

                                      • isoprophlex

                                        yesterday at 4:09 PM

                                        My gut feeling says: cheap gemini model will be fine. Try the cheapest you can find, go more expensive if at first you don't succeed.

                                        invest in a good prompt describing the setup, your goals, when to move. Type your output, don't go parsing move commands out of unstructured chat output. And maybe validate first on the data you already collected: does the vlm take the same actions as your existing train set?

                                        And then just let it run and collect data for as long as you can afford. Maybe 0.2 fps (sample and take action every 5 sec) is already good enough.

                                        Good luck!

                                          • indraneelpatil

                                            today at 4:54 AM

                                            Thanks! Appreciate it!

                                • gilhyun

                                  yesterday at 2:56 PM

                                  Wow, that's a genius idea! What do you think would happen if you loaded C. elegans synapse data into that robot and gave it a signal that dust is food? GitHub: github.com/openworm

                                    • indraneelpatil

                                      yesterday at 3:22 PM

                                      Damn thats very cool! Thanks for sharing! I guess we would only need to detect dust somehow which believe it or not is really hard, the camera isnt great quality but I guess this could work for slightly larger debris?

                                  • londons_explore

                                    yesterday at 1:08 PM

                                    If mass produced, no part of a robot vacuum is expensive. Blower fans are ~$1. Camera is $1. Cheap wifi MCU with a little ML accelerator + 8 Mbytes of ram is $1. Gyro is $1. Drive motors+gearboxes together are $1. AC charger $2. Plastic case $2. Batteries are the most expensive bit (~$3), but you can afford to have a battery life of just 10 mins if you can return to base frequently.

                                    The hard part is the engineering hours to make it all work well. But you can get repaid those as long as you can sell 100 Million units to every nation in the world.

                                      • indraneelpatil

                                        yesterday at 3:15 PM

                                        Yeah agreed 100%, might also need to factor in the cost of the charging dock but the overall thesis is still sound.

                                        Do you know any cheap wifi MCU with a little ML accelerator that we can buy off the shelf? The only one we could think of was the Jetson Orin Nano and thats not cheap

                                          • sagebird

                                            yesterday at 4:59 PM

                                            I am not an expert but this seems like model distillation could work to get the behavior you need to run on a cheap end-user processor (Raspberry Pi 4/5 class). I chatted with claude opus about your project and had the following advice:

                                            For the compute problem, you don't need a Jetson. The approach you want is knowledge distillation: train a large, expensive teacher model offline on a beefy GPU (cloud instance, your laptop's GPU, whatever), then distill it down into a tiny student network like a MobileNetV3-Small or EfficientNet-Lite. Quantize that student to int8 and export it to TFLite. The resulting model is 2-3 MB and runs at 10-20 FPS on a Raspberry Pi 4/5 with just the CPU - no ML accelerator needed. For even cheaper, an ESP32-S3 with a camera module can run sub-500KB models for simpler tasks. The preprocessing is trivial: resize the camera frame to 224x224, normalize pixel values, feed the tensor to the TFLite interpreter. The CNN learns its own feature extraction internally, so you don't need any classical CV preprocessing. Looking at your observations, I think the deeper issue is what you identified: there's not enough signal in single frames. Your validation loss not converging even after augmentation and ImageNet pretraining confirms this. The fix is exactly what you listed in your future work - feed stacked temporal frames instead of single images. A simple approach is to concatenate 3-4 consecutive grayscale frames into a multi-channel input (e.g., 224x224x4). This gives the network implicit motion, velocity, and approach-rate information without needing to compute optical flow explicitly. It's the same trick DeepMind used in the original Atari DQN paper - a single frame of Pong doesn't tell you which direction the ball is moving either. On the action space: your intuition about STOP being problematic is right. It creates a degenerate attractor - once the model predicts STOP, there's no recovery mechanism. The paper you referenced that only uses STOP at goal-reached is the better design. Also consider that TURN_CW and TURN_CCW have no obvious visual signal in a single frame (which way to turn is a function of where you've been and where you're going, not just what you see right now), which is another reason temporal stacking or adding a small recurrent/memory component would help. Even a simple LSTM or state tuple fed alongside the image could encode "I've been turning left for 3 steps, maybe try something else." For the longer term, consider a hybrid architecture: use the distilled neural net for obstacle detection and free-space classification, but pair it with classical SLAM or even simple odometry-based mapping for path planning and coverage. Pure end-to-end behavior cloning for the full navigation stack is a hard problem - even the commercial robots use learned perception with algorithmic planning. And your data collection would get easier too, because you'd only need to label "what's in front of me" rather than "what should I do," which decouples perception from decision-making and makes each piece easier to train and debug independently.

                                              • indraneelpatil

                                                today at 5:08 AM

                                                Wow this is awesome! Thanks a ton for taking the time to think about the project and post this! Yeah I think the way to go is:

                                                1. Improve the model input by stacking frames 2. Then try model distillation to a smaller model

                                    • amelius

                                      yesterday at 10:52 AM

                                      The trick is to make a robot that has a Lidar and a camera, then train a model that can replace the Lidar.

                                      (Lidar can of course also be echolocation).

                                        • infecto

                                          yesterday at 1:17 PM

                                          The harder trick is to do it cost effectively. I picked up my Roborock for $200 and it has LiDAR. Works really well.

                                            • indraneelpatil

                                              yesterday at 3:11 PM

                                              200$ is insane, sounds like economies of scale is really working for them

                                                • infecto

                                                  yesterday at 3:26 PM

                                                  I don’t follow models and there are a ton of them. Here is an example $280 version with lidar.

                                                  https://a.co/d/0cuCgBSZ

                                          • indraneelpatil

                                            yesterday at 3:13 PM

                                            Yeah this would help a lot to collect good trainable data, teleoperating the robot around and collecting large amount of good data is quite hard

                                            • ThatMedicIsASpy

                                              yesterday at 11:55 AM

                                              I thought the trick is just to use an xbox kinect. But lidar got a lot cheaper in the recent years.

                                          • herewulf

                                            yesterday at 11:57 PM

                                            Anyone know of good FOSS DIY robot lawnmower projects?

                                            I want to work really hard to be too lazy to bother with the grass.

                                          • vachanmn123

                                            yesterday at 10:06 AM

                                            Check out using maybe some kind of monocular depth estimation models, like Apple's Depth Pro (https://github.com/apple/ml-depth-pro) and use the depth map to predict a path?

                                            Very cool project though!

                                              • indraneelpatil

                                                yesterday at 3:07 PM

                                                Thank you! We'll check it out! Yeah building a map might be the way to go. With E2E its quite hard to ensure "intelligent" cleaning

                                            • ananandreas

                                              today at 6:02 AM

                                              This is great. More people should have these side projects

                                              • bilsbie

                                                yesterday at 12:47 PM

                                                I don’t understand why we don’t have smarter vaccuums yet. Mine just makes a beeline to get stuck under a chair.

                                                It could easily understand so much about the environment with even a small multimodal model.

                                                  • hattmall

                                                    yesterday at 1:12 PM

                                                    We do, deebot T20 maps the whole house, knows what type of floors there are, maps furniture etc

                                                      • bilsbie

                                                        yesterday at 9:46 PM

                                                        Mine tries to map but it thinks glass doors are other spaces, and throws away the map if you leave a chair pushed out.

                                                    • infecto

                                                      yesterday at 1:16 PM

                                                      My Roborock uses lidar and rarely if ever bumps into things.

                                                      • segmondy

                                                        yesterday at 1:32 PM

                                                        get the wyze robot vacuum, it's pretty smart.

                                                    • villgax

                                                      yesterday at 10:59 AM

                                                      There’s things like SLAM, optical flow etc, read up on things instead of being so defeatist IMO even for a hobby project, seems so forced

                                                        • indraneelpatil

                                                          yesterday at 3:02 PM

                                                          Thanks! Our main goal was to build a vacuum which understands semantics inside the house so that it can "clean the kitchen" or "clean the bedroom" so we wanted to do machine learning and since we were doing machine learning we were like why not try to do something E2E instead of first doing SLAM, optical flow etc..

                                                            • atultw

                                                              today at 4:13 AM

                                                              If you capture a video and SLAM map of the whole space, you could use some VQA model like cosmos reason offline to extract key points and descriptions. Maybe even plan the route offline for the open ended task like “clean kitchen”. Then load the route and all you need is localization and obstacle avoidance

                                                                • indraneelpatil

                                                                  today at 5:13 AM

                                                                  Aaah interesting, does stuff like this generalise to furniture moving around and different lighting conditions and stuff? Also sounds like if the route gets blocked it just wont move

                                                      • genie3io

                                                        yesterday at 3:05 PM

                                                        [dead]

                                                        • builderhq_io

                                                          yesterday at 3:01 PM

                                                          [dead]