Is anyone trying out some of the cloud monitoring solutions available out there with free capabilities? I am thinking of solutions like Netdata and New relic as a couple that come to mind. Curious what you all are experimenting around with? @malcolm-r @jnew1213 @t3hbeowulf @darknight
I looked at Netdata some time ago and found it very complex and something that hinted of a full-time occupation. I made a similar comment to RaidOwl the other day on his YouTube video on Netdata.
I've used a couple of different monitoring tools in the past (PRTG, etc.), but always found the free versions too limiting.
My only active cloud-based monitor is currently UptimeRobot(.com), a service that monitors Websites on a five-minute time interval. It monitors the handful of Websites I host as well as my Plex servers.
Locally, I have a vRealize Log Insight (now Aria Operations for Logs) instance for syslog and Windows event recording, but it's not set to alert me on anything. One of my Synology NASes also does syslog.
I have a project (it's on my project list, I swear!) to implement Check_MK, which is an on-premise monitoring solution with a community-supported free version. We use it at work and it's very comprehensive, alerting on state changes (example: green to critical or vice versa) down to the service level. It monitors everything from vCenter services to Horizon services to disk space usage across hosts to our Log Insight clusters. Admittedly, I haven't gotten very far with Check_MK thus far at home.
I realize you asked about cloud-based solutions, but I have a preference for on-premises solutions given the choice. Internet connectivity for most of us is too fragile a thing to rely on being available one hundred percent of the time, and a monitoring solution should strive for that tier of availability.
@jnew1213, Definitely like CheckMK a lot and did a video some time ago on it. I need to do another soon to revisit. I am kind of digging the idea of cloud monitoring for lab gear as you get stuck in a catch 22 with problems usually in the lab. If gear goes down, you need to make sure your monitoring stack is separate to not go down, and if Internet goes down, you usually won't get alerted due to that dependency on connectivity outbound.
I definitely agree on the fronts of monitoring being on-premises. This has definitely been my preference for years now, but definitely exploring some of the options out there. It piqued my curiosity that Netdata now has a home lab license as well that looks interesting, especially since you get unlimited nodes it looks like but definitely will report my findings back on that!
I have been using local Netdata logging since its early days and it reminds me of NewRelic. I liked that it allowed offline logging collection and aggregation and that it also supported the option of pushing data to their cloud.
For a while, I had some basic monitors configured in Netdata that would push alerts to PagerDuty.
Lately though, I've scaled back so much in the HomeLab that using Netdata doesn't make sense like it used to Hardware alerts built in to True NAS work for monitoring and UptimeKuma takes care of service monitoring for the handful of things that do run all the time.
i use a combination of things, but netdata is the only cloud product i use for monitoring. it's just really easy to get at-a-glance stats if a machine is acting weird.
for on-prem continuous monitoring i use a few things, mostly PRTG and telegraf/influx.
i use graylog, but don't have any alerting. it's just handy for centralizing sys/app logs.
i also have Security Onion set up and ingesting traffic but i don't really do anything with it yet.