I'm newbee homelabber and was very impressed by Brandon's video on YouTube called "Best Container Server Setup". It seemes that Swarm + Portainer is really an ideal middle-ground between bare Docker and Kubernetes and thanks Brandon for highlighting this solution.Â
I tried to reproduce Brandon's setup in my lab working, but gave up after two full days of hard efforts.
Couldn't connect Portainer server to agents on the nodes. It's simply doesn't work. I'm getting "Client.Timeout exceeded while awaiting headers" every time when I'm pushing "Connect" button in environment creation dialog. The only way I found possible is to connect Swarm by socket option, but this approach doesn't give you a beauty of Cluster Visualizer and thus becomes mostly useless.
I did numerous attemts to bootstrap it, starting with carefully repeating all the steps from the video then going to Portainer instance (which is on a node outside of the cluster), copying the commands provided by wizard to a destination node, then, after deploying Agents, trying to connect Portainer to them. A soon as it didn't work, I moved further, searching for a cause: fiddling with UFW in Ubuntu 24.04, iptables, DNS, trying to re-deploy nodes with and without Keepalived, reinstalling few Portainer versions, installing Portainer outside and inside of a cluster, re-creating VMs from Ubuntu full image instead of cloud-image etc. I ended up with a try to roll this up on Debian instead of Ubuntu, but it didn't work either.
As far as I can tell by some of the comments on YouTube video, this is a common problem, may be a bug in Portainer. And I found such complaints on their GitHub and somewhe else on the internet, but no suitable solution or explanation, unfortunately.
Assuming all that said, I would consider it a bug and totally gave up on it, but I saw it working in video.
Could somebody give me any advice, what I might be doing wrong and what is the right path?
---
Related links, that I've used:
-
- https://docs.portainer.io/start/install/server/swarm/linux
-
-
- https://github.com/portainer/portainer/issues/11362
- https://github.com/portainer/portainer/issues/10602
@ifs77 welcome to the forums! I noticed a couple of others mention in the comments they were having issues as well. Give me more details on how you set things up. Hopefully we can figure out what is going on there. 👍Â
Hello Brandon,
Thank you for quick reply!
Yes, I can easily give you more details as I was taking notes of all steps that I was doing. I'll attach a file to this post (just change an extension to .md because .md files disallowed on this forum).
Yes, was executing that commands from Portainer's wizard. And I was trying to connect first with virtual IP, then, after it's throwing back an error, I was trying individual IPs of all the nodes. Then I decided to eliminate Keepalived from the chain because suspected it was a source of error. I rolled back all VMs to bare Ubuntu then repeated all steps without Keepalived, using individual nodes IP, but result was the same.
Â
Gotcha @ifs77. Can I ask, when you run this command from one of your Swarm manager nodes, do you see your Portainer service listed?
docker service ls
The service will show up as something like this:
portainer_agent global 3/3 portainer/agent:2.21.4 *:9001->9001/tcp
Also, you can use something simple like telnet client from a windows machine to try to establish connectivity to port 9001 to see if you have connectivity there:
telnet <swarm ip> 9001
Are your portainer instance and your swarm nodes on the same subnet?
Â
Hello @brandon-lee ,
thank you for your advices, it was very helpful! I took me a while to implement this, but I did it. Now I can resolve the domain names in my network from all of my machines without errors.
Unfortunately, it did not help me with Portainer. I'm still getting the same error during Swarm environment creation in both cases when Portainer server instance is running on one of the Swarm nodes or outside of the cluster.
To be more precise, when I'm deploying Portainer inside the cluster, which is the right way as I think, because it's the way that is recommended on the official site ( https://docs.portainer.io/start/install/server/swarm/linux), I can't approach the web GUI despite both Portainer and Portainer-agent services are running and showing proper ports allocation.
When Portainer running on other machine and I'm trying to connect cluster following the wizard steps, it ended up with the error I mentioned in my first post.
I can confirm that all docker containers needed are running, ports are not blocked by firewall, DNS addresses are resolving.
How do I check this?
Before installing Portainer agents on the node, I'm doing
nc -l -p 9001
then from another computer on Windows I run PowerShell command
Test-NetConnection -ComputerName <FQDN> -Port 9001
 and get
ComputerName : <FQDN> RemoteAddress : 10.0.0.61 RemotePort : 9001 InterfaceAlias : tun2 SourceAddress : 10.33.0.2 TcpTestSucceeded : True
Then, after installing Portainer on that nodes, I can't do port listening on them because the agents took port 9001, as I suppose ("nc: Address already in use" on the machine where only agent is running and "Can't grab 0.0.0.0:9001 with bind" on the machine where Portainer server and agent are deployed).
Whrapping up, annoying error stil exists and I can't connect Portainer to the cluster.
Any suggestions?