apache zookeeper - mesos slaves are not connecting with mesos masters cluster -
i have setup using 3 mesos masters , 3 mesos slasves. after making required configurations can see 3 mesos masters part of cluster maintained zookeepers.
now have setup 3 mesos slaves , when starting mesos-slave service, expecting mesos slaves available mesos masters web ui page. can not see of them in slaves tab.
selinux, firewall, iptalbes disabled. able perform ssh between node.
[cloud-user@slave1 ~]$ sudo systemctl status mesos-slave -l mesos-slave.service - mesos slave loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled) active: active (running) since sat 2016-01-16 16:11:55 utc; 3s ago main pid: 2483 (mesos-slave) cgroup: /system.slice/mesos-slave.service ├─2483 /usr/sbin/mesos-slave --master=zk://10.0.0.2:2181,10.0.0.6:2181,10.0.0.7:2181/mesos --log_dir=/var/log/mesos --containerizers=docker,mesos --executor_registration_timeout=5mins ├─2493 logger -p user.info -t mesos-slave[2483] └─2494 logger -p user.err -t mesos-slave[2483] jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: i0116 16:11:55.628670 2497 detector.cpp:482] new leading master (upid=master@127.0.0.1:5050) detected jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: i0116 16:11:55.628732 2497 slave.cpp:729] new master detected @ master@127.0.0.1:5050 jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: i0116 16:11:55.628825 2497 slave.cpp:754] no credentials provided. attempting register without authentication jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: i0116 16:11:55.628844 2497 slave.cpp:765] detecting new master jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: i0116 16:11:55.628872 2497 status_update_manager.cpp:176] pausing sending status updates jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: e0116 16:11:55.628922 2503 process.cpp:1911] failed shutdown socket fd 11: transport endpoint not connected jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: i0116 16:11:55.629093 2502 slave.cpp:3215] master@127.0.0.1:5050 exited jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: w0116 16:11:55.629107 2502 slave.cpp:3218] master disconnected! waiting new master elected jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: e0116 16:11:55.983531 2503 process.cpp:1911] failed shutdown socket fd 11: transport endpoint not connected jan 16 16:11:57 slave1.novalocal mesos-slave[2494]: e0116 16:11:57.465049 2503 process.cpp:1911] failed shutdown socket fd 11: transport endpoint not connected
so problematic line is:
jan 16 16:11:55 slave1.novalocal mesos-slave[2494]: i0116 16:11:55.629093 2502 slave.cpp:3215] master@127.0.0.1:5050 exited
specifically, note it's detecting master having ip address 127.0.0.1. mesos agent[1] sees ip address, , tries connect fails (the master isn't running on same machine agent).
this happens because master announces thinks it's ip address zookeeper. in case, master thinking it's ip 127.0.0.1 , storing zk. mesos has several configuration flags control behavior, --hostname
, --no-hostname_lookup
, --ip
, --ip_discovery_command
, , via setting environment variable libprocess_ip. see http://mesos.apache.org/documentation/latest/configuration/ details them , do.
the best thing can make sure things work out of box make sure machines have resolvable hostnames. mesos reverse-dns lookup of boxes hostname in order figure out ip people contact from.
if can't hostnames setup properly, recommend setting --hostname
, --ip
manually should cause mesos announce want.
[1]the mesos slave has been renamed agent, see: https://issues.apache.org/jira/browse/mesos-1478
Comments
Post a Comment