[root@ip-172-31-22-13 logs]# nomad server-members
Name Addr Port Status Proto Build DC Region
ip-172-31-22-13.eu-central-1.compute.internal.global 172.31.22.13 4648 alive 2 0.2.3 Best global
ip-172-31-29-38.global 172.31.29.38 4648 alive 2 0.2.3 Best global
ip-172-31-29-39.global 172.31.29.39 4648 alive 2 0.2.3 Best global
[root@ip-172-31-22-13 logs]# nomad node-status
ID DC Name Class Drain Status
0030d4f7-c0ae-5f48-3c01-3939eade8a42 Boxtel ip-172-31-18-153 <none> false ready
8718fa79-f468-9bd8-117b-ce87403920b4 Best ip-172-31-30-25 <none> false ready
a90f5ffb-334a-b9f7-105f-ce078b181098 Best ip-172-31-30-24 <none> true down
ccb36d24-4681-326c-b08e-42aa56a4bf67 Best ip-172-31-29-208 <none> false ready
[ec2-user@ip-172-31-22-13 jobs]$ nomad status
ID Type Priority Status
SimpleHTTPServer service 50 <none>
[ec2-user@ip-172-31-22-13 jobs]$ nomad status -short SimpleHTTPServer
ID = SimpleHTTPServer
Name = SimpleHTTPServer
Type = service
Priority = 50
Datacenters = Boxtel
Status = <none>
[ec2-user@ip-172-31-22-13 jobs]$ nomad status SimpleHTTPServer
ID = SimpleHTTPServer
Name = SimpleHTTPServer
Type = service
Priority = 50
Datacenters = Boxtel
Status = <none>
==> Evaluations
ID Priority TriggeredBy Status
edfc16cd-0a57-edaf-472b-a42ea346407f 50 job-register complete
==> Allocations
ID EvalID NodeID TaskGroup Desired Status
a51dec6c-7416-fc09-037c-6cc617136209 edfc16cd-0a57-edaf-472b-a42ea346407f 0030d4f7-c0ae-5f48-3c01-3939eade8a42 cache run failed
I fired up a RHEL7 ami (instead of the Amazon Linux AMI) to see if that helps, but that also gives the same error.
Assuming it has something to do with the driver, I changed the driver from "exec" to "raw_exec", but that gives a "missing drivers" :
[ec2-user@ip-172-31-22-13 jobs]$ nomad run SimpleHTTPServer.nomad
==> Monitoring evaluation "306cb11a-b012-c89f-aa36-69c9a0e44464"
Evaluation triggered by job "SimpleHTTPServer"
Scheduling error for group "cache" (failed to find a node for placement)
Allocation "7a0350ea-8396-96f7-308d-1aee4a500619" status "failed" (1/1 nodes filtered)
* Constraint "missing drivers" filtered 1 nodes
Evaluation status changed: "pending" -> "complete"
==> Evaluation "306cb11a-b012-c89f-aa36-69c9a0e44464" finished with status "complete"
Checking out the logs of the nomad client confirms:
2016/01/12 07:59:19 [DEBUG] client: available drivers [exec java]
According to the documentation, an "exec" task is running chrooted. Although we do like chrooted envs, I think looks like this requires quite some diskspace (730 MB for one python cmd) (not sure if that's true with all those mounts here):
[root@ip-172-31-29-208 alloc]# pwd
/home/ec2-user/nomad/data/alloc
[root@ip-172-31-29-208 alloc]# ls -l
total 8
drwx------ 3 root root 4096 Jan 13 07:05 651023c2-5bcc-9997-8350-c179eb38e73d
drwx------ 4 root root 4096 Jan 13 07:06 c6c34727-2e79-7112-674f-c02fba46209a
[root@ip-172-31-29-208 alloc]# du -cms . 2>/dev/null
731 .
731 total
[root@ip-172-31-29-208 alloc]# ls -l c6c34727-2e79-7112-674f-c02fba46209a/web
total 36
drwxrwxrwx 5 nobody nobody 4096 Jan 13 07:06 alloc
dr-xr-xr-x 2 root root 4096 Jan 13 07:06 bin
drwxr-xr-x 16 root root 2720 Jan 13 06:43 dev
drwxr-xr-x 75 root root 4096 Jan 13 07:06 etc
dr-xr-xr-x 7 root root 4096 Jan 13 07:06 lib
dr-xr-xr-x 10 root root 12288 Jan 13 07:06 lib64
drwxrwxrwx 2 nobody nobody 4096 Jan 13 07:06 local
dr-xr-xr-x 86 root root 0 Jan 13 06:43 proc
dr-xr-xr-x 5 root root 4096 Jan 13 07:06 usr
[root@ip-172-31-29-208 proc]# mount
mount: /proc/self/mountinfo: parse error: ignore entry at line 9.
mount: /proc/self/mountinfo: parse error: ignore entry at line 10.
mount: /proc/self/mountinfo: parse error: ignore entry at line 12.
mount: /proc/self/mountinfo: parse error: ignore entry at line 13.
mount: /proc/self/mountinfo: parse error: ignore entry at line 15.
mount: /proc/self/mountinfo: parse error: ignore entry at line 16.
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
/dev/xvda1 on / type ext4 (rw,noatime,data=ordered)
devtmpfs on /dev type devtmpfs (rw,relatime,size=500712k,nr_inodes=125178,mode=755)
devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /dev/shm type tmpfs (rw,relatime)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
/dev/xvda1 on /home/ec2-user/nomad/data/alloc/651023c2-5bcc-9997-8350-c179eb38e73d/web/alloc type ext4 (rw,noatime,data=ordered)
none on /home/ec2-user/nomad/data/alloc/651023c2-5bcc-9997-8350-c179eb38e73d/web/dev
none on /home/ec2-user/nomad/data/alloc/651023c2-5bcc-9997-8350-c179eb38e73d/web/proc
/dev/xvda1 on /home/ec2-user/nomad/data/alloc/c6c34727-2e79-7112-674f-c02fba46209a/web/alloc type ext4 (rw,noatime,data=ordered)
none on /home/ec2-user/nomad/data/alloc/c6c34727-2e79-7112-674f-c02fba46209a/web/dev
none on /home/ec2-user/nomad/data/alloc/c6c34727-2e79-7112-674f-c02fba46209a/web/proc
/dev/xvda1 on /home/ec2-user/nomad/data/alloc/a3cc556f-eafe-4c30-a479-0fd63d8d63fd/fulltest-task/alloc type ext4 (rw,noatime,data=ordered)
none on /home/ec2-user/nomad/data/alloc/a3cc556f-eafe-4c30-a479-0fd63d8d63fd/fulltest-task/dev
none on /home/ec2-user/nomad/data/alloc/a3cc556f-eafe-4c30-a479-0fd63d8d63fd/fulltest-task/proc
Nomad#
Table of Contents
Resources#
Miscellaneous commands #
nomad server-members#
nomad node-status#
nomad validate#
nomad run#
[ec2-user@ip-172-31-22-13 jobs]$ nomad run SimpleHTTPServer.nomad ==> Monitoring evaluation "edfc16cd-0a57-edaf-472b-a42ea346407f" Evaluation triggered by job "SimpleHTTPServer" Allocation "a51dec6c-7416-fc09-037c-6cc617136209" created: node "0030d4f7-c0ae-5f48-3c01-3939eade8a42", group "cache" Evaluation status changed: "pending" -> "complete" ==> Evaluation "edfc16cd-0a57-edaf-472b-a42ea346407f" finished with status "complete"nomad status#
nomad stop#
[ec2-user@ip-172-31-22-13 jobs]$ nomad stop SimpleHTTPServer ==> Monitoring evaluation "0a6b662b-a5e9-9495-1e77-429306da45a7" Evaluation triggered by job "SimpleHTTPServer" Evaluation status changed: "pending" -> "complete" ==> Evaluation "0a6b662b-a5e9-9495-1e77-429306da45a7" finished with status "complete" [ec2-user@ip-172-31-22-13 jobs]$ nomad status SimpleHTTPServer Error querying job: Unexpected response code: 404 (job not found)Errors#
When submitting a simple python webserver as a job (exec driver), we get this in the nomad client log:
* Failed to join spawn-daemon to the cgroup (&{Name:02b1290b-0a2d-43b2-eee8-fcc096e86a14 Parent: ScopePrefix: Resources:0xc8200eef00}): Error found less than 3 fields post '-' in "24 21 0:6 / /home/ec2-user/nomad/data/alloc/54e3a869-e78c-f2cf-368a-9374460bea37/web/dev ro,relatime - devtmpfs rw,size=500712k,nr_inodes=125178,mode=755" 2016/01/12 10:19:27 [DEBUG] client: updated allocations at index 519 (2 allocs) 2016/01/12 10:19:27 [DEBUG] client: allocs: (added 0) (removed 0) (updated 2) (ignore 0)Googling it gives you exactly one hit, which is a 404 on github, digging further in that github repo brings you to the source at : https://github.com/hashicorp/nomad/blob/master/client/driver/executor/exec_linux.go
:-) :-(
I fired up a RHEL7 ami (instead of the Amazon Linux AMI) to see if that helps, but that also gives the same error.
Assuming it has something to do with the driver, I changed the driver from "exec" to "raw_exec", but that gives a "missing drivers" :
[ec2-user@ip-172-31-22-13 jobs]$ nomad run SimpleHTTPServer.nomad ==> Monitoring evaluation "306cb11a-b012-c89f-aa36-69c9a0e44464" Evaluation triggered by job "SimpleHTTPServer" Scheduling error for group "cache" (failed to find a node for placement) Allocation "7a0350ea-8396-96f7-308d-1aee4a500619" status "failed" (1/1 nodes filtered) * Constraint "missing drivers" filtered 1 nodes Evaluation status changed: "pending" -> "complete" ==> Evaluation "306cb11a-b012-c89f-aa36-69c9a0e44464" finished with status "complete"Checking out the logs of the nomad client confirms:
Looping PID 1 ?!#
After running a nomad job like this :
driver = "exec" config { command = "/bin/bash" args = [ "-c", "mkdir ff && cd ff && curl --silent --show-error --remote-name 'http://www.computerhok.nl/tmp/dropwizardtest-1.2-assembly.zip' && unzip *.zip && cd * && java -jar dropwizardtest*.jar server helloworld.yaml"] }This results in a hardly responding host, and just before reboot:
chroot for exec task ?#
According to the documentation, an "exec" task is running chrooted. Although we do like chrooted envs, I think looks like this requires quite some diskspace (730 MB for one python cmd) (not sure if that's true with all those mounts here):