Nomad#
Resources#
Miscellaneous commands #
nomad server-members#
[root@ip-172-31-22-13 logs]# nomad server-members Name Addr Port Status Proto Build DC Region ip-172-31-22-13.eu-central-1.compute.internal.global 172.31.22.13 4648 alive 2 0.2.3 Best global ip-172-31-29-38.global 172.31.29.38 4648 alive 2 0.2.3 Best global ip-172-31-29-39.global 172.31.29.39 4648 alive 2 0.2.3 Best global
nomad node-status#
[root@ip-172-31-22-13 logs]# nomad node-status ID DC Name Class Drain Status 0030d4f7-c0ae-5f48-3c01-3939eade8a42 Boxtel ip-172-31-18-153 <none> false ready 8718fa79-f468-9bd8-117b-ce87403920b4 Best ip-172-31-30-25 <none> false ready a90f5ffb-334a-b9f7-105f-ce078b181098 Best ip-172-31-30-24 <none> true down ccb36d24-4681-326c-b08e-42aa56a4bf67 Best ip-172-31-29-208 <none> false ready
nomad validate#
[ec2-user@ip-172-31-22-13 jobs]$ nomad validate SimpleHTTPServer.nomad Job validation successful
nomad run#
[ec2-user@ip-172-31-22-13 jobs]$ nomad run SimpleHTTPServer.nomad
==> Monitoring evaluation "edfc16cd-0a57-edaf-472b-a42ea346407f"
Evaluation triggered by job "SimpleHTTPServer"
Allocation "a51dec6c-7416-fc09-037c-6cc617136209" created: node "0030d4f7-c0ae-5f48-3c01-3939eade8a42", group "cache"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "edfc16cd-0a57-edaf-472b-a42ea346407f" finished with status "complete"
nomad status#
[ec2-user@ip-172-31-22-13 jobs]$ nomad status ID Type Priority Status SimpleHTTPServer service 50 <none>
[ec2-user@ip-172-31-22-13 jobs]$ nomad status -short SimpleHTTPServer ID = SimpleHTTPServer Name = SimpleHTTPServer Type = service Priority = 50 Datacenters = Boxtel Status = <none>
[ec2-user@ip-172-31-22-13 jobs]$ nomad status SimpleHTTPServer ID = SimpleHTTPServer Name = SimpleHTTPServer Type = service Priority = 50 Datacenters = Boxtel Status = <none> ==> Evaluations ID Priority TriggeredBy Status edfc16cd-0a57-edaf-472b-a42ea346407f 50 job-register complete ==> Allocations ID EvalID NodeID TaskGroup Desired Status a51dec6c-7416-fc09-037c-6cc617136209 edfc16cd-0a57-edaf-472b-a42ea346407f 0030d4f7-c0ae-5f48-3c01-3939eade8a42 cache run failed
nomad stop#
[ec2-user@ip-172-31-22-13 jobs]$ nomad stop SimpleHTTPServer
==> Monitoring evaluation "0a6b662b-a5e9-9495-1e77-429306da45a7"
Evaluation triggered by job "SimpleHTTPServer"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "0a6b662b-a5e9-9495-1e77-429306da45a7" finished with status "complete"
[ec2-user@ip-172-31-22-13 jobs]$ nomad status SimpleHTTPServer
Error querying job: Unexpected response code: 404 (job not found)
Errors#
When submitting a simple python webserver as a job (exec driver), we get this in the nomad client log:
* Failed to join spawn-daemon to the cgroup (&{Name:02b1290b-0a2d-43b2-eee8-fcc096e86a14 Parent: ScopePrefix: Resources:0xc8200eef00}): Error found less than 3 fields post '-' in "24 21 0:6 / /home/ec2-user/nomad/data/alloc/54e3a869-e78c-f2cf-368a-9374460bea37/web/dev ro,relatime - devtmpfs rw,size=500712k,nr_inodes=125178,mode=755"
2016/01/12 10:19:27 [DEBUG] client: updated allocations at index 519 (2 allocs)
2016/01/12 10:19:27 [DEBUG] client: allocs: (added 0) (removed 0) (updated 2) (ignore 0)
Googling it gives you exactly one hit, which is a 404 on github, digging further in that github repo brings you to the source at : https://github.com/hashicorp/nomad/blob/master/client/driver/executor/exec_linux.go
:-) :-(
I fired up a RHEL7 ami (instead of the Amazon Linux AMI) to see if that helps, but that also gives the same error.
Assuming it has something to do with the driver, I changed the driver from "exec" to "raw_exec", but that gives a "missing drivers" :
[ec2-user@ip-172-31-22-13 jobs]$ nomad run SimpleHTTPServer.nomad
==> Monitoring evaluation "306cb11a-b012-c89f-aa36-69c9a0e44464"
Evaluation triggered by job "SimpleHTTPServer"
Scheduling error for group "cache" (failed to find a node for placement)
Allocation "7a0350ea-8396-96f7-308d-1aee4a500619" status "failed" (1/1 nodes filtered)
* Constraint "missing drivers" filtered 1 nodes
Evaluation status changed: "pending" -> "complete"
==> Evaluation "306cb11a-b012-c89f-aa36-69c9a0e44464" finished with status "complete"
Checking out the logs of the nomad client confirms:
2016/01/12 07:59:19 [DEBUG] client: available drivers [exec java]
chroot for exec task ?#
According to the documentation, an "exec" task is running chrooted. Although we do like chrooted envs, I think this requires quite some diskspace (730 MB for one python cmd) :
[root@ip-172-31-29-208 alloc]# pwd /home/ec2-user/nomad/data/alloc [root@ip-172-31-29-208 alloc]# ls -l total 8 drwx------ 3 root root 4096 Jan 13 07:05 651023c2-5bcc-9997-8350-c179eb38e73d drwx------ 4 root root 4096 Jan 13 07:06 c6c34727-2e79-7112-674f-c02fba46209a [root@ip-172-31-29-208 alloc]# du -cms . 2>/dev/null 731 . 731 total [root@ip-172-31-29-208 alloc]# ls -l c6c34727-2e79-7112-674f-c02fba46209a/web total 36 drwxrwxrwx 5 nobody nobody 4096 Jan 13 07:06 alloc dr-xr-xr-x 2 root root 4096 Jan 13 07:06 bin drwxr-xr-x 16 root root 2720 Jan 13 06:43 dev drwxr-xr-x 75 root root 4096 Jan 13 07:06 etc dr-xr-xr-x 7 root root 4096 Jan 13 07:06 lib dr-xr-xr-x 10 root root 12288 Jan 13 07:06 lib64 drwxrwxrwx 2 nobody nobody 4096 Jan 13 07:06 local dr-xr-xr-x 86 root root 0 Jan 13 06:43 proc dr-xr-xr-x 5 root root 4096 Jan 13 07:06 usr
