Good questions and answers.
Only half of the configured vRAM will be allocated from the GPU, so if you have 1 K1 with 16GB memory then you can have max 64 users. (256MB x 64 = 16)
But with CAD users you probably saturate the GPU much earlier then that, see this recent performance study:
Since there is no overcommit and GPU utilization is not part of DRS (for now) I usually recommend to build clusters specifically for these workloads since they tend to be more demanding then regular users when it comes to vCPU:s and IOPS anyway.
// Linjo