Automatic segmentation of medical images finds abundant applications in clinical studies. Computed Tomography (CT) imaging plays a critical role in diagnostic and surgical planning of craniomaxillofacial (CMF) surgeries as it shows clear bony structures. However, CT imaging poses radiation risks for the subjects being scanned. Alternatively, Magnetic Resonance Imaging (MRI) is considered to be safe and provides good visualization of the soft tissues, but the bony structures appear invisible from MRI. Therefore, the segmentation of bony structures from MRI is quite challenging. In this paper, we propose a cascaded generative adversarial network with deep-supervision discriminator (Deep-supGAN) for automatic bony structures segmentation. The first block in this architecture is used to generate a high-quality CT image from an MRI, and the second block is used to segment bony structures from MRI and the generated CT image. Different from traditional discriminators, the deep-supervision discriminator distinguishes the generated CT from the ground-truth at different levels of feature maps. For segmentation, the loss is not only concentrated on the voxel level but also on the higher abstract perceptual levels. Experimental results show that the proposed method generates CT images with clearer structural details and also segments the bony structures more accurately compared with the state-of-the-art methods.