Stroke is a challenging disease to diagnose in an emergency room (ER) setting. While an MRI scan is very useful in detecting ischemic stroke, it is usually not available due to space constraint and high cost in the ER. Clinical tests like the Cincinnati Pre-hospital Stroke Scale (CPSS) and the Face Arm Speech Test (FAST) are helpful tools used by neurologists, but there may not be neurologists immediately available to conduct the tests. We emulate CPSS and FAST and propose a novel multimodal deep learning framework to achieve computer-aided stroke presence assessment over facial motion weaknesses and speech inability for patients with suspicion of stroke showing facial paralysis and speech disorders in an acute setting. Experiments on our video dataset collected on actual ER patients performing specific speech tests show that the proposed approach achieves diagnostic performance comparable to that of ER doctors, attaining a 93.12% sensitivity rate while maintaining 79.27% accuracy. Meanwhile, each assessment can be completed in less than four minutes. This demonstrates the high clinical value of the framework. In addition, the work, when deployed on a smartphone, will enable self-assessment by at-risk patients at the time when stroke-like symptoms emerge.