Predicting mood, health, and stress can sound an early alarm against mental illness. Multi-modal data from wearable sensors provide rigorous and rich insights into one's internal states. Recently, deep learning-based features on continuous high-resolution sensor data have outperformed statistical features in several ubiquitous and affective computing applications including sleep detection and depression diagnosis. Motivated by this, we investigate multi-modal data fusion strategies featuring deep representation learning of skin conductance, skin temperature, and acceleration data to predict self-reported mood, health, and stress scores (0 - 100) of college students (N = 239). Our cross-validated results from the early fusion framework exhibit a significantly higher (p < 0.05) prediction precision over the late fusion for unseen users. Therefore, our findings call attention to the benefits of fusing physiological data modalities at a low level and corroborate the predictive efficacy of the deeply learned features.Clinical relevance - This establishes that with automatically extracted features from multiple sensor modalities, choosing the proper scheme of fusion can reduce the errors of predicting new users' future wellbeing by as much as 13.2%.