Nonlinear 3D Face Morphable Model

As a classic statistical model of 3D facial shape and texture, 3D Morphable Model (3DMM) is widely used in facial analysis, including model fitting, image synthesis, etc. Conventional 3DMM is learned from a collection of wellcontrolled 2D face images with associated 3D face scans, and represented by two sets of PCA basis functions. Due to the type and amount of training data, as well as, the linear bases, the representation power of 3DMM can be limited. To address these problems, this paper proposes an innovative framework to learn a nonlinear 3DMM model from a large set of unconstrained face images, without collecting 3D face scans. Specifically, given a face image as input, a network encoder estimates the projection, shape and texture parameters. Two network decoders serve as the nonlinear 3DMM to map from the shape and texture parameters to the 3D shape and texture, respectively. With the projection parameter, 3D shape, and texture, a novel analyticallydifferentiable rendering layer is designed to reconstruct the original input face. The entire architecture is end-to-end trainable with only weak supervision. We demonstrate the superior representation power of our nonlinear 3DMM over its linear counterpart, and its contribution to face alignment and 3D face reconstruction.

Disentangled Representation Learning GAN for Pose-Invariant Face Recognition

The large pose discrepancy between two face images is one of the key challenges in face recognition. The conventional approach to pose-robust face recognition either performs face frontalization on, or learns a pose-invariant representation from, a non-frontal face image. We argue that, it is more desirable to perform both tasks jointly to allow them to leverage each other. To this end, this paper proposes Disentangled Representation Learning-Generative Adversarial Network (DR-GAN) with three distinct novelties. First, the encoder-decoder structure of the generator allows DR-GAN to learn the identity representation for each face image, in addition to image synthesis. Second, this representation is explicitly disentangled from other face variations such as pose, through the pose code provided to the decoder and pose estimation in the discriminator. Third, DR-GAN can take one or multiple images as the input, and generate one integrated representation along with an arbitrary number of synthetic images. Quantitative and qualitative evaluation on both constrained and unconstrained databases demonstrate the superiority of DR-GAN over the state of the art.

Learning to Fuse Information with Missing Modalities

One of the key GEOINT capabilities is to be able to automatically recognize a large array of objects from visual data. Depending on the resolution of imagery, objects may range from specific locations or scenes, road, building, forest to vehicle, human, etc. This is clearly a technically challenging problem for both computer vision and machine learning due to the large variations in the appearance of these objects exhibited in the imagery. To address this problem, researchers have developed various fusion methods that combine information collected from multiple sensing modalities, such as RGB imagery, LiDAR point cloud, multispectral imaging, hyperspectral imaging, and GPS, to improve the reliability and accuracy of object recognition. This research direction is motivated by the ever-decreasing sensoring cost, and more importantly, by the complementary characteristics among multiple sensing modalities. Therefore, with the well-funded promise of escalating object recognition performance, a great deal of data analysis research is in urgent need in order to fully take advantage of this massive amount of multi-modality data.

All the prior research oninformation fusion requires that the sensor data of all modalities are available for every training data instance. This requirement significantly limits the application of information fusion methods as missing modalities abound in practical applications.

In recognizing missing modalities as a roadblock toward fulfilling the key GEOINT capability, we propose to develop powerful and computationally efficient approaches that can learn to fuse information from different sensors when a significant portion of training data has missing modalities. The ultimate goal of our project is to develop a suite of computer vision and machine learning tools for geographical imagery analysis that can serve as an aid for geo-spatial analysts to facilitate the analysis and classification of geographical images.

Person tracking and motion analysis for medical applications

New RGBD sensors enable precise tracking of human motions. This can be used for medical appications such as home care.