Published on

Sculpture GAN

Sculpture GAN Sequence GIF

Sculpture GAN Sequence


In the past decade, there has been a considerable rise in the use of generative models for the purposes of exploring, supporting and understanding art and creativity. Unsupervised learning models are often used to generate visual works that challenge our contemporary views on art. Although generative models have proven themselves to be promising in 2D, there has been comparatively less work in the 3D generative space. This paper seeks to explore 3D shape synthesis using unsupervised learning models to research how sculpture art might arise from generative models. To what extent can deep learning systems learn representations that we might deem sculptural? The goal of this paper is to have a ProGAN predict new volumetric representations from a data distribution. The results show a promising method to compute occupancy grids using a generative adversarial network (GAN). The contributions of this research are twofold: (1) how a dataset can be composed of 3D objects so that a GAN converges during training, and (2) how a GAN can learn a complex data distribution from these representations and generate a novel 3D object. This research emphasises the importance of 3D shape synthesis using GANs and further identifies the vast range of possibilities that generative art can achieve through the lens of computational sculpturing.

Voxel Chiseling

Similar the process of a classical sculptor, the method of voxel chiseling is that of learning what bits to remove off a cube in order to synthesise a geometry. The classical sculptor would chisel away bits of marble from a cube; exposing the geometry that was hidden in the concrete primitive. The Sculpture GAN learns geometrical representations in a similar way. Starting with a voxel grid of size 283, the model learns at what indices it should remove a voxel in order to generate a 3D geometry of a given shape.

The Initial Voxel Grid

Generative Adversarial Network

In 2014, Ian J. Goodfellow proposed the Generative Adversarial Network (GAN). A GAN is a machine learning model that combines two separate networks, namely a Generator and a Discriminator. The Generator is trained to produce fake samples and the Discriminator tries to classify whether a sample given to it comes from the data set or if it’s created by the Generator. This framework has proved itself extremely powerful class of neural networks used for unsupervised learning.

The generative model can be thought of as analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency. Competition in this game drives both teams to improve their methods until the counterfeits are indistinguishable from the genuine articles. -- Ian J. Goodfellow (2014)

Recently, using deep learning on 3D information is gathering attention. 3D model creation forms a large part of the development process in 3D graphical environments such as games and simulations. Understanding 3D environments is a vital element of modern computer vision systems, spanning a wide field of application scenarios from self-driving cars to autonomous robots. However, there is little research of 3D generative techniques, perhaps due to the inherently difficult and computationally large data structures compared to the 2D counterpart.

Training Process

This Generative Adversarial Network (GAN) was trained for 10 epochs on a training set of 7.500 pre-processed mesh files, through a training process of roughly 580 iterations in which the GAN’s Discriminator and Generator are updated. Each sample in the training set is an voxel grid of size 283 where each index holds a boolean value (true if there should be a voxel there, and false if there shouldn’t be).

The GAN predicts voxel maps that are similar to the targets and more than that: a certainty metric for each voxel index, providing an insight into geometrical features of an embedding. These certainties give us a score of how sure the network is that the voxel should be placed at that index.

Using these values in a custom shader for each voxel results in something that could be considered as a three-dimensional heat map of what the model deems are the core features of the geometry.

Sculpture GAN Sequence GIF
Sculpture GAN Sequence GIF

Rendered Predictions

Pipeline to Generate the Dataset

A mesh-file is exported as a PLY file as a list of vertices, edges, and faces. This offers a mesh representation expressed in a cartesian coordinate system. Using poisson disk sampling, a given number of points with equal distance to their neighbours are sampled. With Open3D this sampled point cloud is then translated to a voxel grid of a given resolution. To enlarge the training set, data augmentation is applied. Once a voxel grid of a given pose mesh in created, each voxel grid is ran through a pipeline that duplicates the voxel grid and applies a random noise to each sample.

Sculpture GAN Sequence GIF
Sculpture GAN Sequence GIF
Sculpture GAN Sequence GIF

Data Pre-Processing Pipeline


Several other experiments have been performed in which pose meshes of famous sculptures were used. First, and interpretation of Le Penseur (Auguste Rodin, 1904) was made. Thereafter, an interpretation of Ambroise Paré (Pierre-Jean David d'Angers, 1840) was made.

Sculpture GAN Sequence GIF
Sculpture GAN Sequence GIF

Interpretations of 'Le Penseur' (left) and 'Ambroise Paré' (right)