Let's build something legendary together

/

Idol Face Project

ReactJS / Flask / Pytorch

Overview

I trained a StyleGAN2 model from NVIDIA using an idol face dataset obtained from the Kaggle competition KID-F (K-Pop Idol Dataset - Female). This process took approximately 30 hours, utilizing a single GPU with a resolution of 256 x 256. The final model was integrated into a React application, allowing for the generation of new images on request. The application features a React frontend and Flask backend, with a variety of capabilities, including adjustable truncation psi, noise input, seed preservation, and the ability to save favorite images. However, to further improve the model's output, I suggest utilizing a better and more powerful GPU, which would enable higher resolution images. Additionally, I recommend incorporating a less biased dataset, as the current dataset has a bias towards certain celebrities. Expanding the dataset to include a wider variety of celebrities would further enhance the model's performance.

For this project, I initially considered implementing a whole generative model from scratch. However, I quickly realized that this was not a practical approach. In modern deep learning models, which are often complex and extensive, training from scratch is highly likely to lead to poor results. Therefore, I decided to perform a transfer learning approach by using a pre-trained model of StyleGAN-2 from NVIDIA. After scouring the internet, I found a Github repository consisting of an official pytorch implementation of StyleGAN-2. I thoroughly reviewed the code, which enhanced my understanding of the architecture. Although I encountered several issues setting up the environment and resolving clashes from Conda and Pip, I eventually began the training process. I will share some of the images generated during the training process.

/

Initial pretrained Image

/

Image after 120k images

/

Final trained Image

As shown, the quality of image generated gradually increases as the training proceeds. It is also interesting to see how small details such as the camera angle, facial expression and skin color remains similar to the original photo. In total, the model learned by having 900K image inputs. After the training was complete, which took approximately 30 hours, I started implementing a web application to showcase this, using react frontend and flask backend. For minimalistic design, and since not a lot of features were required, I tried to keep the frontend of the application simple, with the generated image being displayed at the center, with some parameters adjustable below it.

Website Description

/

I apologize for the terrible labelling, was the easiest way I could think of. Anyways, I will go through the different features of the application:

  1. Seed: This is just an arbitary value showing nth random input. The seed is capped at 2^32 - 1, however does not mean this model only makes 2^32 images. Theoretically, the generative model can make practically infinite number of images by receiving style vector z with latent space size of 512. The seed only is used to help user identify specific input without knowing the tensor input.
  2. Image: This section shows the user the produced image with that seed. The image is maintained even when the page is reloaded, because the data is stored in external file. The output image size is 256x256. I wanted to make the resolutions greater, but I predicted that the training time to increase exponentially, therefore gave up.
  3. Generate: This button fetches to the flask backend, by giving the new seed and the psi information. The received information is processed at the backend, generates image and returns the image. The model takes few seconds to load at the start, but after that average time taken is less than a second.
  4. Noise and Keep Seed: Keep Seed is self explanatory, just keeping the seed on next generate. This allows users to experiment on different psi values on same seed. Noise is a feature in StyleGAN, which provides slight variation in image such as the hair, skin texture and so on.
  5. Truncation Psi: The slider allows the user to change the truncation psi. Truncation psi is a feature in styleGAN, in which it truncates the normal distribution of latent space 512, in whcih the input is now restricted by a boundary.The closer the truncation psi is to 1, the greater the variation of image is, however most of the images were quite abnormal and disturbing (less of a human figure). Conversely, the closer the truncation psi is to 0, the more general the image became, representing the "most average" image of a K-pop idol.
  6. Set Favorite: The star button sets the image to favorite. The seed information is stored in a separate json file.
  7. Favorites: This routes to a separate page, listing all favorite images in a grid. The page is really simple, so will not show on this overview.

Overall, this project was entertaining. It was like watching a baby grow up as the model slowly started to draw some cool images in a way that I initially planned as. The react and flask worked almost flawlessly, and the whole process, excluding the wait from training, took approximately a week. The website could be improved by first of all deploying online, adding additional features, and fixing a bug of page reloading after every generate. The github code can be found using the button below!

Technologies

React

Flask

Pytorch

Javascript

Python