I trained a StyleGAN2 model from NVIDIA using an idol face dataset obtained from the Kaggle competition KID-F (K-Pop Idol Dataset - Female). This process took approximately 30 hours, utilizing a single GPU with a resolution of 256 x 256. The final model was integrated into a React application, allowing for the generation of new images on request. The application features a React frontend and Flask backend, with a variety of capabilities, including adjustable truncation psi, noise input, seed preservation, and the ability to save favorite images. However, to further improve the model's output, I suggest utilizing a better and more powerful GPU, which would enable higher resolution images. Additionally, I recommend incorporating a less biased dataset, as the current dataset has a bias towards certain celebrities. Expanding the dataset to include a wider variety of celebrities would further enhance the model's performance.
For this project, I initially considered implementing a whole generative model from scratch. However, I quickly realized that this was not a practical approach. In modern deep learning models, which are often complex and extensive, training from scratch is highly likely to lead to poor results. Therefore, I decided to perform a transfer learning approach by using a pre-trained model of StyleGAN-2 from NVIDIA. After scouring the internet, I found a Github repository consisting of an official pytorch implementation of StyleGAN-2. I thoroughly reviewed the code, which enhanced my understanding of the architecture. Although I encountered several issues setting up the environment and resolving clashes from Conda and Pip, I eventually began the training process. I will share some of the images generated during the training process.
Initial pretrained Image
Image after 120k images
Final trained Image
As shown, the quality of image generated gradually increases as the training proceeds. It is also interesting to see how small details such as the camera angle, facial expression and skin color remains similar to the original photo. In total, the model learned by having 900K image inputs. After the training was complete, which took approximately 30 hours, I started implementing a web application to showcase this, using react frontend and flask backend. For minimalistic design, and since not a lot of features were required, I tried to keep the frontend of the application simple, with the generated image being displayed at the center, with some parameters adjustable below it.
I apologize for the terrible labelling, was the easiest way I could think of. Anyways, I will go through the different features of the application:
Overall, this project was entertaining. It was like watching a baby grow up as the model slowly started to draw some cool images in a way that I initially planned as. The react and flask worked almost flawlessly, and the whole process, excluding the wait from training, took approximately a week. The website could be improved by first of all deploying online, adding additional features, and fixing a bug of page reloading after every generate. The github code can be found using the button below!
Technologies
React
Flask
Pytorch
Javascript
Python