Unleashing Creativity with FastCLIPstyler: Personalized Image Filters at Your Fingertips

Sertis
6 min readAug 2, 2024

--

The world of digital photos is always changing, bringing new ways to make your images truly shine. Social media platforms and photo editing apps offer plenty of filters to play with, but often these just don’t capture the unique essence you’re looking for. You might find yourself scrolling through options that don’t quite hit the mark, limited to what’s already been designed. That’s where FastCLIPstyler [1] steps in — a revolutionary tool that changes the game by letting you create personalized filters from scratch, based simply on your own descriptions. This isn’t just about picking a filter; it’s about crafting your own signature style.

Imagine wanting to apply a unique artistic touch to your photos, something that screams ‘you’, without the hassle of scrolling through endless filter options. Maybe you’ve looked at a breathtaking sunset and wished you could infuse your photos with the same warm, golden hues? Or perhaps you’ve admired the style of a classic painting and wondered how it would look if your photograph mirrored its unique aesthetic.

FastCLIPstyler makes this a reality by enabling you to create on-demand, personalized image filters with just a text description. Instead of simply choosing a filter, you craft one. Utilizing advanced generative AI, FastCLIPstyler incorporates your creative vision into the photo editing process, transforming your ideas into tangible results. Just describe your vision, type it in, and watch as FastCLIPstyler translates your words into a visual style that seamlessly enhances your photos. It’s like having a personal artist at your fingertips, ready to transform your ideas into reality.

How it works

FastCLIPstyler is built on a foundation of advanced AI technology that blends two groundbreaking models: CLIP [2] and a style transfer model developed by Google [3]. Let’s break down how these components work together to allow you to create custom photo filters based on just text descriptions.

Understanding CLIP

CLIP, developed by OpenAI, is like a smart assistant that not only matches words to images but also understands the context behind them. Let’s say you give CLIP a description like “Bright and Airy” and show it two pictures — one that’s brightly lit with soft lighting, and another that’s dark and moody — CLIP can evaluate which image best matches your description.

It does this by assigning a closeness score to each image based on its similarity to the description. The image that most closely matches the description “Bright and Airy” would have a higher closeness score, indicating a better match. So, the bright image with light and muted colors would likely receive a higher score compared to the darker, moodier image. This scoring system allows CLIP to quantify how well each image reflects the given words, drawing on its extensive learning from a vast array of text-image pairs that it learnt during it’s training. This is particularly useful for pinpointing the most fitting image quickly and accurately, based on the verbal cues provided.

Thus, in a practical application, CLIP helps to identify and generate visual styles that best match the textual descriptions, making it ideal for tasks like creating customized filters based on specific style descriptions.

Style Transfer with Google’s Model

Google’s style transfer technology, referred to as Ghiasi style transfer model, allows you to transform the appearance of your photographs by adopting the artistic style of another image. Essentially, it lets you take the visual elements from one image, like the colors and textures from a famous painting, and apply them to your own photo, changing its look while keeping the original content the same.

You start with two images. The first is your target image, which is the picture you want to modify. This could be anything from a portrait to a landscape photo. The second is the style reference image, which is a picture that has the artistic qualities you admire and want to transfer. This could be an artwork by a renowned artist such as Van Gogh or Monet, or any other image whose style catches your eye.

Google’s technology uses advanced machine learning models that have been trained to recognize and analyze the artistic details in the style reference. These details include brush strokes, color distribution, and textural patterns. The model then applies these stylistic elements to your target image. It’s not just a superficial overlay of colors; the model intelligently blends the styles, considering how the light, texture, and colors interact in both images to ensure that the final product is cohesive and visually appealing.

Using Google’s style transfer technology is like having a sophisticated filter for your photos, where you can mimic the look of any artwork or style you admire. The catch, however, is that you need to provide a specific reference image whose style you want to replicate. This requirement can sometimes be a hurdle because finding the perfect image with the desired style might not always be feasible.

Enter FastCLIPstyler

FastCLIPstyler acts as a bridge between CLIP and Google style transfer. It translates the text representations understood by CLIP into a form that the Ghiasi style transfer model can utilize to apply artistic styles to content images.

Model Workflow and Training:

  1. Translation Layer: FastCLIPstyler includes a translation layer specifically designed to convert the image representations from CLIP into a format suitable for the Ghiasi style transfer model. This layer is crucial because CLIP and Ghiasi’s model naturally interpret images in fundamentally different ways.
  2. Generating data for training: FastCLIPstyler is trained using a creative method where different words describing colors and artistic styles are mixed together to form unique descriptions. For example, words like “red”, “azure”, and “golden” are paired with terms like “abstract”, “impressionist”, and “cubist” to create combinations such as “red abstract” or “golden cubist”. This approach helps FastCLIPstyler learn a variety of visual styles from just simple text descriptions.
  3. Training the layer: By training with these varied prompts, the model gets really good at understanding how to transform any picture to match the style described by words. So, if someone asks it to make an image look like a “turquoise surreal” painting, it knows exactly how to do that. Even if it has never seen that particular combination before.

Comparison with Existing Technologies

While there are other approaches like CLIPstyler [4] and CLVA [5] network that also aim to create personalized image filters, they often fall short in terms of speed and artistic output. CLIPstyler, for example, is an optimization-based approach and can take much longer to process requests, which isn’t ideal for users looking for quick results. On the other hand, LDAST network, while innovative, may not always meet the artistic expectations of users, as it can produce results that some may find less visually appealing. FastCLIPstyler sets itself apart by offering not only real-time processing but also ensuring that the artistic quality of the filters is both pleasing and closely aligned with user descriptions. This balance of speed and quality makes it a superior choice for those seeking efficient and aesthetically satisfying photo edits.

Try It Yourself: Interactive Demo

Experience the magic of FastCLIPstyler firsthand by trying out our interactive demo: FastCLIPstyler Demo

This demo allows you to see how your own photos can be transformed with just a few words describing your desired style. Dive into the world of custom filters and discover how easy and effective it is to personalize your photos with FastCLIPstyler.

Acknowledgements

Our research has been peer-reviewed and published to WACV 2024. AI researchers from Sertis Vision Lab, namely Ananda Padhmanabhan Suresh, Sanjana Jain, Pavit Noinongyao, Ankush Ganguly, Ukrit Watchareeruetai, and Aubin Samacoits collaboratively developed the novel FastCLIPstyler framework for text-based image style transfer.

Read our peer-reviewed research here: FastCLIPstyler: Optimisation-free Text-based Image Style Transfer Using Style Representations.

References

[1] Suresh, A.P., Jain, S., Noinongyao, P., & Ganguly, A. (2022). FastCLIPstyler: Optimisation-free Text-based Image Style Transfer Using Style Representations. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 7301–7310.

[2] Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. International Conference on Machine Learning.

[3] Ghiasi, G., Lee, H., Kudlur, M., Dumoulin, V., & Shlens, J. (2017). Exploring the structure of a real-time, arbitrary neural artistic stylization network. ArXiv, abs/1705.06830.

[4] Kwon, G., & Ye, J. (2021). CLIPstyler: Image Style Transfer with a Single Text Condition. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18041–18050.

[5] Fu, T., Wang, X.E., & Wang, W.Y. (2021). Language-Driven Artistic Style Transfer. European Conference on Computer Vision.

--

--