Foley.ai is an AI-powered Figma plugin that assist UI/UX designers by auto-generating sound effects matched with UI elements. This tool enhances creativity and efficiency by allowing sound generation through text, voice input, or reference files. By streamlining the design workflow and reducing time consumption, Foley.ai enables designers to concentrate on creativity and user experience enhancement.
In collaboration with: Amy Kim
Instructors: Jenny Rodenhouse, Todd Masilko
In filmmaking, "foley" refers to the reproduction of everyday sound effects that are added to films, videos, and other media in postproduction to enhance audio quality.
Our product serves as an AI-powered Foley artist, assisting designers in crafting their sound effects that harmonize with their UI/UX products. It is
a Figma plugin that uses AI and ML (machine learning) to automatically generate sound effects that matched with UI elements. This tool streamlines the design process, reduces time consumption on sound design, and allows designers to focus more on creativity and enhancing the user experience.
Foley.ai therefore aims to revolutionize this process by automatically generating and matching sounds to UI elements, effectively reducing the time and effort involved and allowing designers to focus more on creativity and user experience optimization.
After testing various image captioning and object detection tools on Runway, we discovered that AI can identify multiple objects in an image and then generate descriptive narratives as sentences or several keywords. This discovery marks the first step towards making our product system functional.
We did experiments with AudioGen, a state-of-the-art text-to-sound model, and the results were quite interesting. These experiments proved the generative audio output is sufficiently robust to make our concept viable. However, we gained some insights that will impact the user experience design of our product.
1. Generating sound takes a bit of time, particularly during the first creation.
💡 Add a loading animation while the generative audio is being created.
2. Users are able to adjust text, duration, and seed for input, receiving an audio file and seed number as output.
💡 Allow users to modify the text prompt and duration.
💡 Enable users to save audio seeds to their sound library for quick access and use in future prompts.
3. In addition to text input, users can upload an audio file or record a sound using a microphone as input.
💡 Facilitate users to add reference sounds by recording through a microphone or uploading files.
The core attributes of our project are whimsicality, creativity, and uniqueness. We designed our plugin to be an inviting playground for users, encouraging them to experiment freely and creatively.
The logo, with its dynamic gradient colors, visually represents the fluctuation and range of sound pitches, mirroring the vocal inflections heard when pronouncing our name. This decision enriches our brand identity with rhythm, movement, and vibrancy.
Our plugin's natural language tone embodies whimsicality. Each prompt is crafted to feel like casting a magical spell, magically transforming user inputs into the envisioned sounds.
For our design system, we've chosen a color palette and typeface that reflect our brand values. As a Figma plugin, we developed UI components based on Figma's framework, ensuring the content is both intuitive and capable of sparking joy and inspiration.