Stable Diffusion 3 is here! – by Shubham Saboo

February 23, 2024
RSS
News Trends
0

Today’s top AI Highlights:

Stable Diffusion 3 enters early preview
The Pope, Vikings, British medieval kings, everyone is brown – Google Gemini
Fireworks’ GPT-4-level function calling model
Give Feedback to GPTs
Prepare for your next interview with AI

& so much more!

Read time: 3 mins

Stability AI has announced its latest iteration of its text-to-image model, Stable Diffusion 3entering the early preview phase. The version demonstrates enhanced performance across multi-subject prompts, image quality, and notably improved spelling abilities, setting a new benchmark for creativity and precision in the generative AI space.

Key Highlights:

Stable Diffusion 3 introduces a suite of models with parameters ranging from 800 million to 8 billion, designed to offer users flexibility in terms of scalability and output quality.
At the heart of these enhancements is the diffusion transformer architecture, complemented by flow-matching techniques. This architectural choice, similar to OpenAI’s Sora, is aimed at delivering superior performance in generating detailed and coherent images from text prompts.
A standout feature of Stable Diffusion 3 is its significantly improved ability to handle text, spelling out full sentences with coherent style, a leap forward attributed to the integration of transformer architecture and additional text encoders.

As AI models are being used by global audiences, it is imperative to take care of gender and racial diversity to ensure fairness and inclusivity while building trust. But does this necessitate the complete exclusion of white individuals from historical contexts?!

Social media platforms have blasted with Google Gemini’s image generation tool going berserk. The tool, in an attempt to avoid “promoting harmful stereotypes”, has altered images of white historical figures and even contemporary individuals, portraying them as brown. For example, the Pope, Vikings, 17th-century physicists, and even Google’s founders are brown Asians.

The controversy has spurred discussions about RLHF to improve the performance and behavior of AI models. The process involves humans telling the model what constitutes appropriate or not, aiming to align the model with human values. Putting people behind Gemini in the spotlight, Gemini users are slamming Jack Krawczyk, Senior Director of Product at Google who leads Gemini, pointing to his older tweets where he is criticizing white people and racism in America.

Meanwhile, Google has acknowledged the inaccuracies saying it “misses the mark” and has decided to pause the model for some time. What is intriguing about the whole situation is that it was not earlier reported or fixed by Google, Gemini was caught ‘unaware’. The latest Gemini iteration was announced just a couple of days after OpenAI released its text-to-video AI model Sora. Even Google’s Bard had a reputation for being stubborn and erratic at times. Are all these releases hurried in just a bid to compete?

At the altar, Google and Reddit have entered into a partnership reportedly at $60 million per year, giving Google access to Reddit’s real-time data for AI training purposes. This collaboration is aimed at enhancing Google’s AI models by utilizing Reddit’s extensive content corpus through its data API, while Reddit will benefit from Google’s Vertex AI to improve its search capabilities.

Fireworks.ai has just released FireFunction V1a new function-calling model that promises to enhance developer capabilities in integrating external knowledge into LLM applications. With a performance 4x faster than GPT-4 and built upon the high-quality Mixtral 8x7B model, FireFunction V1 is designed to significantly improve upon its predecessor, offering both speed and accuracy for a wide range of applications.

Key Highlights:

FireFunction V1 outpaces GPT-4 in response times, with latencies ranging between 0.4 to 0.6 seconds, compared to GPT-4’s 2.3 to 3.0 seconds. This speed enhancement does not compromise the model’s accuracy or its ability to handle real-world use cases.
When compared with GPT-4 and Mixtral-Instruct + JSON mode, it demonstrates superior or comparable accuracy across different metrics. For instance, with fewer than 5 functions, FireFunction and GPT-4 both achieve an accuracy of 87.88%, while with more than 10 functions, FireFunction’s accuracy is at 84.43% versus GPT-4’s 89.16%. Additionally, FireFunction V1 has improved its response accuracy for multilingual inputs.
FireFunction V1 introduces an enhanced ability to generate structured output and make routing decisions, including a novel option to configure “tool_choice” to ‘any’, forcing a function call regardless of the scenario. This feature, alongside the model’s refined capabilities in handling complex JSON specifications, empowers developers to create more dynamic and responsive applications.

You can now rate GPTs and provide private feedback directly to the builder, enhancing the user experience and improving the quality of custom GPTs. The new GPTs’ “About” section has been expanded to include various details such as builder social profiles, ratings, categories, number of conversations, conversation starters, and other GPTs by the builder. OpenAI

GG9sHk_WIAAdtlc.mp4 [video-to-gif output image]

Talently.ai’s Mock Interview: Prepare for your next interview with AI which conducts live interviews and gives human-like feedback, posing tailored questions and instant results to help you excel in any role worldwide, boasting a 4.7/5 inspection level for thorough candidate assessment.
SheetSavvy AI: Streamlines spreadsheet tasks by utilizing AI to automate categorization, data extraction, text writing, and cleaning, saving time. Integrated directly into spreadsheets, it offers intuitive AI-powered assistance without requiring external tools or programming expertise.
Magic Patterns: AI copilot for user interfaces that offers design inspiration and frontend code generation from simple text prompts. By connecting to custom UI libraries via platforms like Figma, Storybook, or Github, it facilitates generating code tailored to specific themes and components.

😍 Enjoying so far, TWEET NOW to share with your friends!

On Google and Gemini – having been part of organizations where there was an implicit progressive political leaning, I can see why it would have been hard for anyone to point out the obvious.
No employee can easily file a bug as you would have to wade through layers of policy and unspoken but understood cultural rules.
No one can be the kid from the Emperor’s New Clothes as that kid would have probably been instantly thrown into prison – or in this case, dragged before HR/ silently put in a penalty box. ~ Sriram Krishnan – sriramk.eth
The worst part about #GeminiGate is that Google was caught unaware. This means that the alarm was not raised internally. They were genuinely surprised.
Which in turn tells us that either nobody in a position to do something had noticed, or the people who noticed did not feel at liberty to speak up. ~ Alexandros Marinos
If Elon Musk is coming after Google, it means he senses there’s a big opportunity in the search market. We are entering the age of a new internet, that’s truth seeking, uncensored and built for the curious mind. ~ Aravind Srinivas
If AGI is imminent why do you care about nvidia earnings anon ~ Pata van Goon

That’s all for today!

See you tomorrow with more such AI-filled content. Don’t forget to subscribe and give your feedback below 👇

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!!

PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!

Share Unwind AI

Source link