AYA Released! ๐ OpenAI SORA, ๐ Google Gemini 1.5 and Gemma, ๐ฐ $7 trillion for new AI chip project, ๐ ๏ธ V-JEPPA, ๐จ Stable Diffusion 3 and first AI Heist ๐ต๏ธโโ๏ธ
AI Connections #48 - a weekly newsletter about interesting blog posts, articles, videos, and podcast episodes about AI
AYA IS RELEASED ๐๏ธ
The Aya Dataset
is a multilingual instruction fine-tuning dataset curated by an open-science community via Aya Annotation Platform from Cohere For AI. The dataset contains a total of 204k human-annotated prompt-completion pairs along with the demographics data of the annotators.
Dataset: https://huggingface.co/datasets/CohereForAI/aya_dataset
Dataset paper: https://arxiv.org/abs/2402.06619
The Aya Collection is a massive multilingual collection consisting of 513 million instances of prompts and completions covering a wide range of tasks. This collection incorporates instruction-style templates from fluent speakers and applies them to a curated list of datasets, as well as translations of instruction-style datasets into 101 languages. Aya Dataset, a human-curated multilingual instruction and response dataset, is also part of this collection. See our paper for more details regarding the collection.
Aya collection: https://huggingface.co/datasets/CohereForAI/aya_collection
The Aya model is a massively multilingual generative language model that follows instructions in 101 languages. Aya outperforms mT0 and BLOOMZ a wide variety of automatic and human evaluations despite covering double the number of languages. The Aya model is trained using xP3x, Aya Dataset, Aya Collection, a subset of DataProvenance collection and ShareGPT-Command. We release the checkpoints under a Apache-2.0 license to further our mission of multilingual technologies empowering a multilingual world.
Model: https://huggingface.co/CohereForAI/aya-101
Model Paper:https://arxiv.org/abs/2402.07827
Take advantage of our special bundle sale on Linkedist Academy courses, including LinkedIn Personal Branding and LinkedIn Sales. This is a great chance to improve your LinkedIn profile and sales skills at a discounted price.
SPECIAL OFFER
READ ๐
โSora: Creating video from textโ blog post by OpenAI: READ
OpenAI released Sora, a model that can generate videos up to a minute long while maintaining visual quality and adherence to the userโs prompt. Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt but also how those things exist in the physical world.
โVideo generation models as world simulatorsโ blog post by OpenAI: READ
This technical report outlines a method for converting diverse visual data into a unified representation for extensive training of generative models and evaluates the capabilities and limitations of Sora, a generalist model that generates videos and images of various durations, aspect ratios, and resolutions, including up to a minute of high-definition video. Unlike previous work that often focused on specific types of visual data or fixed video sizes, Sora encompasses a broader range of visual data generation.
โGemma: Introducing new state-of-the-art open modelsโ blog post by Google DeepMind: READ
Google introduces Gemma, a new generation of lightweight, state-of-the-art open models for developers and researchers, aimed at building AI responsibly with tools for safe application creation. Gemma models, available in two sizes and with support for major frameworks, come with a Responsible Generative AI Toolkit, ready-to-use notebooks, and easy deployment options on various platforms, marking a significant contribution to the open AI community.
โGemini 1.5: next-generation modelโ blog post by Google DeepMind: READ
Google DeepMind announces Gemini 1.5, a next-generation AI model with significantly enhanced performance, including a new Mixture-of-Experts architecture for more efficient training and serving. Gemini 1.5 Pro, the first model released for early testing, is optimized for a wide range of tasks with a breakthrough experimental feature for long-context understanding, offering a context window of up to 1 million tokens for select users.
โStable Diffusion 3โ blog post by Stability AI: READ
Stable Diffusion 3 is announced in early preview as the most advanced text-to-image model, offering enhanced capabilities in handling multi-subject prompts, image quality, and text spelling. Although not widely available yet, a waitlist for early access is open to gather insights for further improvements in performance and safety before a broader release.
โMy benchmark for large language modelsโ blog post by Nicholas Carlini: READ
Nicholas Carlini has introduced a new benchmark for evaluating large language models (LLMs) on GitHub, featuring nearly 100 tests derived from his real interactions with various LLMs. The benchmark uniquely includes a simple dataflow domain specific language for adding tests, evaluating capabilities like code conversion, understanding minified JavaScript, identifying data encoding formats, parsing, and generating queries, with most tests assessed by executing the model-generated code.
โAdept Fuyu-Heavy: A new multimodal modelโ blog post by Adept AI: READ
Adept introduces Fuyu-Heavy, the world's third-most-capable multimodal model, notable for its excellence in multimodal reasoning, particularly in UI understanding, where it outperforms even Gemini Pro on the MMMU benchmark. Despite its focus on multimodal tasks, Fuyu-Heavy matches or exceeds the performance of similarly sized models on text-based benchmarks, showcasing the scalability and efficiency of the Fuyu architecture in handling diverse data types.
โPhind-70B โ closing the code quality gap with GPT-4 Turbo while running 4x fasterโ blog post by Phind: READ
Phind-70B, an advanced model fine-tuned from CodeLlama-70B, narrows the code quality gap with GPT-4 Turbo, operating four times faster and delivering high-quality technical responses at up to 80 tokens per second. Outperforming GPT-4 Turbo in HumanEval with a score of 82.3% and providing a superior user experience for developers, Phind-70B demonstrates its efficacy in real-world code generation tasks, offering detailed code examples with less hesitancy than its predecessor.
โV-JEPA: The next step toward Yann LeCunโs vision of advanced machine intelligence (AMI)โ blog post by Meta: READ
The Video Joint Embedding Predictive Architecture (V-JEPA) model, based on Yann LeCun's vision for human-like AI, marks a significant advancement in machine intelligence by excelling in detecting and understanding complex interactions in the physical world. Released under a Creative Commons NonCommercial license, V-JEPA aims to foster a more grounded understanding of the world, enabling machines to achieve generalized reasoning and planning akin to human learning and adaptation.
โBuilding an early warning system for LLM-aided biological threat creationโ blog post by OpenAI: READ
In an effort to enhance AI-enabled safety risk evaluation methods, particularly concerning biological risks, a study involving 100 participants, including biology experts and students, was conducted to assess the potential for AI systems like GPT-4 to increase access to dangerous information on biological threat creation. The findings showed mild uplifts in accuracy and completeness for tasks related to biological threat creation among those with access to GPT-4 compared to an internet-only control group, but these increases were not statistically significant, underscoring the need for further research to determine meaningful risk levels.
โOpenAI new embedding models and API updatesโ blog post by OpenAI: READ
OpenAI is launching new models, including two new embedding models, updated previews of GPT-4 Turbo and GPT-3.5 Turbo, and an updated text moderation model, alongside reducing prices for GPT-3.5 Turbo and introducing enhanced developer tools for API key management and usage insights. Additionally, by default, data transmitted to the OpenAI API will not contribute to the training or enhancement of OpenAI's models.
โEagle 7B : Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages (RWKV-v5)โ blog post by Eugene Cheah: READ
Eagle 7B, built on the RWKV-v5 architecture, is celebrated as the most environmentally friendly 7B model, boasting the lowest inference cost in its class and trained on over 1.1 trillion tokens in more than 100 languages. This "Attention-Free Transformer" foundation model matches or exceeds the performance of other 7B class models across multi-lingual benchmarks and approaches the performance of larger models in English evaluations, available for both personal and commercial use under the Apache 2.0 license through the Linux Foundation.
โOpenAI CEO Sam Altman seeks as much as $7 trillion for new AI chip project: Reportโ article by CNBC: READ
OpenAI CEO Sam Altman is seeking up to $7 trillion in investments to significantly expand global semiconductor production, addressing the acute shortage of AI chips that hampers OpenAI's growth. His ambitious plan, which includes discussions with various investors such as the UAE government, aims to build massive-scale AI infrastructure to strengthen economic competitiveness and ensure a resilient supply chain for the burgeoning demand in generative AI technologies.
โDeepfake scammer walks off with $25 million in first-of-its-kind AI heistโ article by arstechnica: READ
A multinational company's Hong Kong office was defrauded of HK$200 million (US$25.6 million) through a sophisticated deepfake scam, where scammers used AI to mimic the company's CFO and other employees in a video call, tricking an employee into transferring funds. This incident, the first of its magnitude in Hong Kong involving deepfake technology in a multi-person video conference scam, highlights the growing challenge of distinguishing authentic from fabricated digital content.
READ (RESEARCH PAPERS) ๐
โGemma: Open Models Based on Gemini Research and Technologyโ research paper by Google DeepMind: READ
Gemma introduces a new family of lightweight, state-of-the-art open models that excel in language understanding, reasoning, and safety, outperforming other models of similar size on the majority of evaluated text-based tasks. With the responsible release of these models, in both 2 billion and 7 billion parameter sizes, the initiative aims to enhance the safety of frontier models and fuel further innovations in large language models (LLMs).
โGemini 1.5: Unlocking multimodal understanding across millions of tokens of contextโ research paper by Google DeepMind: READ
Gemini 1.5 Pro, the latest in the Gemini family, is a compute-efficient multimodal mixture-of-experts model with unparalleled capability in recalling and reasoning across vast contexts, including texts, videos, and audios. It sets new benchmarks in long-context retrieval, QA tasks, and ASR, with near-perfect recall and performance surpassing previous models, demonstrating groundbreaking abilities in language translation, even for languages with minimal speakers.
โRevisiting Feature Prediction for Learning Visual Representations from Videoโ research paper by Meta AI: READ
The paper introduces V-JEPA, a series of vision models trained on 2 million videos without conventional supervision methods, focusing on feature prediction for unsupervised learning. These models achieve versatile visual representations, excelling in motion and appearance-based tasks across multiple benchmarks, demonstrating the efficacy of learning by predicting video features.
โSuppressing Pink Elephants with Direct Principle Feedbackโ research paper by synthlabs.ai: READ
The paper presents a novel approach, Direct Principle Feedback (DPF), for controlling language models at inference time, allowing them to adapt to diverse contexts by directly applying feedback on critiques and revisions, illustrated through the "Pink Elephant Problem." After DPF fine-tuning, their 13B LLaMA 2 model significantly outperforms standard LLaMA and a prompted baseline, achieving parity with GPT-4 on tests related to avoiding specific entities while focusing on preferred topics.
โEfficient Multimodal Learning from Data-centric Perspectiveโ research paper by Beijing Academy of Artificial Intelligence: READ
The paper introduces Bunny, a new family of lightweight Multimodal Large Language Models (MLLMs) that outperforms larger MLLMs by utilizing more informative training data for efficient multimodal learning. Despite their smaller size, Bunny models, particularly Bunny-3B, surpass the performance of state-of-the-art MLLMs like LLaVA-v1.5-13B on various benchmarks, addressing the issue of computational cost that limits wider deployment.
โYOLOv9: Learning What You Want to Learn Using Programmable Gradient Informationโ research paper by Chien-Yao Wang: READ
This paper introduces the concept of Programmable Gradient Information (PGI) to address the issue of data loss in deep learning networks through layer-by-layer feature extraction and proposes a new lightweight network architecture, Generalized Efficient Layer Aggregation Network (GELAN), designed around gradient path planning. The implementation of PGI and GELAN, tested on the MS COCO dataset for object detection, demonstrates superior parameter utilization and performance, even outperforming state-of-the-art methods that rely on depth-wise convolution, highlighting the potential of PGI to enhance model training from scratch across various model sizes.
โFiT: Flexible Vision Transformer for Diffusion Modelโ research paper by Zeyu Lu: READ
The Flexible Vision Transformer (FiT) is introduced to address the limitations of existing diffusion models by generating images of unrestricted resolutions and aspect ratios, conceptualizing images as sequences of dynamically-sized tokens for adaptable training. This innovative approach, enhanced by a tailored network structure and training-free extrapolation techniques, demonstrates FiT's superior performance in generating high-quality images across a wide range of resolutions, far exceeding traditional models' capabilities.
WATCH๐ฅ
LEARN ๐
โByte Pair Encoding (BPE) algorithmโ github repository by Andrej Karpathy: READ
โMistral Cookbookโ github repository by Mistral: READ
COOL THINGS ๐
ChatGPT system prompt ๐
Groq fastest LLM platform using Language Processing units
TOOLS ๐ค
MetaVoice - text-to-speech technology, offering a platform to experiment with various voice styles and settings.ย
Sintra AI - automating tasks and processes, aiming to fuel business growth through the use of AI prompts and automation bots.ย
Adventure AI - educational platform designed to teach kids real AI skills through a social game.