omniAI support numerous AI models:
OpenAI o1
The o1 series of large language models are trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, producing a long internal chain of thought before responding to the user.
OpenAI o1 series models are new large language models trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, and can produce a long internal chain of thought before responding to the user. o1 models excel in scientific reasoning, ranking in the 89th percentile on competitive programming questions (Codeforces), placing among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeding human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA).
Input: 128,000
Output: 32,768
Training Data up to Oct 2023
GPT-4o
GPT-4o integrates text and images in a single model, enabling it to handle multiple data types simultaneously. This multimodal approach enhances accuracy and responsiveness in human-computer interactions. GPT-4o matches GPT-4 Turbo in English text and coding tasks while offering superior performance in non-English languages and vision tasks, setting new benchmarks for AI capabilities.
Input: 128,000
Output: 16,384
Training Data up to Oct 2023
Gemini 1.5 Flash
A multimodal model that is designed for high-volume, cost-effective applications, and which delivers speed and efficiency to build fast, lower-cost applications that don't compromise on quality.
Specification |
---|
Max input tokens: 1,048,576 |
Max output tokens: 8,192 |
Max raw image size: 20 MB |
Max base64 encoded image size: 7 MB |
Max images per prompt: 3,000 |
Max video length: 1 hour |
Max videos per prompt: 10 |
Max audio length: approximately 8.4 hours |
Max audio per prompt: 1 |
Max PDF size: 30 MB |
Training data: Up to May 2024 |
Gemini 1.5 Pro
A multimodal model that supports adding image, audio, video, and PDF files in text or chat prompts for a text or code response. This model supports long-context understanding up to the maximum input token limit.
Specification |
---|
Max input tokens: 2,097,152 |
Max output tokens: 8,192 |
Max images per prompt: 3,000 |
Max video length (frames only): approximately one hour |
Max video length (frame and audio): approximately 45 minutes |
Max videos per prompt: 10 |
Max audio length: approximately 8.4 hours |
Max audio per prompt: 1 |
Max PDF size: 30 MB |
Training data: Up to May 2024 |
Models by Anthropic
Anthropic is introducing a new family of models – Claude 3 Opus, its most powerful offering, Claude 3 Sonnet, its best combination of skills and speed, and Claude 3 Haiku, its fastest compact model - allowing customers to choose the exact combination of intelligence, speed that suits their business needs. All Claude 3 models can process images and return text outputs, and feature a 200K context window.
Claude 3.5 Sonnet
Claude 3 Sonnet by Anthropic strikes the ideal balance between intelligence and speed. It is engineered to be the dependable, high-endurance workhorse for scaled AI needs. Claude 3 Sonnet can process images and return text outputs, and features a 200K context window.
Supported use cases
- Data processing: RAG or search & retrieval over vast amounts of knowledge
- Sales: product recommendations, forecasting, targeted marketing
- Time-saving tasks: code generation, quality control, parse text from images
Claude 3 Opus
Claude 3 Opus is Anthropic's most powerful AI model, with state-of-the-art performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Claude 3 Opus shows us the frontier of what’s possible with generative AI. Claude 3 Opus can process images and return text outputs, and features a 200K context window.
Supported use cases
- Task automation: plan and execute complex actions across APIs and databases, interactive coding
- R&D: research review, brainstorming and hypothesis generation, drug discovery
- Strategy: advanced analysis of charts & graphs, financials and market trends, forecasting
Claude 3 Haiku
Claude 3 Haiku is Anthropic's fastest, most compact model for near-instant responsiveness. It answers simple queries and requests with speed. Users will be able to build seamless AI experiences that mimic human interactions. Claude 3 Haiku can process images and return text outputs, and features a 200K context window.
Supported use cases
- Customer interactions: quick and accurate support in live interactions, translations
- Content moderation: catch risky behavior or customer requests
- Cost-saving tasks: optimized logistics, inventory management, extract knowledge from unstructured data
Cohere Command R
Cohere is the leading AI platform that build world-class large language models (LLMs) and LLM-powered solutions that allow computers to search, understand meaning, and converse in text. Command R is a generative language model optimized for long-context tasks and large scale production workloads.
Supported use cases
Chat, text generation, text summarization, RAG on large amounts of data, Q&A, function calling
Note: no image processing is supported. with Command R.
Meta Llama 3.2
The Llama 3.2, of multilingual large language model (LLM) of pretrained and instruction tuned generative model in 90B size that take both text and image inputs and output text.
The Llama 3.2 90B Instruct model is a multimodal, fine-tuned model, leveraging 90 billion parameters to deliver unparalleled capabilities in image understanding, visual reasoning, and multimodal interaction, enabling advanced applications such as image captioning, image-text retrieval, visual grounding, visual question answering, and document visual question answering, with its unique ability to reason and draw conclusions from visual and textual inputs, making it an ideal choice for applications requiring sophisticated visual intelligence, such as image analysis, document processing, multimodal chatbots, and autonomous systems.
Supported use cases
- Image understanding
- Visual reasoning
- Image Captioning
- Image-Text Retrieval
- Visual grounding
- Visual question answering and visual reasoning
- Document visual question answering
Mistral Large 24.02
The most advanced Mistral AI Large Language model capable of handling any language task including complex multilingual reasoning, text understanding, transformation, and code generation.