omniAI support numerous AI models:


OpenAI o1

The o1 series of large language models are trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, producing a long internal chain of thought before responding to the user.


OpenAI o1 series models are new large language models trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, and can produce a long internal chain of thought before responding to the user. o1 models excel in scientific reasoning, ranking in the 89th percentile on competitive programming questions (Codeforces), placing among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeding human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA).


Input: 128,000

Output: 32,768

Training Data up to Oct 2023


GPT-4o


GPT-4o integrates text and images in a single model, enabling it to handle multiple data types simultaneously. This multimodal approach enhances accuracy and responsiveness in human-computer interactions. GPT-4o matches GPT-4 Turbo in English text and coding tasks while offering superior performance in non-English languages and vision tasks, setting new benchmarks for AI capabilities.


Input: 128,000

Output: 16,384

Training Data up to Oct 2023



Gemini 1.5 Flash

A multimodal model that is designed for high-volume, cost-effective applications, and which delivers speed and efficiency to build fast, lower-cost applications that don't compromise on quality.

Specification
Max input tokens: 1,048,576
Max output tokens: 8,192
Max raw image size: 20 MB
Max base64 encoded image size: 7 MB
Max images per prompt: 3,000
Max video length: 1 hour
Max videos per prompt: 10
Max audio length: approximately 8.4 hours
Max audio per prompt: 1
Max PDF size: 30 MB
Training data: Up to May 2024

Gemini 1.5 Pro

A multimodal model that supports adding image, audio, video, and PDF files in text or chat prompts for a text or code response. This model supports long-context understanding up to the maximum input token limit.

Specification
Max input tokens: 2,097,152
Max output tokens: 8,192
Max images per prompt: 3,000
Max video length (frames only): approximately one hour
Max video length (frame and audio): approximately 45 minutes
Max videos per prompt: 10
Max audio length: approximately 8.4 hours
Max audio per prompt: 1
Max PDF size: 30 MB
Training data: Up to May 2024


Models by Anthropic

Anthropic is introducing a new family of models – Claude 3 Opus, its most powerful offering, Claude 3 Sonnet, its best combination of skills and speed, and Claude 3 Haiku, its fastest compact model - allowing customers to choose the exact combination of intelligence, speed that suits their business needs. All Claude 3 models can process images and return text outputs, and feature a 200K context window.


Claude 3.5 Sonnet

Claude 3 Sonnet by Anthropic strikes the ideal balance between intelligence and speed. It is engineered to be the dependable, high-endurance workhorse for scaled AI needs. Claude 3 Sonnet can process images and return text outputs, and features a 200K context window.


Supported use cases

  • Data processing: RAG or search & retrieval over vast amounts of knowledge
  • Sales: product recommendations, forecasting, targeted marketing
  • Time-saving tasks: code generation, quality control, parse text from images


Claude 3 Opus

Claude 3 Opus is Anthropic's most powerful AI model, with state-of-the-art performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Claude 3 Opus shows us the frontier of what’s possible with generative AI. Claude 3 Opus can process images and return text outputs, and features a 200K context window.


Supported use cases

  • Task automation: plan and execute complex actions across APIs and databases, interactive coding
  • R&D: research review, brainstorming and hypothesis generation, drug discovery
  • Strategy: advanced analysis of charts & graphs, financials and market trends, forecasting


Claude 3 Haiku 


Claude 3 Haiku is Anthropic's fastest, most compact model for near-instant responsiveness. It answers simple queries and requests with speed. Users will be able to build seamless AI experiences that mimic human interactions. Claude 3 Haiku can process images and return text outputs, and features a 200K context window.


Supported use cases

  • Customer interactions: quick and accurate support in live interactions, translations
  • Content moderation: catch risky behavior or customer requests
  • Cost-saving tasks: optimized logistics, inventory management, extract knowledge from unstructured data


Cohere Command R

Cohere is the leading AI platform that build world-class large language models (LLMs) and LLM-powered solutions that allow computers to search, understand meaning, and converse in text. Command R is a generative language model optimized for long-context tasks and large scale production workloads.


Supported use cases

Chat, text generation, text summarization, RAG on large amounts of data, Q&A, function calling 

Note: no image processing is supported. with Command R.


Meta Llama 3.2


The Llama 3.2, of multilingual large language model (LLM) of pretrained and instruction tuned generative model in 90B size that take both text and image inputs and output text.


The Llama 3.2 90B Instruct model is a multimodal, fine-tuned model, leveraging 90 billion parameters to deliver unparalleled capabilities in image understanding, visual reasoning, and multimodal interaction, enabling advanced applications such as image captioning, image-text retrieval, visual grounding, visual question answering, and document visual question answering, with its unique ability to reason and draw conclusions from visual and textual inputs, making it an ideal choice for applications requiring sophisticated visual intelligence, such as image analysis, document processing, multimodal chatbots, and autonomous systems.


Supported use cases


  • Image understanding 
  • Visual reasoning 
  • Image Captioning 
  • Image-Text Retrieval 
  • Visual grounding 
  • Visual question answering and visual reasoning 
  • Document visual question answering

 

Mistral Large 24.02

The most advanced Mistral AI Large Language model capable of handling any language task including complex multilingual reasoning, text understanding, transformation, and code generation.