Contents
- Preface
- The use of Chat-type large language models
- Local Large Language Model UI
- Introduction to common large language model API providers
- FBI warning: Boring situation alert
- Prerequisite knowledge
- OpenAI – ChatGPT
- Azure OpenAI
- Google Cloud AI – Gemini
- Anthropic Claude API
- Additional Information: Other Official API Providers
- Third-party API provider: OhMyGPT
- Summarize
Preface
How to embark on your AI journey? I believe this is the first question that confuses many people (especially those who only know the concept of AI but haven't actually used the technology). I was the same a few weeks ago; I had only heard about the power of AI, but hadn't delved too deeply into its practical applications.
Prior to this, my experience with AI was limited to OpenAI's ChatGPT. Like many others, I enjoyed the free services offered by OpenAI's official website: either through the web version or the official app client on my Mac, iPhone, or iPad. Because I didn't rely heavily on ChatGPT, the daily free allowance was generally sufficient for me (if I exceeded the limit, I would stop using it for a few hours before using it again). However, this "freebie" approach wasn't sustainable. Since AI technology is becoming increasingly important across various fields, I needed to gain a more orthodox understanding and learn about AI applications, rather than relying solely on a single platform or tool.
At the beginning, I had absolutely no idea how to start learning AI, especially when faced with different large language models, a wide variety of API providers, and questions about how to integrate AI technology into real-world work and life scenarios. I was completely clueless (I didn't even know what many of the terms meant, and I often couldn't understand the conversations I heard from friends in the group).
Finally, I could only start with the most basic concepts, and through practical deployment of large language model UI applications (Lobechat local version, server-side database version, Chatgpt-nextweb, etc.), and comparison of the advantages of various API providers (OpenAI, Azure OpenAI, Claude, etc.), I finally sorted out these introductory AI knowledge and wrote this article.
This article provides a simple and practical introduction to AI, covering the fundamentals of large language models, familiarizing you with commonly used API providers, and even how to build your own AI Chat application environment. Whether for personal interest or professional use of AI, this article offers a clear introductory guide.
The use of Chat-type large language models
How to use chatGPT
As I mentioned earlier, I had been using the free services provided by chatGPT's official website, which includes an official web version (https://chatgpt.com):

There's also an app version for Mac:

Actually, both the official web version accessed through the official website and the local Mac app version are the UI interfaces I use for chatGPT. However, the free version's UI backend is bound to the default GPT-4-turbo model version provided by OpenAI (this model is an optimized, lower-cost version that allows some free users to access the GPT-4o model within a certain quota). Once these quotas are used up, it will switch back to using the "GPT-3.5-turbo" model.
Note 1: The Mac version of chatGPT can be downloaded directly.chatGPT Mac version official download addressThere are also chatGPT apps on iOS and iPad, but you need to switch to a different region ID to download them, and you need a VPN or other similar technology to use them properly.
Note 2: Using chatGPT is now very simple. You can log in with an existing Google account, Microsoft account, or Apple account. Of course, you can also register an OpenAI account directly.
Use of other large language models for chat
Besides OpenAI's chatGPT, there are other similar large language models to choose from. However, they may not offer an official web version or ready-made app as an access UI, and are mostly only accessible via API. Therefore, to use the services of other large language model providers besides OpenAI, a universal large language model UI that supports various provider APIs is needed as a tool to access the APIs.
Generally speaking, normal access to a large language model requires two functionalities:
- Large Language Model UI
Essentially, it's a meticulously designed user interface (UI) that serves as a bridge connecting ordinary users with powerful but complex AI technology. Imagine a user-friendly control panel in front of you, behind which lies a vast AI system powered by various API providers. The beauty of the UI lies in its ability to conceal all the technical complexity, offering a simple and intuitive way to operate. You don't need to know programming or understand how the underlying APIs work; simply type in text as if you were having a casual chat. The UI cleverly translates your needs into instructions that the API can understand, and then translates the API's results back into a form you can easily comprehend. It acts like a translator, relaying information back and forth between you and the AI system. Furthermore, this interface helps you manage conversation history, save important information, and even allows you to adjust settings to personalize your AI assistant. In short, the UI of the Large Language Model (LLM) is a key tool for making complex AI technology accessible, enabling everyone to easily leverage the power of AI without needing to understand its technical details.
- API Providers
Behind the large language model UI, API providers play a crucial role, acting as the "brain" and "engine" of the system. Imagine these API providers offering a powerful suite of tools, each dedicated to a specific task. Some excel at transforming your ideas into vivid images, as if an invisible painter were always at your service; others can "understand" images, describing their content like a meticulous observer interpreting visual information. Still others can convert your speech to text, or vice versa, into fluent speech, as if a 24/7 secretary is recording and reading aloud for you. These APIs are like a collection of superpowers; they can understand natural language, answer complex questions, and even help you write code or create articles. API providers continuously update and optimize these tools, making them increasingly intelligent and efficient. Through these diverse APIs, the UI can provide users with virtually omnipotent services, easily handling everything from everyday conversations to complex creative tasks and professional analytical missions. In short, API providers are the technical teams that work silently behind the scenes, providing a continuous stream of intelligent power to the UI, allowing users to access a variety of amazing AI capabilities through a simple interface.
Local Large Language Model UI
Large language model UI suitable for personal use
There are many large language model UI options on the market, but today I will only talk about the three options that I think are suitable for general personal use:Lobechat,ChatGPT Next Web,Chatbot-UI.
Note: RegardingChatbot-UII looked into it and found that the deployment is quite complicated (unlike Lobechat and ChatGPT Next Web, which can be done with just a docker run command), and it doesn't have any obvious advantages, so I won't recommend it. However, I'll leave the following description here for anyone who's interested to research it on their own.
Lobechat
Lobechat UI Introduction
Lobechat is a feature-rich, open-source user interface for a native large language model, designed for ease of use and flexibility. Deployment is relatively simple, making it suitable for users with basic technical backgrounds. It supports Docker containerization, simplifying the installation process and allowing even non-professional developers to quickly build their own AI assistants. In terms of extensibility, Lobechat offers a plugin system, allowing users to add new features as needed. It supports multiple API providers, including OpenAI, Anthropic Claude, and Azure OpenAI, and is also compatible with open-source models such as llama.cpp and ChatGLM. Lobechat's interface is clean and intuitive, supports multiple languages, and provides rich dialogue management features, such as dialogue export and history search. For users who value data privacy and want complete control over the AI interaction process, Lobechat is an ideal choice. Its access interface is shown below:

The official GitHub link is as follows:https://github.com/lobehub/lobe-chat.
Lobechat deployment method
Lobechat supports two deployment modes, which are divided into client database mode and server database mode, depending on the location of user data storage.
- Lobechat client database mode
Deploying Lobechat in this mode means all data (such as user session logs and model configurations) is stored in the user's local browser cache or client-side database. This mode does not rely on a backend server, making it suitable for individual users or small projects. Deployment is simple and requires no additional server resources. User data is entirely under local control, offering good privacy. However, because it's stored on the client side, data is susceptible to browser cache clearing or device replacement, posing a certain risk of data loss. Furthermore, the client-side mode is unsuitable for scenarios requiring data synchronization across multiple devices. For example, most Lobechat accesses from the same PC client, so there's no need to synchronize data between multiple clients (no need to share Lobechat session data between PC and mobile clients).
The Lobechat client database mode deployment commands can be found as follows:
docker run --name lobe-chat -d --restart=always \ -p 3210:3210 \ -e ACCESS_CODE=xxx \ lobehub/lobe-chat
- Lobechat server-side database mode
When Lobechat is deployed in this mode, user session data, configurations, etc., are stored in a database on a remote server, typically through a managed database service. This mode is suitable for multiple users or teams, supporting data synchronization and centralized management across multiple devices. It enables persistent storage, ensuring long-term data security and integrity, but deployment is relatively complex.
The deployment of the Lobechat server-side database mode is complex because, in addition to the deployment of Lobechat-database itself, it also involves components such as PostgreSQL database, Minio COS (object storage), and Logto (authentication). Among these, Minio COS and Logto can be replaced by other third-party services.
Since the deployment is quite complex, I won't go into detail here. Interested readers can refer to the article:Docker series based on the open source large language model UI framework: Lobechat detailed deployment tutorial.
ChatGPT Next Web
Introduction to ChatGPT Next Web UI
ChatGPT Next Web is a highly customizable large language model user interface, renowned for its powerful features and flexible deployment options. Deployment is of moderate difficulty, requiring some technical knowledge, but detailed documentation and community support are provided. It supports multiple deployment methods, including one-click deployment via Vercel, Docker containerization, and traditional server deployment. In terms of scalability, ChatGPT Next Web employs a modular design, allowing developers to easily add new features or modify existing ones. It primarily supports OpenAI's API, but its flexible architecture also allows integration with other API vendors. Notably, it offers API proxy functionality to address access restrictions in certain regions. ChatGPT Next Web's key features include multi-user support, a customizable prompt dictionary, and the ability to export dialogues as Markdown or images. Its modern and responsive interface, supporting dark mode and multilingual support, makes it suitable for individual users and small teams requiring a highly customized AI assistant. Its access interface is shown below:

The official GitHub link is as follows:https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web.
ChatGPT Next Web Deployment
ChatGPT Next Web currently only supports client-side database mode, which is relatively simple to deploy. The deployment commands are as follows:
docker run -d -p 3000:3000 \ -e OPENAI_API_KEY=sk-xxxx \ -e CODE=your-password \ -e PROXY_URL=http://localhost:7890 \ yidadaa/chatgpt-next-web
Note 1:-e PROXY_URLYou can specify a proxy server (if needed).
Note 2: Although ChatGPT Next Web does not support server-side database deployment, it provides other ways to indirectly achieve data synchronization: WebDAV and UpStash.
WebDAV:

UpStash:

I haven't done detailed testing on its effectiveness, since I usually use Lobechat.
Chatbot UI:
Chatbot UI is a native large language model interface focused on providing an exceptional user experience. Its deployment is relatively easy (though not as easy as the client-side database versions of Lobechat and ChatGPT Next Web), offering detailed installation guides and multiple deployment options, including local and cloud deployment. In terms of scalability, Chatbot UI employs a modular architecture, allowing developers to add new features through a plugin system. It supports multiple API vendors, including OpenAI, Anthropic, and Cohere, and also supports integration of open-source models via Hugging Face. A key feature of Chatbot UI is its powerful conversation management capabilities, including conversation categorization, a tagging system, and advanced search functionality. It also provides detailed usage analysis tools to help users optimize their AI interactions. Regarding interface design, Chatbot UI offers various themes and layout options, allowing users to customize it to their preferences. Furthermore, it supports voice input and text-to-speech functionality, significantly improving accessibility. Chatbot UI is particularly suitable for professional users and researchers who need in-depth analysis and management of AI conversations. Its interface is shown below:

The official GitHub link is as follows:https://github.com/mckaywrigley/chatbot-ui/.
Note 1: As mentioned above, the deployment section is skipped. I do not recommend that beginners use Chatbot UI.
Note 2: Besides the three local large language model UIs mentioned above, there are actually quite a few others, such as:GPT4All Web UI,Oobabooga Text Generation Web UI,StableLM Web UI,Vicuna Web UI,Langchain UI,Hugging Face Inference API UIHowever, these UIs are either not easy to install, require a high level of technical expertise, or are too complex for the average user. Therefore, after comparing them, I will not include them in the recommended list. However, each of these UIs has its own characteristics and is suitable for people with different needs. If you feel that Lobechat or ChatGPT Next Web is not suitable for you, you can also check out other UIs.
Comparison of API vendors supported by default in two UIs
In fact, you can't tell how good or bad the local large model UI is just from its appearance (it's essentially just a chat dialog box). A very crucial criterion is the number of API vendors that are "supported by default" and the timely support for the model versions updated by the API vendors (API vendors are constantly adding new model versions, so the UI also needs to be updated in a timely manner to support them): the more supported vendors, the stronger the scalability.
Lobechat
Lobechat offers the best support for the number of API providers and new model versions among the three UIs, supporting up to 30 API providers:



Crucially, it also supports numerous domestic language model vendors, which is very user-friendly, offering a wide range of choices. Furthermore, Lobechat's language model settings support custom API addresses and model lists by default.

Note: The ability to "customize API address and model version" is excellent. Please bookmark this; I will mention it later.
ChatGPT Next Web
Compared to Lobechat, chatGPT-next-web supports fewer API providers by default, currently only supporting 10, but it does support commonly used ones such as OpenAI, Azure OpenAI, Google, and Anthropic.




As can be seen from the images above, the default language model and model version of ChatGPT Next Web are fixed, unlike Lobechat which allows for customization, resulting in less flexibility. However, the API address can be customized through the "Custom Endpoint" option to support more API vendors and model versions.


“Custom Endpoint”The "Custom Endpoint" option allows users to configure custom API endpoints (or model versions), meaning you can connect to model providers or locally running models beyond the 10 API providers supported by default. With "Custom Endpoint" enabled, you can use the "OpenAI Endpoint" option to access and connect to the API addresses of any third-party large language model provider that conforms to the API call specification, provided their interfaces are compatible with OpenAI's API. You can connect ChatGPT Next Web to these models by entering a custom URL. This option somewhat compensates for the limited number of API providers supported by default; however, this support is certainly less compatible than Lobechat's direct support.
Note: This option is the "Custom API Address and Model Version" feature provided by ChatGPT Next Web.
Introduction to common large language model API providers
FBI warning: Boring situation alert
This part is just a record and is rather dry. If you're not interested, you can skip to the summary at the end. I've made it more detailed for future reference.
Prerequisite knowledge
Transformer architecture
The Transformer architecture is a deep learning model architecture specifically designed for Natural Language Processing (NLP) tasks, proposed by Vaswani et al. in 2017. It revolutionized research and applications in the field of NLP, becoming the foundation for many modern language models, including the GPT series, BERT, and T5. It includes the following core concepts.
- Self-Attention Mechanism
The core of Transformer is the self-attention mechanism, which allows the model to focus on all the words in a sentence when processing it, rather than relying on fixed sequence processing (such as the sequential structure of RNNs). Each word can understand other words in the sentence through the self-attention mechanism, thereby capturing the relationships between words.
- Encoder-Decoder Structure
The original design of Transformer consisted of two parts:
• EncoderProcess the input sequence to generate a context-dependent representation.
• DecoderBased on the encoder's output and previous generation results, the output sequence is generated step by step.
However, in many language models (such as GPT), we only use a portion of the encoder or decoder. For example, GPT only uses the decoder structure of the Transformer.
- parallel computing
The Transformer uses a completely attention-based mechanism, abandoning the step-by-step computation mode of Recurrent Neural Networks (RNNs) and Long Short-Term Memory Networks (LSTMs). Its parallel computing capabilities enable the model to efficiently process long text inputs, significantly improving training speed.
- Multi-head attention
The Transformer uses a multi-head attention mechanism, enabling the model to understand data from different "angles". Through multiple independent attention heads, the model can capture the relationships between words at different semantic levels, further enhancing the model's expressive power.
- Positional Encoding
Because the Transformer does not rely on the order of the input like RNNs or LSTMs, it introduces positional encoding to represent the relative positions of words in a sentence. This allows the model to preserve the sentence's sequential information.
Applications and advantages of the Transformer architecture
1. High scalabilityCompared to previous RNN and LSTM models, Transformer can better handle long sequence data, especially performing well in understanding long sentences or documents.
2. High-efficiency parallelizationDue to its self-attention mechanism and sequence-independent processing, the Transformer can process the entire input sequence in parallel, which greatly speeds up the training process, especially on large-scale data.
3. Wide application:
• GPT seriesOpenAI has developed the GPT series of models (such as GPT-3 and GPT-4) based on the Transformer architecture, which are applied to tasks such as text generation and dialogue systems.
• BERTGoogle's BERT is also based on Transformer and is good at handling tasks with bidirectional contextual relationships, such as reading comprehension and text classification.
• T5The Transformer-based Text-to-Text Transfer Transformer (T5) transforms all NLP tasks into text generation problems, demonstrating the high flexibility of the Transformer architecture.
The Transformer architecture, with its self-attention mechanism, parallel computing capabilities, and wide range of applications, has become a core architecture in the field of NLP. It has driven the development of large language models and achieved great success in multiple tasks, making modern AI systems more intelligent and efficient in processing language.
GPT
GPT (Generative Pre-trained Transformer) is a large-scale language model based on the Transformer architecture, developed by OpenAI. It can generate human-like text and performs exceptionally well in many natural language processing tasks. The core of GPT lies in its pre-training and fine-tuning stages, learning from large-scale text data and optimizing with small amounts of task-specific data. Since its initial release, the GPT series has evolved into several versions, including GPT-2, GPT-3, and GPT-4.
The core principles of GPT are as follows:
- Based on Transformer architecture
GPT is based on the decoder portion of the Transformer, meaning it can understand context and generate new text through a self-attention mechanism. The Transformer architecture allows GPT to capture complex dependencies between words, making it excellent at handling long texts.
- Pre-training and fine-tuning
• Pre-trainingGPT is first pre-trained on massive amounts of unlabeled text data. The model learns syntactic and semantic rules of language by predicting the next word.
• Fine-tuningFor specific tasks (such as text generation, translation, or question answering), GPT can be fine-tuned using a small amount of labeled data to make its performance more suitable for the specific task requirements.
- Autoregressive generation
GPT is an autoregressive model that generates text incrementally, predicting the next word based on the parts already generated until the complete text is output. This makes GPT well-suited for generating fluent, coherent natural language text.
The main functions of GPT:
- Natural Language Generation
GPT excels at generating highly consistent and coherent text that fits the context. It can be used for tasks such as automatic writing, content creation, screenwriting, and generating product descriptions.
- Conversation and Chatbots
GPT can understand user input and generate appropriate responses, making it a core technology in many conversational systems and customer service robots.
- Question and Answer and Information Extraction
GPT can answer questions or extract relevant information based on given text or knowledge bases, and is widely used in question answering systems, search engines, and other scenarios.
- Zero-shot learning and few-shot learning
GPT-3 and GPT-4 demonstrated the ability to learn with zero and few samples, and were able to complete tasks with prompts even without being specifically fine-tuned for a particular task.
- Multi-domain applications:
GPT can perform tasks across multiple domains, including but not limited to programming code generation, medical text understanding, legal document analysis, and translation. GPT can generate creative text, and can even write articles, poems, and stories, demonstrating human-like creativity.
Evolution of models and versions
“"Model" is a common concept in the field of large language models, involving different organizations and frameworks. Besides the well-known OpenAI GPT-3 and GPT-4 models, other institutions are also developing and releasing their own models while continuously iterating and upgrading them.
- Google's BERT has evolved from the original BERT to subsequent improved versions (such as ALBERT and RoBERTa); Gemini has progressed from version 1.0 to the current 1.5, with version 2.0 about to be released.
- RoBERTa from Facebook AI Research (FAIR) has undergone larger-scale training and optimization than the original BERT.
- Hugging Face It maintains the Transformers library, supporting various models and versions, including BERT, GPT, T5, XLNet, etc.
- Microsoft The newly released DeBERTa showcases the version updates and performance improvements.
In the following article, I have listed the most common large language model API vendors on the market. Since there are too many, I have only listed the API vendors that I think are most commonly used (excluding domestic API vendors) and their main model versions.
OpenAI – ChatGPT
Introduction to OpenAI ChatGPT
What is OpenAI ChatGPT?
ChatGPT is a large-scale language model product from OpenAI, based on the GPT series, used to generate natural language dialogues. With its powerful language understanding and generation capabilities, ChatGPT can engage in multi-turn conversations with users, answer questions, generate creative text, assist in coding, and is widely used in chatbots, customer service systems, content generation, and other scenarios.
Key features of ChatGPT:
1. Based on the GPT modelChatGPT is based on the GPT (Generative Pre-trained Transformer) series of models, especially GPT-3.5 and GPT-4. GPT is a language model that uses the Transformer architecture for large-scale pre-training and fine-tuning, and excels at learning patterns from context and generating coherent text.
2. Conversational skillsChatGPT boasts powerful conversation generation capabilities, understanding and responding to a variety of user requests, from simple chat to complex question answers. It can engage in multi-turn conversations, remembering context and maintaining coherence.
3. Generate content in multiple languagesChatGPT supports multiple languages, allowing users to communicate with it in English, Chinese, French, Spanish, and other languages. It can not only generate text but also create poems, stories, or write code based on prompts.
4. Highly efficient learning abilityChatGPT uses few-shot learning and zero-shot learning techniques. Users only need to provide a few examples, or even none at all, and ChatGPT can understand the task and generate accurate text content.
5. AdaptabilityChatGPT offers assistance across multiple areas, from technical support and copywriting to academic discussions, demonstrating exceptional versatility. This makes it applicable to various industries, including education, entertainment, and business.
Advantages of ChatGPT:
• MultitaskingChatGPT can easily handle a variety of tasks, such as answering questions, generating conversations, writing code, and translating languages, making it highly versatile.
• Fast response timeWith optimized versions such as GPT-4 Turbo, ChatGPT can provide high-quality text generation in a shorter time.
• Widely usedWhether for business applications or personal use, ChatGPT can adapt to various scenarios, and users can quickly integrate APIs or deploy using ready-made applications.
Main model versions and their functions
The OpenAI API provides several different versions of the language model, each with different performance, uses, and costs:
GPT-4 series models
• GPT-4 Turbo:
This is an optimized version of GPT-4, offering faster response times and higher performance while reducing usage costs. GPT-4 Turbo is suitable for scenarios requiring efficient generation of large amounts of text content.
• GPT-4 (gpt-4o):
This is OpenAI's latest and most powerful model. It boasts exceptional language understanding and generation capabilities, handling complex text tasks: whether generating long articles or processing contextual dialogues, GPT-4 performs remarkably well, making it suitable for tasks such as high-precision content generation, complex dialogue systems, and technical document writing. It can understand long-term contexts and generate responses highly relevant to previous dialogues, making it suitable for use cases such as news media, academic research, technical writing, and customer support.
• GPT-4-mini (gpt-4o-mini):
This is a streamlined version of GPT-4. While its performance is slightly lower than the standard GPT-4, it still boasts powerful text processing capabilities. Compared to GPT-4, it has lower computational requirements and lower usage costs, making it suitable for scenarios that require a balance between performance and cost. It is suitable for moderately complex tasks, such as chatbots and automated customer service, offering advantages in both speed and cost. Ideal for customer service, content moderation, and routine text generation in small and medium-sized enterprises.
GPT-3.5 series models
• GPT-3.5 (davinci):
GPT-3.5 is the predecessor to GPT-4. While slightly less powerful than GPT-4, it remains capable of handling most text generation tasks. It strikes a good balance between the quality and speed of generated content. Suitable for various tasks such as content generation, dialogue systems, text translation, and code generation, its ability to handle complex tasks is second only to GPT-4. Ideal for companies requiring automated generation or customer interaction, such as news platforms, online education, and technical support.
• GPT-3.5-mini (curie):
This is a lightweight version capable of handling most everyday text tasks. Compared to davinci, it's slightly less efficient at handling complex content, but better suited for scenarios requiring efficient processing of a large number of simple tasks. It's ideal for basic chatbots, text summarization, and simple content creation. It can quickly generate short documents or simple answers. Applications include e-commerce, content recommendation systems, and automated email replies.
OpenAI O1 series models
OpenAI O1 It is a relatively optimized version of the GPT model family, designed to provide more efficient computation and stronger generation capabilities in different scenarios. While retaining the core capabilities of large language models, the O1 model provides high efficiency and accuracy through optimized architecture and computational performance, making it suitable for various tasks such as dialogue generation, text analysis, and content creation.
Key features of the O1 model:
1. Optimize performanceO1 is optimized for higher performance compared to the traditional GPT model. It can handle complex language generation tasks, such as coding, long text generation, and complex question-answering tasks, while maintaining high efficiency.
2. Wide range of applicationsThe O1 model can be applied to multiple domains, including business services, technical support, content generation, and natural language processing tasks. It can adapt to more complex contextual requirements and provide accurate and coherent output in dialogue and generation tasks.
3. High-quality generationCompared to lighter models, O1 offers higher generation quality, especially in tasks that require understanding context and complex reasoning, where it outputs text that is more consistent with human language logic.
Advantages of the O1 model:
• Balanced computational cost and performanceThe O1 model strikes a good balance between computational resource usage and generation quality, providing sufficient inference capabilities without consuming excessive computational resources like larger models such as GPT-4.
• Suitable for multitaskingThe O1 can handle tasks ranging from simple text generation to more complex dialogues and coding. Whether generating creative copy, processing technical documents, or dealing with real-time dialogue needs, the O1 performs exceptionally well.
• Optimized response speedO1, through architectural optimization, significantly reduces response latency while providing high-quality output, making it particularly suitable for application scenarios that require rapid generation and feedback.
The difference between O1 and O1-mini:
• Model sizeO1 has more parameters than O1-mini, and therefore performs better in handling complex tasks and providing accurate generation.
• Generation capabilityO1 typically produces higher quality text than O1-mini, especially in context understanding and complex task processing. O1-mini is better suited for lightweight tasks, while O1 is better suited for tasks requiring higher accuracy and more complex reasoning.
• Calculation costAlthough O1 consumes more computing resources than O1-mini, it performs better in high-performance tasks and is suitable for users who have high requirements for the quality of the output.
Overall, OpenAI O1 is an efficient and powerful version of the model, suitable for applications that require balanced, accurate, and fast generation across a variety of natural language tasks.
Additional information: knowledge cutoff (training deadline)
On OpenAI's official pricing page, you can see a "knowledge cutoff" mentioned in the descriptions of different models; currently, it's for October 2023.

This means that the model's current knowledge scope only covers information prior to this date. The specific reasons for this are as follows:
- Limitations of model training data:
Models like GPT don't receive information in real time; instead, they are generated based on large amounts of pre-collected and processed training data. OpenAI trains the model with new datasets periodically (each training session can take several months). However, the model isn't actively updated after deployment until the next version is trained. Therefore, at a given point in time, the model's knowledge only includes the data from its last training iteration. This is because updating a model isn't simply a matter of adding new data; it requires extensive debugging, optimization, and validation to ensure the accuracy, reasonableness, and controllability of the generated content. Frequent introduction of new data could lead to unstable model quality. Therefore, model updates are typically performed periodically, rather than in real-time.
- Maintain transparency:
To help users understand the scope of knowledge covered by the model, OpenAI specifies the model's knowledge range in its documentation and user manuals. Knowledge Deadline(Knowledge cutoff). This allows users to clearly understand that the model will not include [knowledge cutoff] when answering questions. After October 2023The latest information, such as new technologies, news events, and policy changes, is crucial. Another reason is that current GPT models are trained offline, meaning they cannot acquire or learn new information from the internet in real time. These models need to acquire knowledge from static pre-trained datasets, and therefore cannot access and process the latest news, research, or dynamic information like a search engine.
- Avoid misleading:
This prompt also aims to avoid misleading users into thinking that the model can provide real-time information or respond to the latest situations. For example, if a user asks about the latest events in 2024, the model may not be able to answer accurately, and OpenAI uses this prompt to remind the user of this.
- **Model version stability:**
When providing APIs and commercial services, OpenAI typically uses rigorously validated, stable versions. While knowledge may become outdated after a model's release, ensuring the model's stable performance across multiple tasks and guaranteeing security and reliability takes precedence over the ability to update data in real time.
Therefore, although knowledge deadlines may seem “outdated,” this is a common practice in the current field of large-scale language models.
Companies like OpenAI may shorten this lag time in the future through technological advancements (such as enhanced real-time update mechanisms), but currently this "lag" is still a relatively common phenomenon.
Note: There is another model naming convention that is related to time, such as "gpt-4o-2024-08-06". This only indicates the model release or update date and is different from "knowledge cutoff". Please pay attention to this.
Price calculation unit: Token
In the pricing of the OpenAI API,Token Tokens are the basic unit for calculating usage costs. Understanding tokens is crucial for accurately estimating API call costs.
What is a token?
• Basic concepts of tokens:
A token is the smallest unit a language model uses to process text. It can be a word, a segment of words, or even a punctuation mark. For example, the English word "chat" is a token, while a long word like "incredible" might be split into multiple tokens. Punctuation marks and spaces can also be counted as tokens.
• The difference between tokens and characters:
Tokens are not simply mapped one-to-one to characters or bytes. The model breaks down and encodes the input text, generating multiple tokens. Therefore, the longer the input text, the more tokens are used.
Specific examplesFor example, the sentence "OpenAI is awesome!" consists of four tokens: "OpenAI", "is", "awesome", and "!". A single sentence in Chinese might be broken down into even more tokens. For complex text, the number of tokens can increase significantly.
The role of tokens in OpenAI pricing
In the OpenAI API,Each API call Billing will be based on the number of tokens used. When calling the API to generate text or perform other operations, the system will calculate the total number of tokens used for input and output, and charge according to the number of tokens used.
• Enter Token: The text you send to the model.
• Output TokenThe response generated by the model.
Total number of tokens for each API call = Number of input tokens + Number of output tokens.
Token billing exampleSuppose you input text containing 100 tokens and request a response of 50 tokens, the system will charge you for 150 tokens. The price per token varies depending on the specific model used (e.g., GPT-4 or GPT-3.5).
Current OpenAI Token price
The current official website price for GPT-4o is as follows:

The price of the GPT-4o mini is as follows:

可以看出,GPT-4o mini比GPT-4o却是便宜了很多。
OpenAI还有其他模型版本及功能,我就不一一介绍了。更多OpenAI的模型版本及价格参见官网:https://openai.com/api/pricing/.
Azure OpenAI
Azure OpenAI和上节内容讲过的OpenAI之间是什么关系呢?
Azure OpenAI 和 OpenAI 的关系可以描述为”微软 Azure 平台通过云服务提供 OpenAI 的模型”,从而让开发者能够在 Azure 环境中访问和使用 OpenAI 的先进人工智能模型。具体来说:
• OpenAI 是一家独立的 AI 研究公司,开发了像 GPT-3、GPT-4、Codex 等大语言模型,广泛用于自然语言处理、文本生成、代码生成等任务。
• Azure OpenAI 则是 微软 Azure 云平台 上的一个服务,提供了 OpenAI 的模型接口,使得用户可以在 Azure 环境中轻松集成和部署 OpenAI 的模型。
这两者的关系可以概括为:
- 技术合作:微软与 OpenAI 建立了深度的战略合作关系,微软提供云基础设施,OpenAI 专注于开发先进的 AI 模型。
- API 提供:通过 Azure OpenAI 服务,微软的云客户可以直接通过 API 访问 OpenAI 的 GPT 模型、Codex 以及其他前沿 AI 技术。用户可以利用 Azure 提供的集成优势(如安全、数据管理等)来构建和部署 AI 应用。
- 差异化平台:尽管 OpenAI 有自己的平台提供 API,Azure OpenAI 服务则是在 Azure 生态下的一部分,帮助用户将 OpenAI 的技术与微软的其他云产品(如 Azure 机器学习、存储、数据库等)无缝结合。
其实,最关键的点在于:OpenAI本身是限制大陆IP访问的,所以正常情况下要使用OpenAI必须使用科学或者魔法;并且如果是直接访问OpenAI官方API的方式,往往需要使用第三方代理来发起访问,稳定性难说之外,关键就是折腾。而Azure OpenAI提供的API,却是不需要使用科学或者魔法就可以直接访问的,这点非常之难得(目前国外API供应商能在国内不用科学或者魔法而直接使用的貌似就只有Azure OpenAI了)。
如果平时用户是使用本地大语言模型UI来访问OpenAI,那可选择的方法倒是很多,关键在于有些固化了OpenAI官方API地址的使用场景(比如wordpress的AI插件,具体设置步骤可以参看文章:Home Data Center Series WordPress Chatbot Plugin "AI Engine" Functional Exploration and Built-in Tools Research),在无法正常使用第三方API供应商的API地址的时候,有一个不需要科学或者魔法就可以直接使用的OpenAI API地址是非常重要的。
同时,Azure OpenAI相比其他API供应商的另一个优点,在于其API收费方式不是OpenAI官方的月租方式,而是按需使用付费,比如这个月就使用了1000个Token,那只需要付这1000个Token的使用费即可,这对于平时有其他的API供应商而只是需要一个备份的方案的朋友来说尤其适合。
注:Azure为新注册用户提供一个”价值 200 美元的免费试用额度”,用于在前 30 天内试用他们的各种云服务,这个优惠适用于首次注册 Azure 的用户,让他们可以体验和测试 Azure 的各种产品和服务,而无需立即支付费用(也就是说,OpenAI第一个月可以随便用了~)。
我在后面文章内容中提到的OpenAI同时包含”OpenAI”和”Azure OpenAI”,没有区分开来,因为本来就是不同渠道售卖的相同产品而已,不过,两者各有优势。
官方OpenAI:
- 缺点:plus套餐是20美金月租,这对于用得少的人(比如我)来说很不友好(用不起~)。
- 优点:每天有免费的GPT-4o额度可以白嫖(而且感觉训练知识库比较新,也不知是不是我的错觉),用完额度之后会切换到GPT-3.5-turbo,对于轻量用户来说,省着用还是够用的。
Azure OpenAI:
- 缺点:没有每天的免费额度,对白嫖党很不友好。
- 优点:没有月租费,用多少给多少,按需收费,可以作为备用手段。
Google Cloud AI – Gemini
Google Gemini Introduction
Google Gemini 是 Google 最新发布的 AI 语言模型系列,代表着 Google 在生成式 AI 和自然语言处理领域的顶尖水平。Gemini 是 Google 的旗舰大语言模型,它被设计用于处理和生成自然语言,支持从文本分析到复杂对话等各种任务。通过 Gemini,Google 将大幅提升其在生成式 AI 领域的能力。
Google Gemini特性:
- 多模态支持:
Gemini 模型不仅支持文本生成和处理,还能够理解并生成图像、视频、代码等多种形式的内容。这种多模态能力使得 Gemini 在诸如自动生成图像注释、多媒体内容理解等应用中表现优异。
- 强大的语言理解与生成:
与 OpenAI 的 GPT-4 类似,Gemini 具备极高的自然语言理解与生成能力。它能够处理复杂的上下文,并生成符合语境的高质量文本,适用于对话机器人、文档自动生成、代码自动补全等任务。
- 广泛应用领域:
Gemini 能够被应用于多个行业和领域,包括医疗、法律、金融、客户服务等。无论是需要自动生成专业报告,还是进行大规模的数据分析,Gemini 都能够快速适应并提供智能解决方案。
- 高效的知识整合:
Google 利用了其庞大的数据资源和知识图谱,确保 Gemini 拥有强大的知识基础。这让它在生成答案、撰写文章等任务时,更加准确并能够结合最新的事实与知识。
市场占有率与应用广泛性:
• Google 强大的生态系统:
Gemini 模型可以与 Google 的其他产品和服务(如 Google Cloud、Google Workspace)无缝集成,便于企业快速部署 AI 解决方案。
• 多模态能力:
相较于竞争对手,Gemini 的最大优势之一是其多模态处理能力。这让它不仅能处理文本,还能理解并生成多种形式的内容,适用于更多的场景和应用。
• 可扩展性与安全性:
依托 Google Cloud 的强大基础设施,Gemini 在处理大规模数据时表现稳定,并提供企业级的安全保护,适合从小型到大型企业的需求。
Main model versions and their functions
Google 发布了多种版本的 Gemini 模型,每个版本都针对不同的使用场景和需求。以下是目前的主要版本及其对应的功能:
1. Gemini 1
Gemini 1 是最早期的模型版本,支持强大的语言生成能力,适用于基础的对话机器人、文本总结和内容生成等任务。支持自然语言处理与生成,适用于简单的对话系统、文章生成等应用。
2. Gemini 1.5
Gemini 1.5 是对初代模型的升级版本,提升了生成质量和推理能力,并优化了多模态处理,能够处理包括图像、代码在内的复杂任务。不仅支持文本处理,还能理解和生成图像、视频,适用于多模态应用场景,比如自动内容创作、代码生成等。
3. Gemini 2
即将推出的版本,Google 表示该版本会进一步加强多模态 AI 的能力,能够更精准地结合文字、图像、声音等数据,为企业和用户提供更加智能的解决方案。预计将大幅提高模型推理效率,并在更多复杂的行业场景中提供支持,如医疗诊断、法律分析等。
4. Gemini Pro 和 Gemini 1.5 Pro
• Gemini Pro:这是为企业级用户量身定制的高端版本,提供更高的处理速度和更强的生成质量。适用于需要高准确性和高并发处理的大型企业。
• Gemini 1.5 Pro:基于 Gemini 1.5 的增强版本,专注于企业应用场景下的高效处理与安全保障,特别适用于需要高性能推理和跨国数据分析的场景。
Gemini 模型的优势与对比
- Gemini 与 GPT-4 :
Gemini 的多模态支持和 Google 知识图谱的集成让它在复杂任务处理上占据优势。而 GPT-4 专注于语言生成,在文本生成质量和语言模型的精细化处理方面表现卓越。
- Gemini 与其他模型的差异:
Gemini 强调跨领域能力,它不仅能生成文本,还可以理解和生成图像、视频等多模态内容,而大多数语言模型则主要专注于文本生成。
注:除了 Gemini 这个大语言模型之外,Google 还有另一款非常强大的自然语言处理工具:Google Cloud Natural Language API,这两者各有所长,前者侧重于生成式 AI,后者则侧重于对文本的分析和理解。不过,Google Cloud Natural Language API主要是针对企业可开发者,和AI的个人用户一般没什么关系(除非是插件之类的调用,见下一节内容)。
Additional information: Google Cloud Natural Language API
Google Cloud Natural Language API 是一个基于云端的服务,目的在于帮助开发者和企业理解和处理文本信息。它可以自动分析文章、文档或对话,提取出关键的主题、情感和实体(如人名、地名、组织等)。简而言之,Google 的 Natural Language API 让计算机具备理解人类语言的能力,帮助企业从海量文本中挖掘出有用的信息。
Google Cloud Natural Language API 的产品特性:
- 情感分析:它能够自动判断一篇文章或一段文字的情感倾向,是积极的、消极的,还是中立的。这对于客户反馈、社交媒体监控非常有帮助。
- 实体识别:API 可以从文本中提取重要信息,比如提到的人、地点、组织、产品等。这对于内容分类、信息检索和文本结构化处理非常有用。
- 句法分析:它能够解析句子的结构,分析其中的词汇和语法关系。对于文本理解、自动翻译、语言学习等任务有很大的应用价值。
- 内容分类:API 能够将文本自动分类到不同的主题类别(如体育、科技、娱乐等),这在自动化内容管理、新闻分类等场景下非常实用。
- Multi-language support:Google Cloud Natural Language API 不仅支持英文,还支持多种语言的分析,包括中文、西班牙语、法语等,适合全球化的企业和用户。有一些翻译软件就可以通过调用该API完成多语言翻译,比如wordpress上的翻译插件”GTranslate”和”TranslatePress”,对这2款插件的使用感兴趣的朋友可以参看我的另一篇文章:Home Data Center Series WordPress Sites Implement Multilingual Automatic Translation and Multilingual SEO Best Practices (GTranslate and TranslatePress).
市场占有率与应用广泛性:
Google Cloud Natural Language API 在全球市场中处于领先地位,特别是在处理非结构化文本数据方面。它在数据分析、客服系统、金融服务、内容推荐等领域被广泛应用。依托于 Google 强大的云计算基础设施,企业能够轻松扩展 API 的使用,无论是分析数百万条用户评论,还是处理复杂的社交媒体数据。
最大的竞争优势:
• 深度集成 Google 生态系统:Google Cloud Natural Language API 与 Google 的其他云服务(如 BigQuery、Cloud Storage)无缝集成,方便数据存储、分析和可视化。
• 准确的语言理解:由于 Google 长期积累的自然语言处理技术,它的 API 在情感分析、实体识别和内容分类方面有着较高的准确性。
• Scalability:Google 的云端基础设施确保了 API 可以处理从小规模文本到大规模数据集的各种需求,适合从小型企业到大型企业的使用。
Anthropic Claude API
Anthropic Claude API Introduction
Anthropic Claude API 是由 Anthropic 公司推出的人工智能语言模型平台,旨在为开发者提供强大且安全的自然语言处理能力。通过 Claude API,用户可以在各种应用中使用这些模型进行文本生成、对话、问答、内容撰写、数据分析等任务。Claude 的命名源自信息论的创始人 Claude Shannon,它的设计理念特别注重安全性、可控性和可靠性,以确保输出的内容对用户和社会没有潜在的风险。
Claude API 强调safetyand道德规范,它被设计为尽可能减少生成不适当、误导性或有害内容的风险。这使得它非常适合在医疗、金融、教育等对输出质量要求高的领域使用。
Claude API 的另一大优势在于它的易用性and扩展性,开发者可以非常容易地将其集成到现有系统中,通过简单的 API 调用即可使用强大的自然语言处理功能。此外,Claude 模型还具有高度的stabilityand可调节性,能够根据需求调整生成的内容风格和复杂度。
Main model versions and their functions
Anthropic 提供了多个不同的 Claude 版本,以适应不同的计算需求和任务类型。每个版本在性能、响应时间和功能集上有所不同,确保可以覆盖从轻量级任务到复杂文本生成的多样化应用。
1. Claude 3.5 Sonnet
Claude 3.5 Sonnet 是 Claude 3.5 版本的一个高性能变种,专门优化了在复杂文本处理、长篇文章生成中的表现。它能够生成更长、更连贯的文本,适合需要处理大量数据或长篇内容的任务,如报告生成、小说写作等。
Key Features:
• 提供高质量、连贯的长文本生成。
• 能够在复杂对话和多轮问答中保持上下文一致。
• 优化了处理时间较长的生成任务。
2. Claude 3.5 Lite
Claude 3.5 Lite 是一个轻量级版本,专门设计用于处理相对简单的文本生成任务,响应速度更快,适合对实时性能要求较高的应用,如智能客服、在线问答等场景。
Key Features:
• 提供快速响应的文本生成。
• 消耗较少的计算资源,适用于轻量级场景。
• 适合实时对话、用户互动等对速度有高要求的任务。
3. Claude 3.5 Chat
Claude 3.5 Chat 专门针对对话系统进行了优化,能够在对话生成和多轮交互中保持高效和一致的表现。它的应用场景包括智能客服、虚拟助理等需要与用户进行连续对话的场合。
Key Features:
• 高效处理多轮对话。
• 保持对话上下文的连贯性。
• 优化了对复杂问题的回答和对话生成。
4. Claude 3.0
Claude 3.0 是该系列的基础模型版本,适用于各种通用自然语言处理任务。它能够处理从文本生成、情感分析、到简单的问答系统等广泛的任务。
Key Features:
• 通用型自然语言生成和理解。
• 支持多轮对话和上下文保持。
• 在多数情况下可以提供高效且准确的文本生成。
模型版本之间的区别:
• Claude 3.5 Sonnet 是针对高质量、长篇文本生成的模型,适用于需要长时间保持上下文一致性的场景,如复杂的报告撰写和长篇内容生成。
• Claude 3.5 Lite 则是轻量级的版本,响应速度快,适合需要快速生成结果的应用,如客服系统或对话系统。
• Claude 3.5 Chat 专门针对多轮对话进行优化,确保在复杂对话中保持流畅、自然的互动。
一般我们直接用最新的Claude 3.5版本即可。
Additional Information: Other Official API Providers
除了 OpenAI、Azure OpenAI、Google Gemini和 Anthropic Claude之外,市面上还有其他几家常用的大语言模型 API 供应商。以下简要介绍一下这几家供应商及其特点和提供的模型版本的介绍,以便大家有个印象。
1. Cohere
Cohere 是一家提供大规模自然语言处理 API 的公司,专注于自然语言理解和生成,适用于文本分析、文档分类、情感分析等多种任务。Cohere 的模型针对开发者和企业,提供了灵活且高效的语言模型服务。
特点与优势:
• 文本生成和理解:Cohere 的 API 提供了强大的文本生成和理解功能,适合用于生成复杂文本、总结、翻译等任务。
• Custom Model:Cohere 支持用户基于其模型进行Fine-tuning,以适应特定的行业需求。
• 开放架构:允许用户自定义模型输出,支持不同的语言和风格,适合多语言应用。
• 安全与合规:Cohere 强调模型的安全性和对用户数据隐私的保护。
主要模型:
• Command:Cohere 的旗舰模型,专注于指令式文本生成任务,适合文本撰写、回答问题和生成对话等。
• Rerank:用于提高搜索结果的准确性,适合信息检索和排序任务。
• Embed:专注于文本嵌入,适合文本分类、相似性分析等任务。
2. Hugging Face (Transformers API)
Hugging Face 是开源人工智能社区的领军者,提供了数百种自然语言处理模型。其 Transformers API 允许开发者调用各种预训练的大型语言模型,包括 GPT 系列、BERT、RoBERTa 等模型,满足各种自然语言处理任务需求。
特点与优势:
• 模型种类丰富:Hugging Face 提供了海量的开源模型库,开发者可以选择各种开源模型,并且可以微调现有模型以满足特定需求。
• Community Support:Hugging Face 社区非常活跃,开发者可以通过社区资源快速学习并应用模型。
• 多任务支持:支持文本生成、翻译、情感分析、信息提取、对话系统等多种任务。
• 低成本和开源:多数模型可以免费使用,并且提供免费的微调和托管服务。
主要模型:
• GPT-2/3:Hugging Face 提供了 GPT 系列的开源版本,支持文本生成、对话等任务。
• BERT:适合文本分类、情感分析、问答系统等任务。
• RoBERTa:BERT 的改进版,擅长文本理解和上下文分析。
3. Mistral
Mistral 是一家新兴的大语言模型提供商,专注于高性能的开源语言模型。其发布的模型在准确性和计算效率上都有极高的表现,适合需要高质量生成和文本处理的任务。Mistral 专注于提供灵活的开源模型,适合开发者和企业定制化使用。
特点与优势:
• 开源模式:Mistral 提供开源的大型语言模型,允许用户在本地或云端进行定制化应用。
• 高性能模型:其模型能够在不牺牲生成质量的情况下显著提升处理速度,适合大规模并发任务。
• 专注文本生成:Mistral 的模型特别擅长生成自然流畅的文本,适合内容创作、代码生成、问答系统等场景。
主要模型:
• Mistral-7B:Mistral 旗下的旗舰模型,参数量为 7 亿,适合复杂的文本生成任务。相比其他同类模型,Mistral-7B 在性能和准确度上具有更高的性价比。
4. Meta (LLaMA API)
Meta(Facebook)推出了自己的大型语言模型 LLaMA,该模型主要用于研究和商业应用。LLaMA 提供了不同版本,适合各种自然语言处理任务,包括生成文本、理解上下文、问答等。
特点与优势:
• 轻量化:与其他大型语言模型相比,LLaMA 在保持高性能的同时,显著降低了计算资源的消耗。
• 开源模型:LLaMA 的模型开源,开发者可以基于 LLaMA 进行二次开发和微调。
• 研究驱动:LLaMA 主要面向学术界和研究机构,适合高级研究型任务。
主要模型:
• LLaMA 2:LLaMA 2 是 Meta 推出的新一代语言模型,具备增强的上下文理解和文本生成能力,适合各类自然语言处理任务。
• LLaMA 13B/65B:参数规模分别为 130 亿和 650 亿,专门用于处理复杂的自然语言任务。
Third-party API provider: OhMyGPT
OhMyGPT API Introduction
OhMyGPT API是一家专门提供多个大语言模型 API 对接服务的第三方平台(官方网址:https://www.ohmygpt.com/)。它的主要功能是帮助用户更方便、灵活地接入不同的大型语言模型 API,而无需单独配置和管理多个账户或平台。因此,用户可以通过 OhMyGPT 一站式调用 OpenAI、Anthropic、Google 等多家供应商的语言模型,从而减少管理和技术上的负担。
产品特性:
• 统一接口管理:OhMyGPT 为用户提供一个平台,整合多个语言模型的 API,通过一个统一的接口调用不同的模型,简化了开发流程。
• 价格透明:虽然 OhMyGPT 本身不提供模型,但它整合了市场上的 API 服务,并且通过打包定价或灵活计费的方式,帮助用户节省成本。
• 无缝切换供应商:通过 OhMyGPT,用户可以在不同的 API 供应商之间快速切换,便于比较不同模型的表现并选择最适合的方案。
• 开发者友好:OhMyGPT 提供简洁的文档和丰富的示例,适合技术背景各异的开发者轻松上手。
功能与市场竞争优势:
• 简化接入:开发者无需分别注册多个大语言模型 API 的账号,OhMyGPT 统一提供了 API 对接服务,简化了接入过程。
• 成本优化:通过打包不同 API 供应商的服务,OhMyGPT 为用户提供了更加灵活的付费方式,适合预算有限的开发者和中小型企业。
• 支持多供应商:无论是 OpenAI 还是 Anthropic Claude,OhMyGPT 允许用户快速对接,并根据需求灵活选择最适合的模型。
OhMyGPT supports major API vendors and their functions.
OhMyGPT 并没有自己的模型,它的作用是为用户提供一个平台,整合各大语言模型 API 供应商的服务。以下是 OhMyGPT 支持的主要语言模型 API 供应商及其功能:
- OpenAI API:支持 GPT-4、GPT-3.5 系列模型,擅长文本生成、对话系统和文本分析任务。
- Anthropic Claude API:支持 Claude 系列模型,擅长安全性高、稳健性好的对话和文本生成任务。
- Google Gemini API:提供强大的多模态处理能力,适合图像、文本等不同类型数据的综合处理。
- Azure OpenAI API:由微软提供,除了 OpenAI 模型的服务,还整合了 Azure 的云计算优势,适合企业级应用的场景。
所有支持的API供应商及模型如下:



Advantages of OhMyGPT
通过OhMyGPT来使用其他的API供应商的服务有以下几点优势:
• 跨平台使用:通过 OhMyGPT,用户只需一次付费,就可以在多个 API 供应商之间无缝切换,适应不同的使用场景(需要配合支持快速切换的本地大语言模型UI使用,例如Lobechat(ChatGPT Next Web应该也可以,只是我没试过):

• 一站式体验:简化了多平台 API 的管理,降低了使用难度,特别适合需要频繁切换不同模型的开发者,也就是说,可以通过一个API地址就能访问多个API供应商(和上面一条一样,需要APP或者UI支持)。同时,可以非常清楚的知道各个API的消费明细:

• 多个API地址可供选择:官方提供了多条API线路,适合国内、国外不同的网络环境使用

• 性价比高:OhMyGPT采用预付费积分制,提供每日免费积分,所以更适合小规模、不频繁的使用的朋友(比如我~),而最关键的是,基本使用费仅需20元人民币(我都勉强用得起):

注:还记得前面mark过的本地大语言模型UI支持的”自定义API地址和模型版本”功能吗?只要可以自定义,理论上就可以配合OhMyGPT来实现对多个API供应商及模型版本的访问。不过嘛,理想和现实是有差距的,理论是理论,实际上却不是都可以成功(涉及API格式的兼容性问题),所以只能说大家可以先试试。
Summarize
前文提到多个API供应商的不同的语言模型和版本,并且这些语言模型各自擅长的领域以及定价不尽相同,可能会在选择时搞得大家头昏脑胀。在日常生活中选择不同的 API 供应商和具体的模型版本时,取决于你对模型的”功能需求、速度要求、预算以及应用场景”的考量。
以下对常见的GPT-4o-mini,GPT-4o,OpenAI o1,Google Gemini 1.5、以及Anthropic Claude 3.5 Sonnet 的一些关键进行比较,以供大家参考。
GPT-4o-mini
这是 OpenAI 的一个优化版轻量模型,比标准版 GPT-4o 更快速和经济,但保留了 GPT-4 系列的强大推理能力。
Applicable scenarios:
• 适合日常对话、轻量级任务或需要快速响应的应用,如简单的客服系统、实时聊天等。
advantage:
• 更低的成本,适合预算有限的用户。
• 响应速度快,适合实时应用场景。
shortcoming:
• 相比 GPT-4o 完整版,在更复杂的任务上表现较弱。
• 语言理解和推理能力相对较低。
GPT-4o
标准的 GPT-4 优化版本,具备强大的语言理解和推理能力,处理复杂问题表现出色。
Applicable scenarios:
• 适合需要高精度、复杂语言处理的应用,如内容创作、技术问答、复杂客户支持等。
advantage:
• 强大的文本生成和推理能力,适合广泛的复杂任务。
• 在多任务处理上表现优越。
shortcoming:
• 相对较高的使用成本。
• 响应速度可能不如轻量版本。
OpenAI o1
OpenAI 系列中的另一个重要模型,强调多领域表现,擅长处理各种复杂的 NLP 任务。
Applicable scenarios:
• 适用于需要多领域知识和更高精度的应用,如法律、医学、教育等垂直领域的内容创作和技术分析。
advantage:
• 广泛的领域覆盖,擅长处理高难度问题。
• 在生成上下文复杂的文本时表现出色。
shortcoming:
• 价格相对较高。
• 对于简单任务可能显得过于强大和浪费资源。
Google Gemini 1.5
Google 的大语言模型系列,具有强大的Multimodal处理能力(文字、图片等),Gemini 系列特别强调与现实世界知识的结合。
Applicable scenarios:
• 适合需要跨模态处理的应用场景,如生成图文结合的内容、视觉与语言混合任务、知识推理等。
advantage:
• 多模态支持,适合需要图片和文本处理的任务。
• 深度集成了 Google 知识图谱,信息更加精确和可靠。
shortcoming:
• 成本较高,特别是对于中小型应用。
• 对于纯文本处理任务,可能没有明显的优势。
Anthropic Claude 3.5 Sonnet
Claude 系列模型以安全性、稳定性见长,专注于可控性和对话生成的优化,强调对敏感话题的安全处理和伦理问题。
Applicable scenarios:
• 适合需要对话系统、客户支持等应用,特别是对模型安全性和敏感内容管理要求较高的场合,如医疗、心理咨询等领域。
advantage:
• 对话生成能力强,专注于安全性,适合需要高伦理标准的场景。
• 在长对话中上下文理解能力突出。
shortcoming:
• 成本可能较高,特别是在广泛应用的情况下。
• 对比其他模型,可能在数据广度上略有局限。
How to choose?
1、预算有限,追求快速响应:
• 选择 GPT-4o-mini。它的性能足以应付大多数日常任务,且成本较低,响应速度较快。
2、需要高精度、复杂问题解决方案:
• 选择 GPT-4o 或 OpenAI o1(贵)。这两个模型能够处理复杂任务,如内容创作、深入的技术问答或多领域分析。
3、跨模态处理(文字+图片):
• 选择 Google Gemini 1.5。如果你的应用需要处理文字和图像的组合内容,这个模型更合适。
4、重视对话生成的安全性和稳定性:
• 选择 Claude 3.5 Sonnet。如果你的应用对伦理安全要求高,例如在医疗、心理健康等领域,这个模型的优势会更明显。
最后,不管选择以上哪个模型,都可以试试通过OhMyGPT来使用。
注1:据说从写代码的角度来说,OpenAI o1或者OpenAI o1-preview最好,但是太贵,而Claude 3.5 Sonnet相比OpenAI o1差别不大,但是价格却便宜很多,如有有朋友有用AI写代码的需求,可以验证下是否准确。
注2:除了以上这些我认为常用的API供应商及模型,还有很多在某些特定垂直领域有优势的API供应商及模型,只不过我现在孤陋寡闻,没听说过,自然没法一一整理出来,大家如果有非常规的需求,不用局限于本文中这些常规的API供应商,可以看看在特定领域有优势的其他API供应商,根据具体的需求权衡模型的能力、成本和应用场景的匹配度,最终来确认最合适的API供应商。
注3:据说一些国内的API供应商用起来也不错,有兴趣的朋友可以试试。