Contents
1 Introduction
From my previous articles about RAG and knowledge base, a word often appears——vectorIt can be said that vector is one of the core concepts of AI. Anyone who has been exposed to AI will probably be familiar with this term.
Even without AI, we're probably all familiar with vectors: in school, teachers might have asked us to draw arrows, write coordinates, or calculate lengths and directions in geometry problems. That's right, the vectors used in AI are essentially the same mathematical concepts we remember:It is essentially a set of numbers, but with more dimensions and wider application scenarios.
So, why are vectors so important in AI? The reason is simple—computers don't directly understand human words; they only see numbers and symbols. For example, the word "apple" has a specific meaning and image to us, but to a computer, it's just a string of characters. In order for computers to understand, process, and even find relationships between different objects, they need a bridge: the vector.
In the world of AI, every piece of text, every word, even an image or a piece of music, can be represented by a vector. Vectors transform abstract concepts into "points" in the digital world, allowing computers to perform calculations and comparisons within this space, thereby understanding their relationships. This is precisely why vectors are the underlying core of scenarios like RAGs, knowledge base retrieval, and recommendation systems: without vectors, AI would struggle to convert our speech, images, and sounds into actionable information.
So, in the following content, I'll slowly unpack the concept of vectors, exploring what they are, what they do, and why they're so indispensable. No formulas or advanced theory are needed; simply understanding them will allow you to grasp a core underlying logic of AI.
2 What exactly is a vector?
In fact, you can completely understand vectors as"Universal Language"When humans say "apple" or "banana," computers don't understand these words; they only see numbers. So, we assign numerical coordinates to every word, sentence, and even a picture or piece of music, just like marking points on a map.
Let’s take an intuitive example:
- In two-dimensional space, a point can be represented by (x, y), such as (3, 4). This is like marking a location on a flat map.
- If it is a three-dimensional space, there is an additional z coordinate, such as (3, 4, 5), just like marking a point in a three-dimensional room.

AI's vectors simply extend the dimensionality even higher—potentially 128, 512, or even thousands of dimensions—with each dimension representing a number along a characteristic direction. It sounds complex, but the logic is exactly the same as for two or three dimensions: each number represents a coordinate axis, and you simply find a point along more dimensions.
How to understand these higher dimensions? Think of it this way: two dimensions are like points on a map, three dimensions are like points in a room, and higher dimensions are like a massive warehouse, where each dimension represents a feature. You can't see these dimensions, but each one is meaningful, like color, shape, style, emotion, and so on. Vectors are how you label these features with numbers, allowing computers to locate each object in high-dimensional space.
So, AI How to determine whether the human vocabulary semantics corresponding to different vectors are similarThe principle is quite intuitive: AI calculates the "distance" or "angle" between two vectors. Vectors that are closer together have more similar semantics. For example, the vectors for "apple" and "banana" are close together, while the vectors for "apple" and "television" are farther apart.
What is the use of this judgment? There are many scenarios:
- Semantic Search: When you search for a word, AI can find the content closest to it in the vector space, rather than just matching the exact same text.
- Recommendation System: If you like a song, AI can find the song closest to it in the vector space and recommend it to you.
- Clustering and classification: AI can group similar content together based on vector distance, making it easier to understand and process large-scale information.
To give a daily analogy, when you go to the supermarket to buy fruit, you will see apples, oranges, and bananas next to each other.Fruit SpaceThe distance between items in the store is very close. Refrigerators, rice cookers, and other items are in the appliance section, far away from the fruit section. Vectors allow computers to draw this kind of "mall map" in the semantic world, helping them distinguish between different categories of items.
Here's another interesting example: Imagine you have a playlist of pop, rock, and classical music, and you want AI to recommend songs you might like. Without converting the songs into vectors, the AI wouldn't know the relationship between your favorite song and your neighbor's song. By representing the songs as vectors, the AI can directly find the neighboring song based on distance—the song closest to your favorite song is the one it thinks you might like next.
The power of vectors doesn't stop there. You can even do"arithmetic"For example, using "king - man + woman" to get "queen." It sounds like magic, but it's actually just clever vector operations in high-dimensional space: you make a "directional move" in the digital world and you get a new semantic point.
In short, vectors are the underlying language of AI in the semantic world. They enable AI to "understand" text, images, sounds, and even your preferences. Understanding vectors gives you a fundamental understanding of the AI world.
3 How do vectors make AI smarter?
3.1 Semantic Search: AI’s Search for Meaning
Searching for something on Baidu, Google, or Taobao is essentially a form of search. However, traditional search has a significant limitation: it only matches words literally. For example, if you type "cat" into the search box, it will find all web pages, products, and posts containing the word "cat." However, if a web page simply contains the words "喵星人" (cat star) or "可爱宠" (cute pet), traditional search may not be able to find it.
Herein lies the biggest problem with keyword searches:It can only read the literal meaning, not the semantic meaning..
So how does AI solve this problem? The answer is vectors. AI first converts each word and sentence into a vector and places it into a "semantic space":

In this way, when you search for "cute animals", the AI doesn't just look for the word "animal". Instead, it searches for the points in the semantic space closest to the vector of "cute animals". As a result, "cats", "puppies" and "chickens" may all be found:

This is what is called Semantic Search, or "searching for meaning with meaning." Imagine this: traditional search = poring over the keyword list on the bookshelf; vector search = asking the bookstore clerk, "I'm looking for some lighthearted and interesting novels," and the clerk immediately directs you to the bookshelves of Haruki Murakami and Keigo Higashino. Isn't that smarter?
This capability is the underlying foundation of current AI technologies like RAG (Retrieval-Augmented Generation). ChatGPT can help you find answers from a knowledge base by relying on semantic search rather than focusing solely on keywords.
3.2 Recommendation System: Finding Your Most Likely Neighbors
In addition to search, vectors have another application that everyone uses every day but may not have noticed:Recommendation SystemHave you ever had this experience:
- You listened to a song on NetEase Cloud Music, and the next few songs recommended to you were exactly what you wanted.
- I saw a funny video on TikTok, and then scrolled down to see a bunch of similar videos.
- I bought a cup on Taobao and they immediately recommended a thermos, a tea can, a desktop storage box...
It may seem like AI is "mind-reading", but in fact, vectors are what really work behind the scenes. The logic is actually very simple: every song, every video, and every product can be converted into a vector; the songs you have listened to, the videos you have watched, and the products you have bought also have corresponding vectors. In the high-dimensional semantic space, these vectors will naturally "group themselves" - similar to the automatic classification of interests and categories: the vectors of athletes will be close together, the vectors of animals will be clustered together, and movies, music, and products will also form clusters. In this way, when AI wants to recommend content to you, it does not need to "know what you like", as long as it finds the nearest neighbor to your existing vector, it can accurately recommend relevant content. You can imagine that the vector is like a "digital label" for everything in the semantic space. The closer the distance, the higher the relevance:

What the recommendation system needs to do is to help you findThe nearest neighbors to the vectors you like:

Let's use another analogy: in your circle of friends, the people you're most likely to connect with are often those with similar interests. AI works the same way in vector space: simply find your "neighbors." For example, if you listen to Jay Chou's "Qilixiang," AI will discover that its neighbors in the "pop music space" are JJ Lin, Mayday, and Stefanie Sun. Therefore, their songs will appear in the recommendation column.
This is where vectors make AI smart: it doesn’t require hard coding like “liking Jay Chou = liking JJ Lin”, but instead relies on calculating semantic “distance” to dynamically find what you are most likely to like.
3.3 Multimodal Understanding: Text, Pictures, and Sound Can All Fit into the Same “Space”
Search and recommendations are straightforward enough, but vectors are far more powerful than that. They also enable AI to span different types of data, achieving what we often call "multimodal understanding." What does this mean?
In the human world, we communicate through multiple senses: we can speak (text), hear sounds (audio), and see images (vision). But to computers, these are completely different forms of raw data: text is characters, sound is waveforms, and images are matrices of pixels. They are inherently incompatible.
So how can AI "pack these different types of information into the same brain"? The answer is again vectors.
AI converts images, text, and audio into vectors and puts them into the same semantic space:

This way, they can be directly compared and connected. Here are a few intuitive examples:
- Search by imageWhen you upload a cat photo, the AI doesn't need to know whether it's a JPG or PNG. Instead, it converts it into a vector and then searches for its "nearest neighbors" in the semantic space. As a result, it can help you find "photos of cats" or even "stuffed animals that look like cats."
- Automatic image matching: If you write the sentence "taking a walk on the beach under the sunset", AI can find pictures with similar semantics to this sentence in the vector space, and then automatically add suitable illustrations to the article.
- Speech Recognition + Understanding: A piece of speech is first converted into a vector and then aligned with the text vector, so that the content of the speech can correspond to the meaning of the text.
In other words, vectors are the "translators" between different modalities. They allow text, images, and sounds, which originally "did not speak the same language," to finally communicate in the same space.
With this, AI can handle complex applications such as image and text generation, voice assistant, and video recommendation.
3.4 AI’s “Analogy and Reasoning” Tips
Vectors can not only be used to measure distance, but also perform some amazing calculations. Remember the classic example mentioned earlier: King – Man + Woman = Queen?
This isn't metaphysics; it's a mathematical result of vector space. Why is this so? Because during training, the vectors learned by the AI aren't just "isolated points"; they carry semantic direction. For example, "man → woman" represents a change in gender; "king → queen" represents the same change.
So when you do in vector space Subtraction + Addition, which is essentially "moving in the same semantic direction." So, you can naturally get the point "Queen":

The magic of this type of operation is:It gives AI some kind of analogy and reasoning ability. Let’s take a few more light-hearted examples: “Paris – France + Japan ≈ Tokyo”, “iPhone – Apple + Samsung ≈ Galaxy”, “Programmer – day + night ≈ people who stay up late to code” ~.
Of course, this last one is a joke, but it illustrates a fact:Vectors can not only represent semantics, but also make the relationship between semantics "computable".
This is why vectors are called the "underlying language" of AI: they not only help AI store and retrieve knowledge, but also allow AI to play with logic and reasoning to a certain extent.
3.5 Summary: Vectors — AI’s Mental Coordinate System
Through the previous four sections, we can see that a vector is not just a collection of numbers, it is more like AI's Thinking Coordinate System.
- In semantic search, vectors enable AI to “search for meaning with meaning,” no longer focusing on the literal meaning but understanding the meaning;
- In the recommendation system, it helps AI find the "neighbors" you are most likely to like, making content push more accurate;
- In multimodal understanding, different types of data can enter the same space, enabling cross-modal comparison and matching;
- In vector operations, semantic relationships can be calculated like arithmetic, giving AI the ability to make analogies and reason.
Simply put, vectors transform messy, complex, and difficult to process directly information into Computable, comparable, and reasonable pointsIt transforms AI from a "digital machine that cannot understand the world" into an intelligent system that can understand, recommend, and make analogies in the semantic space.
Understanding vectors gives you a basic understanding of the underlying logic behind AI capabilities. This not only helps you understand the workings of tools like ChatGPT, RAG, and knowledge bases, but also lays a solid foundation for exploring various AI applications.
4 Application of Vectors in Knowledge Enhancement Generation
4.1 Brief Review of RAG Basic Theory
I have already introduced the core concepts and processes of RAG (see article:Home Data Center Series: Understanding RAG from Scratch (Part 1): Principles and Complete Process Analysis), and has done practical work on knowledge bases through Chatbox and Ollama, built its own embedding model and completed the retrieval of knowledge base content using the GPT model (see article:Home Data Center Series: Using Ollama's Self-Built Embedding Model + Chatbox Knowledge Base PracticeThis chapter will not repeat the operational details, but instead hopes to build on this foundation to further understand the core role of vectors in knowledge enhancement generation, as well as how to optimize and expand them to truly bring the knowledge base to life.
In general, the core logic of RAG (Retrieval Enhanced Generation) is actually very straightforward:Prepare in advance, find when needed, and regenerate:
- Prepare in AdvanceThe knowledge base content is vectorized using an embedding model and stored in a vector database. The benefit of vectorization is that AI can determine the similarity between content by simply comparing coordinates in digital space without having to understand the original text.
- Find when needed: When a question is asked, AI does not search the knowledge base one by one, but directly finds the coordinate point in the vector space that is closest to the question, that is, the most relevant knowledge fragment.
- Regeneration: Feed the retrieved content into the generative model, allowing it to output natural, coherent, and contextual answers. This step is like integrating scattered information into a complete article or suggestion.
Let me give you a practical analogy: Imagine you have built your own knowledge index in the library, with each article and each paragraph marked with coordinates (vectors):
- When you ask a question, AI does not need to flip through the entire book, but directly finds the "coordinate point" closest to you in the vector space, which is the most relevant content.
- The generative model then synthesizes the answer based on this content, just like a librarian organizes the most relevant book excerpts and references into a naturally readable recommendation.
It can be understood this way: the vector is like a "digital location" for each book and each paragraph, allowing AI to quickly find "the point you want" in the ocean of knowledge, while also understanding the relationship between these points, thereby generating logical and organized answers.
4.2 The actual role of vectors in the Chatbox/Ollama knowledge base
In the previous section, we discussed the core logic of RAG, and at its core lies vectors. Simply put, vectors act like a compass for AI, helping it find the most relevant information in a vast ocean of knowledge. Without vectors, AI is like a librarian without a map: finding information is slow and error-prone.
In the Chatbox and Ollama knowledge base, vectors have several obvious uses:
- Help AI quickly find the most relevant content
- You cut the article into small segments, and each segment generates a vector, just like each paragraph has its own coordinates in the knowledge space.
- When you ask a question, AI will directly find the nearest coordinate point, which is the most relevant content, without having to read the entire text from beginning to end.
- Determine the accuracy of the answer
- The vector's dimensionality and generation method affect matching performance. Higher dimensionality allows AI to capture more subtle semantics, but computation is slower; lower dimensionality results in faster processing but may miss some meaning.
- Just like when you are looking for a book, if you only remember the general categories (such as "mathematics"), you may not be able to find the specific chapters; the more detailed you remember (such as "linear algebra vector part"), the more accurate the content you find.
- Help understand the context
- In knowledge-based question-answering, users often ask questions one after another. Vectors can help AI "remember previous context," linking together the context and making answers more coherent and natural.
- It’s similar to how a librarian knows which books you have borrowed before, so they can recommend materials that better suit your needs instead of having to start from scratch every time.
- Easy to expand and optimize
- Vector search makes the knowledge base easy to expand: when new documents are added, vectors are generated and put into the database, and AI can immediately use the new information.
- You can even convert text, images, and tables into vectors and put them into the same "knowledge map" to achieve intelligent retrieval across content types.
To summarize: the role of vectors in a knowledge base is to provide AI with a "digital map," allowing it to quickly locate relevant content and generate natural, contextually appropriate responses. Understanding the practical role of vectors will not only help you operate a knowledge base but also understand why it's so intelligent.
5 Applications of Vectors in More AI Scenarios
In previous chapters, we discussed the role of vectors in text knowledge bases and RAGs, enabling AI to understand text, quickly retrieve relevant content, and generate natural responses. In reality, the power of vectors extends far beyond text; they permeate nearly every application scenario of modern AI.
Image SearchImagine you're organizing a photo album. Each photo has unique characteristics—people, colors, scenes, lighting... AI converts these features into vectors. For example, if you want to find photos with the sun in the background, AI can directly find these photos in the vector space, without you having to search manually.
Music RecommendationsEach song's melody, rhythm, and style can be generated into a vector. For example, if you're listening to a lively guitar piece recently, AI can use the vector to find other songs with similar rhythms and styles and recommend them to you, just like when you're choosing apples at a fruit shop, the shopkeeper can also help you pick out pears or oranges with similar flavors.
Video editing and recommendationsAI can convert the images, actions, and soundtracks in a video into vectors. For example, if you want to create a cheerful, sunny short video, the AI will automatically find clips with similar styles and tones in the library and combine them for you, saving you time from manual screening.
Cross-modal applicationsText, images, and audio can all be mapped to the same vector space. For example, if you write, "I want to listen to a soft piano piece to go with this photo of a seaside sunset," AI can directly find the most suitable music in the vector space, without the need for manual text and audio matching.
To put it in practical terms, vectors are like attaching a "digital label" to each type of information. Whether it is text, pictures or music, AI can quickly find similar things in multi-dimensional space to recommend, generate or match them.
In short, once you understand the applications of vectors beyond text, you'll find that they're not only the underlying tool for knowledge retrieval, but also the universal language for AI to make intelligent judgments and recommendations in a variety of scenarios. Their core logic remains the same:Digital representation → Multidimensional space calculation → Find similarities/matches → Output results, but the applied materials and scenes are richer and more vivid.
Perhaps you have a question when reading this:Isn't a vector just a string of numbers? How can it represent human semantics such as "sun", "rock", and "funny video"?
The answer is actually quite simple—it relies on model training. Whether it's pixels in an image, frequencies in music, or user behavior, AI learns to extract "patterns" from vast amounts of data and context. These patterns are ultimately compressed into vectors, which become a kind of "digital semantic label."
Here are a few analogies:
- In an image, a certain combination of pixels and colors will be interpreted by the model as "sun" or "beach";
- In music, rhythmic and melodic characteristics may correspond to "rock" or "slow";
- In a video, the image and sound together determine whether it is closer to an "action scene" or a "comedy segment";
- In recommendations, users' browsing and selection habits will be abstracted into "people with a sweet tooth" or "viewers who prefer science fiction films."
So, vectors do not "know" these things out of thin air, but compress complex information into a unified digital space through learning. In other words, vectors are the relationship between different modes."Semantic Translator", which enables AI to build a common semantic map between text, images, audio, video, and even user preferences.
6 Summary and Postscript
Having written this, you may feel:Vectors are not abstract symbols in math books, but a universal language in the AI world..
It translates human text, pictures, sounds, and even preferences into digital coordinates, allowing computers to find locations, measure distances, and understand relationships in a "semantic map."
In Chapter 2, we said that vectors are like partitions in a supermarket, putting apples and bananas together and electrical appliances in another; in Chapter 3, we saw that they allow AI to judge semantic similarity through "distance" and thus answer questions intelligently; in RAG in Chapter 4, they become the underlying support for the knowledge base, helping the model quickly find key content from vast amounts of information; Chapter 5 further demonstrates its wide application in various scenarios such as images, music, video, and cross-modality.
So, you just need to remember one sentence:Understanding vectors is almost the same as understanding the underlying operating logic of AI.
While writing this article, I was actually thinking back to the feeling of being "baffled" when I first encountered AI. A lot of the material was too academic, filled with formulas and high-dimensional space, leaving me feeling lost. But if you think of vectors as "maps," "labels," and "universal language," you'll find that they're actually quite intuitive and even quite interesting.
Next time you see AI recommending your favorite songs, organizing your photo albums, or answering questions from its knowledge base in seconds, you might be able to say this to yourself:Oh, the vector is what's running behind this.
This article is well written, easy to understand, and a good article for readers.
Good vision! I think so too~
This kind of writing needs to be learned later, and it is better to face the readers (not the author). After all, the blog is written for others to read (even for yourself a few years later, when you will have forgotten everything).
Yes, and I wrote this article because some previous AI-related articles, as well as some subsequent articles, often involve vectors. It is impossible for me to explain them in every article, so I simply use an article to explain them clearly. When you need them in the future, you only need to post a link to this article.
Thank you, big guy (^-^), let's communicate more in the future
You're welcome, I'm not some big shot, but thank you for commenting.