Preface

Originally, I wanted to write a tutorial on deploying "Stable Diffusion" on Mac, but I happened to see Diffusion Bee, an APP that is actually based on "Stable Diffusion" but is very friendly to friends with no basic knowledge, so I decided to write an article based on this to hand in this week's homework.

Diffusion Bee Installation

Software Introduction

DiffusionBee Is a Stable Diffusion Desktop application designed specifically for use in macOS Generative AI models are run on the platform. It is widely popular for its user-friendly interface and configuration-free features. It is mainly aimed at creators and AI enthusiasts who are interested in generating images. It has the following features and functions:

No configuration required, ready to use

• DiffusionBee hides the complex AI model configuration process in the background. Users do not need to install the Python environment or complex dependencies. They only need to download and start the program to start generating images.

• Suitable for beginners who have less technical knowledge.

Supported model types

• Stable Diffusion 1.x and 2.x: Supports a basic version widely used by the community.

• SD XL: Supports extended large models of Stable Diffusion, providing higher quality image generation.

• Inpainting: Supports image repair and editing, allowing users to regenerate parts of the image by drawing mask areas.

• ControlNet: Provides precise control over image generation and generates target content based on auxiliary information such as sketches and poses.

• LoRA: Load low-rank adaptation weights to enhance the generation ability of specific styles or themes.

Simple user interface

• Provides an intuitive graphical interface where users only need to enter prompts and click the generate button to get the results.

• Supports direct import of prompt word templates for easy and quick creation.

Efficient operation and support for local generation

• Leverage Apple's Metal API and Neural network engine in M series chips, it has excellent performance when running on macOS and can fully utilize hardware features to achieve fast generation.

• All generation tasks are run locally, without the need for network connection, thus ensuring user privacy.

Supported Features

• Text to Image: Generate high-quality images based on the description entered by the user.

• Tip word optimization: Built-in prompt word recommendation and debugging functions to help users optimize input and improve the quality of generated results.

• Multi-resolution support: Images of different resolutions can be generated to meet various purposes.

• Batch Processing: Supports generating multiple pictures at one time.

safety

• DiffusionBee’s fully local operation mode avoids the risk of uploading user data to the cloud, ensuring privacy protection.

Target Group

beginner

• No technical background is required to quickly experience the charm of generative AI images.

• Simple interface, easy to use.

Creator

• Suitable for illustrators, designers and other people who need to quickly generate inspiration pictures or creative sketches.

• Provides convenient image repair and editing functions.

macOS users

• Especially suitable for macOS users, especially M-series chip users, without the need for additional GPU or high-performance equipment.

System requirements

• operating system: macOS 12.5 or later.

• Hardware requirements：It is recommended to use Apple Silicon chips (M series chips). Although Intel chips are also supported, the generation speed will be slower (it is said that it takes 5 minutes to generate a picture. I don’t have an environment to test it, but my M4 pro generates a picture in about 20 seconds). There is also a Windows 64 Bit version, but if you don’t have an Nvidia graphics card, the generation speed is estimated to be touching.

Install

The download link of Diffusion Bee official website is as follows:https://diffusionbee.com/download.

I won’t waste space on the installation. It’s the most normal APP installation process. After installation, open the APP and the homepage shows all the functions provided:

Diffusion Bee Practice

Text to image

Function Introduction

Generating images from text is one of the core functions of Diffusion Bee, which allows users to transform creative ideas into vivid images with simple text descriptions. This technology provides great convenience for creators, designers and ordinary users, making visual art creation more efficient, intuitive and personalized. Its common application scenarios are as follows:

• 1. Artistic Creation

Automatically generate complex artistic style images to help artists get inspired or quickly realize concepts.

• 2. Concept design

It is used to quickly create concept sketches and scenes in film, television, games, architecture and other fields.

• 3. Content Generation

Generate images for blogs, social media, or marketing materials.

• 4. Education and research

Provide visual aids for teaching, papers or research projects.

• 5. Personalized needs

Users can generate one-of-a-kind artwork with unique descriptions.

Getting Started

When you use it for the first time, you need to download the corresponding model first. When I choose the text to image function, the default model will be automatically downloaded (it needs to be scientific, otherwise it will feel unavailable):

To generate pictures, my Chinese prompts are as follows:

A beautiful Japanese girl in a bikini quietly watched the sunset on the beach at dusk. She has long black hair, a slim figure, plump breasts and peach buttocks.

Need to be translated into English:

A beautiful Japanese girl in a bikini quietly watches the sunset on the beach at dusk. She has long, shiny black hair, a slim figure, but her chest is full and her backside is a peach-shaped butt.

Complete in less than 30 seconds:

Note: Is this considered a peach butt? I don’t really understand.

You can adjust it through "Styles", the default is "none", for example, I choose "enhance":

Then regenerate the image:

It's OK, but the face is a little unsatisfactory. I don't like pointed chins, and it doesn't look like a Japanese girl. I think I should mention this in the prompts next time. But why did the bikini style change?

After that, I made some more attempts and found that there were generally no problems when the prompt words were simple. However, once the prompt words involved too many elements, the generated images could not meet all the requirements. For example, I changed the description of "Withered vines, old trees, crows, small bridges, flowing water, people's homes, ancient roads, west wind, thin horses. The setting sun, heartbroken people at the end of the world." to "A rural scene in autumn, filled with a melancholy atmosphere: withered vines entangled in an ancient tree, and the dark silhouette of a lonely crow perched on the branches. A small arch bridge spanned a gently flowing stream, and next to it was a warm ancient Chinese thatched house. A desolate old road stretched to the horizon, where a thin and tired horse stood with its back against the cold west wind. The sky was dyed with warm tones by the setting sun, casting long shadows, evoking a feeling of loneliness and longing, and a lonely traveler looking into the distance at the edge of the world." Then the various generated pictures were not satisfactory. They were either missing this or that, or simply missing a lot, especially the horse, which never appeared from beginning to end:

I don’t know if it’s because my posture was wrong when I generated the picture, or because so many elements exceeded the capabilities of the default model, or if there’s something wrong with my prompt words. I’ll study this later when I’m in the mood.

From the software interface, we can see that the default model of Diffusion Bee is "Default_SDB_0.1", and its corresponding core version of Stable Diffusion is "Stable Diffusion v1.5". The versions currently supported by Diffusion Bee are as follows:

Why is the default core version v1.5, not v2 or XL? Because SD 1.x is the most widely used version, its model generation effect is relatively mature, and its resource requirements are relatively low, making it suitable as an entry-level default option. The default model does not directly use SD 2.x or SD XL, probably because:

• SD 2.x Introduced new features (such as OpenCLIP) that are not fully compatible with 1.x.

• SD XL It has higher hardware requirements (requires a more powerful GPU) and is not suitable as a default model.

However, other versions can be imported as needed, so it does not affect them.

Advanced use (enable advanced options)

General interface

The previous part is the most basic way to use Diffusion Bee. It does not provide any professional adjustment parameters. It is suitable for users with low requirements and no basic knowledge. For those who have a certain technical foundation and have customized some parameters themselves, the most basic way of use is obviously not suitable. At this time, you can enable the advanced options interface of Diffusion Bee by turning on "Advanced Options". The operation interface will have many more options:

Negative Prompt: It is an important option for controlling image generation, which is used to suppress specific features or elements that are not desired to appear. When generating pictures, the model will generate images that match the description based on the prompt. But sometimes, the generated image may contain some unwanted features or elements. Negative Prompt is used to explicitly tell the model what content should be weakened or avoided. For example, you want to generate a clear image, but the model may generate a blurry effect. Enter blurry in Negative Prompt to reduce the probability of blur; if you don’t want certain objects to appear in the image, such as "hat", you can enter hat in Negative Prompt.

The resolution (added after turning on advanced options) and the number of pictures are easy to understand at a glance, so I won’t explain them.

Seed: It is an important parameter in basic mode when generating images. It is used to control randomness and make the generated results reproducible. According to the different set values, it can be divided into random Seed and fixed Seed: If the Seed is set to -1 or "random", the model will automatically assign a new Seed for each generation, and the result will vary; if a specific Seed value (such as 12345) is used, the image generated each time will be the same when other conditions are the same.

Sampling Steps: This parameter is added after turning on the advanced option. It is a key parameter that controls the image generation process. It determines the number of denoising iterations from random noise to the final image, which directly affects the quality, details and generation time of the generated image. The more steps, the more iterations the model has to optimize the image, gradually remove noise, and generate an image that meets the prompt word. Low steps (10-20): Suitable for quick preview of the generated results, or when the generated target is simple (such as background, solid color image). Advantages: fast generation, suitable for debugging; Disadvantages: insufficient details, may produce defects; Medium steps (20-50): Balance generation time and image quality, suitable for most scenes, usually 30-40 steps can generate high-quality images; High steps (50+): Suitable for scenes with extremely high image quality requirements (such as high-resolution images or complex scenes). After exceeding a certain number of steps (such as more than 100 steps), the quality improvement may tend to saturation. In the case of limited hardware (such as ordinary GPU), it is recommended to start with a lower number of steps (20-30) to find the best balance between quality and time.

After turning on the advanced options switch, in addition to the additional options in the regular interface, the following settings are also added:

Below I will introduce the functions of these settings one by one.

Diffusion and Seed

Diffusion: It is used to select and adjust the sampling algorithm used to generate images. The sampling algorithm determines how the model gradually removes noise during the generation process, as well as the denoising method at each step, which will affect: image generation speed, image quality, and stability. There are karras, ddim, lmsd, pndm, k_euler, and k_euler_ancestral. Among them, ddim is fast and stable, and can still generate high-quality images at low step numbers. It is very popular and has a good balance between generation speed and quality. pndm is a sampling method that optimizes the diffusion process by introducing numerical methods. It combines traditional diffusion models with numerical optimization strategies, so that high-quality images can still be generated at fewer steps. I plan to mainly choose these two.

Guidance Scale: It is a very important parameter that controls the degree of compliance of the model to the prompt when generating images. It affects how the model balances the guidance and randomness of the prompt during the generation process, thereby changing the details, style and matching degree of the image with the description. Beginner's suggestion: adjust it in the range of 7-12 as a general setting. This is the recommended value for most models when generating images, which can ensure the compliance of the prompt without excessive guidance; optimization for specific scenarios: Prompts with detailed descriptions are increased to 15-20 to ensure that the model generates detailed content according to the description; Prompts for abstract or vague ones are reduced to 5-7 to allow the model to play more freely.

Small Modification seed: refers to slightly changing the value of the random number (for example, increasing or decreasing it by a small amount) based on the current Seed, so as to generate an image that is similar to the original image but slightly different. Making small changes to the Seed (for example, changing from 12345 to 12346) will fine-tune the noise distribution.

Compatibility Mode: A key tool for resolving incompatibility issues between different models or parameters, ensuring that various models (old versions, new architectures, different formats) can be used normally in the target generation tool. When compatibility mode is enabled, there may be a slight impact on performance or results, but it is a powerful tool to ensure cross-model generation.

ControlNet

ControlNet: It adds precise conditional control capabilities to text-generated images, allowing users to provide auxiliary input information as control and guidance for generating images. Common additional input types are as follows:

By combining prompt words and auxiliary input provided by the user, ControlNet can generate more accurate and expected images. It is very suitable for application scenarios that require detailed control and complex scene generation, and can greatly improve the creative freedom and result quality of image generation.

ControlNet Model: its function is to tell Diffusion Bee What kind of information does the auxiliary input image (such as edge map, depth map, posture map, etc.) provide?

And how the model uses this information to influence the final image generation process. For example, if an "edge map" is uploaded, ControlNet Model will understand that these lines are used to define the "contour structure" of the image; if a "depth map" is uploaded, the model will know that these grayscale values represent the "depth and perspective relationship" of the scene; if a "pose map" is uploaded, the model will identify the key points in it to represent "human body movements or skeletons" and generate an image that conforms to the pose accordingly. In this way, ControlNet Model can accurately integrate the auxiliary information provided by the user into the generation process of the diffusion model to achieve more expected results.

Automatically generate control: Combined with the settings of ControlNet Model, the required auxiliary input information is automatically extracted from the auxiliary image uploaded by the user for use by ControlNet. This can help users save the steps of manually preparing these inputs, making ControlNet easier to use.

Select Enable (Yes), the system will automatically process the auxiliary input image you uploaded, and extract the control information that matches the ControlNet model from the auxiliary input image (such as edge map, depth map, posture map, etc., depending on the settings of ControlNet Model). It is suitable for uploading ordinary pictures or rough sketches, and reduces the requirements for input image accuracy.
Select Close (No): The system directly uses the control input image you uploaded without additional processing. It requires that the input content has accurately matched the ControlNet model (such as a clear edge map or depth map, depending on the settings of the ControlNet Model) to accurately control the generated results.

ControlNet importance: Controls the influence of "ControlNet auxiliary input" on the final result during the generation process, and determines to what extent the model follows the constraints of the auxiliary input. By adjusting this parameter, you can balance the influence between the auxiliary input and the prompt word to meet different generation requirements:

High importance (close to 1.0):

• effect: Emphasize the influence of auxiliary inputs (such as edge maps and depth maps) to make the generated results fit the structure or characteristics of the input as closely as possible.

• Applicable scenarios: Specific elements of the generated image need to be strictly controlled, for example, retaining the precise contours of the input edge map; generating a scene with an accurate sense of space based on the depth map; and completely matching the input human pose map.

Low importance (close to 0.0):

• effect: Weaken the influence of auxiliary input, let the model rely more on the description of prompt words, and generate more free and creative images.

• Applicable scenarios：We hope that the auxiliary input is just a reference, and the final generated image relies more on the expression of the prompt word. For example, the input edge map only provides a general structure, while the specific content and details are determined by the prompt word; the auxiliary input provides the basic direction, but allows the model to generate backgrounds or decorations more creatively.

LoRA

LoRA: The full name is Low-Rank Adaptation of Large Language Models, which is a low-rank adaptation of large language models. It is an optimization technique that is mainly used to fine-tune large neural networks (such as image generation networks for diffusion models) in an efficient way.

LoRA is an important technology in the diffusion model ecosystem. Through this tool, users can customize the model efficiently while maintaining friendly support for computing resources, which makes tools like Diffusion Bee more flexible and easy to use: LoRA is often used to fine-tune the model to generate pictures of a specific style, or to add specific themes (such as specific characters, artistic styles, etc.) when generating. For example, after fine-tuning, the model can better understand and generate images similar to "cyberpunk style" or "oil painting style". After LoRA fine-tuning, there is no need to re-save the entire model, only save the additional fine-tuning weights ( $A$ and $B$ ). This results in significantly lower storage space requirements (usually less than 100MB).

In Diffusion Bee's advanced options, users can load specific LoRA models or weight files to adjust the style of the generated images: Typically, users need to provide a pre-trained LoRA file (such as .safetensors or .ckpt format) and configure its influence (such as weights or ratios) through the interface.

After turning on the advanced options in the Diffusion Bee interface, LoRA1, LoRA2, and LoRA3 appear in the LoRA section. These three items are independent slots for loading and combining multiple LoRA models. Their main functions and differences are as follows:

1. The function of LoRA slot

• Support multi-model overlay: Diffusion Bee supports applying multiple LoRA models in one generation task, which is very useful for achieving more complex or customized image effects.

• Independent configuration: Each LoRA slot can load an independent LoRA model and set weights for each model separately.

• Combination Effect: The superposition of multiple LoRAs will jointly affect the final generated image according to the set weights.

2. Differences between LoRA 1, LoRA 2 and LoRA 3

There is no essential functional difference between them, they just provide more combination possibilities. :

• Each slot can be loaded with a different LoRA model.

• The order of loading may affect the generated results (in some tools, the models loaded later may partially override the effects of the earlier ones).

• It is possible to set different strengths for LoRA models in different slots (if option is available).

3. How to use multiple LoRA

Loading multiple topics of LoRA:

• For example, if you load a stylized model (LoRA 1) and a specific character model (LoRA 2), the image will reflect the characteristics of both models.

Adjust weights:

• If the interface allows adjusting weights, you can set different strengths for different LoRAs to determine how much they affect the final image.

• The general range is 0 to 1 (or 0 to some maximum value), with higher values having greater effects.

4. Notes

The base model needs to match:

• Make sure all loaded LoRA models are adapted to the same base model (such as SD 1.5 in this article), otherwise it may cause abnormal generation effects.

Multiple LoRA compatibility:

• The training objectives of different LoRAs may conflict, for example, if one is a stylized model and the other is a specific image model, the effect may be distorted or become uncontrollable after superposition.

Performance cost:

• Loading multiple LoRAs simultaneously may increase the demand on video memory or computing resources.

5. Example Scenario

• LoRA 1: Style Model

Load an "Oil Painting Style" LoRA to give the image a specific artistic style.

• LoRA 2: Role Model

Load the LoRA of a character or a specific item (e.g. "anime character").

• LoRA 3: Background Enhancement

Load a LoRA that is used to enhance specific background details (such as a "forest background").

With this combination, you can generate a forest scene with a specific character in the style of an oil painting.

So, where can I download the LoRA model? Generally speaking, the LoRA model can be downloaded from online model sharing community platforms, such as Civitai, Hugging Face, etc. In this article, I downloaded it from Civitai.

About Civitai

Civitai.com is a AI model sharing and communication It is a community platform mainly for users and developers of generative AI models such as Stable Diffusion. It provides a one-stop model download, preview and usage guide, and is one of the more active resource sharing websites in the current generative AI field.

Core functions and features:

Model Sharing：Users can upload and download various model files for generating images (such as Stable Diffusion model weights), including basic models, fine-tuned models, LoRA (low-rank adaptation) weights, etc.
Model Classification：The website provides multiple model categories, such as: Realistic style, Anime style, Illustrative style, special effects/post-processing models.
Sample images and previews：Each model page usually comes with many example images to show the model generation effect. The example images also include the prompts used during generation, which is convenient for users to refer to and learn.
Community Exchange：Users can evaluate, comment and rate the model to help others judge the effectiveness of the model. Developers and users can interact directly to discuss model optimization and usage tips.
Tool support: Provides installation and usage guidance for generative models to help users quickly deploy. Some resources may include recommendations for related tools or scripts.
Open source spirit: Most model files are shared free of charge by community users, continuing the open source tradition in the field of generative AI.

Find your favorite character image in the "Model" of Civitai. Note that you should choose according to the basic model you use, otherwise it is likely to be incompatible when imported. This problem can be solved by using filters:

Choose a character model that you like from the appropriate models:

Then click the download button in the upper right corner of the current page:

Import the model you just downloaded into Diffusion Bee:

Then select from the drop-down menu of the LoRA slot (since only one character model is selected, you can select any LoRA slot, I chose LoRA 1 here):

Misc

Misc: Abbreviation for Miscellaneous, which provides more refined control, allowing users to find a balance between the details of the generated image (V-Prediction) and text parsing (Clip Skip 2) according to their needs. These options are usually suitable for users with some experience or scenarios with special requirements for the generated results.

V-Prediction: Enables the variance prediction mechanism in the diffusion model to improve the detail performance and stability of image generation, especially in complex or high-resolution image generation, which helps reduce blur or randomness in the generation process and generate clearer images: Turn on to pursue high-quality, detail-rich image output; turn off to try to solve some incompatibility issues during early model generation.

Clip Skip 2: Adjust the text prompt processing mechanism of the CLIP model to skip the last two layers of Transformer output. The generated images may be more artistic and creative, but may slightly deviate from the original meaning of the prompt word. At the same time, it reduces the strict dependence on the prompt word and allows the model to generate more free image content: turn it on to explore style diversity or looser prompt word interpretation; turn it off when the prompt word meaning needs to be accurately reflected.

Finally, still using the original prompt, I enabled the advanced options and set the parameters as follows (other options not mentioned are the default values used):

The final result is as follows:

There are other options that I haven't studied carefully. I'll just have to wait until the need arises in the future. It takes constant experimentation to figure them out.

Image to image

Function Introduction

Image to image is a powerful tool that provides diverse changes and creative outputs through AI while maintaining the key features of the input image, providing users with more possibilities. This makes it suitable for various needs of creative design, digital art, and content generation. Common application scenarios are as follows:

• Artistic Creation: Turn sketches into high-quality artwork, or give existing drawings a different style.

• Stylized photos: Convert ordinary photos into works of specific styles such as oil paintings and watercolors.

• Scene or detail modification: Modify certain areas in the image, or change the content based on the prompt word (for example, change a daytime scene to night time).

• Concept Design: Quickly generate new images based on preliminary drafts for concept designs or projects.

Functional operation

Basically, the options of Image to image and Text to image are similar. They are divided into a basic user interface and an advanced user interface after enabling advanced options. In addition, the option parameters are similar. The basic user interface is as follows:

Input Strength: This is a key parameter of the image-to-image function, which adjusts the degree of integration between the input image and the generated result. By adjusting this parameter, you can find a suitable balance between "generating new images based on the prompt word" and "slightly adjusting the input image".

The impact of different parameter values of Input Strength settings

Low value (e.g. 10 – 40):

• The features of the input image are weak and only serve as a reference for the initial noise.

• The output image will be closer to the description of the prompt word, while retaining less structure or details of the original input image.

• Suitable for using input images as inspiration rather than direct modification.

Medium value (e.g. 50 – 70):

• The weights of input image and prompt word are balanced.

• The output image contains the main structure of the input image and also reflects the content of the prompt word.

• Commonly used when you want to add creative or stylistic adjustments to an input image.

High value (e.g. 80 – 100):

• Features of input image are highly preserved.

• The output image will be very close to the original input image, with only slight stylization or detail modifications.

• Suitable for scenarios where you want to perform only small-scale enhancements on the input image.

The advanced user interface mainly has two more option switches, one is Specify image dimensions (that is, specify the resolution of the output image):

The other is Inpainting Options:

Inpainting options, there is only one switch, Smoothen Mask. When inpainting, the mask is used to specify the area that needs to be modified. The Smoothen Mask switch can make the edges of the mask smoother, avoiding too hard or abrupt edges. By smoothing, the transition between the inpainted area and the surrounding unmodified areas is more natural, which reduces the obvious traces of editing and improves the visual consistency of the generated results.

If you need to make fine-grained local modifications, such as repairing small details or patching a more complex background (turn on Smoothen Mask: suitable for scenes that require softer and seamless transitions, such as repairing portraits, natural scenery or complex textures; turn off Smoothen Mask: if you need to retain sharp edges or need clear boundaries when modifying, you can choose to turn it off).

This feature helps optimize the detail and consistency of the generated results, and is especially useful when high-quality restorations are required.

Let’s practice this and use the following cat photo as input:

Descriptive words include:

A cat is napping on the Floor, Cartoon style

The parameters are set as follows:

Specify relevant parameters including input strength:

The other options have their default values, and the final result is as follows:

Or a simpler treatment, in the style of Van Gogh:

You can also use the mask function to make local modifications:

Note: Input Strength is very important. You can try using different input strengths to try different styles of pictures.

IIIusion generator

Generate images with surreal or visual illusion effects. It uses the power of diffusion models to create unique and artistic surreal scenes by combining user prompts and built-in image generation algorithms.

The function interface of IIIusion generator is similar to the previous Text to image and Image to image functions. The only condition parameter with its own characteristics is IIIusion generator. The default value is 1 and the maximum value is 3:

This option is used to adjust the strength of the generated content and the degree of integration between the original image: when the value is low, the generated content will cover or integrate into the original image more lightly, keeping more details and style of the original image; when the value is high, the generated content will change the original image more obviously, or even completely cover the original image, showing the new content specified by the prompt word. This option is similar to Image-to-Image Function Input Strength, but focuses on adding illusion-like effects to existing images and enhancing artistic creativity.

This function is very simple, so I won’t describe it in detail here. Just look at the actual effect:

Inpainting

Function Introduction

exist Diffusion Bee middle,Inpainting The function of the function is to edit, repair or regenerate the specified part of the image without affecting other unselected areas. This function is very suitable for modifying the local content of the image, such as repairing defects, replacing certain elements or filling gaps.

What Inpainting does

Local repair: Used to repair damaged areas in an image or remove unnecessary parts. For example, remove text or objects in an image and fill them with appropriate content.
RegionsEdit：Generate new content for the selected area based on the prompt words provided by the user, while trying to maintain a natural transition with the surrounding areas. For example, replace one object with another.
Filling the gap: Supplement unfinished or partially blank images to generate content that is consistent with the existing picture style.
Creative transformation: Based on the existing image, regenerate the content of the specified area according to the prompt words to achieve local stylization or creative adjustment.

Practical application scenarios

• Repair blemishes or damage in old photos.

• Modify partial content in an existing design, such as changing colors or adding new elements.

• Add new creative details to artwork.

Functional operation

Use a photo of me U and give her blonde hair:

The final effect (the prompt for blond hair is too simple, I should have written long blond hair~):

The face feels a little deformed. This may be because the mask was not used carefully enough to select the hair area. No wonder Akira Toriyama didn't want to draw Super Saiyan 3 because the hair was too troublesome. Even selecting a long hair area made it look fuzzy~.

Inpainting and Image to ImageFunction Mask function

exist Diffusion Bee middle,Inpainting and Image to Image In Mask function There are certain similarities, but their design goals and actual uses are different:

1. Differences in functional positioning

Inpainting

Target: Repair, replace, or regenerate the selected area
Application Scenario: Local editing operations such as local repair, content replacement, and filling in blanks

Image to Image Mask

Target: Regenerate the entire image, but allow some content to be preserved
Application Scenario: Adjust the style or content of an image over a large area while preserving key areas

2. Differences in Mask Functions

Inpainting

Coverage: Only the area drawn manually by the user is affected, and the unselected parts remain unchanged
Edit Strength: AI completely regenerates the selected area
How to use: After manually drawing the mask, enter the prompt word to generate a separate
Typical Results：The generated content is seamlessly integrated with the unselected area (mainly local processing)

Image to Image Mask

Coverage：Different from traditional hard masking, Mask specifies less degree of change for the retained area
Edit Strength：AI adjusts the unmasked part according to the overall image prompts and slightly affects the masked part
How to use: Provide the initial image and use Mask to control the generation amplitude of certain areas
Typical Results：When generating content, the Mask part is usually used as a reference or to maintain consistency (mainly overall processing)

3. Comparison of actual use

Typical usage scenarios of Inpainting:

• Repair scratches, blemishes, or missing parts of your photos.

• Replace local content of the image (such as changing faces, removing distractions in the background).

• Use input prompt words to regenerate the selected area (such as replacing the sky, adding decorations, etc.).

Typical usage scenarios of Image to Image Mask:

• Adjust the style of the background or other parts while preserving specific areas, such as faces or key objects.

• Increase the degree of control over the overall image during editing, for example by avoiding overwriting details that are already satisfactory.

• Protect important areas from drastic changes during style transfer or image enhancement.

Summarize

• Inpainting Focus on local editing, affecting only the manually painted areas, ideal for small-scale content replacement or repair.

• Mask in Image to Image It is an auxiliary function in overall image generation, the purpose is to preserve the selected area while adjusting the style or content of the rest.

The two tools complement each other and you can choose the most suitable tool according to your needs.

Upscaler

Function Introduction

The Upscaler function is very simple: toResolution ImprovementandDetail EnhancementTo improve the image quality and make it suitable for higher resolution applications (such as printing, large-screen presentation or high-quality storage):

Resolution Improvement: Enlarge a low-resolution image to a high-resolution one while minimizing pixelation or distortion; often used to generate images to make them clearer and suitable for higher-resolution devices.
Detail Enhancement: Use AI technology to supplement the texture, edges and details in the image to make the enlarged image more natural; reduce the blur or loss of details that may occur during image enlargement.
Lossless Amplification: Use deep learning models (such as Real-ESRGAN) to ensure that images retain as much original detail as possible after being enlarged.

Functional operation

The Upscaler feature doesn’t have many options. Simply select the photo and click the “Upscaler” button:

I’m afraid that you can’t see the actual effect clearly in the picture above, so let’s take a look at the original picture effect below.

Original picture (the effect of directly taking a screenshot and saving it from the Internet):

The effect after using upscaler:

Note: Because the pictures I uploaded to the image hosting have been processed once (the picture resolution is limited to 1024 * 768 on chevereto), there is still a big difference between what you see and the real effect. But even so, you can see that the pictures processed by Upscaler are richer in details and higher in resolution (very reasonable, after Upscaler processing, the picture size directly increases from 1.2 MB to 26.4 MB~).

AI canvas

Function Introduction

AI Canvas The main function of this function is to provide users with an interactive canvas for creating works in combination with AI painting technology. Its specific functions include the following aspects:

1. Local editing and drawing

Users can manually draw or mask areas (selection boxes) on the canvas to specify the specific location where the AI should generate an image or modify the area of an existing image. This feature is suitable for local repairs, detailed modifications, or enhancements based on existing paintings.

2. Inpainting function

AI Canvas usually supports "Inpainting", which is to repair or regenerate the covered area through AI. Users can cover the unsatisfactory parts and let AI repaint according to the context to complete a natural transition.

3. Sketch or suggest design

Users can sketch on the canvas with rough lines or shapes, and then let AI generate more complete and detailed works based on these sketches. This method is suitable for users who have specific composition or design needs.

4. Flexible adjustment of the spawn area

AI Canvas provides greater control, allowing users to select part or all of the canvas area for AI image generation or modification, rather than targeting the entire image each time.

5. Enhance users’ creative expression

With AI Canvas, users can interact with AI directly in the visual interface, providing a more intuitive way to adjust the image generation process and explore more creative possibilities.

Common application scenarios:

• Local image restoration (e.g. removing unwanted objects, completing the image)

• Creative drawing (AI drawing based on user sketches)

• Adjust the details and overall style of generated content

This feature is often used for generative scenes that require detailed modifications or high interactivity, and is very suitable for designers and artists.

Functional operation

Note: The reason why I put the AI Canvas function at the end is because this function can be regarded as a combination of many previous functions: including text generation, image generation, repair, etc., so after explaining the previous functions clearly, I can save a lot of words here.

For the first use, you need to download the ControlNet Inpaint model:

At the top of the canvas area on the right, a series of tools are provided, as shown below (if you want to use the repair function provided by the brush, the large model on the left needs to select the "SD1.5_Inpainting" downloaded previously):

In the Function section at the bottom left of the image above, there are 3 options:

These three options indicate what type of work AI will do in the selection box on the right canvas. "Text To Image" and "Image To Image" are easy to understand, as I mentioned earlier in the article. The key is the third option, which is also the default option "Generative Fill", which is Generative Fill: The function of Generative Fill is to generate and fill new content based on an existing image or canvas by entering text descriptions. If you use Generative Fill on a blank canvas, it is actually very similar to generating images from text:

The effect of selecting "Generative Fill" function:

The effect of selecting "Text To Image" function:

Using "Generative Fill" in this way is a special case (empty canvas). Normally, if you want to regenerate an image (starting from scratch), you should choose "Text to Image". If you want to base it on existing content (for example, to improve the image, remove unnecessary parts, or add new elements), choose "Generative Fill".

Take Kakarot in the picture below as an example. To change his hair to pink, you need to use the "Generative Fill" and "SD1.5_Inpainting" models:

The final effect is as follows:

It can be seen that it is actually the Inpainting function, so I said before that AI Canvas is a combination of multiple functions. In addition, there is also a Paintbrush function. I have studied it for a long time but have not figured out how to use it. If anyone knows how to use it, please tell me in the comment section.

Note: AI canvas is very powerful. I am only demonstrating the most basic usage here. You can slowly study the specific application techniques if you need them.

Summarize

In fact, the core of Diffusion Bee is not only the generation functions based on "Stable Diffusion" (generating images from text, generating images from images), but also the ease of use that almost no learning cost is required for newcomers and can be used right out of the box. If we really compare the power of Diffusion Bee, it is definitely not as good as other professional tools (such as Midjourney, Stable Diffusion based on professional UI, etc.), but if we talk about the speed of getting started, Diffusion Bee is definitely the fastest: after all, you can start using it directly after double-clicking the installation (at most it takes a little time to download the default basic model).

As a user of Apple's M-series chips, you naturally have powerful GPUs and neural network chips, which are simply the best carriers for Diffusion Bee: whether you need them or not, not using them is equivalent to losing hundreds of millions!

Unfortunately, there are too few detailed tutorials on the Internet about Diffusion Bee (especially the functions and setting methods of the parameters involved in the advanced options), so it took me a lot of time to find the functions of these parameters and verify the effects after setting them. Fortunately, I have covered most of them in this article. I hope it can be more or less helpful to those friends who need to use Diffusion Bee (I originally just wanted to write a random article to make up the number, but the more I wrote, the more mistakes I made, and in the end I found that I was even more tired than usual?).

Note 1: In this article, I have explained in detail the meaning and effects of many basic settings. I will not repeat them in subsequent related articles (such as the next article about Draw Things).

Note 2: This article involves the most basic generation applications. In addition, I am not very good at the operations involving these subdivided fields (some skills and experience need to be accumulated through repeated actual combat), so you don’t have to stick to my superficial demonstrations in this article (some of my statements in the article may not be accurate), just use it as a reference.

Preface

Diffusion Bee Installation

Software Introduction

Install

Diffusion Bee Practice

Text to image

Function Introduction

Getting Started

Advanced use (enable advanced options)

General interface

Diffusion and Seed

ControlNet

LoRA

Misc

Image to image

Function Introduction

Functional operation

IIIusion generator

Inpainting

Function Introduction

Functional operation

Inpainting and Image to ImageFunction Mask function

Upscaler

Function Introduction

Functional operation

AI canvas

Function Introduction

Functional operation

Summarize

Send Comment Edit Comment

Related Posts