Barclay's Updated Guide for Dummies
Assuming you have a recent Nvidia GPU (1070+) and Windows, just 8 steps to get started:
+ Download and install Python: https://www.python.org/ftp/python/3.10.10/python-3.10.10-amd64.exe (ensure you check 'Add Python to PATH')
+ Download and install Git: https://github.com/git-for-windows/git/releases/download/v2.39.2.windows.1/Git-2.39.2-64-bit.exe
+ Download and extract AutoMatic1111's WebUI to it's own folder: https://github.com/AUTOMATIC1111/stable-diffusion-webui/archive/refs/heads/master.zip
+ Download at least one checkpoint to the webui's models/stable-diffusion folder (start with Bigger Girls V2): https://civitai.com/api/download/models/6327?type=Pruned%20Model&format=PickleTensor
+ Download at least one VAE to use for color correction if you are merging checkpoints or using checkpoints without a baked in VAE (any work fine really, you just need one or you will get purple splots disease or extremely faded or sepia tones) to models/stable-diffusion folder:
Anything v4 VAE (standard anime color scheme): https://huggingface.co/andite/anything-v4.0/blob/main/anything-v4.0.vae.pt (RECOMMENDED)
Xpero End1ess VAE (vibrant colors): https://civitai.com/api/download/models/7307?type=VAE&format=Other
Stable Diffusion VAE - Photorealism colors (this vae must downloaded to the separate stable-diffusion-webui/models/VAE folder instead): https://huggingface.co/stabilityai/sd-vae-ft-ema-original/resolve/main/vae-ft-ema-560000-ema-pruned.ckpt
Save the VAE in the models/stable-diffusion folder AND select it for use with all models after you run the webui. (Settings Tab - Stable Diffusion - SD VAE)
If you are running a Pascal, Turing and Ampere (1000, 2000, 3000 series) card, Add --xformers to COMMANDLINE_ARGS in webui-user.bat for slightly better performance/speed.
You can also add --listen if you would like the WebUI to be accessible from other computers on your network, your phone or tablet for example.
Run web-user.bat.
Wait patiently while it installs dependencies and does a first time run.
It may seem "stuck" but it isn't. It may take up to 10-15 minutes.
Once it finishes loading, head to 127.0.0.1:7860 in your browser to access the web ui.
Don't forget to setup your VAE as instructed earlier. (Settings Tab - Stable Diffusion - SD VAE)
You could also check 'Ignore selected VAE for stable diffusion checkpoints that have their own .vae.pt next to them' This will make the selected VAE only be used if the checkpoint you are generating from does not have one baked in or already downloaded right next to it. I wouldn't recommended using this option, as if you use many models and generate between them, the color scheme may not be consistent. Picking one from the drop down and using it for all generations, regardless of if the checkpoint has baked in VAE or a separate vae with it, is usually best in my opinion. Make sure to hit "Apply and Restart WebUI" for the change to take effect.
HOW TO PROMPT:
Start with simple positive prompts and build from there. A typical positive prompt might look something like this:
masterpiece, best quality, highres, 1girl, (chubby cute teenage anime cowgirl redhead standing in front of a desk), (beautiful green eyes), (cow ears), (cow horns), (medium breasts), (blank expression), jeans, (white t-shirt), (freckled face), deep skin, office lighting
PROMPT STRUCTURING:
masterpiece, best quality, highres, 1girl - This is telling the model primarily (by putting it at the front of prompt, i.e., weighting) make the generation resemble art tagged as masterpiece, was originally uploaded in high resolution, and was specifically tagged as 1girl, meaning it was tagged on a Danbooru as only having one female subject within frame. (Add the Danbooru autocomplete extension for help with learning those).
(chubby cute teenage anime cowgirl redhead standing in front of a desk) - putting this in brackets tells the model to focus more on this specific grouping of tokens more than those that are not in brackets. Emphasis.
This is also where you typically put the main subject of the generation in the form of ADJECTIVE DESCRIPTOR FLAVOR SUBJECT LOCATION ACTIVITY
(beautiful green eyes), (cow ears), (cow horns), (medium breasts), (blank expression) - these are also in brackets, but behind our main subject. This helps the model apply and emphasize these features AFTER the main subject is 'visualized' in frame by the AI in the first 10 steps or so. Applying these before the main subject could result in TOO much emphasis, i.e. cow ears everywhere, eyes on things that shouldn't have eyes, eyes and ears not aligned to the characters because they were 'drawn' first, etc.
You could further weight these emphasis keywords within themselves.
You just add a full colon followed by a decimal number to the word you want to emphasize. The decimal numbers are percentages, so they must add up to 1.
I.e. (massive belly:0.7) (huge breasts:0.3) would emphasize both prompts but emphasize the belly more than twice the strength it applies to the boobs.
jeans, (white t-shirt), (freckled face), deep skin, office lighting - we prefer jeans, but we do not mind if they are otherwise, same with office lighting. If the model decides hey maybe shorts and candlelight, hey, let the boy try. These terms are near the end of the prompt so they may or may not be respected, depending on CFG scale.
NEGATIVE PROMPTING:
(lazy eye), (heterochromia), lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, bad feet, extra limbs, (multiple navels), (two navels), (creases), (folds), (double belly), thin, slim, athletic, muscular, fit, fat face, blemished stomach, rash, skin irritation
These are all things we DON'T want to see, and we can use emphasis here as well. you don't have to use a negative prompt, but it's often quite helpful to achieve what you're going for. In this example, I wanted to make sure that the subject would not be described as muscular or athletic.
Hit generate and watch the magic happen.
MISC BEGINNER TIPS:
+ Experiment and find your favorite sampler. I tend to favor the three DPM++ options. All samplers vary in speed, quality, amount of steps required for good results, variety, etc. It will take some experimentation to find your favorites and you may need to use different ones depending on context (if generating from scratch or img2img, for example). Note that, the original base model was trained on DDIM, so you may want to play with that one at least a little bit, to get an idea of how the model generated images by default, before we had an array of other samplers to choose from.
+ CFG scale refers to how closely the model should try to follow the text prompt. A lower scale of 6-8 will produce more variety but may not follow the text prompt as closely as a higher scale of 9-11.
Higher than 11 (13+) can 'overcook' an image, and lower than 6 (1-3) can produce messy, blurry, unfocused generations.
+ When running a batch size of more than one, try tick the 'Extra' checkbox and drag the variation strength to .5-.7 for interesting inspirations. Especially effective with simple prompts only describing a subject, not what they are doing or where they are.
Edited last time by admin on 04/12/2023 (Wed) 02:34:48.