Should I struggle through constant crashes to get my 7900gre with 16gb of vram working, possibly through the headache of ONNX? Can anyone report their own success or offer advice? AMD on linux is generally lovely, SD with AMD on linux, not so much. It was much better with my RTX2080 on linux but gaming was horrible with NVIDIA drivers. I feel I could do more with the 16GB AMD card if stability wasn’t so bad. I currently have both cards running to the horror of my PSU. A1111 does NOT want to see the NVIDIA card, only the AMD. Something about the version of pytorch? More work to be done there.

  • Having a much better time back on Cinnamon default instead of Wayland. Oops!

** It heard me. Crashed again on an x/y plot but due to being away from Wayland I was able to see the terminal dump: amdgpu thermal overload! shutdown initiated! That’ll do it! Finally something easy to fix. Wonder why thermal throttling isn’t kicking in to control runaway? Will stress it once more and clock the temps this time.

Temps were exceeding 115C, phew! No idea why the default amdgpu driver has no fan control but they’re ripping like they should now. Monitoring temps has restored system stability. Using multiple amd/nvidia dedicated venv folders and careful driver choice/installation were the keys to multigpu success.

  • abcdqfr@lemmy.worldOP
    link
    fedilink
    arrow-up
    2
    ·
    5 months ago

    I might take the docker route for the ease of troubleshooting if nothing else. So very sick of hard system freezes/crashes while kludging through the troubleshooting process. Any words of wisdom?

    • electricprism@lemmy.ml
      link
      fedilink
      arrow-up
      2
      ·
      5 months ago

      Assume I’m an amature and bad at this ;P

      In any case you might try a docker-compose.yml

      version: "3.8"
      # Compose file build variables set in .env
      services:
        supervisor:
          platform: linux/amd64
          build:
            context: ./build
            args:
              PYTHON_VERSION: ${PYTHON_VERSION:-3.10}
              PYTORCH_VERSION: ${PYTORCH_VERSION:-2.2.2}
              WEBUI_TAG: ${WEBUI_TAG:-}
              IMAGE_BASE: ${IMAGE_BASE:-ghcr.io/ai-dock/python:${PYTHON_VERSION:-3.10}-cuda-11.8.0-base-22.04}
            tags:
              - "ghcr.io/ai-dock/stable-diffusion-webui:${IMAGE_TAG:-cuda-11.8.0-base-22.04}"
              
          image: ghcr.io/ai-dock/stable-diffusion-webui:${IMAGE_TAG:-cuda-11.8.0-base-22.04}
          
          devices:
            - "/dev/dri:/dev/dri"
            # For AMD GPU
            #- "/dev/kfd:/dev/kfd"
          
          volumes:
            # Workspace
            - ./workspace:${WORKSPACE:-/workspace/}:rshared
            # You can share /workspace/storage with other non-WEBUI containers. See README
            #- /path/to/common_storage:${WORKSPACE:-/workspace/}storage/:rshared
            # Will echo to root-owned authorized_keys file;
            # Avoids changing local file owner
            - ./config/authorized_keys:/root/.ssh/authorized_keys_mount
            - ./config/provisioning/default.sh:/opt/ai-dock/bin/provisioning.sh
          
          ports:
              # SSH available on host machine port 2222 to avoid conflict. Change to suit
              - ${SSH_PORT_HOST:-2222}:${SSH_PORT_LOCAL:-22}
              # Caddy port for service portal
              - ${SERVICEPORTAL_PORT_HOST:-1111}:${SERVICEPORTAL_PORT_HOST:-1111}
              # WEBUI web interface
              - ${WEBUI_PORT_HOST:-7860}:${WEBUI_PORT_HOST:-7860}
              # Jupyter server
              - ${JUPYTER_PORT_HOST:-8888}:${JUPYTER_PORT_HOST:-8888}
              # Syncthing
              - ${SYNCTHING_UI_PORT_HOST:-8384}:${SYNCTHING_UI_PORT_HOST:-8384}
              - ${SYNCTHING_TRANSPORT_PORT_HOST:-22999}:${SYNCTHING_TRANSPORT_PORT_HOST:-22999}
         
          environment:
              # Don't enclose values in quotes
              - DIRECT_ADDRESS=${DIRECT_ADDRESS:-127.0.0.1}
              - DIRECT_ADDRESS_GET_WAN=${DIRECT_ADDRESS_GET_WAN:-false}
              - WORKSPACE=${WORKSPACE:-/workspace}
              - WORKSPACE_SYNC=${WORKSPACE_SYNC:-false}
              - CF_TUNNEL_TOKEN=${CF_TUNNEL_TOKEN:-}
              - CF_QUICK_TUNNELS=${CF_QUICK_TUNNELS:-true}
              - WEB_ENABLE_AUTH=${WEB_ENABLE_AUTH:-true}
              - WEB_USER=${WEB_USER:-user}
              - WEB_PASSWORD=${WEB_PASSWORD:-password}
              - SSH_PORT_HOST=${SSH_PORT_HOST:-2222}
              - SSH_PORT_LOCAL=${SSH_PORT_LOCAL:-22}
              - SERVICEPORTAL_PORT_HOST=${SERVICEPORTAL_PORT_HOST:-1111}
              - SERVICEPORTAL_METRICS_PORT=${SERVICEPORTAL_METRICS_PORT:-21111}
              - SERVICEPORTAL_URL=${SERVICEPORTAL_URL:-}
              - WEBUI_BRANCH=${WEBUI_BRANCH:-}
              - WEBUI_FLAGS=${WEBUI_FLAGS:-}
              - WEBUI_PORT_HOST=${WEBUI_PORT_HOST:-7860}
              - WEBUI_PORT_LOCAL=${WEBUI_PORT_LOCAL:-17860}
              - WEBUI_METRICS_PORT=${WEBUI_METRICS_PORT:-27860}
              - WEBUI_URL=${WEBUI_URL:-}
              - JUPYTER_PORT_HOST=${JUPYTER_PORT_HOST:-8888}
              - JUPYTER_METRICS_PORT=${JUPYTER_METRICS_PORT:-28888}
              - JUPYTER_URL=${JUPYTER_URL:-}
              - SERVERLESS=${SERVERLESS:-false}
              - SYNCTHING_UI_PORT_HOST=${SYNCTHING_UI_PORT_HOST:-8384}
              - SYNCTHING_TRANSPORT_PORT_HOST=${SYNCTHING_TRANSPORT_PORT_HOST:-22999}
              - SYNCTHING_URL=${SYNCTHING_URL:-}
              #- PROVISIONING_SCRIPT=${PROVISIONING_SCRIPT:-}
      

      install.sh

      sudo pacman -S docker
      sudo pacman -S docker-compose
      

      update.sh

      #!/bin/bash
      # https://stackoverflow.com/questions/49316462/how-to-update-existing-images-with-docker-compose
      
      sudo docker-compose pull
      sudo docker-compose up --force-recreate --build -d
      sudo docker image prune -f
      

      start.sh

      #!/bin/bash
      sudo docker-compose down --remove-orphans && sudo docker-compose up
      
      • abcdqfr@lemmy.worldOP
        link
        fedilink
        arrow-up
        2
        ·
        5 months ago

        What a treat! I just got done setting up a second venv within the sd folder. one called amd-venv the other nvidia-venv. Copied the webui.sh and webui-user.sh scripts and made separate flavors of those as well to point to the respective venv. Now If I just had my nvidia drivers working I could probably set my power supply on fire running them in parallel.

        • electricprism@lemmy.ml
          link
          fedilink
          arrow-up
          1
          ·
          5 months ago

          Excellent, did my test config last month for a friend, I was having trouble on bare metal even though I typically prefer, and in this sense it was nice to have a image I could turn on and off as needed easily.