Troubleshooting¶
This guide covers common issues and their solutions when working with DevOps Images.
Image Pull Issues¶
pull access denied or manifest unknown¶
Symptoms:
Solutions:
- Verify the image path is correct:
# Correct paths
ghcr.io/jinalshah/devops/images/all-devops:latest
registry.gitlab.com/jinal-shah/devops/images/all-devops:latest
js01/all-devops:latest
- Check if you need authentication:
# For GHCR
docker login ghcr.io
# For GitLab
docker login registry.gitlab.com
# For Docker Hub
docker login
- Verify the tag exists:
# List available tags on GitHub
gh api repos/jinalshah/devops-images/pkgs/container/devops%2Fimages%2Fall-devops/versions
- Try a different registry:
Registry rate limits (Docker Hub)¶
Symptoms:
Solutions:
- Use GHCR or GitLab registry instead:
- Authenticate to increase Docker Hub limits:
- In CI/CD, use authenticated pulls:
# GitHub Actions example
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
Network timeouts during pull¶
Solutions:
- Retry with exponential backoff in CI:
for i in {1..3}; do
docker pull ghcr.io/jinalshah/devops/images/all-devops:latest && break
echo "Retry $i failed, waiting..."
sleep $((i * 10))
done
-
Use a closer registry mirror or CDN
-
Check your network connection and firewall settings
Container Runtime Issues¶
Files created as root on host¶
Symptoms: Files created by the container are owned by root:root on the host, making them difficult to modify.
Solution:
Run container with your host user ID:
docker run --rm --user "$(id -u):$(id -g)" \
-v "$PWD":/srv \
ghcr.io/jinalshah/devops/images/all-devops:latest \
terraform fmt -recursive /srv
Alternative for persistent containers:
# Create a wrapper script
cat > ~/bin/devops <<'EOF'
#!/bin/bash
docker run --rm --user "$(id -u):$(id -g)" \
-v "$PWD":/workspace \
-w /workspace \
ghcr.io/jinalshah/devops/images/all-devops:latest "$@"
EOF
chmod +x ~/bin/devops
# Now use it
devops terraform plan
Permission denied on mounted volumes¶
Symptoms:
Solutions:
- Ensure files have execute permissions:
- Mount as read-only if you only need to read:
docker run --rm -v "$PWD":/srv:ro ghcr.io/jinalshah/devops/images/all-devops:latest cat /srv/file.txt
- Check SELinux context (on RHEL/CentOS/Fedora):
# Add :z or :Z suffix to volume mount
docker run --rm -v "$PWD":/srv:z ghcr.io/jinalshah/devops/images/all-devops:latest ls /srv
Container exits immediately¶
Symptoms: Container starts and exits right away when using docker run -d.
Solution:
The images default to an interactive shell. Use -it or provide a long-running command:
# For interactive use
docker run -it ghcr.io/jinalshah/devops/images/all-devops:latest
# For background daemon (less common)
docker run -d ghcr.io/jinalshah/devops/images/all-devops:latest sleep infinity
Out of disk space¶
Symptoms:
Solutions:
- Clean up Docker resources:
# Remove unused images
docker image prune -a
# Remove all unused resources
docker system prune -a --volumes
- Check Docker disk usage:
- Increase Docker Desktop disk allocation (macOS/Windows)
Cloud Provider Authentication Issues¶
AWS credentials not found¶
Symptoms:
Solutions:
- Mount AWS credentials directory:
docker run --rm -v ~/.aws:/root/.aws \
ghcr.io/jinalshah/devops/images/aws-devops:latest \
aws sts get-caller-identity
- Pass credentials as environment variables:
docker run --rm \
-e AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY \
-e AWS_SESSION_TOKEN \
ghcr.io/jinalshah/devops/images/aws-devops:latest \
aws sts get-caller-identity
- Use IAM roles (in EC2/ECS): Container automatically inherits IAM role when running on AWS infrastructure.
GCP authentication failed¶
Symptoms:
Solutions:
- Mount gcloud config directory:
docker run --rm -v ~/.config/gcloud:/root/.config/gcloud \
ghcr.io/jinalshah/devops/images/gcp-devops:latest \
gcloud auth list
- Use service account key file:
docker run --rm \
-v /path/to/service-account.json:/key.json \
-e GOOGLE_APPLICATION_CREDENTIALS=/key.json \
ghcr.io/jinalshah/devops/images/gcp-devops:latest \
gcloud auth list
- Authenticate inside container:
docker run -it ghcr.io/jinalshah/devops/images/gcp-devops:latest
# Inside container:
gcloud auth login
gcloud auth application-default login
SSH key not found for Git operations¶
Symptoms:
Solutions:
- Mount SSH directory:
docker run --rm -v ~/.ssh:/root/.ssh \
ghcr.io/jinalshah/devops/images/all-devops:latest \
git clone git@github.com:user/repo.git
- Set correct SSH key permissions:
- Use HTTPS instead of SSH:
Build Issues¶
Architecture mismatch errors¶
Symptoms:
Solutions:
- Use Buildx for cross-platform builds:
# Create buildx builder
docker buildx create --name multiarch --use
# Build for specific platform
docker buildx build --platform linux/arm64 \
--target all-devops \
-t all-devops:arm64 \
--load .
- Verify your platform:
- Pull platform-specific tag:
# For ARM64
docker pull ghcr.io/jinalshah/devops/images/all-devops:1.0.abc1234-arm64
# For AMD64
docker pull ghcr.io/jinalshah/devops/images/all-devops:1.0.abc1234-amd64
Build fails on downloading tools¶
Symptoms:
Solutions:
- Check network connectivity:
- Retry the build (transient failures):
- Build with no cache to force re-download:
- Check if behind corporate proxy:
docker build \
--build-arg HTTP_PROXY=http://proxy:8080 \
--build-arg HTTPS_PROXY=http://proxy:8080 \
--target all-devops -t all-devops:local .
Custom build-arg version fails¶
Symptoms: Build fails when overriding tool versions with --build-arg.
Solutions:
-
Verify the version exists: Check the official release pages for the tool you're trying to install.
-
Revert to defaults:
- Test one build arg at a time:
# Test with single override
docker build --build-arg TERRAGRUNT_VERSION=0.68.14 \
--target all-devops -t all-devops:test .
Build runs out of memory¶
Symptoms:
Solutions:
-
Increase Docker memory limit (Docker Desktop)
-
Build stages separately:
# Build base first
docker build --target base -t devops-base:local .
# Then build specific image
docker build --target all-devops -t all-devops:local .
- Use build cache:
Tool-Specific Issues¶
Terraform state locking errors¶
Symptoms:
Solutions:
- Ensure proper AWS credentials are mounted:
docker run --rm -v ~/.aws:/root/.aws -v $PWD:/srv \
ghcr.io/jinalshah/devops/images/all-devops:latest \
terraform force-unlock LOCK_ID
-
Check DynamoDB table exists (for S3 backend)
-
Verify network access to backend
Kubectl context not found¶
Symptoms:
Solutions:
- Mount kubeconfig:
docker run --rm -v ~/.kube:/root/.kube \
ghcr.io/jinalshah/devops/images/all-devops:latest \
kubectl get pods
- Set KUBECONFIG environment variable:
docker run --rm \
-v ~/.kube/custom-config:/kubeconfig \
-e KUBECONFIG=/kubeconfig \
ghcr.io/jinalshah/devops/images/all-devops:latest \
kubectl get pods
- Get credentials from cloud provider:
# For AWS EKS
aws eks update-kubeconfig --name my-cluster
# For GKE
gcloud container clusters get-credentials my-cluster --zone us-central1-a
AI CLI tools not authenticated¶
Symptoms:
Solutions:
- Authenticate on host first:
- Mount config directories:
docker run -it \
-v ~/.claude:/root/.claude \
-v ~/.codex:/root/.codex \
-v ~/.copilot:/root/.copilot \
-v ~/.gemini:/root/.gemini \
ghcr.io/jinalshah/devops/images/all-devops:latest
- Authenticate inside container:
docker run -it ghcr.io/jinalshah/devops/images/all-devops:latest
# Inside container:
claude auth login
Ansible inventory or playbook not found¶
Symptoms:
Solutions:
- Mount project directory:
docker run --rm -v $PWD:/workspace -w /workspace \
ghcr.io/jinalshah/devops/images/all-devops:latest \
ansible-playbook playbook.yml
- Use absolute paths:
docker run --rm -v $PWD:/srv \
ghcr.io/jinalshah/devops/images/all-devops:latest \
ansible-playbook /srv/playbook.yml
Documentation Issues¶
mkdocs command not found¶
Solution:
Port 8000 already in use¶
Solution:
# Use different port
mkdocs serve -a 0.0.0.0:8080
# Or kill process using port 8000
lsof -ti:8000 | xargs kill -9
Documentation not updating¶
Solutions:
- Stop and restart mkdocs:
-
Clear browser cache or use incognito mode
-
Force rebuild:
Platform-Specific Issues¶
Apple Silicon (M1/M2/M3) issues¶
Issue: Wrong architecture pulled
Solution:
# Verify you got ARM64 image
docker run --rm ghcr.io/jinalshah/devops/images/all-devops:latest uname -m
# Should output: aarch64
# Force ARM64 if needed
docker pull --platform linux/arm64 ghcr.io/jinalshah/devops/images/all-devops:latest
Issue: Rosetta compatibility mode warnings
Solution:
Ensure Docker Desktop is using Apple's virtualization framework, not Rosetta.
Windows WSL2 issues¶
Issue: Volume mount performance is slow
Solutions:
- Keep files in WSL2 filesystem:
- Avoid mounting from /mnt/c/ if possible
Issue: Line ending problems
Solution:
Linux permission issues with Docker socket¶
Symptoms:
Solutions:
- Add user to docker group:
- Or use sudo (not recommended for regular use):
Getting Additional Help¶
If you're still experiencing issues:
Gather Information¶
Collect the following information:
# Docker version
docker --version
# Host architecture
uname -m
# Host OS
cat /etc/os-release # Linux
sw_vers # macOS
# Exact error message
docker run ... 2>&1 | tee error.log
Open an Issue¶
Open an issue on GitHub with:
- Title: Clear, concise description of the problem
- Environment: Docker version, host OS, architecture
- Steps to reproduce: Exact commands you ran
- Expected behaviour: What should happen
- Actual behaviour: What actually happened
- Error output: Full error messages and logs
- Image and tag: Which image and version you're using
Community Resources¶
- GitHub Discussions
- Tool-specific documentation
- Individual tool official documentation