ARM 서버에 Kubernetes 클러스터 구축하기 (3) - 모니터링 스택 (Prometheus + Grafana)

들어가며

이전 글에서 3노드 Kubernetes 클러스터를 완성했습니다. 이번 글에서는 클러스터의 상태를 모니터링하기 위한 Prometheus + Grafana 모니터링 스택을 구축하는 과정을 다룹니다.

구축 목표

✅ Prometheus: 메트릭 수집 및 저장
✅ Grafana: 메트릭 시각화 및 대시보드
✅ PersistentVolume: 데이터 영구 보존
✅ NodePort: 외부 접근 가능

아키텍처 개요

┌─────────────────────────────────────────────────────┐
│          Kubernetes Cluster (3 nodes)               │
├─────────────────────────────────────────────────────┤
│ Master: k8s-master (<마스터_노드_IP>)              │
│ Workers: k8s-worker1, k8s-worker2                   │
├─────────────────────────────────────────────────────┤
│ Monitoring Namespace                                │
├─────────────────────────────────────────────────────┤
│ ┌──────────────┐         ┌──────────────┐           │
│ │ Prometheus   │         │  Grafana     │           │
│ │ Port 9090    │─────────│  Port 3000   │           │
│ │ Storage: 10Gi│         │ Storage: 10Gi│          │
│ └──────────────┘         └──────────────┘           │
├─────────────────────────────────────────────────────┤
│ Services                                            │
├─────────────────────────────────────────────────────┤
│ Prometheus: NodePort 32664 (9090)                   │
│ Grafana:    NodePort 31211 (3000)                   │
└─────────────────────────────────────────────────────┘

사전 준비

1. 클러스터 상태 확인

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 마스터 노드에 SSH 접속
ssh ubuntu@<마스터_노드_IP>

# 클러스터 상태 확인
kubectl get nodes -o wide

# 예상 출력
NAME          STATUS   ROLES           AGE   VERSION
k8s-master    Ready    control-plane   27h   v1.29.15
k8s-worker1   Ready    <none>          27h   v1.29.15
k8s-worker2   Ready    <none>          27h   v1.29.15

2. 필요한 도구 설치

1
2
3
4
5
6
# kubectl (이미 설치됨)
kubectl version --client

# Helm 설치 (패키지 관리자)
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version

Namespace 생성

모니터링 관련 리소스를 전용 namespace에 배치합니다.

1
2
3
4
5
# namespace 생성
kubectl create namespace monitoring

# namespace 확인
kubectl get namespace

PersistentVolume 설정

1. NFS 스토리지 준비

마스터 노드 (NFS 서버)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 마스터 노드에서 공유 디렉토리 생성
mkdir -p /mnt/prometheus /mnt/grafana
chmod 777 /mnt/prometheus /mnt/grafana

# NFS 서버 설정
sudo apt-get install nfs-kernel-server -y

# NFS export 설정 (클러스터 네트워크 IP 범위 허용)
sudo bash -c 'cat >> /etc/exports << EOF
/mnt/prometheus 192.168.122.0/24(rw,sync,no_subtree_check,no_root_squash)
/mnt/grafana 192.168.122.0/24(rw,sync,no_subtree_check,no_root_squash)
EOF'

# NFS 서비스 재시작
sudo systemctl restart nfs-server

# export 확인
sudo exportfs -v

# 예상 출력
# /mnt/prometheus    192.168.122.0/24
# /mnt/grafana       192.168.122.0/24

워커 노드들 (NFS 클라이언트)

중요! 워커 노드에 nfs-common 패키지를 설치해야 NFS 마운트가 작동합니다.

1
2
3
4
5
6
# 워커1과 워커2에서 각각 실행
sudo apt-get update
sudo apt-get install -y nfs-common

# 설치 확인
which mount.nfs

NFS 마운트 테스트

워커 노드에서 수동으로 마운트를 테스트하여 설정이 정상인지 확인:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 워커 노드에서 임시 디렉토리 생성
sudo mkdir -p /mnt/test

# NFS 마운트 테스트
sudo mount -t nfs 192.168.122.10:/mnt/prometheus /mnt/test

# 마운트 확인
mount | grep prometheus

# 테스트 파일 생성 (권한 확인)
touch /mnt/test/test-file

# 마운트 해제
sudo umount /mnt/test

2. PersistentVolume 생성

마스터 노드에서 실행:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
cat << 'EOF' | kubectl apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  nfs:
    server: <마스터_노드_IP>  # 호스트 IP
    path: /mnt/prometheus
  persistentVolumeReclaimPolicy: Retain
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: grafana-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  nfs:
    server: <마스터_노드_IP>
    path: /mnt/grafana
  persistentVolumeReclaimPolicy: Retain
EOF

# PV 확인
kubectl get pv

3. PersistentVolumeClaim 생성

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
cat << 'EOF' | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-pvc
  namespace: monitoring
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  volumeName: prometheus-pv
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-pvc
  namespace: monitoring
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  volumeName: grafana-pv
EOF

# PVC 확인
kubectl get pvc -n monitoring

Prometheus 배포

1. Prometheus 설정 (ConfigMap)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
cat << 'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    scrape_configs:
      - job_name: 'kubernetes-cluster'
        kubernetes_sd_configs:
          - role: node
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
EOF

2. Prometheus Deployment 생성

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
cat << 'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      containers:
      - name: prometheus
        image: prom/prometheus:latest
        args:
          - "--config.file=/etc/prometheus/prometheus.yml"
          - "--storage.tsdb.path=/prometheus"
        ports:
        - containerPort: 9090
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        volumeMounts:
        - name: config
          mountPath: /etc/prometheus
        - name: storage
          mountPath: /prometheus
      volumes:
      - name: config
        configMap:
          name: prometheus-config
      - name: storage
        persistentVolumeClaim:
          claimName: prometheus-pvc
EOF

3. Prometheus Service 생성

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
cat << 'EOF' | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitoring
spec:
  type: NodePort
  selector:
    app: prometheus
  ports:
  - port: 9090
    targetPort: 9090
    nodePort: 32664
EOF

# Service 확인
kubectl get svc -n monitoring

# 예상 출력
# NAME         TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)              AGE
# prometheus   NodePort   10.107.230.194  <none>        9090:32664/TCP       10m

4. ServiceAccount 생성

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
cat << 'EOF' | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitoring
EOF

Grafana 배포

1. Grafana Deployment 생성

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
cat << 'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana:latest
        ports:
        - containerPort: 3000
        env:
        - name: GF_SECURITY_ADMIN_PASSWORD
          value: "admin"
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        volumeMounts:
        - name: storage
          mountPath: /var/lib/grafana
      volumes:
      - name: storage
        persistentVolumeClaim:
          claimName: grafana-pvc
EOF

2. Grafana Service 생성

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
cat << 'EOF' | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: monitoring
spec:
  type: NodePort
  selector:
    app: grafana
  ports:
  - port: 3000
    targetPort: 3000
    nodePort: 31211
EOF

# Service 확인
kubectl get svc -n monitoring

Node Exporter 배포

호스트 메트릭(CPU, 메모리, 디스크 등)을 수집하기 위해 Node Exporter를 각 노드에 배포합니다.

1. Node Exporter DaemonSet 배포

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
cat << 'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '9100'
    spec:
      hostNetwork: true
      hostPID: true
      containers:
      - name: node-exporter
        image: prom/node-exporter:latest
        ports:
        - containerPort: 9100
        args:
          - --path.procfs=/host/proc
          - --path.sysfs=/host/sys
          - --path.rootfs=/rootfs
          - --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)
        volumeMounts:
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
        - name: rootfs
          mountPath: /rootfs
          readOnly: true
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys
      - name: rootfs
        hostPath:
          path: /
EOF

2. Prometheus 설정 업데이트

Node Exporter 메트릭 수집을 위해 Prometheus ConfigMap을 업데이트합니다:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
cat << 'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
          - targets: ['localhost:9090']
      
      - job_name: 'node-exporter'
        kubernetes_sd_configs:
          - role: pod
            namespaces:
              names:
                - monitoring
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_label_app]
            action: keep
            regex: node-exporter
          - source_labels: [__meta_kubernetes_pod_ip]
            action: replace
            target_label: __address__
            replacement: $1:9100
EOF

3. Prometheus Pod 재시작

설정이 변경되었으므로 Prometheus Pod을 재시작합니다:

1
kubectl delete pod -n monitoring -l app=prometheus

4. 메트릭 수집 확인

1~2분 후 Node Exporter 메트릭이 수집되는지 확인:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Prometheus 타겟 상태 확인
curl -s http://192.168.122.10:32664/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, instance: .labels.instance, health: .health}'

# 예상 출력
# {
#   "job": "prometheus",
#   "instance": "localhost:9090",
#   "health": "up"
# }
# {
#   "job": "node-exporter",
#   "instance": "192.168.122.11:9100",
#   "health": "up"
# }
# {
#   "job": "node-exporter",
#   "instance": "192.168.122.12:9100",
#   "health": "up"
# }

# Node Exporter 메트릭 쿼리 (예: 노드 CPU 정보)
curl -s 'http://192.168.122.10:32664/api/v1/query?query=node_cpu_seconds_total' | jq '.data.result | length'

배포 상태 확인

Pod 상태 확인

1
2
3
4
5
6
7
# monitoring namespace의 모든 pod 확인
kubectl get pods -n monitoring

# 예상 출력
# NAME                          READY   STATUS    RESTARTS   AGE
# grafana-5fbb5cd89b-kw7z8      1/1     Running   0          10m
# prometheus-f9645fc7b-jnq7c    1/1     Running   0          10m

Pod 상세 정보

1
2
3
4
5
6
7
8
# Prometheus pod 상세 정보
kubectl describe pod -n monitoring -l app=prometheus

# 로그 확인
kubectl logs -n monitoring -l app=prometheus --tail=50

# 같은 방식으로 Grafana 확인
kubectl logs -n monitoring -l app=grafana --tail=50

접근 테스트

1
2
3
4
5
6
# 마스터 노드에서 포트 확인
netstat -tlnp | grep -E "32664|31211"

# 또는 curl로 테스트
curl -s http://<마스터_노드_IP>:32664/api/v1/query?query=up | head -20
curl -s http://<마스터_노드_IP>:31211/api/health | jq .

외부 접근 설정

호스트에서 포트 포워딩

호스트(ARM 서버) 머신에서:

1
2
3
4
5
6
7
8
9
# 호스트의 127.0.0.1:9090을 VM의 9090으로 포워딩
ssh -L 9090:<마스터_노드_IP>:32664 localhost -N &

# 호스트의 127.0.0.1:3000을 VM의 3000으로 포워딩
ssh -L 3000:<마스터_노드_IP>:31211 localhost -N &

# 브라우저에서 접속
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (username: admin, password: admin)

원격 접근 (SSH 터널)

메인 머신에서:

1
2
3
4
5
6
# ARM 서버로 터널링
ssh -p 5022 <ARM_서버_계정>@<ARM_서버_호스트> -L 9090:<마스터_노드_IP>:32664 -L 3000:<마스터_노드_IP>:31211 -N

# 브라우저에서
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000

Grafana 초기 설정

1. 기본 로그인

URL: http://<마스터_노드_IP>:31211 (또는 localhost:3000)
Username: admin
Password: admin

2. Prometheus 데이터 소스 추가

Grafana 대시보드에서:

Configuration > Data Sources 클릭
Add data source 클릭
Prometheus 선택
설정:
- Name: Prometheus
- URL: http://prometheus:9090
- Access: Server
Save & Test 클릭

3. 기본 대시보드 임포트

Dashboards > Import 클릭
Grafana ID 입력: 1860 (Node Exporter)
데이터 소스로 Prometheus 선택
Import 클릭

문제 해결

Pod이 ContainerCreating 상태일 때

1
2
3
4
5
# 이미지 다운로드 중일 수 있음
kubectl describe pod -n monitoring <pod-name>

# 대기 (몇 분 소요)
watch kubectl get pods -n monitoring

NodePort 접근 불가

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 1. 서비스 포트 확인
kubectl get svc -n monitoring

# 2. 노드 IP 확인
kubectl get nodes -o wide

# 3. 방화벽 확인 (호스트)
sudo firewall-cmd --list-ports

# 4. 포트 오픈 필요시
sudo firewall-cmd --permanent --add-port=32664/tcp
sudo firewall-cmd --permanent --add-port=31211/tcp
sudo firewall-cmd --reload

PVC가 Pending 상태

Pod의 상태가 Pending이고 이벤트에서 NFS 마운트 에러가 보이면, 다음 중 하나입니다.

에러 1: “bad option; for several filesystems”

mount: /var/lib/kubelet/pods/.../kubernetes.io~nfs: bad option; 
for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.

원인: 워커 노드에 nfs-common 패키지가 설치되지 않음

해결:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 워커1과 워커2에서 각각 실행
ssh ubuntu@k8s-worker1
sudo apt-get update
sudo apt-get install -y nfs-common

ssh ubuntu@k8s-worker2
sudo apt-get update
sudo apt-get install -y nfs-common

# Pod 재시작 (자동으로 마운트 재시도)
kubectl delete pod -n monitoring -l app=prometheus
kubectl delete pod -n monitoring -l app=grafana

에러 2: “mount.nfs: access denied by server”

mount.nfs: access denied by server while mounting 192.168.122.10:/mnt/prometheus

원인: 마스터 노드의 /etc/exports 설정에서 워커 노드의 IP가 허용되지 않음

해결:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 마스터 노드에서
# 1. 현재 설정 확인
sudo cat /etc/exports

# 2. /etc/exports 수정 (IP 범위 또는 개별 IP로 설정)
sudo nano /etc/exports

# 올바른 형식 (IP 범위 사용):
/mnt/prometheus 192.168.122.0/24(rw,sync,no_subtree_check,no_root_squash)
/mnt/grafana 192.168.122.0/24(rw,sync,no_subtree_check,no_root_squash)

# 또는 개별 IP로 설정:
/mnt/prometheus 192.168.122.11(rw,sync,no_subtree_check,no_root_squash)
/mnt/prometheus 192.168.122.12(rw,sync,no_subtree_check,no_root_squash)
/mnt/grafana 192.168.122.11(rw,sync,no_subtree_check,no_root_squash)
/mnt/grafana 192.168.122.12(rw,sync,no_subtree_check,no_root_squash)

# 3. NFS 설정 적용
sudo exportfs -ra

# 4. 상태 확인
sudo exportfs -v

# 5. Pod 재시작
kubectl delete pod -n monitoring -l app=prometheus
kubectl delete pod -n monitoring -l app=grafana

일반적인 NFS 문제 디버깅

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# 마스터 노드에서
# 1. NFS 서버 상태 확인
sudo systemctl status nfs-server

# 2. export 목록 확인
sudo exportfs -v

# 3. NFS 데몬 상태
sudo rpcinfo -p

# 워커 노드에서
# 1. NFS 클라이언트 도구 설치 확인
which mount.nfs

# 2. NFS 마운트 수동 테스트
sudo mount -t nfs 192.168.122.10:/mnt/prometheus /mnt/test
mount | grep prometheus
sudo umount /mnt/test

# 3. 마스터 노드의 공유 확인
showmount -e 192.168.122.10

PV 상태 확인

1
2
3
4
5
6
7
8
9
# PV 상태 확인
kubectl get pv

# PVC 상태 확인
kubectl get pvc -n monitoring

# 상세 정보 보기
kubectl describe pvc -n monitoring prometheus-pvc
kubectl describe pvc -n monitoring grafana-pvc

⚠️ NFS 설정 시 주의사항

흔한 실수들

워커 노드에 nfs-common 미설치
- ❌ 마스터 노드만 NFS 서버 설치
- ✅ 워커 노드에도 nfs-common 필수 설치
/etc/exports 설정 오류
- ❌ IP 범위가 너무 제한적 (예: 192.168.122.11만 허용)
- ✅ IP 범위 또는 모든 워커 노드 명시 (예: 192.168.122.0/24)
no_root_squash 빠뜨림
- ❌ (rw,sync,no_subtree_check) - 권한 문제 발생 가능
- ✅ (rw,sync,no_subtree_check,no_root_squash) - 정상 작동
NFS 설정 변경 후 exportfs 실행 안 함
- ❌ /etc/exports 수정 후 즉시 테스트
- ✅ 반드시 sudo exportfs -ra 실행 후 테스트
Pod 배포 전 마운트 테스트 안 함
- ❌ 워커 노드에서 수동 마운트 테스트 없이 배포
- ✅ sudo mount -t nfs ...로 미리 테스트

권장 체크리스트

배포 전 다음을 확인하세요:

마스터 노드: NFS 서버 설치 (nfs-kernel-server)
워커 노드들: NFS 클라이언트 설치 (nfs-common)
마스터 노드: /mnt/prometheus, /mnt/grafana 디렉토리 생성
마스터 노드: /etc/exports 설정 및 exportfs -ra 실행
워커 노드들: 수동 NFS 마운트 테스트 성공
마스터 노드: exportfs -v로 export 설정 확인
마스터 노드: PV 생성 (kubectl get pv)
마스터 노드: PVC 생성 (kubectl get pvc -n monitoring)
모든 노드: 방화벽 설정 (NFS 포트 2049 개방)

모니터링 항목

Prometheus에서 확인 가능한 메트릭

Prometheus 자체:

up{job="prometheus"}: Prometheus 가용성

Node Exporter 메트릭 (호스트 메트릭):

node_cpu_seconds_total: CPU 사용 시간 (초)
node_memory_MemAvailable_bytes: 사용 가능한 메모리
node_memory_MemTotal_bytes: 전체 메모리
node_disk_avail_bytes: 디스크 가용 공간
node_disk_total_bytes: 전체 디스크 공간
node_load1: 1분 로드 평균
node_load5: 5분 로드 평균
node_load15: 15분 로드 평균
node_network_receive_bytes_total: 네트워크 수신 바이트
node_network_transmit_bytes_total: 네트워크 송신 바이트
node_processes_running: 실행 중인 프로세스 수

호스트 정보:

node_uname_info: OS/커널 정보
node_boot_time_seconds: 부팅 시간

Grafana 대시보드 활용

1. Grafana 접속

URL: http://192.168.122.10:31211 (또는 localhost:3000)
Username: admin
Password: admin

2. Node Exporter 대시보드 임포트

Dashboards > Import 클릭
Grafana ID 입력: 1860 (Node Exporter Full)
데이터 소스로 Prometheus 선택
Import 클릭

이제 각 노드의 CPU, 메모리, 디스크, 네트워크 메트릭을 시각화된 그래프로 확인할 수 있습니다.

다음 단계

이제 기본 모니터링 스택이 완성되었습니다. 다음 글(Part 4)에서는:

클러스터 상태 모니터링 (API 서버, etcd 메트릭)
Prometheus 경고 규칙 설정
로그 수집 (선택사항)

을 다룰 예정입니다.

참고 자료

배포 완료! 🎉

이제 실시간으로 Kubernetes 클러스터의 메트릭을 수집하고 시각화할 수 있습니다.

들어가며#

구축 목표#

아키텍처 개요#

사전 준비#

1. 클러스터 상태 확인#

2. 필요한 도구 설치#

Namespace 생성#

PersistentVolume 설정#

1. NFS 스토리지 준비#

마스터 노드 (NFS 서버)#

워커 노드들 (NFS 클라이언트)#

NFS 마운트 테스트#

2. PersistentVolume 생성#

3. PersistentVolumeClaim 생성#

Prometheus 배포#

1. Prometheus 설정 (ConfigMap)#

2. Prometheus Deployment 생성#

3. Prometheus Service 생성#

4. ServiceAccount 생성#

Grafana 배포#

1. Grafana Deployment 생성#

2. Grafana Service 생성#

Node Exporter 배포#

1. Node Exporter DaemonSet 배포#

2. Prometheus 설정 업데이트#

3. Prometheus Pod 재시작#

4. 메트릭 수집 확인#

배포 상태 확인#

Pod 상태 확인#

Pod 상세 정보#

접근 테스트#

외부 접근 설정#

호스트에서 포트 포워딩#

원격 접근 (SSH 터널)#

Grafana 초기 설정#

1. 기본 로그인#

2. Prometheus 데이터 소스 추가#

3. 기본 대시보드 임포트#

문제 해결#

Pod이 ContainerCreating 상태일 때#

NodePort 접근 불가#

PVC가 Pending 상태#

에러 1: “bad option; for several filesystems”#

에러 2: “mount.nfs: access denied by server”#

일반적인 NFS 문제 디버깅#

PV 상태 확인#

⚠️ NFS 설정 시 주의사항#

흔한 실수들#

권장 체크리스트#

모니터링 항목#

Prometheus에서 확인 가능한 메트릭#

Grafana 대시보드 활용#

1. Grafana 접속#

2. Node Exporter 대시보드 임포트#

다음 단계#

참고 자료#