为了给即将入驻的 sourceforge 源腾地方,mirrors 将磁盘阵列的 XFS 扩充到 24T,现有 14T 可用空间。
删除无用内容
首先删除一些未同步成功又占据大量空间的源,释放了约 3T 磁盘空间。
- android-src
- android-releases
- sourceforge
- google-v8
- chromiumos
扩容 XFS
XFS 号称能在线扩容,不过那是在 XFS 建立在 LVM 上时才可以。由于害怕 LVM 影响性能,mirrors 的 XFS 是直接建立在磁盘阵列的 GPT 上的。扩容 XFS 分为三步:
- 修改分区表
- 让内核重新载入分区信息,而这需要卸载文件系统
- 在线扩容 XFS
卸载磁盘阵列文件系统
以往卸载磁盘阵列的文件系统是全手工操作,折腾一次至少要十几分钟,其间磁盘阵列上的源都无法访问,整个网络服务(nginx 等)也会有一两分钟中断。这次我们使用了脚本,基本服务(nginx)每次中断时间不超过5秒,磁盘阵列的中断时间不超过2分钟(第一次中断了1分多钟是由于脚本里忘了 mount -a 重新挂载,收到报警短信后赶紧 mount -a)。
- 停止 LXC(因为 LXC 的根文件系统在磁盘阵列上)
- 杀掉所有服务进程
- 卸载已挂载的磁盘阵列文件系统(如果这一步出现问题,需要用 lsof 排查)
- 做想做的事
- 重新挂载文件系统
- 启动服务进程
- 启动 LXC
一开始的脚本是这样的:
sudo service rsync stop; sudo service vsftpd stop; sudo service nginx stop; mount | grep /dev/sdh1 | awk '{print $3}' | while read dir; do sudo umount $dir; sudo lsof $dir; done; sudo service nginx start; sudo service vsftpd start; sudo service rsync start
但发现 lsof 里还有很多 rsync 和 vsftpd 占着文件描述符,因此把 vsftpd 和 rsync 改成了杀气腾腾的 pkill:
sudo service nginx stop; sudo pkill vsftpd; sudo pkill rsync
还是有一些 rsync 进程发了 SIGTERM 信号仍无动于衷,因此改成了 SIGKILL(kill -9 所用的不可捕获杀进程信号)。
sudo service nginx stop; sudo pkill vsftpd; sudo pkill -SIGKILL rsync
现在 umount 成功了,但 rsync 启动失败了。这是由于没有删掉 rsyncd 的 pid 文件,强制删除即可。
sudo service nginx stop; sudo pkill vsftpd; sudo pkill -SIGKILL rsync; mount | grep /dev/sdh | awk '{print $3}' | while read dir; do sudo umount $dir; done; sudo service nginx start; sudo service vsftpd start; sudo rm -f /var/run/rsyncd.pid; sudo service rsync start
坑爹的 udev rule
扩容顺利完成了,不过 partprobe 之后,sdh 变成了跟 sdh1 一样的大小,分区表也不见了。重新启动 iscsi,问题依然如故。
$ sudo service nginx stop; sudo pkill vsftpd; sudo pkill -SIGKILL rsync; mount | grep /dev/sdh | awk '{print $3}' | while read dir; do sudo umount $dir; done; sudo /etc/init.d/open-iscsi restart; sudo service nginx start; sudo service vsftpd start; sudo rm -f /var/run/rsyncd.pid; sudo service rsync start Stopping nginx: nginx. Unmounting iscsi-backed filesystems: Unmounting all devices marked _netdev. Disconnecting iSCSI targets:Logging out of session [sid: 1, target: iqn.2002-10.com.infortrend:raid.sn8223150.001, portal: 192.168.10.1,3260] Logout of [sid: 1, target: iqn.2002-10.com.infortrend:raid.sn8223150.001, portal: 192.168.10.1,3260] successful. . Stopping iSCSI initiator service:. Starting iSCSI initiator service: iscsid. Setting up iSCSI targets: Logging in to [iface: default, target: iqn.2002-10.com.infortrend:raid.sn8223150.001, portal: 192.168.10.1,3260] (multiple) Login to [iface: default, target: iqn.2002-10.com.infortrend:raid.sn8223150.001, portal: 192.168.10.1,3260] successful. . Mounting network filesystems:. Starting nginx: nginx. Starting FTP server: vsftpd. Starting rsync daemon: rsync. $ sudo fdisk -l /dev/sdh Disk /dev/sdh: 24189.3 GB, 24189254763008 bytes 255 heads, 63 sectors/track, 2940842 cylinders, total 47244638209 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/sdh doesn't contain a valid partition table $ sudo fdisk -l /dev/sdh1 Disk /dev/sdh1: 24189.3 GB, 24189254763008 bytes 255 heads, 63 sectors/track, 2940842 cylinders, total 47244638209 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/sdh1 doesn't contain a valid partition table
线索在 syslog 里:
udevd[13846]: kernel-provided name 'sdh1' and NAME= 'sdh' disagree, please use SYMLINK+= or change the kernel to provide the proper name
原来是下面的 udev rule 捣的鬼,内核把 sdh1 和 sdh 都 probe 出来了,sdh1 占用了 sdh 这个名字(因为它们都符合 udev rules,不信可以自己执行 /lib/udev/scsi_id 验证)……
$ cat /etc/udev/rules.d/80-persistent-iscsi.rules KERNEL=="sd*", SUBSYSTEM=="block", PROGRAM="/lib/udev/scsi_id --whitelisted --replace-whitespace /dev/$name", RESULT=="3600d0231000da93e75966be33fd9a2b4", NAME="sdh"
修改方法很简单,加个 %n 就行了。
$ cat /etc/udev/rules.d/80-persistent-iscsi.rules KERNEL=="sd*", SUBSYSTEM=="block", PROGRAM="/lib/udev/scsi_id --whitelisted --replace-whitespace /dev/$name", RESULT=="3600d0231000da93e75966be33fd9a2b4", NAME="sdh%n"
扩容过程演示
下面我们演示将磁盘阵列 XFS 从 22T 扩容到 24T 的完整过程(为了文章清晰,删除了一些输出)。
1. 查看分区表信息。GPT 分区表要用 gdisk,而非 fdisk。
boj@mirrors:~$ sudo gdisk /dev/sdh GPT fdisk (gdisk) version 0.8.5 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Command (? for help): i Using 1 Partition GUID code: EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 (Microsoft basic data) Partition unique GUID: 4592A37B-886C-40C9-A2AC-D9145B1D33D1 First sector: 2048 (at 1024.0 KiB) Last sector: 47244640256 (at 22.0 TiB) Partition size: 47244638209 sectors (22.0 TiB) Attribute flags: 0000000000000000 Partition name: 'array'
2. 删除原有分区并新建分区、修改分区名称。注意分区起始扇区、GUID 都必须与原来的相同,否则文件系统无法识别!
Command (? for help): d 1 Using 1 Command (? for help): n Partition number (1-128, default 1): First sector (34-54684213214, default = 2048) or {+-}size{KMGTP}: Last sector (2048-54684213214, default = 54684213214) or {+-}size{KMGTP}: 24T Current type is 'Linux filesystem' Hex code or GUID (L to show codes, Enter = 8300): EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 Changed type of partition to 'Microsoft basic data' Command (? for help): c 1 Using 1 Enter name: array Command (? for help): i Using 1 Partition GUID code: EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 (Microsoft basic data) Partition unique GUID: A5298F7D-A675-41DD-9AA3-BE119CA73DB3 First sector: 2048 (at 1024.0 KiB) Last sector: 51539607552 (at 24.0 TiB) Partition size: 51539605505 sectors (24.0 TiB) Attribute flags: 0000000000000000 Partition name: 'array' Command (? for help): w Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING PARTITIONS!! Do you want to proceed? (Y/N): Y OK; writing new GUID partition table (GPT) to /dev/sdh. Warning: The kernel is still using the old partition table. The new table will be used at the next reboot. The operation has completed successfully.
3. 卸载磁盘阵列文件系统以便 partprobe 让内核重新载入分区表(详见上文)
boj@mirrors:~$ sudo lxc-list RUNNING lxr pypi sync (auto) boj@mirrors:~$ sudo lxc-stop -n pypi boj@mirrors:~$ sudo lxc-stop -n sync boj@mirrors:~$ sudo lxc-stop -n lxr boj@mirrors:~$ sudo service nginx stop; sudo pkill vsftpd; sudo pkill -SIGKILL rsync; mount | grep /dev/sdh | awk '{print $3}' | while read dir; do sudo umount $dir; done; sudo partprobe; sudo mount -a; sudo service nginx start; sudo service vsftpd start; sudo rm -f /var/run/rsyncd.pid; sudo service rsync start Stopping nginx: nginx. Starting nginx: nginx. Starting FTP server: vsftpd. Starting rsync daemon: rsync. boj@mirrors:~$ sudo lxc-start -n pypi -d boj@mirrors:~$ sudo lxc-start -n sync -d boj@mirrors:~$ sudo lxc-start -n lxr -d
4. 检查分区信息,使用 xfs_growfs 对 XFS 分区进行扩容。
boj@mirrors:~$ sudo gdisk -l /dev/sdh GPT fdisk (gdisk) version 0.8.5 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Disk /dev/sdh: 54684213248 sectors, 25.5 TiB Logical sector size: 512 bytes Disk identifier (GUID): 5E620981-EBFB-4EEA-BCF9-0052707F6859 Partition table holds up to 128 entries First usable sector is 34, last usable sector is 54684213214 Partitions will be aligned on 2048-sector boundaries Total free space is 3144607676 sectors (1.5 TiB) Number Start (sector) End (sector) Size Code Name 1 2048 51539607552 24.0 TiB 0700 array boj@mirrors:~$ sudo xfs_growfs /dev/sdh1 meta-data=/dev/sdh1 isize=256 agcount=22, agsize=268435455 blks = sectsz=512 attr=2 data = bsize=4096 blocks=5905579776, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 data blocks changed from 5905579776 to 6442450688
5. 检查扩容是否成功。
boj@mirrors:~$ df -lh /dev/sdh1 Filesystem Size Used Avail Use% Mounted on /dev/sdh1 24T 11T 14T 43% /srv/ftp/ubuntu-old-releases boj@mirrors:~$ sudo lxc-list RUNNING lxr pypi sync (auto) FROZEN STOPPED mirror-lab mirror-lab_ php-mirror root
最后提醒 mirrors 维护者,这类危险操作必须在 screen 或 byobu(screen 的封装,可以有选项卡)中进行,以免你的网络突然中断,脚本被 SIGHUP,将系统留在不可预测的状态。