Pranab's scrapbook: Linux

Showing posts with label Linux. Show all posts

Wednesday, 3 August 2022

Recovering DigitalOcean droplet landing on grub-rescue prompt

One night we rebooted one of our DigitalOcean Ubuntu 18.04 droplet (VM) and after starting, the VM was giving an error and directly going to the grub rescue prompt.

The error was displayed as

error: file /boot/grub/i386-pc/normal.mod not found

grub rescue>

We used the ls command and it will show all the disk devices or partitions which were connected to our VM.

grub rescue> ls

(hd0) (hd0,gpt15) (hd0,gpt14) (hd0,gpt1) (hd1) (hd2)

grub rescue> ls (hd0,gpt1)/

We can see from the above output that our /boot directory is missing.

As the /boot folder is missing or probably got deleted by mistake, we tried to restore the droplet backup also. But unluckily in available droplet backups also the /boot folder was missing and the droplet won't start.

So we decided to go with DigitalOcean recovery option. We stopped our VM and took a snapshot and proceeded to boot from Recovery ISO.

1) Go to the Recovery link in DigitalOcean console, after the VM is shut down. Select the Boot from Recovery ISO option.

Turn the VM on and go to the recovery console.

Click on Launch Recovery Console.

2) Once you are in the recovery console, choose option 1. Mount Your Disk Image. This will mount our droplet's root volume.

3) Then choose option 6 to go to the Interactive Shell.

4) In the interactive shell, execute the below commands:

a. mount -o bind /dev /mnt/dev

b. mount -o bind /dev/pts /mnt/dev/pts

c. mount -o bind /proc /mnt/proc

d. mount -o bind /sys /mnt/sys

e. mount -o bind /run /mnt/run

5) Change root for your mounted disk and go to droplet’s root directory.

chroot /mnt

6) Create the GRUB config file using the command: /usr/sbin/grub-mkconfig -o /boot/grub/grub.cfg

7) Our droplet’s disk is /dev/vda and we will install GRUB on this disk /usr/sbin/grub-install /dev/vda

8) At this point, we can exit from the chrooted environment.

exit

9) Shutdown the VM and turn the VM on from the VM's hard drive. But in our case VM didn't boot and went to the grub> console.

10) To resolve this I rebooted the VM and again performed the above 1-7 steps.

After that performed an upgrade of the installed packages.

a) apt update

b) apt upgrade

But the apt upgrade command failed, with the below error:

Could not find /boot/grub/menu.lst file.

Would you like /boot/grub/menu.lst generated for you? (y/N)

/usr/sbin/update-grub-legacy-ec2: line 1101: read: read error: 0: Bad file descriptor

11) To resolve the error, I created the /boot/grub/menu.lst file manually.

touch /boot/grub/menu.lst

12) After that again I run the apt upgrade command.

Now the apt command showed me the below question for the/boot/grub/menu.lst file.

From the available option, select the first one, "install the package maintainer's version"

13) This time apt upgrade command was successful.

After that, we can exit from the chrooted environment.

Use the exit command to exit from the chrooted environment.

14) Shutdown the recovery environment

shutdown -h now

15) Start the VM after selecting the Boot from Hard Drive option from DigitalOcean's Recovery Link.

This time our recovery was successful and the VM started without any issue.

Thursday, 20 January 2022

Bootstrapping your own Kubernetes clusters for testing and development

In this document, I am going to show you the simplest & quickest way to ready your own Kubernetes cluster that you can use for testing, learning and development purposes. It is not recommended for production scenarios.

I will use one master node and two worker nodes for this demonstration. I am using Virtualbox VMs with all the nodes running Ubuntu 20.04 and the scripts that I am going to use are also for Ubuntu only.

Prerequisites:

Minimum RAM per node should be 2 GB
2 CPU cores per node
Swap off on all the nodes

Run swapoff command on each node:
$ sudo swapoff -a
Disable any swap entry in /etc/fstab file

Recommendation:

The nodes should probably be in the same local subnet, they should be able to communicate with each other without any firewall.
If you are using VMs in some cloud provider, ensure that the VMs are in the same VCN and subnet. You can configure the security list/cloud firewall so that the VMs can interact with each other for all the ports needed in a Kubernetes cluster.

Initial Setup:

Suppose my VMs are named this way:

Node	IP
master	192.168.0.51
worker1	192.168.0.52
worker2	192.168.0.53

You can add the entries in all the VMs hosts file, so that they can communicate with each other by hostnames. So edit /etc/hosts file on each VM and add the following lines:

192.168.0.51 master

192.168.0.52 worker1

192.168.0.53 worker2

Now we are ready to start the installation

Master Node:

On the master node run the scripts step by step in the same order it is shown below:

Step 1:

Install container runtime containerd using the script:
https://github.com/pranabsharma/scripts/blob/master/kubernetes/installation/install_containerd.sh

Download the script and run it
ubuntu@master:~$ ./install_containerd.sh

Step 2:

Install the kubectl, kubeadm and kubelet using the script:

https://github.com/pranabsharma/scripts/blob/master/kubernetes/installation/install_kubeTools.sh

Download the script and run it

ubuntu@master:~$ ./install_kubeTools.sh

Step 3:

Download the below script and ONLY run on your master node:

https://github.com/pranabsharma/scripts/blob/master/kubernetes/installation/run_on_master.sh

Download the script and run it

ubuntu@master:~$ ./run_on_master.sh

This script does the following tasks:

Run kubeadm to initialize a Kubernetes control-plane on the master node.
Deploy Wavenet CNI plugin to manage the kubernetes pod networking.
Copy the kubeconfig file to the user's home directory location so that kubectl commands can be run without specifying the kubeconfig file.

Our master node and control-plane are ready. At this point we will get the following status of our cluster:

ubuntu@master:~$ kubectl get node

NAME STATUS ROLES AGE VERSION

master Ready control-plane,master 50m v1.23.2

ubuntu@master:~$ kubectl get pod -n kube-system

NAME READY STATUS RESTARTS AGE

coredns-64897985d-fvnhj 1/1 Running 0 51m

coredns-64897985d-wq6z5 1/1 Running 0 51m

etcd-master 1/1 Running 0 51m

kube-apiserver-master 1/1 Running 0 51m

kube-controller-manager-master 1/1 Running 0 51m

kube-proxy-hnk2z 1/1 Running 0 51m

kube-scheduler-master 1/1 Running 0 51m

weave-net-gjvqq 2/2 Running 1 (50m ago) 51m

Worker Node

Installation steps on worker nodes are the same as the master, the only difference is that we are going to skip the Step3 of the master node (step3 is for setting up the control plane). Run the scripts as shown in Step1 and Step2:

Step 1:

Install container runtime containerd using the script:
https://github.com/pranabsharma/scripts/blob/master/kubernetes/installation/install_containerd.sh

Download the script and run it
ubuntu@worker1:~$ ./install_containerd.sh

Step 2:

Install the kubectl, kubeadm and kubelet using the script:

https://github.com/pranabsharma/scripts/blob/master/kubernetes/installation/install_kubeTools.sh

Download the script and run it

ubuntu@worker1:~$ ./install_kubeTools.sh

Adding Worker Nodes to the cluster

At this point our required software and services for the Kubernetes cluster are ready. The final step is to add the worker nodes to the cluster.

Step1:

We are going to create a new token for the worker node to join the cluster.

Run the below command on master node:

ubuntu@master:~$ kubeadm token create --print-join-command

This command will output the command to join the cluster. The output will be something like this:

kubeadm join 192.168.0.51:6443 --token pk9v0f.o8valhztkblohsmu --discovery-token-ca-cert-hash sha256:9e046d3f15e49c7363ec7a762767b169a296d6af7150aad56d21d54399a2df6f

Copy the output, we will need it in the next step.

Step 2:

Run the copied output command on the worker nodes

ubuntu@worker1:~$ kubeadm join 192.168.0.51:6443 --token pk9v0f.o8valhztkblohsmu --discovery-token-ca-cert-hash sha256:9e046d3f15e49c7363ec7a762767b169a296d6af7150aad56d21d54399a2df6f

Immediately after running the above command on worker node, if we check the nodes in the cluster we may get the below output:

ubuntu@master:~$ kubectl get node

NAME STATUS ROLES AGE VERSION

master Ready control-plane,master 54m v1.23.2

worker1 NotReady <none> 39s v1.23.2

After some time, the worker node will come into the ready state.

In the same way we can add the worker2 node also.

That’s it, and our kubernetes cluster is ready to rock!!! Super easy isn’t it?

Tuesday, 20 November 2018

Nginx HTTP/2 openssl NPN issue

We wanted to make one of our website HTTP/2 enabled. The website was running on Ubuntu 14.04 server and on Nginx web server version 1.14.0 (Nginx added HTTP/2 support since version 1.9.5). We did all the necessary configurations of Nginx and we were ready to go. But when we checked the website from our most commonly used web-browser Google chrome and Firefox, it showed that the website’s contents were loaded with HTTP/1.1 not with HTTP/2 as we expected.

When we checked the access log of the website, we could see HTTP/1.1 request only which was really strange for us as we did all necessary Nginx configurations for HTTP/2. Then we verified the HTTP/2 support for the website using the online tool https://tools.keycdn.com/http2-test and this tool showed that our website supports HTTP/2.
After doing some web search, we came across a nice blog: https://www.nginx.com/blog/supporting-http2-google-chrome-users/ which explained what was going wrong, please go through it.
The main reason why our website was not opening in HTTP/2 on major browsers, because the vendors have stopped supporting the Next Protocol Negotiation (NPN) method for upgrading a connection to HTTP/2. Now most of the newer versions of web browsers support the new standard, Application Layer Protocol Negotiation (ALPN). So the operating system on which the web server is running must provide a version of OpenSSL that supports ALPN. OpenSSL 1.0.2 or later supports ALPN. We were using Ubuntu 14.04 and which has OpenSSL version 1.0.1f and this version do not support ALPN. Ubuntu 16.04 LTS has OpenSSL version 1.0.2g and this one supports ALPN. So we shifted the website to another server with Ubuntu OS with version 16.04 LTS and then configured HTTP/2 on Nginx and the website started opening in HTTP/2 in web browsers.

Wednesday, 10 January 2018

TUSD server on Production

TUS (https://tus.io/protocols/resumable-upload.html) is HTTP based protocol used for resumable file upload. TUSD (https://github.com/tus/tusd) is official implementation of the TUS protocol.

For one of our project, we have decided to use TUSD for uploading large number of files from many locations over unreliable internet connection.

For running TUSD on production, first thing came to our mind was how to make it secure. TUSD does not accept HTTPS connection, neither it has any built in layer for verification/authentication checks. Authentication can be done using the hooks system of TUSD.

So we decided to move the security layer out of TUSD and let it focus on its primary task of resumable file uploading.

Accordingly we introduced HAProxy in front of TUSD server.

TUSD server will run as normal user on default port 1080 and HAProxy will listen on default HTTP/HTTPS ports and proxy pass the requests to TUSD server.
SSL Security certificate will be deployed on HAProxy and HAProxy will do the SSL offloading. TUSD will receive plain HTTP traffic from HAProxy.
We will enable basic HTTP authentication in HAProxy and HAProxy will authenticate the incoming connections before forwarding it to TUSD server.
HAProxy will only proxy a specific URL traffic to the backend TUSD server. It will not forward the whole traffic to TUSD server, so any connection attempt to the default HTTP/HTTPS port on the public IP will not be forwarded to TUSD.

For this document I used Ubuntu 16.04, TUSD Version: 0.9.0 and HAProxy Version 1.7.

As we will run TUSD behind HAProxy, so have to add -behind-proxy flag while starting TUSD to inform TUSD that it is running behind a proxy and have to give attention to the special headers sent by the proxy.

Let’s see the configs for this setup

TUSD service:

[Unit]

Description= TUSD File Upload Server

After=network.target

[Service]

User=tusd

Group=tusd

WorkingDirectory=/app/tusd

ExecStart=/bin/bash -ce "exec /app/tusd/tusd -dir /data/tusupload -hooks-dir /app/tusd/hooks -behind-proxy >> /logs/tusd/tusd.log 2>&1"

# file size

LimitFSIZE=infinity

# cpu time

LimitCPU=infinity

# virtual memory size

LimitAS=infinity

# open files

LimitNOFILE=infinity

# processes/threads

LimitNPROC=infinity

# total threads (user+kernel)

TasksMax=infinity

TasksAccounting=false

[Install]

WantedBy=multi-user.target

HAProxy config file:

userlist UL1

user httpuser insecure-password abcdefghijklmnop

global

log /dev/log local0

log /dev/log local1 notice

chroot /var/lib/haproxy

stats socket /run/haproxy/admin.sock mode 660 level admin

stats timeout 30s

user haproxy

group haproxy

daemon

# Default SSL material locations

ca-base /etc/ssl/certs

crt-base /etc/ssl/private

# Default ciphers to use on SSL-enabled listening sockets.

# For more information, see ciphers(1SSL). This list is from:

# https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/

# An alternative list with additional directives can be obtained from

# https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy

ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS

ssl-default-bind-options no-sslv3

defaults

log global

mode http

option httplog

option dontlognull

timeout connect 5000

timeout client 50000

timeout server 50000

errorfile 400 /etc/haproxy/errors/400.http

errorfile 403 /etc/haproxy/errors/403.http

errorfile 408 /etc/haproxy/errors/408.http

errorfile 500 /etc/haproxy/errors/500.http

errorfile 502 /etc/haproxy/errors/502.http

errorfile 503 /etc/haproxy/errors/503.http

errorfile 504 /etc/haproxy/errors/504.http

frontend localhost

bind *:80

bind *:443 ssl crt /etc/ssl/my_certs/my_cert.pem

redirect scheme https if !{ ssl_fc }

mode http

acl tusdsvr hdr(host) -i tusdserver.example.com

use_backend tus-backend if tusdsvr

#########################Backend Settings##########################

backend tus-backend

#HTTP basic Authentication check

acl AuthOkay_UsersAuth http_auth(UL1)

http-request auth realm UserAuth if !AuthOkay_UsersAuth

#Setting X-Forwarded-Proto header to https to let TUSD know that it is behind HTTPS proxy and should return https URLs.

http-request set-header X-Forwarded-Proto "https"

http-request add-header X-Forwarded-For %[src]

mode http

server server1 localhost:1080 check fall 3 rise 2

Saturday, 6 January 2018

Strange ext4 error: cannot create regular file … No space left on device !!!

Today I was copying a folder containing large number of small files (around 1,25,000) into a backup partition (/dev/mapper/vg_ema_data-lv_ema_data) on my Ubuntu 16.04 VM with ext4 file system. That folder size was 250MB, and the backup partition had around 45GB of free space. One of the sub-directory of that folder has around 45,000 files with long file names like 3_8_25438833_11_3_4081_2017_12_08_13_07_55_2017_12_09_11_15_01_2018_01_02_13_38_42_2018_01_06_11_29_12_2018_01_06_11_37_07_2018_01_06_11_48_09_2018_01_06_13_16_01_2018_01_06_13_21_13_2018_01_06_13_52_55_2018_01_06_14_47_22.json

When I started copying, after some time the copy operation started giving me error of "cannot create regular file" and“no space left on device” on my backup partition (while copying the files of the folder containing long file names). As that partition had 45GB space, so running out of space was out of question, so I thought maybe I consumed the number of Inodes in the backup partition. I checked Inodes of that partition and found that large number of inodes were free.

After searching a lot I found one excellent blog https://blog.merovius.de/2013/10/20/ext4-mysterious-no-space-left-on.html which wrote about dir_index of ext4. Please read that blog for details of the issue and I am not going to repeat that here.

So I decided to check and disable dir_index for my backup partition (/dev/mapper/vg_ema_data-lv_ema_data).
To check whether dir_index is enabled or not, I used tune2fs command as mentioned in the blog.

# tune2fs -l /dev/mapper/vg_ema_data-lv_ema_data | grep -o dir_index

If the above command outputs dir_index, that means dir_index is enabled for that partition, if it outputs nothing that means dir_index is not enabled.
Once I came to know that dir_index is enabled in my /dev/mapper/vg_ema_data-lv_ema_data partition, so I decied to disable it.

So I used the following command:

# tune2fs -O "^dir_index" /dev/mapper/vg_ema_data-lv_ema_data
to disable dir_index.

After disabling dir_index I tried to copy that directory again, and wow my copy operation successfully completed this time.

Wednesday, 20 September 2017

Ubuntu upstart service for my golang web application

I have one web application written in go and need to deploy it as a service in Ubuntu server, say the name of the app is hello.
Copy the app to some directory in the server (e.g. /app directory, so the application binary is /app/hello)
Create an upstart script (e.g. hello.conf) and place it in /etc/init.
We run the binary using the following line:

exec start-stop-daemon --start \
--chuid $DAEMONUSER:$DAEMONGROUP \
--pidfile /var/run/hello.pid \
--make-pidfile \
--exec $DAEMON $DAEMON_OPTS

To send the stdout and stderr of the application to a log file (e.g. /logs/app/hello/hello.log), we can edit the start-stop-daemon command line:

exec start-stop-daemon --start \
--chuid $DAEMONUSER:$DAEMONGROUP \
--pidfile /var/run/hello.pid \
--make-pidfile \
--startas /bin/bash -- -c "exec $DAEMON $DAEMON_OPTS >> /logs/app/hello/hello.log 2>&1"

Now say we want to collect the summary of garbage collection and go scheduler trace in the log file. We have to change the Go’s runtime environment variable GODEBUG.
GODEBUG=gctrace=1 enables garbage collector (GC) trace. The garbage collector emits a single line to STDERR at each collection. The collector summarizes the amount of memory collected and the length of the garbage collection pause.
To investigate the operation of the runtime scheduler directly, and to get insights into dynamic behaviour of the goroutine scheduler, we can enable the scheduler trace. To enable the scheduler trace we can set:
GODEBUG=schedtrace=1000
The value 1000 is in milliseconds. So the above setting will make the scheduler to emit a single line to standard error every second.
We can combine both garbage collection and scheduler trace as GODEBUG=gctrace=1,schedtrace=30000

So again editing the start-stop-daemon command line:

exec start-stop-daemon --start \
--chuid $DAEMONUSER:$DAEMONGROUP \
--pidfile /var/run/hello.pid \
--make-pidfile \
--startas /bin/bash -- -c "exec /usr/bin/env GODEBUG=gctrace=1,schedtrace=30000 $DAEMON $DAEMON_OPTS >> /logs/app/hello/hello.log 2>&1"

A garbage collection log line looks like:
gc 56 @27.196s 0%: 0.010+3.7+0.014 ms clock, 0.010+0.80/2.6/0+0.014 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 57 @27.260s 0%: 0.007+2.1+0.010 ms clock, 0.007+0.35/1.0/0+0.010 ms cpu, 4->4->0 MB, 5 MB goal, 1 P

56: the GC number, incremented at each GC
@27.196s: time in seconds since program start
0%: percentage of time spent in GC since program start
0.010+3.7+0.014 ms clock: wall-clock times for the phases of the GC
0.007+0.35/1.0/0+0.010 ms cpu: CPU times for the phases of the GC
4->4->0 MB: heap size at GC start (4MB), at GC end (4MB), and live heap (0MB)
5 MB goal: goal heap size
1 P: number of processors used, here 1 processor used

A scheduler trace line looks like:
SCHED 25137ms: gomaxprocs=1 idleprocs=0 threads=4 spinningthreads=0 idlethreads=1 runqueue=2 [98]

25137ms : Time since program start
gomaxprocs=1: Gomaxprocs is the current value of GOMAXPROCS. The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. Starting with Go 1.5, GOMAXPROCS is set to number of CPUs available by default.
idleprocs=0: Number of processors that are not busy. So here 0 processors are idle.
threads=4: Number of threads that the runtime is managing.
spinningthreads=0: Number of spinning threads.
idlethreads=1: Number of threads that are not busy. 1 thread is idle and (3 are running).
runqueue=2: Runqueue is the length of global queue with runnable goroutines.
[98]: Number of goroutines in the local run queue. For a machine with multiple processors we can see multiple values for each processor e.g. [2 2 2 3].
The init script is available here https://github.com/pranabsharma/scripts/blob/master/initScripts/Go/hello.conf

Tuesday, 16 May 2017

Encrypting the shell scripts

Sometimes we need to encrypt a shell script for security reasons, for example if the script contains some sensitive information like password etc.
For this task I am going to use the shc tool (http://www.datsi.fi.upm.es/~frosal/sources/shc.html) to convert my text shell script file into a binary file . Download the source code of shc tool from the link http://www.datsi.fi.upm.es/~frosal/sources/ and extract the GZIP compressed tar archive file. Here I am going to use the 3.8.9 version.
Note: I used Ubuntu 14.04 for this example.
If make is not installed, then install make
# apt-get install make
Go inside the shc-3.8.9 source folder.
# cd shc-3.8.9
# make

Now install shc
#make install

If installation fails with directory not found error, create the /usr/local/man/man1 directory and run the command again.

#mkdir /usr/local/man/man1
# make install

Remove the shc source folder after it is installed
# cd ..
# rm -rf shc-3.8.9/
Our shc tool is installed, we are now going to convert our shell script into binary.
Go to the folder where the shell script is stored. My script name is mysql_backup.
Create binary file of the shell script using the following command:
# shc -f mysql_backup
shc command creates 2 additional files
# ls -l mysql_backup*
-rwxrw-r-- 1 pranab pranab 149 Mar 27 01:09 mysql_backup
-rwx-wx--x 1 pranab pranab 11752 Mar 27 01:12 mysql_backup.x
-rw-rw-r-- 1 pranab pranab 10174 Mar 27 01:12 mysql_backup.x.c

mysql_backup is the original unencrypted shell script.
mysql_backup.x is the encrypted shell script in binary format.
mysql_backup.c is the C source code of the mysql_backup file. This C source code is compiled to create the above encrypted mysql_backup.x file.
We will remove the original shell script (mysql_backup) and c file (mysql_backup.x.c) and rename the binary file (mysql_backup.x) into the shell script (mysql_backup).
# rm -f mysql_backup.x.c
# rm -f mysql_backup
# mv mysql_backup.x mysql_backup

Now we have our binary shell script, the contents of this file can not be easily seen as it is a binary file.

CopyDisable

Wednesday, 3 August 2022

Thursday, 20 January 2022

Prerequisites:

Recommendation:

Initial Setup:

Master Node:

Step 1:

Step 2:

Step 3:

Worker Node

Step 1:

Step 2:

Adding Worker Nodes to the cluster

Step1:

Step 2:

Tuesday, 20 November 2018

Wednesday, 10 January 2018

Saturday, 6 January 2018

Wednesday, 20 September 2017

Tuesday, 16 May 2017