Ubuntu-Server
Server Monitoring

Comprehensive Guide to Monitoring Networking, Security, Web Servers, and NGINX/Apache

Here's the rewritten guide with the continuation added:


Setting up Webmin and DNS


Step 1: Install Webmin

Webmin provides a user-friendly graphical interface for server management, reducing the need to use command-line tools for day-to-day administration tasks.

1.1 Add Webmin Repository

First, add the Webmin repository to your system to easily install and update it using apt.

  1. Update the package list:

    sudo apt update
  2. Install essential dependencies:

    sudo apt install software-properties-common apt-transport-https wget
  3. Add the Webmin PGP key:

    curl -fsSL https://download.webmin.com/jcameron-key.asc | sudo gpg --dearmor -o /usr/share/keyrings/webmin.gpg
  4. Add the Webmin repository to /etc/apt/sources.list:

    sudo cp /etc/apt/sources.list /etc/apt/sources.list.bak
    sudo vim /etc/apt/sources.list
    # Add the following line at the bottom:
    deb [signed-by=/usr/share/keyrings/webmin.gpg] http://download.webmin.com/download/repository sarge contrib

    Explanation: This adds the Webmin repository, allowing you to install and receive updates through apt.

1.2 Install Webmin

  1. Update the package list again:

    sudo apt update
  2. Install Webmin:

    sudo apt install webmin -y

    Note: Webmin runs on port 10000 by default and supports SSL for secure connections.

1.3 Access Webmin

  1. Open your web browser and navigate to:

    https://your-server-ip:10000

    Replace your-server-ip with your actual server’s public or private IP address.

  2. You may receive a browser warning due to the self-signed SSL certificate. Either:

    • Proceed by adding a security exception in your browser, or
    • Replace the self-signed certificate with a trusted SSL certificate (using Let's Encrypt or another CA).
  3. Log in using your system’s root account or a user account with sudo privileges.

Best Practices:

  • Secure access using SSL: Install a trusted SSL certificate (e.g., via Let’s Encrypt) to avoid browser warnings.

  • Restrict Webmin access: Use your firewall (e.g., ufw) to limit access to Webmin only from trusted IPs:

    sudo ufw allow from <trusted-ip-address> to any port 10000

    This restricts access to Webmin on port 10000 from only your specified IP address.


Step 2: Add a Valid Certificate with Let’s Encrypt

Webmin uses a self-signed, untrusted certificate by default. You can replace it with a valid certificate from Let’s Encrypt.

  1. Open your browser and navigate to:

    https://your_domain:10000

    Replace your_domain with the domain pointing to your server’s IP address.

  2. On the first login, you may see an “Invalid SSL” warning. Allow the exception and proceed to your domain so you can replace the self-signed certificate.

  3. Log in with your non-root user.

  4. Once logged in, you will see the Webmin dashboard. Set the server’s hostname by clicking the System hostname field. Enter your Fully-Qualified Domain Name (FQDN) and save the changes.

  5. Click on Webmin Configuration from the Webmin dropdown menu, then go to SSL Encryption and select the Let’s Encrypt tab.

  6. Fill in the required information:

    • Hostnames for certificate: Enter your FQDN.
    • Website root directory for validation file: Choose "Other Directory" and enter /var/www/your_domain (the Apache web server’s root directory from the prerequisites).
    • Months between automatic renewal: Enter 1 to enable automatic renewal.
  7. Click Request Certificate. After successful generation, click Return to Webmin Configuration.

  8. Restart Webmin for the changes to take effect. Reload the page, and your browser should now indicate a valid certificate.


Step 3: Using Webmin

Now that Webmin is set up with a valid SSL certificate, you can start managing your server.

3.1 Managing Users and Groups

  1. From the left-hand sidebar, navigate to System > Users and Groups.

  2. To add a new user called deploy, click Create a new user. Fill in the following fields:

    • Username: deploy
    • User ID: Select Automatic
    • Real Name: Deployment user
    • Home Directory: Select Automatic
    • Shell: Choose /bin/bash
    • Password: Set a password of your choice
    • Primary Group: Select New group with same name as user
    • Secondary Group: Select sudo.
  3. After filling in the fields, click Create to add the user.

3.2 Updating Packages

  1. Click the Dashboard button at the top of the sidebar to check for package updates.

  2. If updates are available, click the link in the Package updates field.

  3. Select the packages you wish to update and click Update selected packages. If prompted, you can also reboot the server via Webmin.


With this setup, you can manage your server using Webmin’s GUI, including secure access with Let's Encrypt, user management, and easy package updates.

Step 2: Install and Configure Nginx or Apache

You can choose between Nginx or Apache for your web server setup, depending on your preference or project requirements.

2.1 Install Nginx

Nginx is known for its high performance and low resource consumption. To install and configure Nginx:

  1. Install Nginx:

    sudo apt install nginx
  2. Start and enable Nginx to start automatically on boot:

    sudo systemctl start nginx
    sudo systemctl enable nginx
  3. Confirm Nginx is running:

    sudo systemctl status nginx

2.2 Install Apache

Apache is a more feature-rich, flexible web server and is widely used in many setups.

  1. Install Apache:

    sudo apt install apache2

Note: If installing both nginx and apache you will need to change the ports on one of the sevrvices as they both cannot use port 80 and 443. To change the default ports on apache go to the ports.conf file

sudo vim /etc/apache/ports.cong
  1. Start and enable Apache to start automatically on boot:

    sudo systemctl start apache2
    sudo systemctl enable apache2
  2. Confirm Apache is running:

    sudo systemctl status apache2

3. Install and Configure BIND and Troubleshoot Named-resolvconf.service

Before diving into the installation and configuration of BIND, it's important to address potential issues with systemd services related to BIND and resolvconf. If you encounter an inactive named-resolvconf.service due to a missing or non-executable /sbin/resolvconf, follow these steps to resolve the issue:

Resolving Named-resolvconf.service Issues:

  1. Check if resolvconf is installed:

    dpkg -s resolvconf
  2. If it's not installed, install it:

    sudo apt-get install resolvconf
  3. If installed, check the file permissions for /sbin/resolvconf:

    ls -l /sbin/resolvconf
  4. If the file is missing, reinstall the package:

    sudo apt-get install --reinstall resolvconf

Now, let's move on to the steps for installing and configuring BIND.

3.1 Install BIND

Install the necessary packages:

sudo apt update
sudo apt install bind9 bind9utils bind9-doc

Start and enable BIND:

sudo systemctl start bind9
sudo systemctl enable bind9

3.2 Understanding BIND Configuration Files

File PathDescription
/etc/bind/named.confThe main configuration file that includes references to other files.
/etc/bind/named.conf.localContains custom zone configurations for your domains.
/etc/bind/named.conf.optionsControls global options like forwarders, DNS recursion, and query behavior.

3.3 Initial Setup: Edit /etc/bind/named.conf

  1. Open the main configuration file:

    sudo vim /etc/bind/named.conf
  2. Verify that it includes references to key configuration files:

    include "/etc/bind/named.conf.options";
    include "/etc/bind/named.conf.local";
    include "/etc/bind/named.conf.default-zones";

3.4 Configure /etc/bind/named.conf.options

  1. Open the file for editing:

    sudo vim /etc/bind/named.conf.options
  2. Add the forwarders section to ensure DNS queries are forwarded to public resolvers:

    forwarders {
        8.8.8.8; # Google’s Public DNS
        8.8.4.4; # Google’s Public DNS
    };

3.5 Configure a Zone in /etc/bind/named.conf.local

  1. Open the zone configuration file:

    sudo vim /etc/bind/named.conf.local
  2. Add a zone entry for your domain:

    zone "example.com" {
        type master;
        file "/etc/bind/zones/www.example.com";
    };

3.6 Create the Zone File

  1. Create a directory for zone files if it doesn't already exist:

    sudo mkdir /etc/bind/zones
  2. Create the zone file for your domain:

    sudo vim /etc/bind/zones/www.example.com
  3. Add DNS records:

    $TTL 3600
    @ IN SOA ns1.digitalocean.com. admin.example.com. (
        2024092001 ; Serial number (YYYYMMDDNN)
        7200       ; Refresh interval
        1800       ; Retry interval
        1209600    ; Expiry
        3600 )     ; Minimum TTL
    @ IN NS ns1.digitalocean.com.
    @ IN NS ns2.digitalocean.com.
    @ IN NS ns3.digitalocean.com.
    @ IN A 192.0.2.1
    www IN A 192.0.2.1
    @ IN MX 10 mail.example.com.
    ftp IN CNAME @

3.7 Check Configuration and Restart BIND

  1. Verify that the configuration is correct:

    sudo named-checkconf
  2. Check the syntax of the zone file:

    sudo named-checkzone example.com /etc/bind/zones/www.example.com
  3. Restart BIND to apply the changes:

    sudo systemctl restart bind9

For more details, refer to the BIND documentation (opens in a new tab).


4. Troubleshooting DNS Issues

4.1. Common DNS Issues and Symptoms

Start by identifying common symptoms and the root causes of DNS failures:

1.1 Name Resolution Fails Globally

  • Symptom: No DNS queries resolve on your network.
  • Common Causes:
    • BIND service not running.
    • Firewall blocking port 53 (UDP/TCP).
    • Configuration file errors in named.conf or zone files.
    • Network connectivity issues between DNS clients and servers.

1.2 Name Resolution Fails Locally but Works Remotely

  • Symptom: DNS queries fail only on the server running BIND, but external clients can resolve domain names.
  • Common Causes:
    • Incorrect listen-on configuration in named.conf.options.
    • BIND service may not be configured to listen on the loopback address (127.0.0.1).
    • Local firewall or iptables rules blocking DNS traffic.

1.3 Slow DNS Response Times

  • Symptom: DNS queries take longer than expected to respond.
  • Common Causes:
    • High latency between your DNS server and the forwarders (e.g., Google's 8.8.8.8).
    • Recursive DNS queries taking too long.
    • DNS caching issues or TTL misconfigurations.

1.4 DNS Propagation Delays

  • Symptom: DNS records have been updated, but changes aren't reflected on the internet.
  • Common Causes:
    • DNS caching due to TTL settings.
    • Propagation delays across authoritative servers.
    • ISP caching DNS entries beyond their TTL.

1.5 Zone File Loading Errors

  • Symptom: BIND fails to load a zone file, resulting in errors like "not loaded due to errors."
  • Common Causes:
    • Syntax errors in the zone file.
    • Incorrect SOA or NS records.
    • Misconfigured A, CNAME, or MX records.

4.2. Preliminary Steps for DNS Troubleshooting

Step 1: Verify DNS Service Status

The first step in DNS troubleshooting is confirming whether the BIND service is running properly.

  1. Check BIND status:

    sudo systemctl status bind9

    If the service is inactive or failed, restart it:

    sudo systemctl restart bind9
  2. Review BIND logs: Check for error messages in the system log file:

    sudo tail -f /var/log/syslog

Step 2: Verify Network Connectivity

If DNS is failing for external clients, but not locally, ensure network connectivity and firewall settings are correct.

  1. Ping your DNS server from a client machine:

    ping <DNS-server-IP>
  2. Check port 53 (DNS traffic) is open:

    sudo ufw status
    sudo ufw allow 53/tcp
    sudo ufw allow 53/udp
  3. Check DNS forwarders if external resolution fails:

    • Use dig to check Google’s DNS servers:
      dig @8.8.8.8 example.com

    If this query fails, check your named.conf.options file for misconfigured forwarders.


Step 3: Validate Configuration Files

DNS issues can often arise from syntax errors in configuration files. Use built-in BIND tools to validate configuration files.

  1. Check the main BIND configuration (named.conf):

    sudo named-checkconf

    If any syntax errors appear, correct them before restarting BIND.

  2. Check the zone file:

    sudo named-checkzone example.com /etc/bind/zones/db.example.com
    • Look for common zone file syntax errors:
      • Missing semicolons.
      • Incorrect SOA format.
      • Incorrect or missing records.

    If the tool outputs any error, correct the offending lines in the zone file.


4.3. Detailed Zone File Troubleshooting

Step 4: Zone File Syntax Validation

One of the most common DNS issues is zone file syntax errors, leading to failures when BIND tries to load the file.

  1. Common Zone File Format Issues:

    • TTL (Time to Live): Ensure the TTL value is correctly defined at the top of the zone file:
      $TTL 3600
    • SOA (Start of Authority) Record: Ensure the SOA record is properly formatted:
      @   IN  SOA     ns1.digitalocean.com. admin.example.com. (
                      2024092001 ; Serial
                      7200       ; Refresh
                      1800       ; Retry
                      1209600    ; Expire
                      3600 )     ; Minimum TTL
  2. A Record Issues:

    • Ensure each A record points to a valid IP address:
      @   IN  A   192.0.2.1
      www IN  A   192.0.2.1
  3. MX Record Issues:

    • Ensure the mail server domain is valid and has a corresponding A record:
      @   IN  MX  10 mail.example.com.
  4. CNAME Issues:

    • Ensure no CNAME record conflicts with other records (like A records):
      ftp IN  CNAME @

Step 5: Check Zone File Loading Errors

If BIND fails to load a zone, check /var/log/syslog for specific error messages.

  1. Example Error:

    • Error: dns_rdata_fromtext: /etc/bind/zones/db.example.com:7: near '#': extra input text
    • Cause: There's likely a comment or extra text that isn't properly formatted in the zone file.
  2. Fix the Issue:

    • Go to the offending line number (in this case, line 7), and ensure there’s no trailing text or misplaced comment.

4.4. DNS Query Testing and Debugging

Step 6: Use dig for Query Testing

dig is a powerful tool for testing DNS queries and debugging DNS problems.

  1. Test DNS resolution locally:

    dig @localhost example.com
    • Expected Output: This command should return the A record for example.com. If it doesn’t, check your zone file and restart BIND.
  2. Test DNS resolution from a client machine:

    dig @<DNS-server-IP> example.com
    • Expected Output: If this query fails, ensure port 53 is open and check firewall rules.
  3. Check for propagation delays:

    dig +trace example.com
    • Expected Output: This command should show the path DNS queries take from root servers down to your authoritative server. If the trace stops, it indicates a propagation issue.

4.5. Debugging DNS Forwarding Issues

Step 7: Check Forwarders and Recursion Settings

  1. Check named.conf.options for forwarders:

    forwarders {
        8.8.8.8;    # Google’s Public DNS
        8.8.4.4;
    };
  2. Check recursion settings:

    • If DNS queries are not resolving recursively, ensure the allow-recursion option is set correctly:
      allow-recursion { any; };
  3. Test upstream resolution:

    • Use dig to query an external DNS server:
      dig @8.8.8.8 example.com

4.6 Troubleshooting BIND Errors: Hostname Resolution and Zone File Issues

When encountering errors related to BIND (Berkeley Internet Name Domain), it is essential to address both hostname resolution issues and zone file configurations. Below are common problems and their resolutions.

1. Hostname Resolution Failure

If you receive an error like:

sudo: unable to resolve host your.hostname.com: Temporary failure in name resolution

This indicates that your system cannot resolve its hostname. To fix this:

  • Edit the /etc/hosts File: Open the /etc/hosts file and ensure your hostname is correctly listed. You should have entries similar to:

    127.0.0.1   localhost
    127.0.1.1   your.hostname.com

    Replace your.hostname.com with your actual hostname.

2. Zone File Not Found

If you see an error like:

zone example.com/IN: loading from master file /etc/bind/zones/example.com failed: file not found

This indicates that BIND cannot find the specified zone file. To resolve this issue:

  • Check the Configuration: Open your BIND configuration file (typically located at /etc/bind/named.conf or similar) and verify the zone definition for example.com. Ensure it points to a valid zone file.

  • Locate the Zone File: Confirm that the zone file (e.g., /etc/bind/zones/example.com) exists. If it does not, you may need to create it or correct the path in the configuration.

  • Check File Permissions: Ensure that the BIND user (usually bind or www-data) has read access to the zone file. You can adjust permissions using:

    sudo chown bind:bind /etc/bind/zones/example.com
    sudo chmod 644 /etc/bind/zones/example.com

3. Restarting BIND

After making the necessary changes, restart the BIND service:

sudo systemctl restart bind9

4. Checking Logs for Additional Errors

After restarting, check the BIND logs (usually found in /var/log/syslog or a designated BIND log file) for any additional error messages that may help diagnose further issues.


4.7 Troubleshooting "Temporary Failure in Name Resolution" with BIND DNS

Issue: Temporary failure in name resolution

A "Temporary failure in name resolution" error typically indicates that your DNS server is unable to resolve domain names due to improper configuration or network issues. Follow these steps to resolve the problem:

  1. Check Network Connectivity

    First, verify that the server is connected to the internet by running:

    ping 8.8.8.8

    If this fails, the issue is likely network-related, not DNS.

  2. Verify DNS Server Configuration

    Check your DNS configuration in /etc/resolv.conf:

    cat /etc/resolv.conf

    Ensure it lists valid nameservers like:

    nameserver 8.8.8.8
    nameserver 8.8.4.4

    If incorrect, edit the file:

    sudo vim /etc/resolv.conf

    Add valid public DNS servers (e.g., Google’s):

    nameserver 8.8.8.8
    nameserver 8.8.4.4
  3. Check BIND Service Status

    If you're using BIND, ensure that it’s running correctly:

    sudo systemctl status bind9

    If inactive or failed, restart BIND:

    sudo systemctl restart bind9

    Verify configuration files:

    sudo named-checkconf
    sudo named-checkzone example.com /etc/bind/zones/www.example.com
  4. Ensure DNS Ports Are Open

    Verify that port 53 for DNS is open on your firewall:

    • For UFW:

      sudo ufw allow 53/tcp
      sudo ufw allow 53/udp
    • For iptables:

      sudo iptables -A INPUT -p tcp --dport 53 -j ACCEPT
      sudo iptables -A INPUT -p udp --dport 53 -j ACCEPT
  5. Restart Networking Services

    After making changes, restart networking services to apply updates:

    sudo systemctl restart networking

    Or, if using systemd-resolved:

    sudo systemctl restart systemd-resolved
  6. Check /etc/hosts File

    Ensure that /etc/hosts doesn't contain any incorrect entries:

    sudo vim /etc/hosts

    A valid entry example:

    127.0.0.1   localhost
    127.0.1.1   yourhostname
  7. Disable systemd-resolved (Optional)

    If systemd-resolved interferes with your DNS server:

    1. Stop the service:

      sudo systemctl stop systemd-resolved
      sudo systemctl disable systemd-resolved
    2. Remove the /etc/resolv.conf symlink:

      sudo rm /etc/resolv.conf
    3. Create a new /etc/resolv.conf:

      sudo vim /etc/resolv.conf

      Add the following:

      nameserver 8.8.8.8
      nameserver 8.8.4.4
    4. Restart networking:

      sudo systemctl restart networking
  8. Test DNS Resolution

    Test DNS resolution using dig or nslookup:

    dig example.com

    Or:

    nslookup example.com

    This should return a valid DNS response.

  9. Check for Temporary DNS Server Issues

    If you're still facing issues, the public DNS server (e.g., 8.8.8.8) might be down. Try switching to Cloudflare’s DNS:

    nameserver 1.1.1.1
    nameserver 1.0.0.1

4.8. Troubleshooting Scenario Examples

Scenario 1: DNS Service Fails to Start

  • Symptom: The BIND service fails to start.
  • Troubleshooting:
    1. Check the syntax of all BIND configuration files using named-checkconf.
    2. Review /var/log/syslog for specific error messages related to BIND.
    3. Verify file permissions and ownership for all zone files.

Scenario 2: DNS Records Not Resolving for External Clients

  • Symptom: DNS records work locally but fail for remote clients.
  • Troubleshooting:
    1. Ensure BIND is listening on external interfaces by checking listen-on in named.conf.options.
    2. Check firewall rules to ensure port 53 is open for both TCP and UDP traffic.

Scenario 3: Zone Transfer Failing

  • Symptom: Secondary DNS servers fail to receive zone updates.
  • Troubleshooting:
    1. Verify the allow-transfer option is configured correctly in named.conf.local.
    2. Use dig to test the AXFR zone transfer:
      dig @ns1.example.com example.com axfr

7. Final Troubleshooting Checklist

  • BIND service is running (systemctl status bind9).
  • No configuration file syntax errors (named-checkconf).
  • Valid zone file syntax

(named-checkzone).

  • DNS queries work locally and remotely (dig).
  • Firewall rules allow DNS traffic (port 53, TCP/UDP).
  • Zone transfers configured correctly (allow-transfer).

These steps focuses on identifying, diagnosing, and resolving common DNS issues with a network engineer’s troubleshooting approach.

5. Maintenance with Webmin

Webmin provides a graphical interface to manage BIND and DNS configurations, simplifying zone file management, record creation, and troubleshooting.

5.1 Manage DNS Zones with Webmin

  1. Go to Servers > BIND DNS Server.
  2. Click on Create a new master zone to define a new zone for your domain.
  3. Fill in the details:
    • Domain name: example.com
    • Master server: ns1.digitalocean.com
    • Email: admin@example.com
  4. Click Create.

5.2 Modify DNS Records in Webmin

  1. Select your domain under Existing Zones.
  2. Add or modify A, CNAME, or MX records using the GUI.
  3. Click Apply Changes after editing records.

5.3 Restart BIND from Webmin

To restart BIND through Webmin:

  1. on the top right corner within bind9 click the refresh button or:
  2. Go to System > Bootup and Shutdown.
  3. Scroll down to bind9 and click Restart.

5.4 Trubleshooting:

If BIND (the DNS server) is not showing in Webmin, here are some troubleshooting steps you can take:

  1. Check BIND Installation:

    • Ensure that BIND is installed on your server. You can check this by running:
      sudo systemctl status bind9
    • If it's not installed, you can install it using:
      sudo apt-get install bind9
  2. Webmin Module:

    • Ensure that the BIND DNS Server module is enabled in Webmin. Go to Webmin > Webmin Configuration > Webmin Modules, and check if the BIND DNS module is listed and enabled. If it is not click on refresh modules
    • Now if you go back to servers you should see it there along with any other servers that were added.
  3. Check for Errors:

    • Check the Webmin logs for any errors. You can find the logs in /var/webmin/miniserv.error.
  4. Browser Cache:

    • Clear your browser cache or try accessing Webmin from a different browser to rule out caching issues.
  5. Restart Webmin:

    • Sometimes, simply restarting Webmin can resolve issues:
      sudo systemctl restart webmin
  6. Firewall and Permissions:

    • Ensure that any firewall rules are not blocking access to BIND, and check that the user you're logged in as has the necessary permissions to view BIND settings.
  7. Reinstall Webmin Module:

    • If the BIND module is not functioning properly, you might consider reinstalling it.

6. Useful Documentation


Step 4: Monitoring Nginx/Apache with Webmin

4.1 Monitoring Web Server Logs

Nginx and Apache generate logs that provide critical information for troubleshooting and performance analysis.

To monitor Nginx logs:

sudo tail -f /var/log/nginx/access.log

For Apache logs:

sudo tail -f /var/log/apache2/access.log

In Webmin, you can use the System Logs module to view these logs through the web interface.

4.2 Configuring Alerts and Notifications

Webmin allows you to set up alerts for system events like high CPU usage, low disk space, or web server errors.


Setting Up and Troubleshooting Email on Linux Using SMTP and Postfix (Gmail as an SMTP Provider Example)


1. Understanding SMTP and Postfix

Before diving into the setup, it's essential to understand the components involved:

  • SMTP (Simple Mail Transfer Protocol): A protocol used for sending emails between servers.
  • Postfix: A Mail Transfer Agent (MTA) that routes and delivers email.

Using Postfix as an SMTP client means configuring it to send outgoing mail through an external provider like Gmail. Postfix will handle the routing, while Gmail’s SMTP server will handle the actual delivery of messages.

1.1 Why Use an External SMTP Provider?

Most ISPs and hosting services limit outgoing emails to prevent spam. By configuring Postfix to use an external SMTP provider like Gmail, you can bypass these restrictions, benefit from Gmail’s infrastructure, and ensure better email delivery.

1.2 Security Considerations

When configuring an email system, security should be prioritized. Email can be intercepted if not encrypted, and passwords could be exposed. Always use TLS/SSL encryption for sending emails and secure your SMTP authentication credentials using secure file permissions.


2. Setting Up Postfix as an SMTP Server on Ubuntu

To configure Postfix, you need to first install it and configure it to work with an external SMTP server (e.g., Gmail).

2.1 Update the Package List and Install Postfix

First, ensure your system’s package list is up-to-date and install Postfix with the mailutils package (which includes utilities like the mail command).

sudo apt update
sudo apt install postfix mailutils

2.2 Postfix Installation Prompts

During installation, you will be prompted to choose a mail server configuration. Choose "Internet Site" when asked, and provide your fully qualified domain name (FQDN) in the following format:

example.yourdomain.com

This domain is important for email identification and will be used in the myhostname parameter of the Postfix configuration.

2.3 Verifying Postfix Installation

Once installed, check if Postfix is running properly:

sudo systemctl status postfix

If everything is correct, you should see the service as "active (running)." If it's not running, start it with:

sudo systemctl start postfix

2.4 Configuring Basic Postfix Settings

To begin configuring Postfix, open the main configuration file:

sudo vim /etc/postfix/main.cf

Add or verify the following parameters:

myhostname = mail.yourdomain.com  # The hostname of your server
mydomain = yourdomain.com         # Your domain name
myorigin = $mydomain              # Originating domain for emails
inet_interfaces = all             # Listen on all network interfaces
mydestination = $myhostname, localhost.$mydomain, localhost
relayhost = [smtp.yourprovider.com]:587  # External SMTP provider (e.g., Gmail)
mynetworks = 127.0.0.0/8          # Trusted networks

Replace yourdomain.com with your actual domain and smtp.yourprovider.com with your email provider's SMTP server (e.g., Gmail’s smtp.gmail.com).


3. Enabling Secure Authentication and TLS for Postfix

To secure your connection when sending emails, you need to configure SMTP authentication and enable TLS encryption.

3.1 Set Up SMTP Authentication

Create a file to store your email provider’s SMTP authentication details:

sudo vim /etc/postfix/sasl_passwd

Add the following line, replacing with your SMTP server, email, and password:

[smtp.yourprovider.com]:587 your_email@yourprovider.com:your_password

For Gmail, it would look like this:

[smtp.gmail.com]:587 your_email@gmail.com:your_app_password

Note: If you're using Gmail with two-factor authentication, you will need to generate an app password from your Google Account's security settings. This is a more secure way of allowing Postfix to send emails without exposing your main account password.

3.2 Secure the Credentials File

Ensure the file is readable only by root to protect your credentials:

sudo chmod 600 /etc/postfix/sasl_passwd

3.3 Create a Hash Map for Postfix

Run the postmap command to create a hashed version of the authentication file:

sudo postmap /etc/postfix/sasl_passwd

3.4 Configure TLS Encryption

To ensure all emails are sent securely, configure TLS encryption in Postfix:

sudo vim /etc/postfix/main.cf

Add the following parameters to enable encryption:

smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_security_options = noanonymous
smtp_tls_security_level = encrypt
smtp_tls_CAfile = /etc/ssl/certs/ca-certificates.crt

This configuration ensures that Postfix will use STARTTLS to encrypt communications with the SMTP server.

3.5 Reload Postfix to Apply Changes

After editing the configuration, reload Postfix:

sudo systemctl reload postfix

4. Using Gmail as an SMTP Provider with Postfix

Gmail is a popular choice for relaying mail via an external SMTP server. Here's how to configure Postfix with Gmail.

4.1 Enabling App Passwords or Less Secure Apps

If you have two-factor authentication (2FA) enabled on your Gmail account, you'll need to generate an App Password for Postfix:

  1. Go to your Google Account (opens in a new tab).
  2. Navigate to Security > App Passwords.
  3. Generate an app password for "Mail" and "Other device."

Alternatively, if you're not using 2FA, you may enable Less Secure Apps in your Google account by visiting this link (opens in a new tab).

4.2 Configure Gmail as the Relayhost

Edit the Postfix configuration file:

sudo vim /etc/postfix/main.cf

Add or modify the following lines:

relayhost = [smtp.gmail.com]:587
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_security_options = noanonymous
smtp_tls_security_level = encrypt
smtp_tls_CAfile = /etc/ssl/certs/ca-certificates.crt

This configuration ensures that emails are relayed through Gmail's SMTP server using secure encryption.


5. Testing and Troubleshooting Postfix

Once Postfix is configured, you should test and verify that everything is working as expected.

5.1 Send a Test Email

Send a test email from the command line to verify Postfix is working correctly:

echo "Test email body" | mail -s "Test Subject" recipient@example.com

Replace recipient@example.com with a valid email address. Check if the email is delivered and inspect /var/log/mail.log for any issues.

5.2 Check Postfix Service Status

Ensure Postfix is running:

sudo systemctl status postfix

Restart the service if needed:

sudo systemctl restart postfix

5.3 Examine Postfix Logs

Logs are crucial for troubleshooting. To monitor real-time logs, run:

sudo tail -f /var/log/mail.log

Look for errors related to authentication, connection, or email delivery.

5.4 Check DNS and MX Records

Ensure your domain has correct DNS and MX records configured. You can use the following command to check MX records:

dig yourdomain.com MX

For more advanced diagnostics, you can use external tools like MXToolbox (opens in a new tab).

5.5 Validate Postfix Configuration

To validate your Postfix configuration and detect any syntax errors:

postfix check

6. Advanced Troubleshooting

If your setup isn't working as expected, here are some advanced troubleshooting tips:

6.1 Common Authentication Issues

  • Invalid Credentials: Double-check your /etc/postfix/sasl_passwd file for any typos in the email or password.
  • Incorrect Permissions: Ensure the sasl_passwd file has the correct permissions (chmod 600).

6.2 Connection Issues

  • Firewall Blocking Ports: Ensure your firewall allows outbound traffic on port 587 (or 465 for SSL). Use ufw to allow the port:

    sudo ufw allow 587/tcp
    sudo ufw reload
  • **

SMTP Provider Blocking Requests**: Some email providers may block high volumes of emails from external servers. Check their rate limits and policies.

6.3 DNS and MX Configuration Errors

  • Incorrect MX Records: If your domain's MX records aren't properly set up, email routing could fail. Use dig or tools like MXToolbox to verify your DNS configuration.
  • SPF and DKIM Misconfiguration: Ensure you have proper SPF (Sender Policy Framework) and DKIM (DomainKeys Identified Mail) records to avoid emails being marked as spam.

7. Best Practices for Postfix Setup

  • TLS Encryption: Always enforce TLS for outgoing emails to ensure secure transmission.
  • Secure Authentication: Use secure passwords or app-specific passwords for SMTP authentication. Avoid using your main account password for Postfix.
  • Monitoring Logs: Regularly monitor the /var/log/mail.log for early detection of issues like authentication failures or email bounces.
  • Backup Configuration: Backup your /etc/postfix/main.cf and /etc/postfix/sasl_passwd configuration files regularly.

8. Advanced Configuration Options

8.1 Rate Limiting

If you're sending a high volume of emails and want to avoid being blocked, configure Postfix rate limiting in /etc/postfix/main.cf:

default_process_limit = 10
smtpd_client_connection_count_limit = 10

8.2 Postfix Aliases for System Notifications

To receive system alerts via email, configure Postfix aliases. Edit the aliases file:

sudo vim /etc/aliases

Add an alias to forward root’s email to your email address:

root: your_email@yourdomain.com

Apply changes:

sudo newaliases

Comprehensive Prometheus Setup Guide for Ubuntu

Prometheus is a robust open-source monitoring system designed to collect, store, and alert on time-series data. It is widely used for monitoring network infrastructure, security services, web servers (like Nginx and Apache), databases, and more.


1. Prerequisites

Before proceeding with the installation, ensure that:

  • You have sudo privileges on your server.
  • The firewall allows access to port 9090, which Prometheus uses.

2. Update System Packages

Keeping your system updated ensures that the latest security patches and software versions are in place.

sudo apt update
sudo apt upgrade -y

3. Create a Dedicated Prometheus User

For security, Prometheus should not run as the root user. A dedicated user ensures minimal permissions:

sudo useradd --no-create-home --shell /bin/false prometheus

4. Create Necessary Directories for Prometheus

Prometheus stores its configuration and data in separate directories. Create these directories and set permissions accordingly:

sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus

Download and Install Prometheus:

wget https://github.com/prometheus/prometheus/releases/download/v2.54.1/prometheus-2.54.1.linux-amd64.tar.gz
tar -xvf prometheus-2.54.1.linux-amd64.tar.gz
sudo mv prometheus-2.54.1.linux-amd64/prometheus /usr/local/bin/
sudo mv prometheus-2.54.1.linux-amd64/promtool /usr/local/bin/
sudo mv prometheus-2.54.1.linux-amd64/consoles /etc/prometheus
sudo mv prometheus-2.54.1.linux-amd64/console_libraries /etc/prometheus

---

### **6. Move Prometheus Binaries and Configuration**

After extraction, move the Prometheus binaries and configuration files to their respective directories:
```bash copy
sudo cp prometheus /usr/local/bin/
sudo cp promtool /usr/local/bin/
sudo cp -r consoles /etc/prometheus/
sudo cp -r console_libraries /etc/prometheus/
sudo cp prometheus.yml /etc/prometheus/

7. Set Permissions

Ensure the Prometheus user owns the necessary files:

sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool

8. Configure Prometheus as a Systemd Service

Create a systemd service to manage Prometheus. This enables easy startup, management, and ensures that Prometheus runs automatically after system reboots.

sudo vim /etc/systemd/system/prometheus.service

Add the following content:

[Unit]
Description=Prometheus Monitoring
Wants=network-online.target
After=network-online.target
 
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus/ \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries
 
[Install]
WantedBy=multi-user.target

9. Reload Systemd and Start Prometheus

To apply the new service configuration and start Prometheus:

sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus

10. Verify Prometheus Service Status

Check if Prometheus is running without issues:

sudo systemctl status prometheus

If Prometheus is running, the service status should show "active (running)."


11. Access the Prometheus Web Interface

Prometheus serves a web interface on port 9090. You can access it using your server’s IP address:

http://<your-server-ip>:9090

12. Troubleshooting Prometheus

Prometheus offers comprehensive logging, and system errors can usually be pinpointed by inspecting logs or configuration files.

12.1 Check Logs for Errors

View logs in real-time to capture any issues:

sudo journalctl -u prometheus.service -f

12.2 Verify Configuration File Syntax

The promtool utility can check Prometheus’s configuration file syntax:

promtool check config /etc/prometheus/prometheus.yml

12.3 Common Issues and Solutions

  • Prometheus Fails to Start:

    • Ensure the prometheus.yml file has valid syntax and contains the correct scrape configurations.
    • Double-check the file permissions (prometheus:prometheus ownership).
    • Check if port 9090 is in use by another service:
      sudo ss -tuln | grep 9090
  • Metrics Not Scraped:

    • Confirm that Prometheus can reach its targets by visiting the Status > Targets page in the web interface.
    • Verify network connectivity and firewall rules.
  • Prometheus Crashes Frequently:

    • Inspect system logs with dmesg for potential memory or CPU-related issues.
    • Monitor resource usage with tools like htop or vmstat.

13. Managing Prometheus with Systemd

Prometheus is now managed by systemd, which makes it easy to control its state:

Start Prometheus

sudo systemctl start prometheus

Stop Prometheus

sudo systemctl stop prometheus

Restart Prometheus

sudo systemctl restart prometheus

Enable Prometheus to Start on Boot

sudo systemctl enable prometheus

Disable Prometheus on Boot

sudo systemctl disable prometheus

Check Prometheus Status

sudo systemctl status prometheus

Tail Prometheus Logs

sudo journalctl -u prometheus.service -f

14. Best Practices for Prometheus Setup

  • Limit resource usage: Adjust Prometheus’s retention and scrape interval settings to avoid resource overload, especially on smaller servers.
  • Enable alerting: Use Prometheus alongside Alertmanager to notify you of issues in your infrastructure.
  • Monitor Prometheus itself: Use a self-monitoring setup where Prometheus scrapes its own metrics, providing insights into its performance and uptime.
  • Security considerations: Use a reverse proxy like NGINX to add TLS encryption and basic authentication for the Prometheus web interface.


Monitoring Networking, Security, NGINX/Apache Using Prometheus

Prometheus can monitor multiple services by integrating exporters, which are programs that expose metrics in a Prometheus-readable format. Here's how to monitor some key components.

Networking (Node Exporter)

Node Exporter collects system and hardware-level metrics (CPU, memory, disk, network). It is essential for monitoring networking performance.

  1. Install Node Exporter:

    wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
    tar -xvf node_exporter-1.8.2.linux-amd64.tar.gz
    sudo cp node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/
  2. Create a Systemd Service for Node Exporter:

    sudo vim /etc/systemd/system/node_exporter.service

    Add the following configuration:

    [Unit]
    Description=Node Exporter
    After=network.target
     
    [Service]
    User=node_exporter
    ExecStart=/usr/local/bin/node_exporter
     
    [Install]
    WantedBy=multi-user.target
  3. Start and Enable Node Exporter:

    sudo systemctl daemon-reload
    sudo systemctl start node_exporter
    sudo systemctl enable node_exporter
  4. Add Node Exporter to Prometheus Scrape Targets: Edit /etc/prometheus/prometheus.yml:

    scrape_configs:
      - job_name: 'node'
        static_configs:
          - targets: ['localhost:9100']

Monitoring Security (Auditd Exporter)

To monitor security events, you can use Auditd Exporter for Prometheus.

  1. Install Auditd Exporter:

    wget https://github.com/ClusterLabs/auditd-prometheus/releases/download/v0.1.0/auditd-exporter-0.1.0.linux-amd64.tar.gz
    tar xvfz auditd-exporter-0.1.0.linux-amd64.tar.gz
    sudo cp auditd-exporter /usr/local/bin/
  2. Add Auditd Exporter to Prometheus Scrape Targets: Edit /etc/prometheus/prometheus.yml:

    scrape_configs:
      - job_name: 'auditd'
        static_configs:
          - targets: ['localhost:1234']

Monitoring NGINX/Apache (Exporter for NGINX/Apache)

  1. Install the NGINX Exporter:
     

wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v1.3.0/nginx-prometheus-exporter_1.3.0_linux_amd64.tar.gz (opens in a new tab) tar -xvf nginx-prometheus-exporter_1.3.0_linux_amd64.tar.gz sudo mv nginx-prometheus-exporter_1.3.0_linux-amd64/nginx-prometheus-exporter /usr/local/bin/


2. **Configure NGINX with Status Module:**
Add the following to your NGINX configuration:
```nginx
server {
    location /status {
        stub_status on;
        allow 127.0.0.1;
        deny all;
    }
}
  1. Add NGINX Exporter to Prometheus Scrape Targets: Edit /etc/prometheus/prometheus.yml:
    scrape_configs:
      - job_name: 'nginx'
        static_configs:
          - targets: ['localhost:9113']

Best Practices for Prometheus

  • Secure Prometheus: Use firewall rules to restrict access to the Prometheus web interface to trusted IPs.
  • Retention Policy: Set appropriate data retention periods in Prometheus based on disk availability to prevent data overloading.
  • Alerting: Configure Alertmanager for Prometheus to send alerts based on certain thresholds (CPU usage, HTTP request errors, etc.).

Extensive Troubleshooting Section

  • High Memory/CPU Usage by Prometheus:
    • Use prometheus_tsdb_wal_truncate_duration_seconds metric to monitor write-ahead log (WAL) operations.

Optimize the storage.tsdb.retention.time setting to lower the retention period of data to reduce resource usage.

  • Unable to Reach Prometheus Web Interface:

    • Check if port 9090 is open in the firewall using sudo ufw status.
    • Verify that Prometheus is running and bound to port 9090 using sudo ss -tuln | grep 9090.
  • Prometheus Cannot Scrape Targets:

    • Use curl to check if Prometheus can reach the exporter endpoints: curl http://localhost:9100/metrics.
    • Ensure that exporters are running correctly using sudo systemctl status <exporter>.service.

Monitoring Networking, Security, Web Servers (Nginx/Apache) Using Node Exporter


1. Install Node Exporter for System Metrics

Node Exporter is used to collect Linux system metrics such as CPU, memory, disk, and networking stats. It serves as an exporter for Prometheus and is essential for monitoring a system’s health and performance.

Step 1: Download Node Exporter

Begin by downloading the Node Exporter binary.

cd /tmp
curl -LO https://github.com/prometheus/node_exporter/releases/download/v1.6.0/node_exporter-1.6.0.linux-amd64.tar.gz

Step 2: Extract the Downloaded Archive

Once downloaded, extract the archive.

tar -xvf node_exporter-1.6.0.linux-amd64.tar.gz

Step 3: Move the Node Exporter Binary

Move the node_exporter binary to /usr/local/bin/ so it can be executed globally.

sudo mv node_exporter-1.6.0.linux-amd64/node_exporter /usr/local/bin/

Step 4: Create a Systemd Service for Node Exporter

Create a systemd service file to manage Node Exporter as a service.

sudo vim /etc/systemd/system/node_exporter.service

Add the following content to the service file:

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
 
[Service]
User=nodeusr
ExecStart=/usr/local/bin/node_exporter
Restart=on-failure
 
[Install]
WantedBy=multi-user.target

Step 5: Reload systemd and Start Node Exporter

Reload the systemd manager and enable the Node Exporter service to start automatically on boot.

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

Step 6: Verify Node Exporter Installation

To verify that Node Exporter is running and accessible on port 9100, use the following command:

curl http://localhost:9100/metrics

This will output various system metrics being monitored by Node Exporter.


2. Useful Commands for Node Exporter Management

Here are some useful commands to manage and troubleshoot Node Exporter:

Start Node Exporter Service

sudo systemctl start node_exporter

Stop Node Exporter Service

sudo systemctl stop node_exporter

Restart Node Exporter Service

sudo systemctl restart node_exporter

Check Node Exporter Status

sudo systemctl status node_exporter

View Node Exporter Logs

journalctl -u node_exporter

Ensure Node Exporter is Running on Boot

sudo systemctl enable node_exporter

3. Best Practices for Monitoring

  • Security: Make sure to firewall or restrict access to the Node Exporter port (9100). Use iptables, UFW, or cloud firewall rules to restrict access from your monitoring server only.

    sudo ufw allow from <Monitoring_Server_IP> to any port 9100
  • User Management: Run Node Exporter under a non-root user for security. For example, create a nodeusr as shown in the systemd service file.

    sudo useradd -rs /bin/false nodeusr
  • Resource Monitoring: Ensure that your Prometheus scrape configuration is set up to collect data from Node Exporter at the appropriate interval to avoid overloading the system.

  • Web Servers: For Nginx and Apache, monitor web server metrics by integrating additional exporters like nginx-vts-exporter or apache-exporter. This allows you to collect specific stats related to request rates, traffic, and errors.


4. Node Exporter and Prometheus Configuration

You’ll need to configure Prometheus to scrape metrics from Node Exporter. Here's an example Prometheus configuration for scraping Node Exporter:

scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']

Make sure to modify the targets field if your Node Exporter is running on a different server.


5. Troubleshooting Node Exporter Issues

Issue: Node Exporter Service Fails to Start

If Node Exporter fails to start, check the logs for errors:

journalctl -u node_exporter

Common issues include incorrect permissions or the binary not being found. Verify the file path in /usr/local/bin/node_exporter.

Issue: Cannot Access Metrics on Port 9100

Check if the port is open using:

sudo netstat -tuln | grep 9100

Ensure that firewalls (UFW, iptables) are not blocking the connection.

Issue: Prometheus Cannot Scrape Node Exporter

If Prometheus cannot scrape metrics from Node Exporter, verify that the Node Exporter instance is running by curling the metrics URL:

curl http://<node_exporter_host>:9100/metrics

Double-check Prometheus's scrape configuration in /etc/prometheus/prometheus.yml to ensure the correct host and port are being used.

Issue: Metrics Not Appearing in Prometheus

If Node Exporter metrics are not appearing in Prometheus, check the Prometheus logs for scrape errors.

sudo journalctl -u prometheus

Ensure that Node Exporter is listed under the targets tab in the Prometheus web UI (http://<prometheus_host>:9090/targets).


6. Conclusion

By following this guide, you will have a working Node Exporter setup monitoring your system metrics. This tool is especially useful for monitoring servers that are running Nginx, Apache, or other services that require performance tracking. It’s also a critical tool in a full-fledged security and networking monitoring solution. Make sure to secure the Node Exporter endpoint and continuously review metrics for potential performance issues.

Alertmanager Setup


Install Alertmanager

  1. Download Alertmanager

    Visit the Alertmanager releases page on GitHub (opens in a new tab) to find the latest version. Use wget or curl to download the appropriate tarball. For example, to download version 0.24.0:

     wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
  2. Extract the Tarball

    Extract the downloaded file:

     tar -xvf alertmanager-0.27.0.linux-amd64.tar.gz
  3. Move Binaries to /usr/local/bin

    Move the Alertmanager binaries to a directory in your PATH:

     sudo mv alertmanager-0.27.0.linux-amd64/alertmanager /usr/local/bin/
    sudo mv alertmanager-0.27.0.linux-amd64/amtool /usr/local/bin/
  4. Create a Configuration Directory

    Create a directory for Alertmanager’s configuration files:

    sudo mkdir /etc/alertmanager
  5. Create a Basic Configuration File

    Create and edit the alertmanager.yml file:

    sudo vim /etc/alertmanager/alertmanager.yml

    Add a basic configuration:

    global:
      resolve_timeout: 5m
     
    route:
      receiver: 'default-receiver'
     
    receivers:
      - name: 'default-receiver'
        webhook_configs:
          - url: 'http://localhost:9093/'

    Save and exit the editor.


Create a Systemd Service for Alertmanager

  1. Create a Service File

    Create a Systemd service file for Alertmanager:

    sudo vim /etc/systemd/system/alertmanager.service

    Add the following content:

    [Unit]
    Description=Alertmanager
    Documentation=https://prometheus.io/docs/alerting/latest/alertmanager/
    After=network-online.target
     
    [Service]
    User=alertmanager
    Group=alertmanager
    ExecStart=/usr/local/bin/alertmanager \
      --config.file=/etc/alertmanager/alertmanager.yml \
      --storage.tsdb.path=/var/lib/alertmanager
    Restart=on-failure
     
    [Install]
    WantedBy=multi-user.target

    Save and exit the editor.

  2. Create a System User for Alertmanager

    Create a user for running Alertmanager:

    sudo useradd -r -s /sbin/nologin alertmanager
  3. Set Permissions

    Change ownership of the configuration directory:

    sudo chown -R alertmanager:alertmanager /etc/alertmanager

    Create a directory for data storage and set permissions:

    sudo mkdir /var/lib/alertmanager
    sudo chown alertmanager:alertmanager /var/lib/alertmanager
  4. Enable and Start the Service

    Enable and start the Alertmanager service:

    sudo systemctl daemon-reload
    sudo systemctl enable alertmanager
    sudo systemctl start alertmanager
  5. Verify the Service

    Check the status of the Alertmanager service:

    sudo systemctl status alertmanager

    Check the logs if needed:

    sudo journalctl -u alertmanager

Troubleshooting Alertmanager

  1. Service Not Starting

    If Alertmanager fails to start, check the service status and logs for errors:

    sudo systemctl status alertmanager
    sudo journalctl -u alertmanager

    Common issues include incorrect file paths, permissions issues, or configuration errors. Verify that all paths in the service file and configuration file are correct.

  2. Configuration Issues

    Verify that your alertmanager.yml configuration file is properly formatted. Use yaml linters or online validators to check for syntax errors.

  3. Port Conflicts

    Ensure that no other service is using the ports required by Alertmanager. By default, Alertmanager listens on port 9093. Check for port conflicts using:

    sudo netstat -tuln | grep 9093
  4. Permission Errors

    Make sure the Alertmanager binary and configuration files have the correct ownership and permissions. For example, ensure the alertmanager user has access to the configuration files and storage directory.

  5. Network Issues

    Verify network connectivity if Alertmanager is not receiving alerts or if it cannot communicate with other services. Use curl or telnet to test connectivity to relevant ports:

    curl http://localhost:9093
    telnet localhost 9093

By following these steps, you should be able to set up Alertmanager successfully and troubleshoot common issues that may arise.

Grafana Setup: Installation and Configuration Guide

Overview

Grafana is an open-source platform for monitoring and observability, allowing you to visualize and analyze data from various sources. This guide will walk you through installing Grafana, configuring it, and troubleshooting common issues.

Installing Grafana

Prerequisites

  • A server running Ubuntu 20.04 or later.
  • Root or sudo privileges on the server.

Steps

  1. Update Your System

    sudo apt update
    sudo apt upgrade -y
  2. Install Grafana

    • Add Grafana APT repository
      sudo apt install -y software-properties-common
      sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
    • Add the GPG key
      wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
    • Install Grafana
      sudo apt update
      sudo apt install -y grafana
  3. Start and Enable Grafana Service

    • Start the Grafana service
      sudo systemctl start grafana-server
    • Enable Grafana to start on boot
      sudo systemctl enable grafana-server
  4. Verify Installation

    • Check Grafana status
      sudo systemctl status grafana-server
    • Access Grafana UI: Open a web browser and go to http://<your-server-ip>:3000. The default login is admin/admin.

Configuring Grafana

Adding a Data Source

  1. Log into Grafana

    • Navigate to http://<your-server-ip>:3000 and log in with the default credentials.
  2. Add a Data Source

    • Go to Configuration (gear icon) -> Data Sources -> Add data source.
    • Select the type of data source you want to add (e.g., Prometheus).
    • Enter the necessary configuration details for the data source:
      • URL: The URL of the data source (e.g., http://localhost:9090 for Prometheus).
      • Access: Choose either "Server" or "Browser" depending on where Grafana is running relative to the data source.
    • Click Save & Test to ensure the data source is connected successfully.

Creating a Dashboard

  1. Create a New Dashboard

    • Click the + icon in the sidebar and select Dashboard.
    • Click Add new panel to start adding panels to your dashboard.
  2. Configure Panels

    • Select Data Source: Choose the data source you added earlier.
    • Set Up Queries: Enter the queries to fetch data.
    • Choose Visualization: Select the type of visualization (e.g., graph, table).
    • Customize: Adjust settings, labels, and thresholds as needed.
  3. Save the Dashboard

    • Click Save (disk icon) at the top of the page.
    • Enter a name and choose a folder to save your dashboard.

Troubleshooting

Grafana Service Issues

  1. Check Service Status

    sudo systemctl status grafana-server
  2. Restart Grafana Service

    sudo systemctl restart grafana-server
  3. Review Logs

    • View logs for troubleshooting:
      sudo journalctl -u grafana-server

Access Issues

  1. Verify Firewall Settings

    • Ensure port 3000 is open.
      sudo ufw allow 3000/tcp
      sudo ufw reload
  2. Check Network Configuration

    • Ensure Grafana is accessible from the network.
    • Verify the server’s IP address and hostname.

Data Source Problems

  1. Test Data Source Connection

    • Go to Configuration -> Data Sources.
    • Click Save & Test for the data source.
  2. Check Data Source URL and Credentials

    • Ensure the URL and credentials are correctly configured.
  3. Examine Data Source Logs

    • Check logs for the data source service (e.g., Prometheus) for connection errors.

Dashboard Display Issues

  1. Check Panel Queries

    • Ensure the queries are correct and return data.
    • Test queries in the data source directly to verify.
  2. Inspect Panel Configuration

    • Ensure panel settings match your data and visualization requirements.

Best Practices

  1. Secure Your Grafana Installation

    • Change default admin password immediately after installation.
    • Set up HTTPS for secure access to Grafana.
  2. Regular Backups

    • Backup your Grafana database and configuration regularly.
  3. Update Regularly

    • Keep Grafana and plugins updated to the latest versions for security and new features.
  4. Monitor Grafana Logs

    • Regularly check Grafana logs to catch and resolve potential issues early.

By following this guide, you should have a functional Grafana setup tailored to your monitoring and visualization needs. If you encounter specific issues or need further customization, consult the Grafana documentation or seek support from the Grafana community.


Connecting Grafana to Prometheus:

Overview

Grafana can be used to visualize metrics collected by Prometheus, a powerful monitoring and alerting toolkit. This guide will provide a detailed walkthrough on how to connect Grafana to Prometheus, create dashboards, and import pre-built dashboards.

Add Prometheus as a Data Source

Prerequisites

  • Grafana and Prometheus should be installed and running.
  • You should have Grafana and Prometheus up and running on your server.

Steps

  1. Access Grafana

    • Open your web browser and navigate to http://<your-grafana-server-ip>:3000.
    • Log in with your Grafana credentials (default is admin/admin).
  2. Add Prometheus Data Source

    • Click on the Configuration (gear icon) in the left sidebar.
    • Select Data Sources.
    • Click Add data source.
  3. Configure Prometheus Data Source

    • Select Prometheus: In the data source options, choose Prometheus.
    • Set URL: Enter the URL where Prometheus is running, typically http://localhost:9090 if it's on the same server.
    • Access Method: Choose Server if Grafana and Prometheus are on the same server; choose Browser if Grafana is accessing Prometheus remotely.
    • Optional Settings:
      • HTTP Method: Generally, the default (GET) works fine.
      • Scrape Interval: Configure if needed based on your Prometheus scrape interval.
  4. Save and Test Connection

    • Click Save & Test to verify that Grafana can connect to Prometheus.
    • Ensure you see a message indicating the connection is successful.

Create Dashboards

Steps

  1. Create a New Dashboard

    • Click on the + icon in the left sidebar and select Dashboard.
    • Click Add new panel to start configuring a panel.
  2. Configure Panels

    • Select Prometheus Data Source: In the panel editor, select Prometheus as the data source.
    • Build Queries:
      • Use Prometheus query language (PromQL) to fetch metrics. For example, to visualize CPU usage:
        rate(node_cpu_seconds_total[5m])
    • Choose Visualization Type: Select the type of visualization such as graph, gauge, or table.
    • Customize Panel:
      • Panel Title: Enter a meaningful title for the panel.
      • Axes: Configure X and Y axes as needed.
      • Thresholds: Set thresholds to highlight specific data ranges.
  3. Save the Dashboard

    • Click Save (disk icon) at the top.
    • Enter a name for the dashboard and select a folder for organization.

Import Pre-built Dashboards

Steps

  1. Find Pre-built Dashboards

  2. Import Dashboard into Grafana

    • Click on the + icon in the left sidebar and select Import.
    • Upload JSON File:
      • You can either upload a JSON file directly or paste the JSON content into the text area.
    • Enter Dashboard ID:
      • If you have a dashboard ID from Grafana’s repository, you can enter it here and click Load.
    • Select Data Source:
      • Ensure the Prometheus data source is selected or configured if it is not automatically detected.
    • Click Import to add the dashboard to Grafana.

Troubleshooting

Connection Issues

  1. Verify Prometheus URL

    • Ensure that the URL you configured in Grafana matches the Prometheus server URL and port.
  2. Check Prometheus Status

    • Ensure Prometheus is running and accessible.
      curl http://localhost:9090/metrics
  3. Firewall and Network Configuration

    • Make sure there are no firewall rules blocking access to Prometheus or Grafana.

Dashboard Issues

  1. Verify PromQL Queries

    • Ensure your PromQL queries are correct. Test queries directly in Prometheus’ web UI.
  2. Check Panel Settings

    • Make sure the panel configurations match the data you are querying. Adjust visualization settings if the data does not appear correctly.
  3. Inspect Dashboard JSON

    • If importing a dashboard fails, check the JSON file for errors or incompatibilities with your Grafana version.

Data Source Configuration

  1. Review Prometheus Data Source Settings

    • Ensure all required fields are correctly filled out.
    • Check for any error messages in Grafana’s data source settings page.
  2. Check Grafana Logs

    • Review Grafana logs for any errors related to data source connections.
      sudo journalctl -u grafana-server
  3. Update Grafana

    • Ensure you are using the latest version of Grafana to avoid issues related to outdated features or bugs.

Best Practices

  1. Secure Access

    • Configure user roles and permissions in Grafana to control access to sensitive data.
  2. Optimize Queries

    • Write efficient PromQL queries to reduce load on Prometheus and improve dashboard performance.
  3. Regular Backups

    • Backup your Grafana dashboards and configurations regularly to prevent data loss.
  4. Keep Updated

    • Regularly update both Grafana and Prometheus to the latest versions for new features and security patches.
  5. Document Dashboards

    • Provide clear documentation for custom dashboards, including the purpose and key metrics.

By following this guide, you should be able to effectively connect Grafana to Prometheus, create and manage dashboards, and troubleshoot common issues.

Prometheus and Alertmanager Integration:

Integrating Alertmanager with Prometheus

Step 1: Configure Prometheus to Use Alertmanager

  1. Edit Prometheus Configuration File
    Update the prometheus.yml configuration file to add Alertmanager details.

    alerting:
      alertmanagers:
        - static_configs:
            - targets:
              - 'localhost:9093'  # Adjust if Alertmanager is on a different host or port
     
  2. Reload Prometheus Configuration
    Reload Prometheus to apply the new configuration.

    curl -X POST http://localhost:9090/-/reload

Troubleshooting

Common Issues and Solutions

  1. Alertmanager Not Starting
    Check Logs:

    sudo journalctl -u alertmanager

    Solution:

    • Ensure the configuration file has valid syntax.
    • Verify that the alertmanager user has appropriate permissions.
  2. Alerts Not Sent to Email
    Check Email Configuration:

    grep email_configs /etc/alertmanager/config.yml

    Solution:

    • Confirm that SMTP settings are correct.
    • Check network connectivity to the SMTP server.
  3. Prometheus Not Receiving Alerts
    Check Prometheus Logs:

    sudo journalctl -u prometheus

    Solution:

    • Verify that Prometheus is correctly configured to communicate with Alertmanager.
    • Check for any network issues between Prometheus and Alertmanager.
  4. Alertmanager Not Receiving Alerts
    Verify Configuration:

    curl http://localhost:9093/api/v2/status

    Solution:

    • Ensure that the alerting rules in Prometheus are firing correctly.
    • Confirm that Alertmanager is correctly listed in Prometheus’s configuration.

Best Practices

  1. Secure Communication
    Use HTTPS for secure communication between Prometheus and Alertmanager.

  2. Backup Configuration
    Regularly back up your Alertmanager configuration and data.

  3. Monitoring
    Set up monitoring for Alertmanager to ensure it is running as expected.

  4. Alert Testing
    Regularly test your alerting setup to ensure alerts are delivered correctly.


This guide should help you integrate Alertmanager with Prometheus effectively. For more details, consult the Prometheus documentation (opens in a new tab) and the Alertmanager documentation (opens in a new tab).


Monitoring Ubuntu Services with Prometheus and Grafana


Monitoring Specific Ubuntu Services


Nginx Monitoring

1. Install the Nginx Exporter

  • Command:
    sudo apt-get install -y prometheus-nginx-exporter
  • Explanation: Installs the Nginx exporter, which exposes Nginx metrics in a format compatible with Prometheus.

2. Configure the Nginx Exporter

  • Command:
    sudo vim /etc/default/prometheus-nginx-exporter
  • Add the following configuration:
    WEB_SERVER_HOST=localhost
    WEB_SERVER_PORT=80
  • Explanation: Sets the host and port where Nginx metrics are exposed.

3. Restart and Enable the Exporter Service

  • Commands:
    sudo systemctl restart prometheus-nginx-exporter
    sudo systemctl enable prometheus-nginx-exporter
  • Explanation: Applies configuration changes and ensures the exporter starts on boot.

4. Add the Exporter to Prometheus

  • Configuration:
    scrape_configs:
      - job_name: 'nginx'
        static_configs:
          - targets: ['localhost:9113']
  • Explanation: Prometheus will scrape Nginx metrics from the exporter.

5. Troubleshooting

  • Check Exporter Status:
    sudo systemctl status prometheus-nginx-exporter
  • Verify Metrics:
    curl http://localhost:9113/metrics

MySQL Monitoring

1. Install the MySQL Exporter

  • Command:
    sudo apt-get install -y prometheus-mysql-exporter
  • Explanation: Installs the MySQL exporter, which exposes MySQL metrics for Prometheus.

2. Configure MySQL for Exporting Metrics

  • Command:
    sudo vim /etc/mysql/my.cnf
  • Add under [mysqld]:
    [mysqld]
    performance_schema=ON
  • Explanation: Enables the performance schema for detailed metrics collection.

3. Restart MySQL

  • Command:
    sudo systemctl restart mysql
  • Explanation: Applies the configuration changes to MySQL.

4. Start and Enable the MySQL Exporter

  • Commands:
    sudo systemctl start prometheus-mysql-exporter
    sudo systemctl enable prometheus-mysql-exporter
  • Explanation: Starts the exporter and ensures it will start on system boot.

5. Add the Exporter to Prometheus

  • Configuration:
    scrape_configs:
      - job_name: 'mysql'
        static_configs:
          - targets: ['localhost:9104']
  • Explanation: Prometheus scrapes MySQL metrics from the exporter.

6. Troubleshooting

  • Check Exporter Status:
    sudo systemctl status prometheus-mysql-exporter
  • Verify Metrics:
    curl http://localhost:9104/metrics

Docker Monitoring

1. Install the Docker Exporter

  • Command:
    sudo docker run -d --name=docker-exporter -p 9323:9323 prom/docker-exporter
  • Explanation: Runs the Docker exporter in a container, exposing metrics on port 9323.

2. Add the Exporter to Prometheus

  • Configuration:
    scrape_configs:
      - job_name: 'docker'
        static_configs:
          - targets: ['localhost:9323']
  • Explanation: Configures Prometheus to scrape Docker metrics.

3. Troubleshooting

  • Check Container Logs:
    sudo docker logs docker-exporter
  • Verify Metrics:
    curl http://localhost:9323/metrics

Advanced Monitoring Setup


Custom Prometheus Exporters

1. Develop a Custom Exporter

  • Example in Python:
    from prometheus_client import start_http_server, Gauge
    import random
     
    g = Gauge('custom_metric', 'A custom metric')
     
    def collect_metrics():
        g.set(random.random())
     
    if __name__ == '__main__':
        start_http_server(8000)
        while True:
            collect_metrics()
  • Explanation: Creates a simple custom exporter that serves a random metric.

2. Add the Exporter to Prometheus

  • Configuration:
    scrape_configs:
      - job_name: 'custom'
        static_configs:
          - targets: ['localhost:8000']
  • Explanation: Prometheus scrapes metrics from the custom exporter.

3. Troubleshooting

  • Check Exporter Logs:
    tail -f /var/log/custom_exporter.log
  • Verify Metrics:
    curl http://localhost:8000/metrics

Monitoring via SNMP

1. Install SNMP Exporter

  • Command:
    sudo apt-get install -y snmp-exporter
  • Explanation: Installs the SNMP exporter for Prometheus.

2. Configure SNMP Exporter

  • Command:
    sudo vim /etc/snmp-exporter/snmp.yml
  • Example Configuration:
    modules:
      if_mib:
        walk:
          - 1.3.6.1.2.1.2
        metrics:
          - name: if_mib_if_speed
            oid: 1.3.6.1.2.1.2.2.1.5
            type: gauge
  • Explanation: Configures the SNMP exporter to monitor specific metrics.

3. Add SNMP Exporter to Prometheus

  • Configuration:
    scrape_configs:
      - job_name: 'snmp'
        static_configs:
          - targets: ['localhost:9116']
  • Explanation: Prometheus scrapes SNMP metrics from the exporter.

4. Troubleshooting

  • Check Exporter Logs:
    sudo tail -f /var/log/snmp_exporter.log
  • Verify Metrics:
    curl http://localhost:9116/metrics

Security Best Practices


Securing Prometheus and Grafana

1. Secure Prometheus with HTTPS

  • Generate SSL Certificates:
    sudo openssl req -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/prometheus/prometheus.key -out /etc/prometheus/prometheus.crt
  • Update Prometheus Configuration:
    web:
      tls_config:
        cert_file: /etc/prometheus/prometheus.crt
        key_file: /etc/prometheus/prometheus.key

2. Secure Grafana with Authentication

  • Configure Authentication:
    sudo vim /etc/grafana/grafana.ini
  • Update Settings:
    [auth]
    disable_login_form = false

Managing User Access in Grafana

1. Configure User Roles

  • Command:
    sudo vim /etc/grafana/grafana.ini
  • Update Admin Password:
    [auth]
    admin_password = newadminpassword

2. Set Up Teams and Permissions

  • In Grafana UI:
    • Navigate to: ConfigurationTeams
    • Create Teams and Assign Roles

3. Troubleshooting

  • Check User Roles and Permissions:
    curl -X GET http://localhost:3000/api/org/teams -H "Authorization: Bearer YOUR_API_KEY"

Regex for Command-Line Monitoring and Log Parsing

Introduction

Regular Expressions (Regex) are powerful tools used for searching, matching, and manipulating text. They are particularly useful in command-line environments for monitoring and parsing logs. This guide provides a comprehensive tutorial on regex basics, advanced examples, practical applications, and a primer on using awk for regex-based processing.

Basics of Regex

What is Regex?

Regular Expressions (Regex) are sequences of characters that define a search pattern. They are used for text processing, such as finding specific strings or patterns within text.

Basic Syntax and Special Characters

  • .: Matches any single character except a newline.
  • *: Matches zero or more occurrences of the preceding character or group.
  • +: Matches one or more occurrences of the preceding character or group.
  • ?: Matches zero or one occurrence of the preceding character or group.
  • []: Matches any one of the enclosed characters.
  • ^: Matches the start of a string.
  • $: Matches the end of a string.
  • |: Acts as an OR operator between patterns.
  • \: Escapes special characters.

Examples

Matching Simple Text

To match the word "error" in a log file:

grep 'error' logfile.log
Using Wildcards

To find lines that contain any digit:

grep '[0-9]' logfile.log

Advanced Regex Examples

Character Classes

  • \d: Matches any digit (equivalent to [0-9]).
  • \D: Matches any non-digit.
  • \w: Matches any word character (equivalent to [a-zA-Z0-9_]).
  • \W: Matches any non-word character.
  • \s: Matches any whitespace character.
  • \S: Matches any non-whitespace character.
Matching IP Addresses

To match an IPv4 address:

grep -P '(\b25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(\b25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(\b25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(\b25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b' logfile.log
Extracting Dates

To extract dates in the format YYYY-MM-DD:

grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}' logfile.log

Grouping and Capturing

Use parentheses to create groups and capture text:

  • (pattern): Captures the matched text within the parentheses.
Extracting Error Codes

To capture error codes in the format [ERROR: CODE]:

grep -oP '\[ERROR: \K\d{3}' logfile.log

Primer for awk

What is awk?

awk is a versatile programming language designed for pattern scanning and processing. It is commonly used for extracting and manipulating text data, especially in conjunction with regular expressions.

Basic Syntax

awk 'pattern { action }' file

Key Features

  • Pattern Matching: awk can match patterns using regex.
  • Field Processing: awk processes text files line by line, treating whitespace-separated text as fields.

Examples

Print Specific Fields

To print the first and third fields from a file:

awk '{ print $1, $3 }' file.txt
Filter Lines with Regex

To print lines where the second field contains the word "error":

awk '$2 ~ /error/' file.txt
Using awk with Regex

To extract lines matching a specific pattern and print the second field:

awk '/pattern/ { print $2 }' file.txt
Advanced awk Usage

To sum values in the third column where the first column matches "status":

awk '$1 == "status" { sum += $3 } END { print sum }' file.txt

Best Practices

Use Anchors

Anchors like ^ and $ ensure that the pattern matches the beginning or end of a line, reducing false positives.

Optimize Patterns

Avoid overly complex patterns that can slow down performance. Simplify where possible.

Test Patterns

Always test your regex patterns with sample data to ensure they work as expected.

Troubleshooting

Common Issues

Pattern Not Matching
  1. Check for Escaping Issues: Ensure special characters are correctly escaped.
  2. Verify Syntax: Confirm the regex syntax is valid for the tool being used (e.g., grep, awk).
Performance Issues
  1. Simplify Patterns: Break down complex patterns into simpler ones.
  2. Use Efficient Tools: Choose tools optimized for regex performance (e.g., grep vs. awk).

Useful Tools and Resources

  • Regex101: An online regex tester and debugger.
  • grep Manual: man grep for detailed usage and options.
  • awk Manual: man awk for detailed usage and options.
  • Regex Cheat Sheet: Quick reference for regex syntax.

Integrating Prometheus with Ansible

Integrating Prometheus with Ansible allows for automated deployment and configuration of Prometheus monitoring systems.

Prerequisites

Here’s an improved and polished version of your guide:


Step-by-Step Guide: Ansible Setup for Prometheus, Node Exporter, and Grafana

1. Install Ansible

If Ansible isn't installed yet, update your package list and install it with the following commands:

sudo apt update
sudo apt install ansible

2. Set Up the Ansible Directory Structure

Organize your Ansible configuration by creating a dedicated directory:

mkdir -p ~/ansible/prometheus
cd ~/ansible/prometheus

3. Create the Ansible Inventory File

An inventory file defines the hosts where Prometheus will be deployed.

Define the Inventory File

Create an inventory.ini file with the following content:

[prometheus]
your_prometheus_host ansible_host=your_prometheus_ip
 
[all:vars]
ansible_user=your_ssh_user

Replace the placeholders with the actual values for your Prometheus host, IP, and SSH user.

4. Create Ansible Playbooks

Prometheus Installation Playbook

Create a file named install-prometheus.yml:

---
- name: Install Prometheus
  hosts: prometheus
  become: yes
  tasks:
    - name: Install required packages
      apt:
        name:
          - wget
          - tar
        state: present
 
    - name: Download Prometheus
      get_url:
        url: https://github.com/prometheus/prometheus/releases/download/v2.42.0/prometheus-2.42.0.linux-amd64.tar.gz
        dest: /tmp/prometheus.tar.gz
 
    - name: Extract Prometheus
      unarchive:
        src: /tmp/prometheus.tar.gz
        dest: /usr/local/bin/
        remote_src: yes
 
    - name: Create Prometheus user
      user:
        name: prometheus
        shell: /bin/false
 
    - name: Create directories for Prometheus
      file:
        path: "{{ item }}"
        state: directory
        owner: prometheus
        group: prometheus
      with_items:
        - /etc/prometheus
        - /var/lib/prometheus
 
    - name: Copy Prometheus configuration file
      copy:
        src: prometheus.yml
        dest: /etc/prometheus/prometheus.yml
        owner: prometheus
        group: prometheus
 
    - name: Create Prometheus systemd service
      copy:
        content: |
          [Unit]
          Description=Prometheus Monitoring System
          Documentation=https://prometheus.io/docs/introduction/overview/
          Wants=network-online.target
          After=network-online.target
 
          [Service]
          User=prometheus
          Group=prometheus
          ExecStart=/usr/local/bin/prometheus --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path /var/lib/prometheus/
 
          [Install]
          WantedBy=multi-user.target
        dest: /etc/systemd/system/prometheus.service
 
    - name: Start and enable Prometheus service
      systemd:
        name: prometheus
        state: started
        enabled: yes

Prometheus Configuration File

Create a prometheus.yml file:

global:
  scrape_interval: 15s
  evaluation_interval: 15s
 
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

Node Exporter Installation Playbook

Node Exporter collects hardware and OS metrics.

Create a file named install-node-exporter.yml:

---
- name: Install Node Exporter
  hosts: prometheus
  become: yes
  tasks:
    - name: Download Node Exporter
      get_url:
        url: https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
        dest: /tmp/node_exporter.tar.gz
 
    - name: Extract Node Exporter
      unarchive:
        src: /tmp/node_exporter.tar.gz
        dest: /usr/local/bin/
        remote_src: yes
 
    - name: Create Node Exporter user
      user:
        name: node_exporter
        shell: /bin/false
 
    - name: Create Node Exporter systemd service
      copy:
        content: |
          [Unit]
          Description=Prometheus Node Exporter
          Documentation=https://prometheus.io/docs/guides/node-exporter/
          Wants=network-online.target
          After=network-online.target
 
          [Service]
          User=node_exporter
          ExecStart=/usr/local/bin/node_exporter
 
          [Install]
          WantedBy=multi-user.target
        dest: /etc/systemd/system/node_exporter.service
 
    - name: Start and enable Node Exporter service
      systemd:
        name: node_exporter
        state: started
        enabled: yes

Grafana Installation Playbook

Grafana visualizes data from Prometheus.

Create a file named install-grafana.yml:

---
- name: Install Grafana
  hosts: prometheus
  become: yes
  tasks:
    - name: Add Grafana APT repository
      apt_repository:
        repo: "deb https://packages.grafana.com/oss/deb stable main"
        state: present
 
    - name: Add Grafana GPG key
      apt_key:
        url: https://packages.grafana.com/gpg.key
        state: present
 
    - name: Install Grafana
      apt:
        name: grafana
        state: present
 
    - name: Start and enable Grafana service
      systemd:
        name: grafana-server
        state: started
        enabled: yes

5. Execute the Ansible Playbooks

Run the playbooks in this order:

ansible-playbook -i inventory.ini install-prometheus.yml
ansible-playbook -i inventory.ini install-node-exporter.yml
ansible-playbook -i inventory.ini install-grafana.yml

Best Practices

  • Use Variables: Avoid hardcoding values by defining variables.
  • Version Control: Use Git or similar tools for managing playbook versions.
  • Idempotency: Ensure playbooks are idempotent to prevent unwanted changes.
  • Test Before Production: Always test your playbooks in a staging environment first.
  • Documentation: Document any changes or customizations for future reference.

Troubleshooting

Service Fails to Start

  • Check Logs: Run the following to view logs:
    sudo journalctl -u prometheus
    sudo journalctl -u node_exporter
    sudo journalctl -u grafana-server
  • Verify Configurations: Double-check that configuration files are correctly set up.

Permissions Issues

  • Check File Permissions: Verify ownership and permissions:
    ls -l /etc/prometheus/prometheus.yml
    ls -l /usr/local/bin/node_exporter
  • Fix Permissions:
    sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
    sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Download Errors

  • Check URLs: Verify that the URLs used in the playbooks are correct and accessible.

Service Not Enabled

  • Check Systemd Status:
    sudo systemctl status prometheus
    sudo systemctl status node_exporter
    sudo systemctl status grafana-server
  • Enable Services:
    sudo systemctl enable prometheus
    sudo systemctl start prometheus
    sudo systemctl enable node_exporter
    sudo systemctl start node_exporter
    sudo systemctl enable grafana-server
    sudo systemctl start grafana-server