By Peter Nørgaard
Subscribe to Tech Decoded weekly newsletter
Server problems happen to everyone, even when using Linux servers that are known for being reliable and able to handle heavy workloads. If these issues aren't fixed quickly, they can cause downtime or create security risks.
This guide shows you how to fix the most common Linux server issues. Each solution includes clear steps and real examples. You'll learn to spot problems early and keep your servers working well.
At a Glance Problem |
Key Symptoms |
---|---|
1. Network Connectivity Woes |
SSH failure, DNS errors, packet drops |
2. Disk Space Troubles and Filesystem Clutter |
Insufficient space, disk errors |
3. High CPU or Memory Usage |
Sluggish response, CPU spikes |
4. Service Failures and Configuration Pitfalls |
Services not starting, misconfiguration errors |
5. Permissions and Ownership Blunders |
“Permission denied,” inability to modify files |
6. Security Breaches and Intrusions |
Unknown processes, suspicious logs |
7. Boot and Kernel Errors |
Kernel panic, failure to boot |
Target Keyword: Troubleshooting Linux Server Network Connectivity
“In any network, the first sign of trouble often reveals itself when you can’t connect to the server.” – Casey L., Network Engineer
Nothing is more frustrating than not being able to SSH into your Linux server or losing your remote connection. Fortunately, diagnosing issues like “How to fix linux server high cpu usage caused by zombie processes” and “resolving linux server ssh connection refused error” can start with verifying your network.
Inability to ping or connect via SSH
“Connection refused” or “Host unreachable” errors
Slow data transfers or frequent packet drops
Misconfigured IP settings (typos in IP, gateway, or DNS)
Firewall blocking ports (e.g., port 22 for SSH)
DNS resolution issues
Physical cable or NIC hardware failure
Check Interface Settings
Use commands like ip addr or ifconfig to confirm correct IP address, netmask, and gateway.
Ping External Addresses
Run ping 8.8.8.8 or a known server to test basic IP connectivity.
Verify DNS
Check /etc/resolv.conf for correct DNS entries, or run nslookup or dig.
Look for Firewall Issues
Confirm iptables or ufw rules are not blocking SSH or other critical ports.
Physical Inspection
Ensure cables are plugged in; check NIC lights or switch port logs for errors.
Correct IP/gateway settings, reissue systemctl restart network.
Update DNS entries as needed.
Open necessary ports in firewall using ufw allow ssh.
Case Study: A small enterprise found that after a reboot, their new gateway IP was typed incorrectly. After verifying using ip route, the admin noticed the wrong IP. A quick fix in /etc/network/interfaces (Debian/Ubuntu) or /etc/sysconfig/network-scripts/ifcfg-ethX (RHEL/CentOS) solved the problem instantly.
Target Keyword: Troubleshooting Linux Server Disk Space Full in var log directory
“When a disk fills, services start to fail, logs can’t be written, and everything can spiral out of control. It’s a race against time.” – Lisa M., Systems Administrator
When you run out of disk space, your Linux server throws “No space left on device” errors or fails to launch essential services. This can be due to logs piling up in /var/log or large backup files building up in unexpected folders.
“No space left on device” or “Disk is full” terminal messages
Services failing to start, e.g., MySQL or Apache
Logs in /var/log show repeated “disk full” warnings
Check Usage with df
Run df -h to see overall disk usage by partition.
Locate Large Files
Use du -sh /var/log/* or ncdu to identify massive logs or backups.
Purge Old Logs
Rotate or clean logs in /var/log if they are excessively large.
Check for Large Orphaned Files
A service might have logs open but unlinked. Use lsof | grep deleted.
Delete or archive old logs. Configure log rotation in /etc/logrotate.conf.
Use compression for large archives.
Expand partitions if possible or add new disk volumes.
Employ best practices for log rotation: logrotate --force /etc/logrotate.conf.
Case Study: A hosting provider discovered that daily backups of /home were being stored in /var/backups, quickly filling /var. Implementing a new rotation policy and storing backups on a separate volume prevented repeated “disk full” errors.
Target Keyword: How to fix Linux server high CPU usage caused by zombie processes
“Often, resource spikes happen due to a single runaway process or memory leak. Identifying it quickly is crucial.” – James F., Lead DevOps Engineer
When a server slows to a crawl or stops responding, a key suspect is CPU or memory saturation via “zombie” processes or poorly optimized apps.
Server lag, unresponsive commands, high load averages
Java or Java-based apps hogging CPU
OOM (Out Of Memory) kills essential processes
Run top or htop
Spot process with highest CPU usage quickly or see memory hogs.
Check Memory
Use free -m or vmstat; watch for low available memory or swap usage.
Look for Zombie Processes
Use ps aux | grep Z. A high number of zombies suggests orphaned children not reaped by parent processes.
Examine Logs
Repeated error logs might show a memory leak or crashed processes.
Kill or remove zombie processes with kill -9 or fix parent code.
Tweak server config for apps like Apache, MySQL to limit CPU usage.
Add or upgrade server hardware if truly needed.
Case Study: A marketing company’s Java-based web app (Tomcat server) began hogging 90% CPU daily. Using jmap and analyzing heap dumps revealed a memory leak in a custom library. A quick patch re-coded the library to release memory after each process cycle, dropping usage from 90% to a stable 30%.
Target Keyword: Linux service fails to start
“A single misconfiguration can cascade into a major downtime event if not quickly identified. Familiarity with system logs shortens downtime drastically.” – Raj B., Infrastructure Head
Critical services failing on boot or giving “Failed to start” messages are a top headache.
“systemctl status servicename” shows “Failed” with an error code
“Connection refused” or “Service not running” errors in logs
Applications not reachable despite correct configurations
Typos in config files, e.g. /etc/nginx/nginx.conf
Missing dependencies or incorrect file paths
Systemd unit files pointing to wrong locations
Check Status
systemctl status servicename or journalctl -u servicename
Examine Config Syntax
For Nginx, nginx -t; for Apache, apachectl configtest.
Look for Dependency Issues
If using systemd, ensure “After=” lines match required dependencies.
Fix syntax errors, typos in config
Re-run or reinstall missing dependencies
Adjust systemd unit file and re-run systemctl daemon-reload; systemctl restart servicename
Case Study: A WordPress site refused to run after a reboot because Nginx had “listen 80” missing a semicolon in /etc/nginx/sites-available/default. The error logs pinned it down quickly. A fix plus systemctl daemon-reload && systemctl start nginx resolved the downtime within minutes.
Target Keyword: File permission issues
“A single ‘chmod 777 * -R’ might solve short-term issues but opens a massive security hole. Correct ownership is the real fix.” – Linda V., Security Consultant
Mistakes in file permissions can lock out valid users or open your server to danger.
“Permission denied” while reading, writing, or executing
SSH "Connection refused" due to improper home directory perms
System logs showing repeated “access denied”
List Current Permissions
ls -l filename to see user, group, and mode bits (rwx).
Check Ownership
chown user:group filename as needed.
Set Proper Permissions
chmod 640 file.conf for config files, for example, to restrict to root and group.
Provide minimal privileges needed to run. Avoid large scale chmod 777.
Use usermod -aG group user to fix group memberships for services.
Case Study: A production environment's web images were uploaded by the 'deployer' user to /var/www/images with the group set incorrectly. The website gave constant 403 permission denied errors. Correcting group ownership via chown -R root:www-data /var/www/images and ensuring chmod g+rx fixed the issue instantly.
Target Keyword: How to identify and kill processes causing memory leaks on linux servers
"Security is not a product, but a process, each day you must keep an eye on suspicious processes and logs.” – Bruce T., Security Analyst
No environment is immune to intrusions. Quick detection of suspicious processes, memory leaks from crypto-miners, or unknown user accounts is critical.
Unknown processes with high CPU usage
Strange user accounts, repeated login attempts
High outbound network traffic or cryptomining logs
Inspect Running Processes
ps aux --sort=-%cpu or top -o %CPU
Check SSH logs
Look in /var/log/auth.log or /var/log/secure.
Remove Malicious Tools
Use kill -9 , uninstall suspicious software
Patch and Update
Keep packages updated, follow best security practices.
rkhunter or chkrootkit for rootkit scanning
fail2ban or iptables for repeated login attempts
ClamAV for any known malware viruses
Case Study: A Linux server started slowing, with top showing unknown kworker processes at 100% CPU usage. A closer inspection via logs revealed a cryptomining Trojan. Removing it with ClamAV plus adjusting firewall rules closed the compromise path. The memory usage quickly returned to normal.
Target Keyword: How to repair corrupted filesystem on linux server without data loss
“When kernels panic, the entire system halts. Maintaining backups and rescue strategies ensures minimal downtime.” – David U., Senior SysAdmin
Boot and kernel problems can be the most severe, preventing the OS from loading or causing “Kernel panic” messages.
Kernel panic messages on boot
Grub rescue prompts
“No boot device found,” halting system mid-boot
Check BIOS/UEFI
Confirm correct boot ordering and that disks are recognized.
Inspect GRUB
Access GRUB config (grub.cfg) or run rescue mode to fix boot loader.
Boot Recovery
Use a live or rescue environment. Run fsck for partition checks.
Review Kernel Logs
dmesg or logs in /var/log/kern.log for root cause.
Select an older kernel from GRUB if new kernel loops panic
Reinstall or repair the boot loader via grub-install
Perform a filesystem check: fsck -y /dev/sdaX if necessary
Case Study: An overnight kernel update left a server stuck in panic state. The admin rebooted into a rescue environment, reinstalled GRUB, and downgraded the kernel to the known good version. After verifying /boot/grub/grub.cfg, the server booted normally.
Fixing common Linux server issues requires a step-by-step approach and quick response. Being ready to handle problems like full disk space in /var/log, fixing damaged filesystems without losing data, or dealing with SSH connection errors will help you reduce outages, protect your data, and keep your server running smoothly.
Key Takeaways:
Always keep track of your network, CPU, and memory usage to catch anomalies before they impact performance.
Have a plan for log management, to quickly parse root causes from /var/log and correlated logs across your environment.
Prioritize security. Watch out for unknown processes, keep your system patched, and carefully manage user privileges.
Updating kernel and software regularly can stave off known vulnerabilities and fix performance bugs.
Remember: backups, backups, backups! They’re your best friend against corrupted filesystems and major meltdown events.
In the end, a stable, secure, and well-monitored Linux server stands at the core of robust IT operations. This troubleshooting guide is just the beginning—keeping an eye on logs, employing best practices for configuration, and leaning on stable automation for repetitive checks all form the bedrock of professional system administration.
Keep your Linux server running smoothly by following these steps. This will help prevent problems and make things easier for both admins and users.
Your source for the latest tech news, guides, and reviews.
PAGES
CONTACT
INFORMATION
Receive Tech Decoded's Newsletter in your inbox every week.
NEWSLETTER
Copyright © 2025 Tech Decoded, All rights reserved.