Technology Blog

How to Install Hugo in 2 Minutes

2024-11-01T10:00:00-07:00

This evening I was experimenting with static site generators. I’ve been using Jekyll for a while, but I wanted to see what other options were available. I decided to give Hugo a try, since I heard it’s very fast and some of the themes are really nice. Hugo is written in Go, so it’s a single binary that you download and run, which generally makes it easier to install than something like Jekyll, which requires Ruby and quite a few other dependencies.

As I run Arch Linux on my desktop, I’ll show you how to get it up and running on Arch Linux in just a couple of minutes.

Installation

Installing Hugo on Arch Linux is straightforward using pacman:

$ sudo pacman -S hugo

That’s it! To verify the installation:

$ hugo version

At the time of this writing, the version that was installed on my Arch Linux system was as follows:

hugo v0.136.3+extended linux/amd64 BuildDate=unknown

Quick Start

Let’s create a new site:

$ hugo new site myblog
$ cd myblog

Add a theme (using PaperMod as an example):

$ git init
$ git submodule add https://github.com/adityatelange/hugo-PaperMod themes/PaperMod

Enable the theme by adding the following line to the end of hugo.toml:

theme = "PaperMod"

By this point, your hugo.toml file should look like this:

baseURL = "http://example.org/"
languageCode = "en-us"
title = "My New Hugo Site"
theme = "PaperMod"

Create your first post:

$ hugo new posts/my-first-post.md

Start the Hugo development server:

$ hugo server -D

Visit http://localhost:1313 to see your new site!

Here’s a screenshot of what the site looks like for me:

My system is set to dark mode, so the theme is styled accordingly. There’s a theme toggle next to the logo in the top left.

That’s all there is to it. Hugo is now installed and ready for you to start creating content. The -D flag tells Hugo to include draft posts in the preview.

For more information about Hugo’s features and capabilities, check out the official documentation.

Using HP iLO2 Remote Console on Linux in 2020

2020-01-29T13:18:41-08:00

Overview

This one really frustrated me, but I knew I had to find a solution, and it wasn’t going to be “Run Windows in a VM on your laptop.” That just sounded stupid. But nevertheless, as our technology advances and support for old Java applets wanes, getting esoteric things like an iLO2 video console working on Linux in 2020 seems next to impossible.

This wasn’t so much of an issue with my old laptop, a ThinkPad T520 running an old version of Ubuntu, namely Trusty (14.04). Before you judge me, let me just say, I use laptops like old Toyotas; I don’t replace them until they are damn near falling apart. Really, that tried and true T520 had an “A” key that would sometimes pop off…

And while I could have just gone to my closet and pulled out this old laptop, just to access the console of one of our old HP servers, just this one time, I really wanted to have a modern solution. So, I went down this rabbit hole, burned a few hours, but ultimately was successful and I am proud to share what I discovered.

First things first, I can’t even login to an iLO2 remote management service at all in today’s Chrome, it’ll complain about SSL and quit with ERR_SSL_BAD_RECORD_MAC_ALERT, so using Chrome is out of the picture.

It works fine in Firefox though; I can login and use most of the tools, except for the Java-based remote console. As of September, 2018, Firefox dropped support for the technology to run Java applets.

So if you’re running a modern Firefox, you’re shit-out-of-luck. However, there is good news. You can install and run an older version of Firefox, from the ESR line (Extended Support Release), and support for Java applets will be there, and the iLO2 console will work. You’ll need an ESR version up to or below 52ESR, and then an old version of Java 7, and I’ll show you below how to get all those things working on Arch Linux (my preferred desktop OS on my ThinkPad X1 Carbon).

Install Firefox ESR

This part is not so bad, but you’ll have to edit your PKGBUILD a bit because the latest ESR version at the time of this writing is too new. You need 52 or below. I use trizen for my AUR packages. If you’re using something else, you’ll have to adapt my method below to your prefered tool.

Install Firefox ESR (binary)
```
  $ trizen -S firefox-esr-bin
```

Edit PKGBUILD

You will want to edit the PKGBUILD file according to the diff below.

--- PKGBUILD    2019-12-23 04:57:07.070017439 -0800
+++ PKGBUILD-new        2019-12-23 04:58:52.850018080 -0800
@@ -7,7 +7,7 @@

 pkgname=firefox-esr-bin
 _pkgname=${pkgname/-bin/}
-pkgver=68.3.0
+pkgver=52.6.0
 pkgrel=1
 pkgdesc='Standalone web browser from mozilla.org - Extended Support Release'
 url='http://www.mozilla.org/en-US/firefox/organizations/'
@@ -18,22 +18,23 @@
 license=('MPL' 'GPL' 'LGPL')
 install=$_pkgname.install

-sha512sums=('aadfdd64f10d5f9b97dda227793a6db3b73913f986c2f826ddcc3568f9a9e63ad3fe73d04dcb2cfe27ab854ef048faef3546621b3de731f5a5478c7c551df33a'
+sha512sums=('b521611ace3731aea3e1cc7abb74f01a4885f5325da359a25a6a295316541c4e1e4cb7cf1be104cbb199acc15d57eeb8a37e2e8adf4e53f7ddf284f9d81a047f'
             'c585f6e8ac7abfc96ad4571940b6f0dcc3f7331a18a518b4fe5d19b45e4c2d96c394524ea5c115c8fdd256c9229ea2fabeb1fc04ca7102f1626fd20728aef47d'
             'ab2fa2e08c7a65ac0bfe169a4b579e54b038bddabf838cd3df5ab341bd77be7c101092d0123598944d2174ab3a8fbc70dfbd692b2944016efdb7a69216a74428')
 [[ "$CARCH" == "i686" ]] && sha512sums[0]='cb72fa6f7a6106fefa124dfdc2f8dc6df26a27defeb93d5683f744eb47343cdfb5e39727b16678f479c57a05d09d9358a811950d42635f57bba2cf0e94ed8412'

     ln -s /opt/$_pkgname/firefox $pkgdir/usr/bin/$_pkgname
     install -m644 $srcdir/{$_pkgname.desktop,$_pkgname-safe.desktop} $pkgdir/usr/share/applications/
-       install -m644 $srcdir/firefox/browser/chrome/icons/default/default128.png $pkgdir/usr/share/pixmaps/$_pkgname.png
+       install -m644 $srcdir/firefox/browser/chrome/icons/default/default48.png $pkgdir/usr/share/pixmaps/$_pkgname.png
 }

With the above changes, we accomplish the following:

Downgrade the Firefox version
Insert the correct SHA512 checksum for the downgraded binary package

For reference, I found the SHA512SUM for the binary file by painstakingly looking for the correct version, architecture and language in the SHA512SUMS file for the release we’re trying to install.
Don’t ask me why that icon isn’t there.

Build the package

If everything goes smooth with your changes above, the Firefox ESR binary package will be built and when you’re prompted to install it, just proceed to do so.

If things don’t go smoothly and you need to try again, see the next section.
OPTIONAL: How to retry the build if things go wrong

If something goes wrong, you can save time by going into the build directory, tweaking anything you need, and building again manually as follows:
```
$ cd /tmp/trizen-$USER/firefox-esr-bin

# Make tweaks, etc...

$ makepkg
```
Once makepkg completes successfully, you’ll have a package file that Arch Linux can install using regular ol’ pacman:
```
$ sudo pacman -U firefox-esr-bin-52.6.0-1-x86_64.pkg.tar.xz
```

Install the Java 7 Runtime Environment (JRE7)

To install an old version of the Oracle Java 7 Runtime Environment, which is EOL (end of life / support), perform the steps below. Note though, that the first part will fail, and that is normal. It’s because Oracle doesn’t let you just download the Java runtimes, you have to agree to their license, or terms, or something legal that nobody cares about but the lawyers.

But not only that, you have to login to Oracle’s site. This is annoying but not necessarily an obstacle, because you can sign up for a free account right on the spot.

Install JRE7 package
```
  $ trizen -S jre7
```
This will fail, as noted above, but you’ll get the directory structure in place that you need for the remaining steps in this section.
Download JRE7 runtime from Oracle’s site

The trizen command above will tell you which URL to go to (I would have pasted it here, but I honestly forgot it). Once there, you’ll want to get the file: jre-7u80-linux-x64.tar.gz

When you try to download this file, it’ll make you go through a dance of accepting something legal, signing up for a free Oracle account, etc…

It sucks, but just accept the pain.

Once you finally have the file, proceed to the next step.
Install JRE7 package (again)

Now you can perform the following:
```
$ cp ~/Downloads/jre-7u80-linux-x64.tar.gz ~/trizen-$USER/jre7/
$ trizen -S jre7
```
This should complete successfully now and you’ll have a working version of the old Java 7 runtime.

Add iLO2 to the Security Exceptions

This will help with you not getting blasted with a million popups about security problems.

Start the Java Control Panel

  $ /usr/lib/jvm/java-7-jre/jre/bin/ControlPanel

Confirm exceptions

Navigate to: Security -> Edit Site List

Add your iLO2 server URLs to the list. For example, https://192.168.1.2

Configure JRE7 for Firefox

Note: I didn’t actually have to do this step, the symlink was already present. But for completeness, I’m including this part because the reference I used as a guide for this whole madness also included it, so…

Manual symlink (optional)

You only need to perform this step if the symlink isn’t already present.

  $ cd /usr/lib/mozilla/plugins
  $ sudo rm libnpjp*
  $ sudo ln -s /usr/lib/jvm/java-7-jre/jre/lib/amd64/libnpjp2.so

Run Firefox ESR

Execute the correct binary

This part may seem obvious, but I would like to point out one caveat I found. First of all, to run Firefox ESR, just execute the following:
```
$ firefox-esr
```
But, if you have the regular version of Firefox already installed, this might actually end up simply running that version, by default, and I don’t really know why. This happened to me a couple times and I didn’t even notice at first. Eventually, I ran the following instead:
```
$ /opt/firefox-esr/firefox
```
and that opened the right version. Don’t ask me why. The file /usr/bin/firefox-esr simply symlinks to this one in /opt so it should be the same, but whatever…
Verify the Java plugin is installed

Navigate to about:addons

You should see “Java(TM) Plug-in 10.80.2” in the list.

Open iLO2

Open your iLO page as you normally would and start the remote console. You may need to confirm further security exceptions.

If you get a ClassNotFoundException, don’t panic, just click once on the applet where the remote console should be and it’ll download what it needs. I only had to do this once, then never again.

And that’s it!

You should see your server’s video console in your browser and you can interact with it as you normally would, like in the old days, or on an old laptop/desktop.

A Better Way

If you’re still with me, I thank you for your patience. This process is rather long and I wish accessing out-of-band consoles on physical hardware wasn’t always such a pain in the ass. Why can’t someone make a decent remote console?!

One of the biggest reasons we developed our ARP Thunder™ Cloud Dedicated Server product several years ago over at ARP Networks was to provided a solution to this problem. Get the resources of a dedicated server, but be able to manage it with the ease of a virtual machine, especially with regards to out-of-band (OOB) management.

With ARP Thunder™, you can get a video-based OOB management console by simply clicking “View Console” in our Portal, which works in any modern web browser without any plugins required. You can also get a serial-based OOB management console over SSH.

How cool is that?! :)

References

I owe much of my success in getting the iLO2 console working in Linux to the following post: Use HP iLO2 Remote Console with Linux in 2018

4 Key Considerations for Outsourcing Web App Development

2019-12-12T14:28:38-08:00

By outsourcing web app development, you can reduce project costs, improve product quality, and shorten the time to market. These are just some of the reasons why companies are turning to external talent to handle their IT projects. But outsourcing is as much about the journey as it is about the destination. There’s a lot more to think about than the price-quality ratio. To get the best return on investment, you need to establish a thorough screening process and look towards a long-term collaboration that doesn’t end the moment a project is marked complete.

You need to look towards a long-term collaboration

Here are some of the most important considerations for developing a successful outsourcing relationship:

1: Finalize your business goals

As digital transformation becomes a key business driver in today’s technology-focused world, companies are placing great importance on innovation. But innovation is more than just a goal. It’s a complex set of evolving needs based on current market conditions and existing business systems and processes. A widespread lack of alignment between technological solutions and business needs is one of the main reasons most digital transformations achieve disappointing results.

When it comes to web app development, it’s crucial that you have a clear picture of what you want to achieve. There must be a clear business need for it in the first place. Before you start looking for an outsourcing partner, you should have a finalized concept. This should include a general description of your web product and a list of objectives associated with it, such as business goals and technology requirements. Software development and project management teams often summarize these factors in the form of a user story, which outlines the who, what, and why of the project. Here’s an example: ‘As a customer, I want a shopping cart that allows me to save items I can come back to later.’ This will allow you to develop a list of questions to ask in the screening process and establish a realistic budget.

2: Find the right team

In many circles, outsourcing is still considered as a way of doing things on the cheap with little regard to quality and support. While that can certainly be the case if you choose your partners based on their prices alone, outsourcing can help you overcome talent shortages by allowing you to tap into the best knowledge and experience in the sector. However, finding the right outsourcing partner is perhaps the hardest step of all. That’s why, before you start scouring the freelance websites, developer blogs, and B2B directories, you should make a shortlist of your requirements.

Finding the right outsourcing partner is perhaps the hardest step of all

One of the most important decisions is whether to outsource locally, nearshore, or offshore. Offshore is the most popular outsourcing model where price is the biggest concern, but it can also result in communications delays due to working across multiple time zones and reduced project oversight. Outsourcing locally, while undoubtedly the more expensive option, provides the benefit of complete oversight and easier communications. A local company will also have a better view of your target market and business needs compared to one the other side of the world. Nearshoring often presents itself as a compromise between the two.

3: Select the best service model

When you’re outsourcing app development, it’s critical you choose the right service model. All too often do businesses make the mistake of outsourcing a project only to achieve the bare minimum and end up with an app fraught with bugs and other issues. This is a common issue with outsourcing offshore or hiring freelancers on the cheap through the bidding platforms. In many such cases, there’s a complete lack of post-project support, which also means you have to find a place to host your app and manage all the maintenance and upgrades yourself.

In many cases, there’s a complete lack of post-project support, which means you have to host your app and manage the maintenance yourself.

Many service models are project-based, which means there’s a predefined goal that the team needs to reach within a specified timeframe. This is ideal for companies which have their own IT departments and are able to organize app hosting and maintenance themselves. But if you don’t have your own team, you’ll be better off choosing a dedicated outsourced partner which provides full-cycle app development and post-project support. This gives businesses more control over the process, making it ideal for those with very specific needs, such as integrating existing business systems or adding complex functions which need extensive testing.

You’ll also need to choose a suitable payment model. While the costs vary widely depending on whether you’re outsourcing locally or offshore and which service model you choose, you’ll need to maintain complete visibility into ongoing costs. Fixed-price contracts are ideal for those with limited budgets and where the scope of work is clear, but it can also be a major capital expense. The pay-as-you-go model, which is popular among agile software developers, offers much greater flexibility, making it ideal for more complex projects which need ongoing support or are hard to determine the scope of in advance.

4: Establish efficient communications

The most important component of any outsourcing relationship is also one many businesses get wrong – communication. By now, most of us are familiar with the barrage of complaints so many employees in IT have when trying to deal with a disparate team of developers around the world, none of whom who are able to work effectively as a team. Communications issues abound in such cases due to factors like different time zones, lack of local expertise, and a lack of familiarity with security and privacy regulations like the CCPA. While communications tools like Slack and WhatsApp can go a long way towards simplifying the process, relying on them entirely can result in serious inefficiencies.

The ability to provide quick feedback is essential

It’s important that you choose the right collaboration tools and establish communication needs from the outset. The ability to provide feedback is essential when it comes to developing web projects. If there’s something you don’t like about the direction of the project, it’s important the outsourced partner knows about it immediately. Especially in the early stages of the project, it’s strongly advisable that you hold daily meetings to ensure it gets started on the right track. Every project should start with a planning meeting that defines the scope of the work before meeting up at regular intervals thereafter to synchronize progress. Once each milestone has been reached, there should be a meeting to showcase the result and provide feedback.

About Us

We are a small team that believe in living our best lives. Everyone here is happy, growing and always learning. As a result, we provide the highest quality:

Hosting
App development
Cloud management

to our 500+ clients.

Start Your Project

Want to discuss your project with us?

When you’re ready to start your project, we’re happy to help. Contact us any time either by phone (1-855-444-3145) or fill out the form below.

Terms

A simple glossary that I didn’t want to crowd the article above, so I saved it for the end.

The following are a list of terms used in this article and a definition for each. I felt this would be useful because the exact definition of certain terms in the app development and web development space have become blurry (see, it just happened right there).

App Development

Typically, when one thinks of "apps," they think of mobile applications that run on their phone. As the transition from laptop/desktop usage to mobile usage has increased over the years, the term "app" is increasingly become ubiquitious with any kind of application, even if it doesn't exist on the mobile device. An app with a great responsive design and that runs well in a mobile browser can often also be referred to as simply an "app", even though it is not a native mobile app to the phone itself. App development can refer to the development of either kind of these applications by an individual, web development company, or application development company.

Application Development

See app development above.

Development Services

The contracted services of a company whose expertise is app development, whether it be mobile app development or web app development.

Mobile App Development

See "Native Mobile App Development" below.

Mobile Application

An app that is the end result of Native Mobile App Development (see below) and can also be called simply a "native app" or "mobile app."

Native Mobile App Development

Refers to app development that is targeted exclusively for a mobile device and runs directly on that device without any "web" intermediary. Native apps typically run faster and can provide a better user experience than web applications run simply within a mobile browser.

Web Development

Typically refers to the development of applications that run in a web browser, which is quite typical these days. This term is not generally used to refer to the development of native apps for mobile devices.

How To Update the Device Class on a Ceph OSD

2019-06-28T15:28:38-07:00

Several SAS OSDs in our Ceph cluster were replaced with faster SSDs while re-using the old OSD IDs. As of Luminous, the option to re-use an OSD ID is available and it really speeds up the rebalancing.

The slight problem with re-using OSD IDs is that the old device class is still in the CRUSH map and won’t automatically get updated because you inserted different media. Most of time this isn’t a problem, like if you replace a spinning HDD with another spinning disk. The device class is the same and doesn’t need to change.

But, if you replace it with an SSD, for example, then you need to update the device class manually. The following is an example using a ficticious OSD ID of 101:

$ OSD=101

# Set new device class to "ssd"
$ ceph osd crush rm-device-class osd.$OSD
$ ceph osd crush set-device-class ssd osd.$OSD

It’s as simple as that!

ceph-volume lvm zap results in wipefs error and probing initialization failed: Device or resource busy

2019-06-21T23:26:49-07:00

We were replacing a drive in our Ceph cluster and doing the usual ceph-volume zap needed to wipe the new device when we were hit with the following:

$ sudo ceph-volume lvm zap /dev/sdg
Running command: /sbin/cryptsetup status /dev/mapper/
--> Zapping: /dev/sdg
Running command: /sbin/wipefs --all /dev/sdg
 stderr: wipefs: error: /dev/sdg: probing initialization failed: Device or resource busy
-->  RuntimeError: command returned non-zero exit status: 1
$

I haven’t seen this error too often. It wasn’t a completely new drive, but a used one. After some investigation, turned out the MD subsystem (Linux Software RAID) got a hold of it because it had an existing RAID signature on it, and wouldn’t let it go.

To find out which RAID device was holding onto our drive, I did:

$ cat /proc/mdstat
Personalities : [raid1] [raid10] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] 
md127 : inactive sdg[0](S)
      1101784 blocks super external:ddf

...[snip]...

unused devices: 
$

I know it was md127 because it included inactive sdg, which is the name of our new device.

Now the fix was easy, we just had to stop and remove that superfluous RAID volume, like so:

$ sudo mdadm --stop /dev/md127
mdadm: stopped /dev/md127
$ sudo mdadm --remove /dev/md127
$

After that, the ceph-volume lvm zap command completed without error.

systemd: The name org.freedesktop.PolicyKit1 was not provided by any .service files, or you just forgot sudo

2019-06-20T17:30:53-07:00

This is yet another good example of why I hate systemd. It has good intentions and there are things it can do really well, but for the most part, it is an extremely frustrating piece of software to use. And for no good reason, really.

Say you just issued the below command (like I was doing):

systemctl start ceph-osd@205

and were met with the following rebuttal:

The name org.freedesktop.PolicyKit1 was not provided by any .service files

You’re probably really scratching your head at this point…

The problem actually has nothing to do with anything the above error states. I simply forgot to use sudo

sudo systemctl start ceph-osd@205

and now everything is right with the world.

See what I mean? Simply frustrating that one would be led so far off the correct path right from the start.

I moved my drives to a new server, now it won’t boot!

2019-06-12T13:47:23-07:00

The Problem

Recently, we had to move the drives of one server to another, seemingly identical one, because the original one failed. Once we had done so, Linux would boot, but not find a root filesystem. We were like, “wtf?! … It’s an identical server”

Well, it turned out, not exactly. It was almost the same type, but this newer one had a built-in Intel RAID controller and the SAS cables were already routed to it. Our drives were detected fine, and GRUB would boot, but once Linux took over, it couldn’t see the disks at all.

It turned out our initramfs image did not have the drivers for this particular controller (and why would it, the drives came from a server where this controller was not present)

To fix this, we basically had to rebuild the initramfs image with the correct driver. The following is a description of the process.

The Solution

First of all, we booted the new server with an Arch Linux ISO. In our experience, Arch Linux has the best tools out-of-the-box and it is easy to get a shell right away by just booting the regular ISO.

Once booted, we executed:

mount /dev/md1 /mnt            # Our root was on md1
arch-chroot /mnt
mount -a                       # Mount everything else

# Change modules=dep to modules=all
vim /etc/initramfs-tools/initramfs.conf

update-initramfs -u -k all     # Or use -k  if your boot partition is small

sync
exit

reboot

After this, the new server booted perfectly!

LSI 9211-8i SAS Controller Spins at ‘Initializing…’ Forever

2017-03-17T17:41:56-07:00

We had this issue when popping an LSI 9211-8i SAS controller into an HP Proliant DL180se G6. After the little LSI splash screen ¹ appeared after POST, it goes into:

Initializing..\

With the last character spinning… and spinning… forever.

I tried everything to get the thing going:

Updated LSI firmware to latest P20 (IT mode, was previously in IR mode)
Flashed LSI firmware without BIOS ROM (so this screen is bypassed altogether) ²
Updated HP Proliant DL180se G6 backplane firmware
Attempted to disable option ROM for the card in the BIOS (non-existent)
Tried putting the card into every slot and PCI riser combination possible
Replaced HP mini-SAS to mini-SAS cable with an aftermarket more “generic” one

At the end of the day, the fix was simply to disconnect the I2C cable going from the backplane to the motherboard. So simple, yet… so… ggaahh!!

Footnotes:

The adapter malfunctioning error in this screenshot is non-related to this post; I just wanted to show the splash screen. ↩
This does get the machine to at least boot correctly, without any delays, but no drives are detected in Linux. ↩

How to Speed Test Your New Infiniband Card

2016-11-07T19:21:09-08:00

So, you grabbed a few Infiniband cards for cheap off eBay and are wondering if they’re any good?

Here’s a quick way to throw some data through them and see. You need two cards and two Linux boxes. In the examples below, I’m using Ubuntu Linux and the interface name of the IB card is simply ib0. iperf is already installed (apt-get install iperf).

On both boxes, we’ll use IPoIB (IP over Infiniband) to assign a couple temporary IPs and iperf to run a performance test. It’s important to put the cards into connected mode and set a large MTU:

$ sudo modprobe ib_ipoib
$ sudo sh -c "echo connected > /sys/class/net/ib0/mode"
$ sudo ifconfig ib0 10.0.0.1   # Use .2 for the other box
$ sudo ifconfig ib0 mtu 65520

On the first box, put iperf into server mode:

$ sudo iperf -s -i 1
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------

On the second box, throw data at your first one (-P 2 means to use 2 threads):

$ iperf -c 10.0.0.1 -P 2
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 1.20 MByte (default)
------------------------------------------------------------
[  4] local 10.0.0.2 port 46048 connected with 10.0.0.1 port 5001
[  3] local 10.0.0.2 port 46046 connected with 10.0.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  14.6 GBytes  12.5 Gbits/sec
[  3]  0.0-10.0 sec  14.6 GBytes  12.5 Gbits/sec
[SUM]  0.0-10.0 sec  29.2 GBytes  25.0 Gbits/sec
$

So, there you have it. The cards in the above example are pushing a healthy 25 Gbps. It’ll be even faster if using pure Infiniband applications (rather than IPoIB, since more processing is done in the Infiniband hardware, rather than CPUs having to shuffle the TCP/IP stack, among other factors).

Ceph Is Always Scrubbing, Is That Normal?

2016-10-26T23:21:08-07:00

No, it’s probably not normal.

This was the case with our cluster. We would always see stuff like the following when we’d watch ceph -w:

2016-10-26 14:30:46.646110 osd.72 [INF] 5.3d7 scrub starts
2016-10-26 14:30:46.650324 osd.72 [INF] 5.3d7 scrub ok
2016-10-26 14:30:47.646236 osd.72 [INF] 5.3d7 scrub starts
2016-10-26 14:30:47.649672 osd.72 [INF] 5.3d7 scrub ok
2016-10-26 14:30:50.646450 osd.72 [INF] 5.3d7 scrub starts
2016-10-26 14:30:50.649940 osd.72 [INF] 5.3d7 scrub ok
2016-10-26 14:30:51.646326 osd.72 [INF] 5.3d7 scrub starts
2016-10-26 14:30:51.649113 osd.72 [INF] 5.3d7 scrub ok

In between all the normal pgmap, pgs and other statistics data. And it would seem to be going on constantly, no matter what time of day we check.

Turns out that the output above is a bit misleading, at least to me. It looks to me as if it actually scrubbed placement group 5.3d7. But, in fact, it hadn’t. That’s why we’d see it over and over again (and this would happen to many, many PGs, not just that one).

So what is going on?

Turns out our osd scrub load threshold was too low. It was the default of 0.5. We run hosts with a pretty large number of OSDs, so it’s quite normal for the load average to be above that even when things are operating perfectly fine (and doesn’t seem to go below very much).

Therefore, we added the following to ceph.conf:

[osd]
osd scrub load threshold = 2.0

And restarted our OSDs (alternatively, one can also injectargs).

This fixed the problem of the constant flood of scrub messages and scrubbing worked normally again.