Debian UEFI Wiki

Debian UEFI Wiki







This page is mainly intended to describe UEFI for Debian purposes: what’s supported in Debian and how to use it, plus some troubleshooting tips.

See the Wikipedia page for vastly more information about it in general, or there’s lots of other information in the Links section below.


What is UEFI?
History and naming
Architectures supported
PC platform: BIOS, UEFI, CSM etc.
ARM64 platform: UEFI, U-Boot, Fastboot, etc.
ARM32 platform: UEFI, U-Boot, Fastboot, etc.
Disk partitioning: MS-DOS and GPT
Booting a UEFI machine normally
Booting from removable media
debian-installer support
efibootmgr and efivar
efibootmgr example 1 – display boot entries
efibootmgr example 2 – verbose display of boot entries
efibootmgr example 3 – add a new boot entry
Quirks, workarounds and special UEFI features in Debian and Debian-Installer
Dual-booting systems currently installed using BIOS fallback boot
Force grub-efi installation to the removable media path
32-bit x86 PC (i386) support for UEFI
Support for mixed-mode systems: 64-bit system with 32-bit UEFI
Missing features
UEFI support in live images
UEFI Secure Boot
RAID for the EFI System Partition
How to tell if you’ve booted via UEFI
Diagnosing problems with boot order

What is UEFI?

(U)EFI stands for (Unified) Extensible Firmware Interface. It’s a standard specification for the firmware interface on a computer, and it has been implemented by multiple vendors on various platforms.

History and naming

UEFI started life as Intel’s EFI specification. It was first seen in the wild on Itanium (ia64) machines and that’s where Debian’s first support started too.

Later, Intel passed control over the EFI specification to the UEFI Forum and they continued developing newer versions of the specification. The U for Unified was added to the name at this point. In most references here and elsewhere on the net, EFI and UEFI are interchangeable terms to describe the same thing.

There are multiple further bits of terminology here, and things are often confused. So let’s explain!

UEFI is actually a set of interface specifications, nothing more.

The reference implementation of the UEFI specifications is called edk2 or EDK II (EFI Development Kit, version 2). Code can be found at .

Tianocore is the name of the upstream development group working on the Open Source EDK II project – see for more information.

OVMF (Open Virtual Machine Firmware) is a build of edk2 designed to be used as firmware for a virtual machine.

Many commercial UEFI firmware implementations are built on top of edk2, with changes commonly being made to add platform initialisation and a pretty GUI on the front end.

Architectures supported

UEFI has been supported to some extent on 5 of Debian’s architectures:

64-bit Itanium (ia64 in Debian)

64-bit x86-64 (amd64)

32-bit x86 (i386)

32-bit ARM (armhf)

64-bit Aarch64 (arm64)

There are some caveats, though…

Since the Debian Jessie release (8.0), ia64 is no longer a release architecture in Debian
Support for 32-bit ARM systems (armhf) is only available in Debian Buster (10.0) onwards, and is still under development at the time of writing.

It’s a fair bet to assume that RISC-V might end up with UEFI support in the future too.

PC platform: BIOS, UEFI, CSM etc.

On the PC architectures (amd64 and i386), UEFI-based firmware is a relatively new replacement for the ancient BIOS (Basic Input/Output System) that has existed ever since the PC was first developed in the 1980s. The old BIOS systems have strict limitations due to their ancient design, running in 16-bit mode with access to only 1MB of memory, and limited access to other resources like disks. UEFI firmware is normally fully native and so should be able to access all the system memory and all the devices.

For the sake of backwards compatibility, many current PCs using UEFI also include a Compatibility Support Module (CSM), extra support code that will continue to boot in the old BIOS style. Over time, this support will most likely be phased out. Some systems were already being sold UEFI-only (i.e. with no CSM) in 2014.

x86 virtual machines can be run using qemu with either BIOS or UEFI firmware. qemu will default to BIOS using SeaBIOS, but it can also run OVMF. Debian includes builds of OVMF for amd64 in the ovmf package.

ARM64 platform: UEFI, U-Boot, Fastboot, etc.

Some Aarch64 machines (arm64) use U-Boot or other options like Fastboot for their firmware, but most general-purpose arm64 machines (e.g. those intended for use as servers) should be expected to use UEFI, typically via a build of edk2.

Debian includes edk2-based VM firmware for arm64 in the qemu-efi package. For some reason this is often described as AAVMF to distinguish it from OVMF for x86. It’s basically the same software.

ARM32 platform: UEFI, U-Boot, Fastboot, etc.

Most Arm machines (armhf) use U-Boot or other options like Fastboot for their firmware, but some machines can run edk2 as well directly.

Again, edk2 is also a good option for firmware for 32-bit Arm VMs. Debian includes this firmware in the qemu-efi-arm package.

Recent versions of U-Boot have also included some limited UEFI functionality. This is designed to be “just enough UEFI” to support common operations, without including a lot of the more complicated possibilities underneath.

Disk partitioning: MS-DOS and GPT

Historically, the most common method of partitioning disks on PC platforms has been the MS-DOS standard using a Master Boot Record (MBR) and a tiny limited partition table with space to describe only 4 “primary” partitions. This is what BIOS systems still use to date. There are several important limitations that come with this scheme, but the most obvious one is the size limit of 2TB per disk. Back when this partitioning format was invented, a 100MB disk was large. Today, multi-terabyte disks are the norm.

UEFI also includes support for a newer partitioning format: the GUID Partition Table (GPT). It’s much more flexible than the MS-DOS option, including:

many more partitions (up to 128 per disk)
much larger disks (up to 8ZB: 8,000,000,000 TB)
much better definitions of what each partition might be used for

Booting a UEFI machine normally

Regular UEFI boot has several lists of possible boot entries, stored in UEFI config variables (normally in NVRAM), and boot order config variables stored alongside them. It allows for many different boot options, and a properly-defined fallback order. In many cases, you can even list and choose which OS / boot loader to use from the system boot menu (similar to the boot device menu implemented in many BIOSes). Unfortunately, a lot of PC UEFI implementations have got this wrong and so don’t work properly.

The correct way for this to work when booting off local disk is for a boot variable to point to a vendor-specific bootloader program in


on the EFI System Partition (ESP), a specially tagged partition which is normally formatted using FAT32.

Debian installs grub-efi for its EFI bootloader, as:











Each version of GRUB here contains all the code and configuration that GRUB needs to work from that point.

By using separate vendor directories like this, UEFI allows for clean interoperability between vendors. If only the firmware developers were competent… 😦 Some implementations ignore the boot order altogether, some filter it and will only run things that claim to be “Windows”, etc. See below for tips on how to work around some of the known bugs in broken UEFI implementations.

Booting from removable media

If there are no boot variables pointing to a bootloader program in the ESP, or if the user has told the system appropriately, it will look for bootloaders in certain specific paths too. On each device, it will look for FAT32 filesystems. Within each of those, it will look for a specifically-named bootloader file, again with a different name per architecture:











The different names are deliberate – it allows for one disk or CD to contain boot files for multiple architectures with no clashes.

On Debian installation media, each of these files is again a copy of grub-efi with sufficient built-in code and configuration to find the rest of the system from there.

debian-installer support

debian-installer’s support for UEFI is mostly contained in two modules.

First comes the partman-efi module, and this will be loaded automatically if d-i recognises it has been booted in UEFI mode. partman-efi will cope with both MS-DOS and GPT partitioned disks, but will offer to use GPT by preference on disks that are not already partitioned. It knows how to set up an ESP with appropriate partition type and filesystem if necessary, and will ensure it’s correctly mounted on the installed system later. If the system already has an ESP, partman-efi will attempt to use that rather than create a new one. This is for interoperability with existing operating systems in dual-boot systems.

Once the normal installation process has been completed, the second major component with UEFI support comes into play: grub-installer. It will install the grub-efi bootloader to the right location in the ESP and will use efibootmgr to register that bootloader with the firmware. On correctly-working systems, this should work without needing any user interaction. This module will automatically find the ESP and install its files in the right place, leaving no space for confusion on where boot files are saved (as can happen with MBR/MS-DOS systems).

The initial support to make UEFI amd64 systems directly installable in Debian was added in Wheezy (7.0) Support was later added for i386 and arm64 systems for Jessie (8.0), along with a number of quirks and bug workarounds. See below for more details about those. Support for armhf was added in Buster (10.0).

efibootmgr and efivar

The Linux kernel gives access to the UEFI configuration variables via a set of files under /sys, using two different interfaces.

The older interface was efivars, showing files under /sys/firmware/efi/vars, and this is what was used by default in both Wheezy and Jessie.

The new interface is efivarfs, which will expose things in a slightly different format under /sys/firmware/efi/efivars. This is the new preferred way of using UEFI configuration variables, and Debian switched to it by default from Stretch onwards.

The exact details of these interfaces are hidden from view somewhat by efibootmgr and efivar, userland software packages written to work with them. Initially, all of the code was written directly in efibootmgr but more recently the lower-level code has been split out into the library efivar to make it easier to share this code with other utilities like fwupd. Read the man pages for these for full details, but here are a couple of examples from a system with many devices:

efibootmgr example 1 – display boot entries

# efibootmgr
BootCurrent: 0019
Timeout: 0 seconds
BootOrder: 0019,0006,0007,0008,0009,000A,000B,000C,000D,000E,000F,0010,0011,0012,0013
Boot0000 Setup
Boot0001 Boot Menu
Boot0002 Diagnostic Splash Screen
Boot0003 Startup Interrupt Menu
Boot0004 ME Configuration Menu
Boot0005 Rescue and Recovery
Boot0006* USB CD
Boot0007* USB FDD
Boot0008* ATAPI CD0
Boot0009* ATA HDD2
Boot000A* ATA HDD0
Boot000B* ATA HDD1
Boot000C* USB HDD
Boot000D* PCI LAN
Boot000E* ATAPI CD1
Boot000F* ATAPI CD2
Boot0010 Other CD
Boot0011* ATA HDD3
Boot0012* ATA HDD4
Boot0013 Other HDD
Boot0015* IDER BOOT Floppy
Boot0016* ATA HDD
Boot0017* ATAPI CD:
Boot0018* PCI LAN
Boot0019* debian

efibootmgr example 2 – verbose display of boot entries

The same as example 1, but with more detail (including the GUIDs used to identify devices).

# efibootmgr -v
BootCurrent: 0019
Timeout: 0 seconds
BootOrder: 0019,0006,0007,0008,0009,000A,000B,000C,000D,000E,000F,0010,0011,0012,0013
Boot0000 Setup FvFile(721c8b66-426c-4e86-8e99-3457c46ab0b9)
Boot0001 Boot Menu FvFile(126a762d-5758-4fca-8531-201a7f57f850)
Boot0002 Diagnostic Splash Screen FvFile(a7d8d9a6-6ab0-4aeb-ad9d-163e59a7a380)
Boot0003 Startup Interrupt Menu FvFile(f46ee6f4-4785-43a3-923d-7f786c3c8479)
Boot0004 ME Configuration Menu FvFile(82988420-7467-4490-9059-feb448dd1963)
Boot0005 Rescue and Recovery FvFile(665d3f60-ad3e-4cad-8e26-db46eee9f1b5)
Boot0006* USB CD VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,86701296aa5a7848b66cd49dd3ba6a55)
Boot0007* USB FDD VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,6ff015a28830b543a8b8641009461e49)
Boot0008* ATAPI CD0 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a35401)
Boot0009* ATA HDD2 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f602)
Boot000A* ATA HDD0 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f600)
Boot000B* ATA HDD1 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f601)
Boot000C* USB HDD VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,33e821aaaf33bc4789bd419f88c50803)
Boot000D* PCI LAN VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,78a84aaf2b2afc4ea79cf5cc8f3d3803)
Boot000E* ATAPI CD1 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a35403)
Boot000F* ATAPI CD2 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a35404)
Boot0010 Other CD VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a35406)
Boot0011* ATA HDD3 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f603)
Boot0012* ATA HDD4 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f604)
Boot0013 Other HDD VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f606)
Boot0014* IDER BOOT CDROM ACPI(a0341d0,0)PCI(16,2)ATAPI(0,1,0)
Boot0015* IDER BOOT Floppy ACPI(a0341d0,0)PCI(16,2)ATAPI(0,0,0)
Boot0016* ATA HDD VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f6)
Boot0017* ATAPI CD: VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a354)
Boot0018* PCI LAN VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,78a84aaf2b2afc4ea79cf5cc8f3d3803)
Boot0019* debian HD(1,800,f3800,042e27b6-2c33-4d0e-8ee4-d579c3e39a1e)File(\EFI\debian\grubx64.efi)

efibootmgr example 3 – add a new boot entry

Create a new boot entry, pointing to a bootloader program on disk /dev/sdb, partition 1; write a new signature to the MBR if needed; call it “debian”; the bootloader program is in \EFI\debian\grubx64.efi

# efibootmgr -c -d /dev/sdb -p 1 -w -L debian -l ‘\EFI\debian\grubx64.efi’
BootCurrent: 0019
Timeout: 0 seconds
BootOrder: 0019,0006,0007,0008,0009,000A,000B,000C,000D,000E,000F,0010,0011,0012,0013
Boot0000 Setup
Boot0001 Boot Menu
Boot0002 Diagnostic Splash Screen
Boot0003 Startup Interrupt Menu
Boot0004 ME Configuration Menu
Boot0005 Rescue and Recovery
Boot0006* USB CD
Boot0007* USB FDD
Boot0008* ATAPI CD0
Boot0009* ATA HDD2
Boot000A* ATA HDD0
Boot000B* ATA HDD1
Boot000C* USB HDD
Boot000D* PCI LAN
Boot000E* ATAPI CD1
Boot000F* ATAPI CD2
Boot0010 Other CD
Boot0011* ATA HDD3
Boot0012* ATA HDD4
Boot0013 Other HDD
Boot0015* IDER BOOT Floppy
Boot0016* ATA HDD
Boot0017* ATAPI CD:
Boot0018* PCI LAN
Boot0019* debian

Quirks, workarounds and special UEFI features in Debian and Debian-Installer

Initial support for UEFI installation was added for amd64 in Wheezy (7.0). This worked for many users, but various users reported issues. Most of these were not directly bugs in Debian’s UEFI support, but nonetheless we have added workarounds to help these people.

Dual-booting systems currently installed using BIOS fallback boot

Quite a number of early UEFI systems were shipped with a non-UEFI installation of Windows 7 pre-installed, and the firmware set up to attempt UEFI boot first and BIOS boot second. This worked fine for users, but the moment a new operating system was installed alongside that copy of Windows, it would be difficult/impossible to dual-boot it.

debian-installer will now warn the user if it is booted in UEFI mode but can find it only non-UEFI existing OS installations. It gives them the option to switch the installer to non-UEFI mode from this point forwards so they don’t break potential dual-boot setup.


Force grub-efi installation to the removable media path

Many UEFI firmware implementations are unfortunately buggy, as mentioned earlier. Despite the specification for boot entries and boot order being quite clear about how things should work, there are lots of systems in the wild which get it wrong. Some systems simply ignore valid requests to add new boot entries. Others will accept those requests, but will refuse to use them unless they describe themselves as “Windows” or similar. There are lots of other similar bugs out there, suggesting that many system vendors have done very little testing beyond “does it work with Windows?”

As described above, on a UEFI system bootloaders should be installed only in the correct vendor-specific directory in the EFI System Partition (ESP). But, because of the buggy firmware implementations out there, operating system distributors cannot necessarily expect that this will work correctly for all systems. Microsoft have worked around this (and arguably also made the problem worse) – the Windows installer also installs to the removable media path in the ESP (e.g. \EFI\boot\bootx64.efi for amd64/X64 systems). All firmware implentations have to use this path to be able to run an OS installer. This means that Windows will always work on all these broken implementations, but it also means that system vendors can get away with shipping broken implementations. It removes the idea of having a fallback boot path and sensible control of boot order.

All OS installers installing things to this removable media path will conflict with any other such installers, which is bad and wrong. That’s why in Debian we don’t do this by default.

However, to help support those unfortunate people who own buggy systems like this, there is an option to force grub-efi installation to the removable media path too. There is a d-i Rescue Mode option to force this – if you’ve just installed Debian on your UEFI system but it won’t boot Debian afterwards, this may fix the problem for you. It can also be selected during the normal installation run using Expert mode, or preseed users can add the following option in their configuration (for amd64, tweak the package name to suit on other architectures):

grub-efi-amd64 grub2/force_efi_extra_removable boolean true


32-bit x86 PC (i386) support for UEFI

In Wheezy (Debian 7.0), i386 UEFI support was intentionally omitted for a variety of reasons. However, since then lots more UEFI-only x86 machines were produced so we enabled it. Since Debian Jessie (8.0), all standard i386 Debian installation media should work for UEFI installation as well as in BIOS mode, just like on amd64.

Support for mixed-mode systems: 64-bit system with 32-bit UEFI

Some systems have been released containing 64-bit Intel Atom CPUs (such as the Bay Trail), but unfortunately use 32-bit UEFI firmware with no BIOS compatibility mode. Using the 32-bit UEFI x86 support, an i386 installation should be possible on these machines but it won’t make the most of the 64-bit hardware.

Debian Jessie (8.0) was the first Linux distribution to include full support for mixed-mode UEFI installation on these machines. The multi-arch installation media (available in netinst and DVD form) include the UEFI boot loaders necessary for both i386 and amd64 boot. By selecting “64-bit install” from the initial boot menu, debian-installer will install a 64-bit (amd64) version of Debian. The system will automatically detect that the underlying UEFI firmware is 32-bit and will install the appropriate version of grub-efi to work with it.

Missing features

Although Debian releases since Wheezy (7.0) have included general UEFI support, there are still some features that have not yet been implemented.

UEFI support in live images

Since the first release of Stretch (9.0), UEFI is now supported on both installation and live images.

In previous releases UEFI support existed only in Debian’s installation images. The accompanying live images did not have support for UEFI boot.

UEFI Secure Boot

Debian supports UEFI Secure Boot for Buster (10.0) onwards for amd64, i386 and arm64. See SecureBoot for more details on how this works. It is supported for all the installation media and live media that we create for these three platforms.

RAID for the EFI System Partition

This is arguably a mis-design in the UEFI specification – the ESP is a single point of failure on one disk. For systems with hardware RAID, that will provide some backup in case of disk failure. But for software RAID systems there is currently no support for putting the ESP on two separate disks in RAID. There might be a way to do something useful with fallback options, but this will need some investigation…


How to tell if you’ve booted via UEFI

The Debian installer splash screen will say it’s the UEFI installer, and will look slightly different to the equivalent screen in BIOS mode. BIOS boot is done via isolinux/syslinux, but UEFI boot is done using grub.

Later on, the thing to look for is the directory /sys/firmware/efi. If that exists, the system is running in UEFI mode.

Diagnosing problems with boot order

efibootmgr is your friend. Run it without parameters to simply list the boot options and boot order on your system, or add -v for more detail of where each boot entry points.

After that, check to see if you have Secure Boot enabled – we didn’t support Secure Boot until version 10.0 (Buster).

If that still doesn’t help, you may have a buggy firmware implementation. Try installing to the removable media path – see above for instructions.


There are lots of other UEFI resources on the internet. Particularly useful ones include: – the UEFI Forum – OSDev wiki page about UEFI – wiki page for Linux distro vendors to share information about systems with broken UEFI implementations


The best place to talk about UEFI support in Debian is the mailing list: or in our irc channel (#debian-efi on

UEFI (last modified 2019-12-27 00:15:46)

Debian privacy policy
MoinMoin PoweredPython PoweredDebian Wiki team, bugs and config available.Hosting provided by Metropolitan Area Network Darmstadt

GKH Threadripper 3970X Setup Notes

GKH Threadripper 3970X Setup Notes

Return to
Level1Techs Forums

GKH Threadripper 3970X Setup Notes
What is this machine?

CPU: AMD Ryzen Threadripper 3970X 32-Core – 61

vlcsnap-2020-05-18-12h43m12s1303840×2160 3.39 MB

Motherboard: MSI Creator TRX40 – 42

vlcsnap-2020-05-18-15h44m20s0233840×2160 3.94 MB

AMD GPU, because open source, obviously: Sapphire Radeon Pulse RX 5600 XT – 47

vlcsnap-2020-05-18-14h04m52s2171920×1080 1.96 MB

Case: be quiet! Pure Base 500DX – 53

vlcsnap-2020-05-18-14h29m34s9863840×2160 6.37 MB

Extra Pure Wings 140mm fan – 20

417btO ZXHL.AC

CPU Cooler: Noctua NH-U9 TR4-SP3 – 34

vlcsnap-2020-05-18-15h40m09s2833840×2160 2 MB

PSU: be quiet! Straight Power 11 Platinum 1000W – 17

vlcsnap-2020-05-18-15h43m46s4463840×2160 3.19 MB

Preferred Memory Kit: 256gb/3600 memory kit – 42
*Note 3600 on 8 sticks is hard to attain! It is very much an overclock on this platform.

vlcsnap-2020-05-18-13h57m00s0531920×1080 1.69 MB

Backup/“Plan B” Memory Kit: 128gb memory kit: – 25

610tS4-dJYL.AC_SL1001980×476 58.5 KB

Storage, because we gotta go fast: Liqid Element LQD3000 – 80

vlcsnap-2020-05-18-12h57m40s9913840×2160 7.69 MB

Rationale Fluff TODO
Arch Linux Setup Notes

This roughly follows the arch wiki installation guide, found here: 13

These steps are “extra” steps you can do early in the install process to setup Linux MD for raid, LVM and (optionally) LUKS v2 Encryption).

It would roughly slot in/replace the “Partition the Disks” section with this – ” A Grand Adventure In Raid And Partitioning The Disks”
Arch Linux Pre-Installation Steps

First, we want to partition the NVMe. See the arch wiki for more info, but in general terms we’re going to need 1) an EFI partition (because we’re booting modern UEFI not legacy) 2) a boot partition (because we’ll be using linux md raid, or lvm, or lvm+luks or luks+lvm, etc) and 3) the data storage partition.


gdisk /dev/nvme0n1

image1522×856 221 KB

I am choosing to change the partition types to Linux Raid here in this setup step to help mdadm auto-detect and auto-assemble the md array.

I have also chosen to have my EFI partition on one NVMe device and my boot partition on another device. They are both 500m. This is an arbitrary (and somewhat larger than necessary) value.

I have also created a 2gb swap partition. This setup assume we do NOT need suspend/resume functionality.

If you DO want suspend-to-disk/resume functionality, I recommend the even-more-complicated setup of a LUKS v1 encrypted /boot partition plus a Luks v2 / root file system. But this setup isn’t covered in this how-to. Perhaps in a future how to!?

In the video, I created the partition layout manually on the first two disks because it is slightly different. If you’d rather you could create 4 partitions – EFI, Boot, root fs, swap – so that each disk is identical.

Here’s a handy shortcut for copying partition layouts from disk to disk:

#copy partition layout from nvme 0 to nvme 2
sgdisk /dev/nvme0n1 -R /dev/nvme2n1

#we copied the random guid, which is not desirable having two disks with the same guid. Let’s randomize again!
sgdisk -G /dev/nvme2n1

You can verify by running gdisk and using p to print the parition info of the nvme devices, and verify they all have different GUID (as they should).
Setting up Swap

It’s pretty easy to setup swap at the pre-install phase. If we want to “upgrade” to encrypted swap (a must if you’re encrypting your root file system!) later.

for the partition layout given above where partition 3 is the Linux Swap type on all 4 NVMe devices, you would do:

mkswap /dev/nvme0n1p3
mkswap /dev/nvme1n1p3
mkswap /dev/nvme2n1p3
mkswap /dev/nvme3n1p3

Then you can turn swap on

swapon /dev/nvme[0-3]n1p3

(the bracket weirdness is just shell expansion because this command will take more than one argument. You can type out /dev/nvmeblahblah 4 times for each nvme if you aren’t ready for that yet)


will confirm swap is operating now:


The arch wiki has some more info about encrypted swap, and the suspend-to-disk option: 2

I mostly wanted to put this here because the Arch Wiki, in at least one place, talks about creating a 4-way raid 1 (???) partition on the md device and using that as swap… but this is extra unnecessary overhead since the swap subsystem on Linux will stripe across devices anyway. If you lose an nvme disk, though, it could crash the system. For desktops this doesn’t matter. For servers I would either make a small swap file on the file system (yes, lots of overhead, but on a server I’d mostly rather out-of-memory than get really deep into swap) OR otherwise not use a nand flash device for swap in a server or serverish context.
Option 1 of 2 – Setting up Linux MD Raid (what we did for GKH)

This option is to use Linux Multi-Disk Admin (mdadm) to setup soft raid in Linux. This is an extremely well-tested and battle-hardened software raid solution that works great with NVMe devices like this.

$ mdadm –create /dev/md0 –chunk=128K –level=0 –raid-devices=4 /dev/nvme0n1p2 /dev/nvme1n1p2 /dev/nvme2n1p2 /dev/nvme3n1p2

mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

It will tell you what md device it created. In our case it was /dev/md0. Sometimes it might be /dev/md127 or some number other than 0. Just be aware :slight_smile:

Then you can cat /proc/mdstat to see if it created the array properly:

$ cat /proc/mdstat
Personalities : [raid0]
md0 : active raid0 nvme3n1p2[3] nvme2n1p2[2] nvme1n1p2[1] nvme0n1p2[0]
1560264704 blocks super 1.2 128k chunks

unused devices:

If it did, we can do the LVM thing on this large array device.

# mark a “physical” volume as it can be used for LVM. If you really want to, you could use gdisk to partition the md0 device so you have md0p1 as partition type LVM.. but.. strictly speaking that isn’t necessary.
pvcreate /dev/md0

# then create a group of “physical” volumes. Only one thing in our group doing it this way, the md array:
vgcreate volgrp1 /dev/md0

# finally create the logical volume using the available space. About 900 gigabytes (gibibytes) in my case. Not sure? use lvdisplay or pvdisplay to give you info about your physical and logical volumes.
lvcreate -L 900GiB -n notazpool volgrp1

Option 2 of 2 – Using LVM for the RAID bits

We have 4 NVMe devices with one big partition each.

If you intend to use LVM without Linux MD proper, then first it is important to make sure you selected the right partition type.
Instead of Linux Raid type, you want Linux LVM type (8e00 instead of fd00) for your partition type.

This is one place where the Arch Wiki isn’t as good of a reference as RedHat’s documentation:

Red Hat Customer Portal
5.4.16. RAID Logical Volumes Red Hat Enterprise Linux 6 | Red Hat…

The Red Hat Customer Portal delivers the knowledge, expertise, and guidance available through your Red Hat subscription.

^ This is a good supplemental read to understand what LVM offers today. If you research too much you might see 5+ year old threads that talk about LVM shortcomings vs mdadm but since 2019/2020 this is mostly not really a thing anymore.

An example:

# mark the physical volumes as usable
pvcreate /dev/nvme0n1p2 /dev/nvme1n1p2 /dev/nvme2n1p2 /dev/nvme3n1p2

# create a volume group in which to put logical volumes
vgcreate volgrp1 /dev/nvme0n1p2 /dev/nvme1n1p2 /dev/nvme2n1p2 /dev/nvme3n1p2

# create one or more raided logical volumes
lvcreate –type raid10 -i 2 -m 1 -L 900GiB –maxrecoveryrate 128 -n notazpool volgrp1

*Note: notazpool is just my cheeky name for this logical volume. haha. And volgrp1 is another arbitrary name. *
Continuing on – What about encryption??

You can skip this step if you don’t want to encrypt your root partition. Skip down to ##No crypto for me please!

The Arch wiki has some good info: 1

You can elect where in the stack to insert encryption. Before LVM (no matter what way you’re using LVM – with or without Linux MD), After LVM, between Linux MD and LVM, etc. There are also “side effects” given that the /boot partition (where the initial ram disk should be) can only be LUKS v1 encrypted and generally cannot be a Linux MD array or another “complex” disk option.

For this guide, I am not going to cover encrypting the /boot partition because your encrypted /boot LUKSv1 partition will likely be setup to contain your LUKSv2 decryption key, if you go that route. Otherwise it’ll ask you for two decryption keys on boot – one for /boot and one for / – which is undesirable. Just be aware that this is an option that exists.

To continue on I’m assuming you’ve assembled your raw disks into one big pool. Either at /dev/volgrp1/notazpool OR /dev/md0 depending on if you went with LVM or Linux MD to handle your needs.

This should work whether you went with a LVM on Linux MD or just plain LVM, above:

# Setup Luks and format, then open the crypto device.
# you’ll be promoted to create, then re-enter the passphrase.
cryptsetup luksFormat /dev/volgrp1/notazpool
cryptsetup open /dev/volgrp1/notazpool root

# then we can make the file system
mkfs.ext4 /dev/mapper/root

# and mount it for the arch installer to use
mount /dev/mapper/root /mnt

It is important to configure the system so that it will boot again with this same config, which is a bit tricky and the Arch documentation seems to be incomplete/wrong.

I had a lot of trouble with this step, actually, because I initially opted to use system-d for the initial ramdisk. In /etc/mkinitcpio.conf I had a setup such as:

HOOKS=(base systemd autodetect keyboard sd-vconsole modconf block sd-lvm2 sd-encrypt filesystems keyboard fsck)

However when booting, this would result in systemd immediately trying to mount /dev/mapper/root without ever prompting for the luks password! I even tried enabling GRUB’s luks support (which, really is only for the v1 boot partition support, not what we’re doing here).

Of course that would fail, and you’d get the omenous “Control+D to continue” root maintenance shell. From the maintenance shell I could do

crypsetup open /dev/volgrp1/notazpool root

then just type exit and arch would boot normally.

What I ended up with in my mkinicpio.conf that does work:

HOOKS=(base udev autodetect keyboard modconf block lvm2 encrypt sd-encrypt filesystems keyboard fsck)

And my /etc/default/grub configuration:

GRUB_CMDLINE_LINUX_DEFAULT=”loglevel=3 cryptdevice=/dev/volgrp1/notazpool:root root=/dev/mapper/root ”

Useful commands to help you achieve this, maybe:

# commands to give you uuids of devices if you want to try the kernel param
lsblk -f

#regen your initial ramdisk
mkinitcpio -p linux

#regen your grub conf after modifying /etc/default/grub
grub-mkconfig -o /boot/grub/grub.cfg

The syntax is supposed to be the sd-encrypt syntax, however without the encrypt package, the cryptsetup binary is not packaged into the initial ramdisk. It does NOT hurt to have both and having both is handy for that maintenance shell, if you ever need it. Otherwise you have some systemd manipulation to do to try to get it to start the crypt service. Systemd may work better with /etc/crypttab for the encrypted boot scenario where you really just unlock /boot with the passphrase, which then contains keys for unlocking one or more other partitions.

NOTE: It would be a good idea to setup encrypted swap in crypttab! There is even a facility for this where, at boot, it uses a random password. Effectively swap is scrambled on every reboot AND it is somewhat lower overhead vs being part of the LVM volumes, which is super nice. Also note that this setup gives a way a bit of info in that you can see how much space is used within LVM. You want to read over the arch wiki that talks about discard and trim. I think you probably want trim/discard, for performance, but that lets an adversary see how much data you’ve actually used on your SSD. For truly high security I would probably do something completely different with this setup. A tangent for another day…

I was kinda surprised the Arch documentation was lacking here, and that sd-encrypt was, apparently, hot garbage. I checked another (older) machine I was doing LUKS experiments on and I do think there has been some kind of regression with Arch + Systemd on the initial ram disk, when using luks inside of LVM because on that other machine w/older packages it worked fine.
No crypto for me please!

Mostly the Arch Wiki still applies: 1

We just need to format the logical volume you created earlier, if it is not to be encrypted, and mount it at /mnt for the Arch installer:

mkfs.ext4 /dev/volgrp1/notazpool
mount /dev/volgrp1/notazpool /mnt

It is safe to proceed with the rest of the arch install from here. HOWEVER, be sure that before you reboot you verify that your mkinitcpio.cfg file contains the modules you’ll need for the LVM, Luks (optional) and Raid stuff to work:

See ## Configure the system on the Arch Linux on LVM link above, and the LUKS link to add the modules.

For example mine is similar to:

MODULES=(dm-raid raid0 raid1 raid10 raid456)
HOOKS=(base systemd … block sd-lvm2 filesystems)

Arch Post-Installation Config
SSD Trim Support in Arch

Need to make sure Trim is setup on the SSDs the way you want, or eventually it’ll slow down (/home partition is not trimmed by default, I guess?) 1

running sudo fstrim / is maybe a good idea from time to time. There is a service you can enable that will do it periodically.

Using Crypto? Make sure your SSD properly supports trim (see Arch docs) then you can append


as well as updating lvm.conf to allow discards:

issue_discards = 1

(it has a section comment about this, read it!)

Packages needed for dev, testing and monitoring:

sudo pacman -Sy chromium base-devle bc time net-tools inetutils hdparm htop lm-sensors

sudo sensors-detect

I also installed cpufreq and freon, since I was using the Gnome desktop environment for this build. 6 3
RAID Performance Notes

Fast. We’re going to go fast.

Performance governor set to on-demand.

Kernel Compile Benchmark:
GitLab 7
Thorsten Leemhuis / kcbench 7

Linux kernel compile benchmark

[w@bigthreads kcbench]$ bash kcbench
Warning: Could not find a default version to compile for your system.
Will use Linux kernel 4.19. Please report this and the output of
‘gcc –version’ to tia!
Downloading source of Linux 4.19: this might take a while… Done.
Processor: AMD Ryzen Threadripper 3970X 32-Core Processor [64 CPUs]
Cpufreq; Memory: acpi-cpufreq, ondemand; 64256 MByte RAM
Linux running: 5.6.4-arch1-1 [x86_64]
Linux compiled: 4.19.0 [/home/w/.cache/kcbench/linux-4.19]
Compiler: gcc (Arch Linux 9.3.0-1) 9.3.0
Make: defconfig vmlinux
Filling caches: This might take a while… Done
Run 1 (-j 64): e:23.47 sec, 42607 points (P:4504% U:935.29sec S:122.17sec)
Run 2 (-j 64): e:23.77 sec, 42069 points (P:4450% U:936.75sec S:121.56sec)
Run 3 (-j 72): e:23.62 sec, 42337 points (P:4518% U:947.75sec S:119.80sec)
Run 4 (-j 72): e:23.72 sec, 42158 points (P:4499% U:946.40sec S:121.26sec)
Run 5 (-j 32): e:27.60 sec, 36231 points (P:2607% U:637.05sec S:82.53sec)
Run 6 (-j 32): e:28.60 sec, 34965 points (P:2547% U:645.40sec S:83.47sec)
Run 7 (-j 38): e:27.04 sec, 36982 points (P:2995% U:717.16sec S:93.01sec)
Run 8 (-j 38): e:26.62 sec, 37565 points (P:3044% U:717.44sec S:93.10sec)

[w@bigthreads kcbench]$ cat /proc/mdstat
Personalities : [raid0]
md0 : active raid0 nvme3n1p2[3] nvme2n1p2[2] nvme0n1p2[0] nvme1n1p2[1]
1560264704 blocks super 1.2 128k chunks

unused devices:

[w@bigthreads kcbench]$ sudo hdparm -t /dev/md0

HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
Timing buffered disk reads: 13876 MB in 3.00 seconds = 4625.23 MB/sec

What about ZFS?

This might seem like this is a bit of a patchwork: Linux MD is doing some things, LVM is doing some things, LUKS is doing some crypto things, etc.

Each of those components has been hardened and has been well-battle-tested over the ages. It is modular.

ZFS, on the other hand, features similar components but also provides even more features (albeit with somewhat more overhead). The ZFS answer for snapshots, encryption and volume management is mostly very clean as compared with the above patchwork, but is perhaps somewhat less flexible in one or two use-cases such as expanding the volume after-the-fact (vs what LVM is capable of).

Also, we’ll get to you. I want to do this same sort of writeup, but a “ZFS on root” sort of version of this.

Apr 25
last reply

unlisted Apr 25
29 days later

listed May 24
This topic will close 9 months after the last reply.
Suggested Topics
ZFS with Zen2: overclock ECC RAM or don’t use ECC?
36 507 3d
Sata 6gbps pcie controller on a sata 3gbps motherboard
2 72 Apr 20
Enterprise server for home lab – HPE or Dell?
9 154 Apr 25
Decent chair that is available in the UK
44 425 6d
Phone 3.5mm output to mic input converter
16 116 28d
Painfully slow FreeNAS Performance via 10GbE
19 240 17d
Adaptec 71605 HBA Card not detected in Linux
0 27 22d
Are there ambient / multi temperature monitoring addon cards?
4 75 6d
Switching peripherials (KVM, manually, other options?)
4 72 4d
Want to read more? Browse other topics in Hardware or view latest topics. Why is This Website Port Scanning me? Why is This Website Port Scanning me?

Blog Logo
Continuous Security, DevOps, and DevSecOps

Home Plugin Tests Author Consult
Why is This Website Port Scanning me?
Posted on 19 May 2020
Recently, I was tipped off about certain sites performing localhost port scans against visitors, presumably as part of a user fingerprinting and tracking or bot detection. This didn’t sit well with me, so I went about investigating the practice, and it seems many sites are port scanning visitors for dubious reasons.

A Brief Port Scanning Primer
Port Scanning is an adversarial technique frequently used by penetration testers and hackers to scan internet facing machines and determine what applications or services are listening on the network, usually so that specific attacks can be carried out. It’s common for security software to detect active port scans and flag it as potential abuse.

Most home routers don’t have any open ports, so scanning an internet users IP address is unlikely to return any meaningful data. However, many users run software on their computer that listens on ports for various reasons – online gaming, media sharing, and remote connections are just a few things that consumers might install on a home PC.

A Port scan can give a website information about what software you are running. Many ports have a well defined set of services that use them, so a list of open ports gives a pretty good view of running applications. For instance, Steam (a gaming store and platform) is known to run on port 27036, so a scanner seeing that port open could have reasonable confidence that the user also had steam open while visiting the web site.

Watching Ebay Port Scan My Computer
In the past I have worked on security products that specifically worried about port scanning from employee web browsers. Attack frameworks like BeEF include port scanning features, which can be used to compromise user machines or other network devices. So, I wanted to be able to alert on any port scanning on machines as a potential compromise, and a site scanning localhost might trip those alerts.

On the other hand, it’s been reported on a few times in the past as banks sometimes port scan visitors, and I have heard Threat Matrix offers this as a customer malware detection check.

I was given the example of ebay as a site that includes port scanning, but when I initially navigated there I didn’t see any suspicious behavior. I thought they might use some heuristics to determine who to scan, so tried a few different browsers and spoofed settings, without any luck.

I thought it might be because I run Linux, so I created a new Windows VM and sure enough, I saw the port scan occurring in the browser tools from the ebay home page:

Ebay port scan
Looking at the list of ports they are scanning, they are looking for VNC services being run on the host, which is the same thing that was reported for bank sites. I marked out the ports and what they are known for (with a few blanks for ones I am unfamiliar with):

5900: VNC
5901: VNC port 2
5902: VNC port 3
5903: VNC port 4
3389: Windows remote desktop / RDP
5931: Ammy Admin remote desktop
5950: WinVNC
6039: X window system
6040: X window system
63333: TrippLite power alert UPS
7070: RealAudio
VNC is sometimes run as part of bot nets or viruses as a way to remotely log into a users computer. There are several malware services that leverage VNC for these purposes. However it is also a valid tool used by administrators for remote access to machines, or by some end user support software, so the presence of VNC is a poor indicator of malware.

Furthermore, when I installed and ran a VNC server, I didn’t detect any difference in site behavior – so why is it looking for it?

How Port Scanning with WebSockets Works
WebSockets are intended to allow a site to create bi-directional communication like traditional network sockets. This allows sites to periodically send information to a client browser without user interaction or front end polling, which is a win for usability.

When a web socket is configured, it specifies a destination host and port, which do not have to be the same domain that the script is served from. To do a port scan, the script only has to specify a private IP address (like localhost) and the port it wishes to scan.

WebSockets only speak HTTP though, so unless the host and port being scanned are a web socket server, the connection won’t succeed. In order to get around this, we can use connection timing to determine whether the port is open or not. Ports that are open take longer in the browser, because there is a TLS negotiation step.

You also might get different error messages. If you have python installed, try running the following to create a local web server running on port 8080:

python3 -m http.server 8080
Now, open your browser developer console (usually options -> Web Developer -> Console) and type some JavaScript in directly. Here is what I see when I do it in chrome:

> var s = new WebSocket(“ws://”)
var s = new WebSocket(“ws://”)
VM1168:1 WebSocket connection to 'ws://' failed: Error in connection establishment: net::ERR_CONNECTION_REFUSED
Between error message introspection and timing attacks, a site can have a pretty good idea of whether a given port is open.

Port Scanning is Malicious
Whether the port scan is used as part of an infection or part of e-commerce or bank "security checks", it is clearly malicious behavior and may fall on the wrong side of the law.

If you observe this behavior, I encourage you to complain to the institution performing the scans, and install extensions that attempt to block this kind of phenomenon in your browser, generally by preventing these types of scripts from loading in the first place.

Charlie Belmer's ImageCharlie Belmer
I live for privacy, security, and online freedom because these are the building blocks of a better society, one I want to help create.

Tagged with: privacy
This comment system is self hosted using the Mozilla Coral talk platform, and not connected with any third parties who might collect data. At any time, you can completely delete all comments and data stored, including your email address. I will never send you emails aside from password reset emails you request.
My Favorite InfoSec Learning Resources
Twitter RSS Subscribe
All content copyright Null Sweep © 2020 • All rights reserved. A NoSQL Injection Primer (with Mongo) A NoSQL Injection Primer (with Mongo)

Blog Logo
Continuous Security, DevOps, and DevSecOps

Home Plugin Tests Author Consult
A NoSQL Injection Primer (with Mongo)
Posted on 06 August 2019
Last year, I interviewed a number of coding bootcamp graduates who were taught the MEAN stack exclusively. When looking at their final projects, all of them had NoSQL injection vulnerabilities present, indicating that perhaps the schools were not teaching secure coding.

When I asked them about this (gently, because it was definitely not their fault, and I made clear was not part of my assessment but for my personal information), every candidate had at least heard of SQL Injection, but were not aware that Mongo (and other data stores) had a separate class of injection attacks known collectively as NoSQL Injection.

This might be because NoSQL Injection hasn’t had as much press as classical SQL Injection, though it should. Although traditional SQL databases still dominate the overall usage statistics, has Mongo listed as the 5th most popular datastore, with several other NoSQL engines in the top ten.

SQL vs NoSQL Market Share in the top 10
Because Mongo currently has the largest footprint, I focus here on Mongo injections. There doesn’t exist a NoSQL language standard, so injections for each vendor differ depending on the query language used and things like client permissions and data structure.

NoSQL Injections with PHP
Most NoSQL injection examples across the web leverage PHP, and I’ll start there as well. PHP has a language feature that allows the end user to change GET querystring inputs into arrays by changing the URL parameters into parameters with array brackets: // normal URL // PHP treats input as an array now
This is important because Mongo uses array syntax to declare verbs. For instance, an insecure PHP app might query like this:

$param = $_GET[‘parameter’];
$query = [ “data” => $param ];
// Example of PHP REGEX search syntax, which we want to inject
// $query = [ ‘data’ => [ ‘$regex’: => ‘\d+’ ]];

$query = new MongoDB\Driver\Query($query, []);
In PHP we can replace parameter with [$regex] and the value with a regular expression to search, and PHP will create a query that looks like [“data” => [‘$regex’: => ‘searchValue’]] which allows an injection. From here, we can do things like make the query always true, or ask questions about the data to extract it.

Of course, regex isn’t the only verb that can be injected – any supported querystring can be leveraged in this way, though regex is often a very useful one to inject for data extraction.

A Vulnerable NodeJS App with Mongo
Other languages don’t have this same feature, so attempting this same injection pattern won’t work, even if the app is coded insecurely. We still look for this kind of injection, though it typically is passed via JSON instead of GET parameters.

To illustrate other methods of injection, I wrote a simple NodeJS application that is vulnerable to two kinds of NoSQL injection. If you want to run this app locally to follow along, you will need docker and docker-compose. Clone and build the images and navigate to http://localhost:4000:

git clone
cd vulnerable-node-app
docker-compose up
One thing many users are surprised about, is that Mongo supports JavaScript evaluation when a JS expression is placed into a $where clause or passed into a mapReduce or group function. So anywhere unfiltered input is passed to one of these clauses, we may find JavaScript injection.

JavaScript Injection Example
The app has a JavaScript injection in the querystring of the user lookup page. The (vulnerable) lookup code looks like this:

let username = req.query.username;
query = { $where: `this.username == ‘${username}’` }
User.find(query, function (err, users) {
if (err) {
// Handle errors
} else {
res.render(‘userlookup’, { title: ‘User Lookup’, users: users });
As you can see, the username search string is pulled directly from the request without any filtering. The query is a where clause which passes in the username string directly. So, if we put valid JavaScript into the querystring, and match quotes correctly, we can have Mongo execute our JavaScript!

In this case, our goal is to find all valid users, so we’d like to pass in something which will always evaluate to true. If we pass in a string like ‘ || ‘a’==’a the query will become $where: `this.username == ” || ‘a’==’a’` which of course always evaluates to true and thus returns all results. With JS injection, there are many other things we might be able to achieve, and boolean checks are just one.

JavaScript Injection
Verb Injection with Node & Mongo
I mentioned previously that changing the querystring to include brackets doesn’t work with Node and Express. This app is using Express, Mongo, and Node, three of the four components of MEAN stack applications, with the last being Angular. Many apps built with Angular communicate with the Node service using JSON objects instead of GET requests.

Mongo queries are mostly just JSON themselves, so we can modify JSON objects in transit to attempt injections. The login page passes a username and password as strings within a JSON object. An expected object looks like this:

And the receiving code in express looks like this:

let query = {
username: req.body.username,
password: req.body.password

User.find(query, function (err, user) {
if (err) {
// handle error
} else {
if (user.length >= 1) {
res.json({role: user[0].role, username: user[0].username, msg: “Correct!” });
Again, the body JSON content is parsed and passed directly to Mongo. We can replace the username or password field with a valid Mongo query JSON object to inject something. This time, instead of using the regex verb, we’ll use the $ne (not-equals) verb to make the password always correct. We modify this by catching the request with a proxy like ZAP or Burp, and modifying the body to the following JSON:

{“username”:”myaccount”,”password”:{“$ne”: 1}}
Now we can login to the myaccount user without knowing the password, and could use regex requests to extract the password with a few iterations.

Preventing NoSQL Injections In Your Code
As with most injection attacks, NoSQL injections can be prevented by using proper filtering techniques. There are a few things I recommend to harden your mongo instance and application code. This will limit or prevent injections in your code, regardless of language or framework.

Don’t use the Mongo where, mapReduce, or group with user supplied data. These are all JavaScript injectable functions. Where clauses can almost always be re-written as normal queries, perhaps using expr instead. See the Mongo documentation.
Use a typed model. Typed models will automatically stop some injections by converting user input to the expected type, such as string or int. Note that in my sample project, I did use a typed model and it did not prevent the attacks shown, so it is of limited use (but better than not using a typed model).
Set javascriptEnabled to false in your mongod.conf, if you can. This will disable JavaScript execution in your instance and remove that class of attacks.
Always strongly validate user supplied data – this will help prevent a lot more than just NoSQL attacks! Use libraries like mongo-sanitize (node) or similar libraries on all user supplied information, including cookie values and data from the browser. If you can’t find a library, then enforce type and escape problematic characters within input like single and double quotes. This is difficult to get right, so only do this if your language and framework don’t offer good sanitization libraries.
This is an introductory post about NoSQL Injections, and serious attackers are likely to use far more advanced attacks than shown here.

Still, I hope that MEAN stack developers and NoSQL users will pay attention to this class of attacks and take steps to limit the impact on their applications.

The tools in this space are also somewhat limited in my opinion, which is why I started work on an open source injection tool, which I will be discussing in a future post when it is a little further along.

Have other things to share about NoSQL injections? I would love to hear in the comments!

Charlie Belmer’s ImageCharlie Belmer
I live for privacy, security, and online freedom because these are the building blocks of a better society, one I want to help create.

Tagged with: Technical Guides, Pentesting
This comment system is self hosted using the Mozilla Coral talk platform, and not connected with any third parties who might collect data. At any time, you can completely delete all comments and data stored, including your email address. I will never send you emails aside from password reset emails you request.
A Pivot Cheatsheet for Pentesters
HTTP Security Headers – A Complete Guide
Twitter RSS Subscribe
All content copyright Null Sweep © 2020 • All rights reserved.

Meet DiffGrad: New Deep Learning Optimizer that solves Adam’s ‘overshoot’ issue

Meet DiffGrad: New Deep Learning Optimizer that solves Adam’s ‘overshoot’ issue

Become a member
Sign in
This is your last free story this month. Sign up and get an extra one for free.
Meet DiffGrad: New Deep Learning Optimizer that solves Adam’s ‘overshoot’ issue
Less Wright
Less Wright
Dec 27, 2019 · 5 min read
Example of short term gradient changes on the way to the global optimum (center). Image from paper.

DiffGrad, a new optimizer introduced in the paper “diffGrad: An optimizer for CNN’s” by Dubey, et al, builds on the proven Adam optimizer by developing an adaptive ‘friction clamp’ and monitoring the local change in gradients in order to automatically lock in optimal parameter values that Adam can skip over.
Comparison of results, 300 epochs (from the paper). Note the esp large improvement for CIFAR 100 vs Adam and SGD with Momentum (red column).

When local gradient changes begin to reduce during training, this is often indicative of the potential presence of a global minima. DiffGrad applies an adaptive clamping effect to lock parameters into global minima, vs momentum only optimizers like Adam which can get close, but often fly right by due to their inability to rapidly decelerate. The result is out-performance vs Adam and SGD with momentum, as shown in the test results above.

Training fast but with some regret: Adam and other ‘adaptive’ optimizers rely on computing an exponential moving average of the gradients, which allows it to take much larger steps (or greater velocity) during training where the gradients are relatively consistent vs the fixed, plodding, steps of SGD.

On the positive, Adam can thus move a lot faster and sooner towards convergence relative to SGD. That’s why Adam is the usual default for most deep learning optimizers as it can get you pretty quickly to a reasonable solution.

However, the downside of this acceleration is the risk of going right over the ideal global minima, or true optimal solution, due to the inherent inability for an exponential moving average to rapidly slow down if needed. The current update for an Adam step is often based only on 10% of the current gradient and 90% from the previous gradients.

This is also where in some cases, SGD, while slow, can end up with a better final result because it will plod along but when it reaches a global minima, not jump out of the global minima (but takes a long time to get there).

Locking in to optimal minima with ‘friction clamping’: By contrast, diffGrad monitors the immediate change of the current gradient versus the previous step, and applies an adaptive ‘friction clamp’ that can rapidly decelerate when the gradient change is small, and thus implies an optimal solution may be close by.
diffGrad’s friction clamp, version 0 — values between +5 and -5 are rapidly de-celerated. Larger values remain untouched and perform at the same speed as regular Adam.

By rapidly decreasing the learning rate adaptively, diffGrad can thus help parameters lock into global minima and reduce the real issue of ramping right over it due to the inability of Adam and similar to decelerate. (note this is one reason learning rates are traditionally decayed over epochs, to help allow for parameters to ‘settle in’ over time).

Adam vs diffGrad on synthetic landscapes: Running diffGrad on three different synthetic functions can show how diffGrad is better able to lock parameters into more optimal results.
Synthetic function test — diffGrad is able to lock in to the global minimum with ideal loss. Adam skips over into a higher local minima due to being unable to decelerate in time. (image from papers, annotations added)
Additional test — diffGrad locks into the global minimum, Adam skips past and ends up in local minimum.

In the above examples you can see multiple loss landscapes and both Adam and diffGrad are run. In both cases, Adam is unable to decelerate in sufficient time and end’s up moving past the optimal solution and settling into a less optimal minima.

These examples thus show diffGrad’s advantage of monitoring the immediate gradient landscape. By being able to apply a friction clamp and rapidly decelerate, overshooting an optimal solution can be avoided.

Thus, parameters become locked in to better weights and thus higher net accuracy for the NN.

Escaping local minima and saddle points: The friction clamping is smooth as shown in the function map above, with the idea that it will allow enough deceleration to stick if it’s an optimal minima, while still preserving enough velocity to escape if it’s only a local minima or a saddle point (equal gradients on each side that fail to provide direction).

Several other papers have addressed proposed solutions to the known overshoot issue, but diffGrad solves it elegantly and robustly.

DiffGrad Variants: Note that the paper also delves into several other variants in terms of how to apply the clamping or friction coefficient. The code provided in the paper’s official github only offers the version 0, which is the version used above.

In testing on FastAI datasets, I found that version 1 (which allows the friction clamp to tighten down further) outperformed version 0 and thus I have added a version flag in my unofficial implementation so you can test both on your datasets.
version flag added in my unofficial implementation. I find version 1 performs better on the datasets I tested.

Tips for use: After testing a flat learning rate, FastAI’s triangular schedule (fit_one_cycle), and the flat+ anneal we used to beat the previous FastAI leaderboard records — so far flat+anneal works the best. (You can use fit_fc() if you are on the latest version of FastAI, or I have added our flattenAnneal function in the diffGrad playground notebook in my repository…links below).

Using diffGrad v1 (the default is v0), I was able to quickly get within 1% of the FastAI record for 20 epochs. Considering the amount of tuning with learning rates done for Ranger vs no tuning done for diffGrad, I’m impressed:
20 epoch run — re-used the Ranger learning rate with diffGrad and got within 1% of the Global leaderboard results, with no other tuning.

Source code links:

1 — Official repository: (PyTorch)

2 — TF Unofficial version:

2 — Unofficial PyTorch diffGrad with v1 option and FastAI usage notebook:

Example usage:

Note that I’ll likely make a short video showing how to use diffGrad as well for those who would like to see a more hand’s on tutorial. (Here it is: )

Summary: DiffGrad provides an innovative solution to a known weakness of adaptive optimizers like Adam, namely their potential risk of accelerating right past optimal minima.

By monitoring the immediate gradient landscape, diffGrad adaptively decelerates the optimizer to help lock in global minima and thus allow for faster, better training for your deep learning networks!

Related: A previous paper also flagged Adam’s inability to stick to optimal points here:

Machine Learning
Artificial Intelligence
Deep Learning
Neural Networks

Less Wright

Written by
Less Wright
FastAI, PyTorch, Deep Learning. Stock Index investing and long term compounding.
See responses (3)
More From Medium
Layman’s introduction to LSTM
satyaprakash pareek in AI Graduate
Building a Crawling Robot With Q Learning
Code Heroku in Code Heroku
How to build a calculator using simple Sequence to Sequence model
GOWTHAM CH in Analytics Vidhya
An Illustrated Explanation of Performing 2D Convolutions Using Matrix Multiplications

Gianni Garko = Giovanni Garcovich

Gianni Garko = Giovanni Garcovich

Gianni Garko (* 15. Juli 1935 in Zara, Dalmatien; eigentlich Giovanni Garcovich, manchmal John Garko) ist ein italienischer Schauspieler. Er spielte häufig in Italo-Western mit.

1 Leben
2 Filmografie (Auswahl)
3 Weblinks
4 Einzelnachweise


Garko wurde in Dalmatien geboren und ging als junger Mann nach Triest, wo er in der Theatergruppe ‘Universe City’ erste Erfahrungen als Schauspieler sammelte. Während des Besuches der Accademia d’Arte Drammatica in Rom knüpfte er Kontakte zur Filmbranche, die ihm dann ab 1958 zu einer Vielzahl an Rollen, von 1966 bis 1973 vornehmlich in Italo-Western, verhalfen. Garko gelang der Durchbruch mit der Rolle des ‘Sartana’ im gleichnamigen Film, die er später, mit anderen Attributen versehen, mehrere Male wiederholte und somit dem üblichen Rächer-Image der meisten Italo-Western-Helden eine neue Variante zufügte: Der sich selbst ironisierende, unfehlbare Scharfschütze. Um dieser Typisierung zu entkommen, drehte Garko auch einige Filme unter dem Pseudonym Gary Hudson.[1] Seit dem Ende der Erfolge dieses Filmgenres sah man Garko auch in vielen anderen Genres, immer mehr in Fernsehfilmen und zuletzt (2002) in einer TV-Soap.

Im Theater hatte Garko 1958 neben Lilla Brignone unter Luchino Visconti in Veglia la mia casa, angelo gespielt. Auch seine Fernsehkarriere begann in diesem Jahr, die er vor allem gegen Ende der 1970er und in den 1980er Jahren intensivierte.[2]

Von 1973 bis 1986 war Garko mit der Filmschauspielerin Susanna Martinková verheiratet.
Filmografie (Auswahl)


1958: Kanonenserenade (Pezzo, Capopezzo e Capitano – Serenata a un Cannone)
1959: Und zu leicht befunden (Morte di un amico)
1959: Tschau, tschau, Bambina (Ciao, ciao bambina)
1960: Kapo (Kapo)
1961: Maciste und die Königin der Nacht (Maciste l’uomo più forte del mondo)
1961: Eines Abends am Strand (Un soir sur la plage)
1961: Raubzüge der Mongolen (I mongoli)
1962: Äneas, Held von Troja (Le leggenda di Enea)
1962: Lockende Unschuld (La voglia matta)
1962: Pontius Pilatus – Statthalter des Grauens (Ponzio Pilato)
1965: Genosse Don Camillo (Il Compagno Don Camillo)
1966: Sartana (Mille dollari sul nero)
1967: Django – der Bastard (Per 100.000 dollari t’ammazzo)
1967: 10.000 blutige Dollar (10.000 dollari per un massacro)
1968: Giorni di sangue
1968: Lucrezia Borgia – die Tochter des Papstes (Lucrezia Borgia, l’amante del diavolo)
1968: Sartana – Bete um Deinen Tod (…Se incontri Sartana prega per la tua morte)
1968: Schweinehunde beten nicht (I vigliacchi non pregano)
1968: Todeskommando Panthersprung (5 per l’inferno)
1969: Sartana – Töten war sein täglich Brot (Sono Sartana, il vostro becchino)
1969: Von allen Hunden des Krieges gehetzt (La porta del cannone)
1970: Ein Bulle sieht rot (Un Condé)
1970: Sartana kommt (Una nuvola di polvere… un grido di morte… arriva Sartana)
1970: Sartana – noch warm und schon Sand drauf (Buon funerale, amigos… paga Sartana)
1970: … und Santana tötet sie alle (Un par de asesinos)
1970: Waterloo (Ватерлоо)
1971: 1000 Dollar Kopfgeld (Il venditore di morte)
1971: Ein Halleluja für Spirito Santo (Uomo avvisato mezzo ammazzato… Parola di Spirito Santo)
1971: Ein Hallelujah für Camposanto (Gli fumavano le colt… lo chiamavano Camposanto!)
1971: Matalo (…y seguian robandose el millon de dolares)
1972: Fünf Himmelhunde auf dem Weg nach Tobruk (Gli eroi)
1973: Der Teufel führt Regie (Il boss)
1973: Vier Teufelskerle (Campa carogna… la taglia cresce)
1977: Drei Schwedinnen in Oberbayern
1977: Die sieben schwarzen Noten (Sette note in nero)
1977: Freude am Fliegen
1978: Star Odyssey (Sette uomini d’oro nello spazio)
1978: Summer Night Fever
1979: Graf Dracula (beisst jetzt) in Oberbayern
1979: Der große Kampf des Syndikats (I contrabbandieri di Santa Lucia)
1979: Unheimliche Begegnung in der Tiefe (Encuentro en el abismo)
1983: Herkules (Ercole)
1983: Amok – Aufruhr in Afrika
1984: Monster Shark (Shark: Rosso nell’oceano)
1986: Black Tunnel (Black tunnel)
1992: Body Puzzle – Mit blutigen Grüßen (Body puzzle)
2001: Bel Ami – Liebling der Frauen (L’uomo che piaceva alle donne – Bel Ami)
2001: Due e mezzo compreso il viaggio (Kurzfilm)


1975: Mondbasis Alpha 1 (Space: 1999)
1975: Marco Visconti
1981: Lapo erzählt (Un eroe del nostro tiempo)
1989: Entscheidung für die Liebe (Quattro storie di donne) (Fernseh-Miniserie)
1993: Alles Glück dieser Erde
1994: Il coraggio di Anna
1995: Alta società
1996: Mein Baby soll leben (A rischio d’amore)
1997: Ich will dich nicht verlieren (Mio padre è innocente)
2001: Bel Ami – Liebling der Frauen (L’uomo che piaceva alle donne – Bel Ami)
2003: Sospetti 2
2005: Sospetti 3
2016: Maggie & Bianca Fashion Friends


Gianni Garko in der Internet Movie Database (englisch)
Gianni Garko in der Spaghetti-Western-Database

Interview, in: Christian Keßler: Willkommen in der Hölle. 2002, ISBN 3-00-009290-0
Roberto Poppi, Artikel Gianni Garko, in: Roberto Chiti, Enrico Lancia, Andrea Orbicciani, Roberto Poppi: Dizionario del cinema italiano. Gli attori. Rom, Gremese 1998. S. 221/222

George Hilton = Jorge Hill Acosta y Lara

George Hilton = Jorge Hill Acosta y Lara

WLE Austria Logo (no text).svg
Fotowettbewerb Wiki Loves Earth 2020: Fotografiere in der Natur und unterstütze Wikipedia.
George Hilton
Zur Navigation springen
Zur Suche springen

George Hilton (* 16. Juli 1934 in Montevideo, Uruguay; eigentlich Jorge Hill Acosta y Lara; † 28. Juli 2019[1] in Rom[2]) war ein Schauspieler, der häufig in Italowestern spielte.

1 Leben
2 Filmografie (Auswahl)
3 Weblinks
4 Einzelnachweise


Hilton wurde in Uruguay geboren und begann seine Karriere beim Radio. Entgegen anderen Vermutungen war er bis zu seiner Zeit beim Film nie in Europa.

1955 zog er nach Argentinien und wirkte dort unter dem Pseudonym Jorge Hilton in der Folgezeit in einheimischen Bildgeschichten (Fotoromanen) und Filmen mit, unter anderem auch bei Fernando Ayala, Francis Lauric und Vlasta Lah. Ende des Jahres 1963 kam er nach Italien und bekam die Hauptrolle in dem Piratenfilm L’uomo mascherato contro i pirati (1964). 1965 war er in der Komödie 2 Trottel gegen Goldfinger (Due mafiosi contro Goldginger) von Giorgio Simonelli mit Franco Franchi und Ciccio Ingrassia als Agent 007 zu sehen.

Sein Einstieg in das Genre des Italowestern war 1966 der Film Django – Sein Gesangbuch war der Colt (Tempo di massacro) von dem Regisseur Lucio Fulci an der Seite von Franco Nero. Viele weitere Auftritte folgten, so zum Beispiel 1967 als Kitosch in Der Mann, der aus dem Norden kam und Zeit der Geier, Das Gold von Sam Cooper (1968). Weitere bekannte Rollen sind die der comichaften Revolverhelden „Hallelujah“ und „Tresette“, die von Giuliano Carnimeo entwickelt wurden.

Ende der 1960er war er auch in anderen Genres zu finden, Dramen (Il dolce corpo di Deborah 1968) und Kriegsfilme (Königstiger vor El Alamein und Heiß über Afrikas Erde 1969) entstehen. Als diese Welle zu Ende ging, trat er häufig in Gialli auf (oft als Partner von Edwige Fenech). Ende der 1970er war er in Krimis Gewalt über der Stadt (1977) und Komödien Taxi Girl (1977) zu sehen, später auch im Science-Fiction-Abenteuer Atlantis Inferno (1983) von Deodato, aber in den achtziger Jahren wurden seine Auftritte immer seltener, so in dem Horrorfilm Dinner with the Vampire (1988) und der Fernsehserie College (1989).

Hilton lebte in Rom und arbeitete zuletzt meist für das italienische Fernsehen.
Filmografie (Auswahl)

1956: Los tallos amargos
1965: Allein gegen die Freibeuter (L’uomo mascherato contro i pirati)
1965: Zwei Trottel gegen Goldfinger (Due mafiosi contro Goldginger)
1966: Django – Sein Gesangbuch war der Colt (Tempo di massacro)
1966: I due figli di Ringo
1966: Der Mann, der aus dem Norden kam (Frontera al sur)
1967: Ein Halleluja für Django (La più grande rapina del West)
1967: Leg ihn um, Django (Vado… l’ammazzo e torno)
1967: Poker mit Pistolen (Un poker di pistole)
1967: Der schöne Körper der Deborah (Il dolce corpo di Deborah)
1967: Ein Stoßgebet für drei Kanonen (Professionisti per un massacro)
1967: Die Zeit der Geier (Il tempo degli avvoltoi)
1968: Django – Ein Sarg voll Blut (Il momento di uccidere)
1968: Django – Melodie in Blei (Uno di più all’inferno)
1968: Django, wo steht Dein Sarg? (T’ammazzo!… Raccomandati a Dio)
1968: Das Gold von Sam Cooper (Ognuno per se)
1968: Königstiger vor El Alamein (La battaglie di El Alamein)
1969: Heiß über Afrikas Erde (La battaglia del deserto)
1969: Die Leoparden kommen (Il dito nella piaga)
1969: Um sie war der Hauch des Todes (Quei disperati che puzzano di sudore e di morte)
1970: Django und Sabata – wie blutige Geier (C’è Sartana… vendi la pistola e comprati la bara)
1970: Der Killer von Wien (Lo strano vizio della signora Wardh)
1971: Die Diamantenlady (Il diavolo a sette facce)
1971: Man nennt mich Halleluja (Testa t’ammazzo, croce… sei morto! Mi chiamano Alleluja)
1971: Der Schwanz des Skorpions (La coda dello scorpione)
1972: Beichtet Freunde, Halleluja kommt (Il West ti va stretto, amico… è arrivato Alleluja)
1972: Die Farben der Nacht (Tutti i colori nel buio)
1972: Das Geheimnis der blutigen Lilie (Perché quelle strane gocce di sangue sul corpo di Jennifer?)
1973: Kennst Du das Land, wo blaue Bohnen blüh’n? (Lo chiamavano Tresette… giocava sempre col morto)
1973: Sieben Stunden der Gewalt (Sette ore di violenza per una soluzione imprevista)
1973: Wenn Engel ihre Fäuste schwingen (Fuori uno sotto un altro… arriva il Passatore)
1974: Dicke Luft in Sacramento (Di Tresette ce n’è uno, tutti gli altri son nessuno)
1974: Zwei tolle Hechte – Wir sind die Größten (Prima ti suono e poi ti spara)
1975: Ah sì? E io lo dico a Zzzzorro!
1977: El Macho
1977: Gewalt über der Stadt (Torino violenta)
1978: Heroin (Milano… difendersi o morire)
1983: Atlantis Inferno (I predatori di Atlatide)
1984: Spiegelei und Coca Cola (College)
1988: Dinner with the Vampire (A cena col vampiro)
1990: Der Erfolg ihres Lebens (Mademoiselle Ardel)
1993: Noch mehr Sonnenöl und süße Früchtchen (Abbronzatissimi 2 – un anno dopo)
2007: Natale in crociera
2009: Un coccodrillo per amico


George Hilton in der Internet Movie Database (englisch)
Filmographie (englisch)
Biografie bei mymovies (italienisch)

Morto George Hilton, l’attore aveva 85 anni e fu protagonista di tanti spaghetti western e film del cinema di genere italiano. abgerufen am 29. Juli 2019

Addio a George Hilton, icona dello spaghetti western abgerufen am 30. Juli 2019

Normdaten (Person): GND: 13944839X | LCCN: no2003030503 | VIAF: 78463352 | Wikipedia-Personensuche

FilmschauspielerUruguayerGeboren 1934Gestorben 2019Mann

The mathematics of optimization for deep learning

The mathematics of optimization for deep learning

Open in app
Get started
Towards Data Science
You have 1 free story left this month. Sign up and get an extra one for free.
The mathematics of optimization for deep learning
A brief guide about how to minimize a function with millions of variables
Tivadar Danka
Tivadar Danka
Feb 18 · 16 min read

In general, the overall performance of a neural network depends on several factors. The one usually taking the spotlight is the network architecture, however, this is only one among many important components. An often overlooked contributor to a performant algorithm is the optimizer, which is used to fit the model.

Just to illustrate the complexity of optimizing, a ResNet18 architecture has 11689512 parameters. Finding an optimal parameter configuration is locating a point in the 11689512 dimensional space. If we were to brute force this, we might decide to divide this space up to a grid, say we select 10 points along each dimension. Then we have to check 10¹¹⁶⁸⁹⁵¹² possible configurations, calculate the loss function for each of them and find the one with minimal loss. To put this number in perspective, there observable universe has about 10⁸³ atoms and it is estimated to be 4.32 x 10¹⁷ seconds (~13.7 billion years) old. If we check as many parameter configuration in each second as the number of atoms starting from the Big Bang, we would have been able to check 4.32 x 10¹⁴¹¹ points until now.

To say that this is not even close is an understatement. The size of the grid is still approximately 10⁸²⁸⁴ times larger than we could check if we would have every atom in the universe check a configuration since the Big Bang.

So, optimizers are pretty important. They are managing this incomprehensible complexity, allowing you to train the neural networks in days instead of billions of years. In the following, we are going to take a deep dive into the mathematics of optimizers and see how are they able to handle this seemingly impossible task.
The basics of optimization

Let’s start simple and suppose that we have a function of one variable which we would like to maximize. (In machine learning context, we generally aim to minimize the loss function, but minimizing is the same as maximizing the negative of the function.) Define

which looks like the following if we plot its graph.

An obvious method to optimize would be to divide up the line into a grid, check the value of every point and select the one where the function is maximized. As we have seen in the introduction, this is not scalable, so we are going to look for another solution. Let’s imagine that this is a mountain landscape and we are climbers, trying to reach the peak. Suppose that we are at the location marked with a red dot.

If we want to find the peak, which direction should we go? Of course, we should go where the slope is increasing. This concept is formalized by the derivative of a function. Mathematically, the derivative is defined by

Although this quantity seems mysterious for the first glance, it has a very simple geometric meaning. Let’s look at the function more closely around the point where we take the derivative.

For any x and y, the line passing through f(x) and f(y) is defined by the equation

In general, if we have any line defined by at + b for some a and b, the quantity a is called the slope of the line. This can be negative and positive as well, lines with positive slope go upward, while negative ones go downward. Higher values in absolute value mean steeper lines. If we let y closer and closer to x as it is in the definition of derivative, we see that the line becomes the tangent of the function graph at x.
Tangent and approximating lines for f(x) at x = -2.0

The tangent is given by the function

and its direction can be described with the vector (1, f’(x)).

If we imagine ourselves again into the position of a mountain climber starting from x0 = -2.0 , we should go in the direction where the tangent is rising. If the slope of the tangent is large, we would also like to take a large step, while if the slope is close to zero, we should take a smaller step to make sure we don’t go over the peak. To formalize this mathematically, we should go to the next point defined by

where λ is a parameter, setting how large is the step should be in the right direction. This is called the learning rate. In general, subsequent steps are be defined by

Positive derivative means that the tangent is increasing thus we want to go forward, while negative derivative is a decreasing tangent, so we want to turn back. We can visualize this process.

As we can see, this simple algorithm successfully found a peak. However, this is not the global maximum of the function, which can be seen by looking at the image. To get a little ahead of ourselves, this is a potential issue for a wide family of optimizing algorithms, but there are solutions for it.

In this simple case, we have only maximized a function of a single variable. This is useful to illustrate the concept, however, in real-life scenarios, millions of variables can be present. For neural networks, this is definitely the case. In the next part, we are going to see how this simple algorithm can be generalized for optimizing multidimensional functions!
Optimizing in multiple dimensions

For a function of a single variable, we could think about the derivative as the slope of the tangent line. However, for multiple variables, this is not the case. Let’s try to build intuition first by looking at a concrete example! Define the function

which will be our toy example in this section.
Plot of f(x, y)

For functions of two variables, the graph is a surface. We immediately see that the concept of tangent line is not well defined, since we have many lines which are tangent to a given point in the surface. In fact, we have a whole plane of them. This is called the tangent plane.
Tangent plane for f(x, y) at (0, 0)

However, this tangent plane contains two very special directions. Suppose that we are looking at the tangent plane at (0, 0). For every multivariable function, fixing all but one variables is basically a function of single variable. In our case, we would have


These functions can be visualized by slicing the surface with a vertical plane perpendicular to one of the axes. Where the plane and the surface meet is the graph of f(x, 0) or f(0, y), depending on which plane you use.
Slicing the surface with a vertical plane to visualize f(0, x)

For these functions we can define the derivatives as we have did in the previous section. These are called partial derivatives and they play an essential role in generalizing our peak finding algorithm from before. To formalize it mathematically, they are defined by

Each partial derivative represents a direction in our tangent plane.
Visualizing the direction of partial derivatives on the tangent plane.

The value of partial derivatives are the slopes of the special tangent lines. The direction of steepest ascent is given by the gradient, which is defined by

Note that the gradient is a direction in the parameter space. The gradients can be visualized in the two dimensional plane easily, which looks like the following in our case.
Gradients for f(x, y)

To summarize, the peak finding algorithm is now

which is called gradient ascent. If we want to find the minimum of a function, we would take a step in the direction of the negative gradient, which is the direction of steepest descent:

This version is called gradient descent and you have probably seen this one more frequently, since in machine learning, we actually want to minimize the loss.
Why does the gradient point to the steepest ascent?

In this setting, it is not trivial why the gradient gives us the direction of the steepest ascent. To give a precise explanation, we need to do some mathematics. Besides slicing the surface with vertical planes perpendicular to the x or y axis, we can slice it with a vertical plane given by any direction (a, b). With the partial derivatives, we had

We can think about these as derivatives of f(x, y) along the directions (1, 0) and (0, 1). Although these directions are of special significance, we can do this for any direction. Say we have the direction

then the directional derivative with respect to this direction is defined by

Note that the last identity is nothing else than the dot product (also called scalar or inner product) of the direction vector and the gradient, the same dot product which you have probably encountered in your high school geometry classes. So,

The question is the following: which direction maximizes the directional derivative? This would be the direction of the steepest ascent, so if we want to optimize, we want to know this particular direction. To see that this is noting else than the gradient itself as we have mentioned, recall that the dot product can be written as

where |.| denotes the length of a vector and α is the angle between the two vectors. (This is true in arbitrary number of dimensions, not just in two dimensions.) It is easy to see that this expression is maximized when cos α = 1, that is, α is zero. This means that the two vectors are parallel, thus the direction of e must be the same as the gradient.
Training neural networks

Now we are ready to move from theory to practice and see how can we train neural networks. Suppose that our task is to classify images n dimensional feature vectors into c classes. To mathematically formalize our situation, our neural network is represented by the function f, mapping the n-dimensional feature space to the c-dimensional space:

The neural network itself is a parametrized function. For notational convenience, we can denote its parameters with a single m-dimensional vector

To explicitly express dependence on parameters, it is customary to write

Training a neural network is equivalent of finding the minimum of the loss function

mapping the space of neural network parameters to real numbers. The loss function takes the form


is the i-th data point with observation

and L is the termwise loss function. For instance, if J is the cross-entropy loss, then


This might seem innocent enough, but it can be really difficult to compute. In real life, the number of data points N can be in the millions, not to say the number of parameters m. So, we have a sum with millions of terms, for which we need to calculate millions of derivatives to minimize. How can we solve this problem in practice?
Stochastic gradient descent

To use gradient descent, we have to calculate

which is computationally very intensive if N is large, and N is hopefully very large (because we want lots of data). Can we simplify this? One way to do this is to omit some members of the sum. Although this may sound like an ad-hoc solution, it has solid theoretical foundations. To see this, notice that J can be actually written as an expected value:


is the (empirical) probability distribution given by our training data. We can treat the sequence

as independent, identically distributed random variables. According to the Law of Large Numbers,

holds, where

is the true underlying distribution. (Which is unknown.) To elaborate more, this means that as we increase our training data, our loss function converges to the true loss. As a consequence, if we subsample our data and only calculate the gradient

for some i instead of all, we still obtain a reasonable estimate if we compute enough. This is called stochastic gradient descent or SGD in short.

In my opinion, there were three fundamental developments which have enabled researchers and data scientists to effectively train deep neural networks: utilizing GPU-s as a general purpose computing tool, backpropagation, and finally stochastic gradient descent. Safe to say that without SGD, wide adoption of deep learning would not have been possible.

As with almost every new approach, SGD also introduces a whole new can of worms. The obvious question is, how large should our subsample size be? Too small size might result in a noisy gradient estimation, while too large has diminishing returns. Selecting the subsample also needs to happen with care. For example if all the subsamples belong to one class, the estimate will probably be off by a mile. However, these issues can be solved in practice by experimentation and proper randomization of the data.
Improving gradient descent

Gradient descent (with the SGD variant as well) suffers from several issues which can make them ineffective under some circumstances. For instance, as we have seen, learning rate controls the step size we will take in the direction of the gradient. Generally, we can make two mistakes regarding this parameter. First, we can make the step too large so the loss fails to converge and might even diverge. Second, if the step is too small, we might never arrive to a local minimum, because we go too slow. To demonstrate this issue, let’s take a look at a simple example and study the f(x) = x + sin x function.

Suppose that we start the gradient descent from x0 = 2.5, with learning rates α = 1, α = 0.1 and α = 0.01.

It might not be obvious what is happening here, so let’s plot the x-s for each learning rate.

For α = 1, the sequence is practically oscillating between two points, failing to converge to the local minimum, while for α = 0.01, the convergence seems to be very slow. In our concrete case, α = 0.1 seems just right. How do you determine this in a general setting? There main idea here is that the learning rate does not necessarily have to be constant. Heuristically, if the magnitude of the gradient itself is large, we should reduce the learning rate to avoid jumping too far. On the other hand, if the magnitude is small, it probably means that we are getting close to a local optimum, so to avoid overshooting, the learning rate definitely shouldn’t be increased. Algorithms changing the learning rate dynamically are called adaptive.

One of the most popular examples of such an adaptive algorithm is AdaGrad. It cumulatively stores gradient magnitude and scales the learning rate with respect to that. AdaGrad defines an accumulation variable r0 = 0 and updates it with the rule


denotes the componentwise product of two vectors. This is then used to scale the learning rate:

where δ is a small number for numerical stability and the square root is taken componentwise. First, when the gradient is large, the accumulation variable grows rather fast thus decreasing the learning rate. When the parameter is near a local minimum, gradients get smaller so the learning rate decrease practically stops.

Of course, AdaGrad is one possible solution to this problem. More and more advanced optimization algorithms are available every year, solving a wide range of issues related to gradient descent. However, even with the most advanced methods, experimenting with the learning rate and tuning it is very beneficial.

Regarding issues with gradient descent, another is for instance to make sure that we find a global optimum or a local optimum close to it in value. As you can see in the previous example, gradient descent often gets stuck in a bad local optimum. To get a good picture about the solution for this and the other issues, I recommend reading through Chapter 8 of the Deep Learning textbook by Ian Goodfellow, Yoshua Bengio and Aaron Courville.
How does the loss function for deep neural networks look like?

In our examples during the previous sections, we have only visualized very simple toy examples like f(x) = 25 sin x — x². There is a reason for this: plotting a function is not straightforward for more than two variables. Since our inherent limitations, we are only able to see and think in at most three dimensions. However, to get a grip on the difficulty of how can the loss function of a neural network look like, we can employ several tricks. One excellent paper about this is Visualizing the Loss Landscape of Neural Nets by Hao Li et al., who were able to visualize the loss function by essentially choosing two random directions and plotting the two-variable function

(To avoid distortions by scale invariance, they also introduced some normalizing factors for the random directions.) Their investigations revealed how skip connections in ResNet architectures shape the loss landscape, making it easier to optimize.
Source: Visualizing the Loss Landscape of Neural Nets by Hao Li et al.

Regardless of the significant improvement made by skip connections, my point with this was to demonstrate that highly multidimensional optimization is hard. By looking at the first part of the figure, we see that there are many local minima, sharp peaks, plateaus, and so on. Good architecture design can make the job of optimizers easier, but with thoughtful optimization practices we can tackle more complicated loss landscapes. These go hand in hand.

In the previous sections, we have learned the intuition behind gradients and defined them in a mathematically precise way. We seen that for any differentiable function, no matter the number of variables, the gradient always points towards the steepest ascent, which is the foundation of the gradient descent algorithm. Although it is conceptually very simple, it has significant computational difficulties when applied to functions with millions of variables. This problem is alleviated by stochastic gradient descent, however there are many more issues: getting stuck in a local optimum, selecting the learning rate, etc. Because of these, optimization is hard and requires attention from both researchers and practitioners. In fact, there is a very active community out there making it constantly better, with amazing results! After understanding the mathematical foundations of optimization for deep learning, now you are on the right path to improve the state of the art! Some great papers to get you started:

Visualizing the Loss Landscape of Neural Nets by Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer and Tom Goldstein
Adam: A Method for Stochastic Optimization by Diederik P. Kingma and Jimmy Ba
Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM by Qianqian Tong, Guannan Liang and Jinbo Bi
On the Variance of the Adaptive Learning Rate and Beyond by Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao and Jiawei Han

Machine Learning
Deep Learning
Data Science
Artificial Intelligence

Tivadar Danka

Written by
Tivadar Danka
I use deep learning to show the invisible and discover the unknown. Follow me @TivadarDanka to catch my latest thoughts!
Towards Data Science
Towards Data Science
A Medium publication sharing concepts, ideas, and codes.
More From Medium
Sorry, Online Courses Won’t Make you a Data Scientist
Ramshankar Yadhunath in Towards Data Science
Why Data Science might just not be worth it
Dario Radečić in Towards Data Science
Coding Mistakes I Made As A Junior Developer
Chris in Towards Data Science
Bye-bye Python. Hello Julia!
Rhea Moutafis in Towards Data Science
Do Not Use “+” to Join Strings in Python
Christopher Tao in Towards Data Science
Don’t Become a Data Scientist
Chris in Towards Data Science
Why are data scientists doing DevOps?
Caleb Kaiser in Towards Data Science
One in two Pythonistas should learn Golang now
Rhea Moutafis in Towards Data Science
Get the Medium app
A button that says ‘Download on the App Store’, and if clicked it will lead you to the iOS App store
A button that says ‘Get it on, Google Play’, and if clicked it will lead you to the Google Play store