Difference between revisions of "NFS/Troubleshooting"

From ArchWiki
Jump to: navigation, search
(Client-side Issues)
("Permission denied" when trying to write files as root)
 
(34 intermediate revisions by 27 users not shown)
Line 1: Line 1:
[[Category:Networking]]
+
[[Category:Network sharing]]
 +
[[ar:NFS]]
 +
[[de:Network File System]]
 +
[[es:NFS]]
 +
[[fr:NFS]]
 
[[it:NFSv4]]
 
[[it:NFSv4]]
 +
[[ja:NFS/トラブルシューティング]]
 
[[zh-CN:NFS]]
 
[[zh-CN:NFS]]
{{Article summary start}}
+
{{Related articles start}}
{{Article summary text|Dedicated article for common problems and solutions.}}
+
{{Related|NFS}}
{{Article summary heading|Related}}
+
{{Related articles end}}
{{Article summary wiki|NFS}} - Main NFS article.
+
{{Article summary end}}
+
  
= Server-side Issues =
+
Dedicated article for common problems and solutions.
 +
 
 +
== Server-side issues ==
 +
 
 +
=== exportfs: /etc/exports:2: syntax error: bad option list ===
  
==exportfs: /etc/exports:2: syntax error: bad option list==
 
 
Delete all space from the option list in {{ic|/etc/exports}}
 
Delete all space from the option list in {{ic|/etc/exports}}
  
== Group/gid permissions issues ==
+
=== Group/GID permissions issues ===
If NFS shares mount fine, and are fully accessible to the owner, but not to group members; check the number of groups that user belongs to. NFS has a limit of 16 on the number of groups a user can belong to. If you have users with more then this, you need to enable the {{ic|--manage-gids}} start-up flag for {{ic|rpc.mountd}} on the NFS server.
+
 
 +
If NFS shares mount fine, and are fully accessible to the owner, but not to group members; check the number of groups that user belongs to. NFS has a limit of 16 on the number of groups a user can belong to. If you have users with more than this, you need to enable the {{ic|--manage-gids}} start-up flag for {{ic|rpc.mountd}} on the NFS server.
  
 
{{hc|/etc/conf.d/nfs-server.conf|2=
 
{{hc|/etc/conf.d/nfs-server.conf|2=
Line 25: Line 32:
 
}}
 
}}
  
= Client-side Issues =
+
=== "Permission denied" when trying to write files as root ===
 +
 
 +
* If you need to mount shares as root, and have full r/w access from the client, add the no_root_squash option to the export in {{ic|/etc/exports}}:
 +
/var/cache/pacman/pkg 192.168.1.0/24(rw,no_subtree_check,no_root_squash)
 +
 
 +
* You must also add no_root_squash to the first line in {{ic|/etc/exports}}:
 +
/ 192.168.1.0/24(rw,fsid=root,no_root_squash,no_subtree_check)
 +
 
 +
=== "RPC: Program not registered" when showmount -e command issued ===
 +
 
 +
Make sure that {{ic|nfs-server.service}} and {{ic|rpcbind.service}} are running on the server site, see [[systemd]]. If they are not, start and enable them.
 +
 
 +
== Client-side issues ==
 +
 
 +
=== mount.nfs4: No such device ===
  
==mount.nfs4: No such device==
 
 
Check that you have loaded the {{ic|nfs}} module
 
Check that you have loaded the {{ic|nfs}} module
 
  lsmod | grep nfs
 
  lsmod | grep nfs
 
and if previous returns empty or only nfsd-stuff, do
 
and if previous returns empty or only nfsd-stuff, do
  modprobe nfs
+
  # modprobe nfs
 +
 
 +
=== mount.nfs4: access denied by server while mounting ===
 +
 
 +
NFS shares have to reside in /srv - check your {{ic|/etc/exports}} file and if necessary create the proper folder structure as described in the [[NFS#File system]] page.
  
==mount.nfs4: access denied by server while mounting==
 
 
Check that the permissions on your client's folder are correct. Try using 755.
 
Check that the permissions on your client's folder are correct. Try using 755.
  
==Unable to connect from OS X clients==
+
or try "exportfs -rav" reload {{ic|/etc/exports}} file.
 +
 
 +
=== Unable to connect from OS X clients ===
 +
 
 
When trying to connect from a OS X client, you will see that everything is ok at logs, but MacOS X refuses to mount your NFS share. You have to add {{ic|insecure}} option to your share and re-run {{ic|exportfs -r}}.
 
When trying to connect from a OS X client, you will see that everything is ok at logs, but MacOS X refuses to mount your NFS share. You have to add {{ic|insecure}} option to your share and re-run {{ic|exportfs -r}}.
  
==Unreliable connection from OS X clients==
+
=== Unreliable connection from OS X clients ===
 +
 
 
OS X's NFS client is optimized for OS X Servers and might present some issues with Linux servers. If you are experiencing slow performance, frequent disconnects and problems with international characters edit the default mount options by adding the line {{ic|<nowiki>nfs.client.mount.options = intr,locallocks,nfc</nowiki>}} to {{ic|/etc/nfs.conf}} on your Mac client. More information about the mount options can be found [https://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man8/mount_nfs.8.html#//apple_ref/doc/man/8/mount_nfs here].
 
OS X's NFS client is optimized for OS X Servers and might present some issues with Linux servers. If you are experiencing slow performance, frequent disconnects and problems with international characters edit the default mount options by adding the line {{ic|<nowiki>nfs.client.mount.options = intr,locallocks,nfc</nowiki>}} to {{ic|/etc/nfs.conf}} on your Mac client. More information about the mount options can be found [https://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man8/mount_nfs.8.html#//apple_ref/doc/man/8/mount_nfs here].
  
== Lock problems ==
+
=== Intermittent client freezes when copying large files ===
If you got error such as this:
+
mount.nfs: rpc.statd is not running but is required for remote locking.
+
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
+
mount.nfs: an incorrect mount option was specified
+
  
To fix this, you need to change the "NEED_STATD" value in
+
If you copy large files from your client machine to the NFS server, the transfer speed is ''very'' fast, but after some seconds the speed drops and your client machine intermittently locks up completely for some time until the transfer is finished.
{{ic|/etc/conf.d/nfs-common.conf}} to {{ic|YES}}.
+
  
Remember to start all the required services (see [[NFS]] or [[NFSv3]]), not just
+
Try adding <tt>sync</tt> as a mount option on the client (e.g. in <tt>/etc/fstab</tt>) to fix this problem.
the '''nfs''' service.
+
  
= Performance Issues =
+
=== mount.nfs: Operation not permitted ===
 +
 
 +
After updating to ''nfs-utils'' 1.2.1-2 or higher, mounting NFS shares stopped working. Henceforth, ''nfs-utils'' uses NFSv4 per default instead of NFSv3. The problem can be solved by using either mount option {{ic|1='vers=3'}} or {{ic|1='nfsvers=3'}} on the command line:
 +
# mount.nfs ''remote target'' ''directory'' -o ...,vers=3,...
 +
# mount.nfs ''remote target'' ''directory'' -o ...,nfsvers=3,...
 +
or in {{ic|/etc/fstab}}:
 +
''remote target'' ''directory'' nfs ...,vers=3,... 0 0
 +
''remote target'' ''directory'' nfs ...,nfsvers=3,... 0 0
 +
 
 +
=== mount.nfs: Protocol not supported ===
 +
 
 +
Check you are not mounting including the export root.
 +
Use:
 +
# mount SERVER:/ /mnt
 +
instead of, i.e.:
 +
# mount SERVER:/srv/nfs4/ /mnt
 +
 
 +
Sometimes it could be the same problem with "Operation not permitted" : ''nfs-utils'' uses NFSv4 per default instead of NFSv3. Go and see the previons section
 +
 
 +
=== Problems with Vagrant and synced_folders ===
 +
 
 +
If Vagrant scripts are unable to mount folders over NFS, installing the ''net-tools'' package may solve the issue.
 +
 
 +
== Performance issues ==
  
 
This [http://nfs.sourceforge.net/nfs-howto/ar01s05.html NFS Howto page] has some useful information regarding performance.  Here are some further tips:
 
This [http://nfs.sourceforge.net/nfs-howto/ar01s05.html NFS Howto page] has some useful information regarding performance.  Here are some further tips:
  
== Diagnose the problem ==
+
=== Diagnose the problem ===
  
 
* '''Htop''' should be your first port of call.  The most obvious symptom will be a maxed-out CPU.
 
* '''Htop''' should be your first port of call.  The most obvious symptom will be a maxed-out CPU.
 
* Press F2, and under "Display options", enable "Detailed CPU time".  Press F1 for an explanation of the colours used in the CPU bars.  In particular, is the CPU spending most of its time responding to IRQs, or in Wait-IO (wio)?
 
* Press F2, and under "Display options", enable "Detailed CPU time".  Press F1 for an explanation of the colours used in the CPU bars.  In particular, is the CPU spending most of its time responding to IRQs, or in Wait-IO (wio)?
  
==Server Threads==
+
=== Server threads ===
 +
 
 
'''Symptoms:''' Nothing seems to be very heavily loaded, but some operations on the client take a long time to complete for no apparent reason.
 
'''Symptoms:''' Nothing seems to be very heavily loaded, but some operations on the client take a long time to complete for no apparent reason.
  
 
If your workload involves lots of small reads and writes (or if there are a lot of clients), there may not be enough threads running on the server to handle the quantity of queries.  To check if this is the case, run the following command on one or more of the clients:
 
If your workload involves lots of small reads and writes (or if there are a lot of clients), there may not be enough threads running on the server to handle the quantity of queries.  To check if this is the case, run the following command on one or more of the clients:
  
{{bc|
+
{{hc|# nfsstat -rc|
# nfsstat -rc
+
 
Client rpc stats:
 
Client rpc stats:
 
calls      retrans    authrefrsh
 
calls      retrans    authrefrsh
Line 77: Line 121:
 
If the {{ic|retrans}} column contains a number larger than 0, the server is failing to respond to some NFS requests, and the number of threads should be increased.
 
If the {{ic|retrans}} column contains a number larger than 0, the server is failing to respond to some NFS requests, and the number of threads should be increased.
  
To increase the number of threads on the server, edit the file {{ic|/etc/conf.d/nfs-server.conf}} and change the value of the {{ic|NFSD_COUNT}} variable. The default number of threads is 8.  Try doubling this number until {{ic|retrans}} remains consistently at zero.  Don't be afraid of increasing the number quite substantially.  256 threads may be quite reasonable, depending on the workload.  You will need to restart the NFS server daemon each time you modify the configuration file.  Bear in mind that the client statistics will only be reset to zero when the client is rebooted.
+
To increase the number of threads on the server, edit the file {{ic|/etc/conf.d/nfs-server.conf}} and set the value in the {{ic|NFSD_OPTS}} variable.  
 +
For example, to set the number of threads to 32:
 +
{{hc|/etc/conf.d/nfs-server.conf|
 +
NFSD_OPTS&#61;"32"
 +
}}
 +
 
 +
The default number of threads is 8.  Try doubling this number until {{ic|retrans}} remains consistently at zero.  Don't be afraid of increasing the number quite substantially.  256 threads may be quite reasonable, depending on the workload.  You will need to restart the NFS server daemon each time you modify the configuration file.  Bear in mind that the client statistics will only be reset to zero when the client is rebooted.
  
 
Use '''htop''' (disable the hiding of kernel threads) to keep an eye on how much work each nfsd thread is doing.  If you reach a point where the {{ic|retrans}} values are non-zero, but you can see {{ic|nfsd}} threads on the server doing no work, something different is now causing your bottleneck, and you'll need to re-diagnose this new problem.
 
Use '''htop''' (disable the hiding of kernel threads) to keep an eye on how much work each nfsd thread is doing.  If you reach a point where the {{ic|retrans}} values are non-zero, but you can see {{ic|nfsd}} threads on the server doing no work, something different is now causing your bottleneck, and you'll need to re-diagnose this new problem.
  
== Close-to-Open / Flush-on-Close ==
+
=== Close-to-open/flush-on-close ===
 +
 
 
'''Symptoms:''' Your clients are writing many small files.  The server CPU is not maxed out, but there is very high wait-IO, and the server disk seems to be churning more than you might expect.
 
'''Symptoms:''' Your clients are writing many small files.  The server CPU is not maxed out, but there is very high wait-IO, and the server disk seems to be churning more than you might expect.
  
Line 88: Line 139:
 
See [http://docstore.mik.ua/orelly/networking_2ndEd/nfs/ch07_04.htm this excellent article] or the '''nfs''' manpage for more details on the close-to-open policy.  There are several approaches to solving this problem:
 
See [http://docstore.mik.ua/orelly/networking_2ndEd/nfs/ch07_04.htm this excellent article] or the '''nfs''' manpage for more details on the close-to-open policy.  There are several approaches to solving this problem:
  
=== The nocto mount option ===
+
==== The nocto mount option ====
  
{{Note|The linux kernel doesn't seem to honour this option properly. Files are still flushed when they're closed.}}
+
{{Note|The linux kernel does not seem to honour this option properly. Files are still flushed when they're closed.}}
  
 
Does your situation match these conditions?
 
Does your situation match these conditions?
Line 100: Line 151:
 
If you're happy with the above conditions, you can use the '''nocto''' mount option, which will disable the close-to-open behaviour.  See the '''nfs''' manpage for details.
 
If you're happy with the above conditions, you can use the '''nocto''' mount option, which will disable the close-to-open behaviour.  See the '''nfs''' manpage for details.
  
=== The async export option ===
+
==== The async export option ====
  
 
Does your situation match these conditions?
 
Does your situation match these conditions?
Line 111: Line 162:
 
** When the server is restarted, the clients will believe their recent files exist, even though they were actually lost.
 
** When the server is restarted, the clients will believe their recent files exist, even though they were actually lost.
  
In this situation, you can use '''async''' instead of '''sync''' in the server's {{ic|/etc/exports}} file for those specific exports.  See the '''exports''' manpage for details.  In this case, it does not make sense to use the '''nocto''' mount option on the client.
+
In this situation, you can use {{ic|async}} instead of {{ic|sync}} in the server's {{ic|/etc/exports}} file for those specific exports.  See the '''exports''' manual page for details.  In this case, it does not make sense to use the {{ic|nocto}} mount option on the client.
 +
 
 +
=== Buffer cache size and MTU ===
  
== Buffer Cache Size and MTU ==
 
 
'''Symptoms:''' High kernel or IRQ CPU usage, a very high packet count through the network card.
 
'''Symptoms:''' High kernel or IRQ CPU usage, a very high packet count through the network card.
  
This is a trickier optimisation. Make sure this is definitely the problem before spending too much time on this. The default values are usually fine for most situations.
+
This is a trickier optimisation. Make sure this is definitely the problem before spending too much time on this. The default values are usually fine for most situations.
  
See [http://docstore.mik.ua/orelly/networking_2ndEd/nfs/ch07_03.htm this excellent article] for information about I/O buffering in NFS. Essentially, data is accumulated into buffers before being sent. The size of the buffer will affect the way data is transmitted over the network. The Maximum Transmission Unit (MTU) of the network equipment will also affect throughput, as the buffers need to be split into MTU-sized chunks before they're sent over the network. If your buffer size is too big, the kernel or hardware may spend too much time splitting it into MTU-sized chunks. If the buffer size is too small, there will be overhead involved in sending a very large number of small packets. You can use the '''rsize''' and '''wsize''' mount options on the client to alter the buffer cache size. To achieve the best throughput, you need to experiment and discover the best values for your setup.
+
See [http://docstore.mik.ua/orelly/networking_2ndEd/nfs/ch07_03.htm this excellent article] for information about I/O buffering in NFS. Essentially, data is accumulated into buffers before being sent. The size of the buffer will affect the way data is transmitted over the network. The Maximum Transmission Unit (MTU) of the network equipment will also affect throughput, as the buffers need to be split into MTU-sized chunks before they're sent over the network. If your buffer size is too big, the kernel or hardware may spend too much time splitting it into MTU-sized chunks. If the buffer size is too small, there will be overhead involved in sending a very large number of small packets. You can use the '''rsize''' and '''wsize''' mount options on the client to alter the buffer cache size. To achieve the best throughput, you need to experiment and discover the best values for your setup.
  
It is possible to change the MTU of many network cards. If your clients are on a separate subnet (e.g. for a Beowulf cluster), it may be safe to configure all of the network cards to use a high MTU. This should be done in very-high-bandwidth environments.
+
It is possible to change the MTU of many network cards. If your clients are on a separate subnet (e.g. for a Beowulf cluster), it may be safe to configure all of the network cards to use a high MTU. This should be done in very-high-bandwidth environments.
  
See also the '''nfs''' manpage for more about '''rsize''' and '''wsize'''.
+
See also the '''nfs''' manual page for more about '''rsize''' and '''wsize'''.
 +
 
 +
 
 +
== Debugging ==
 +
 
 +
=== Using rpcdebug ===
 +
 
 +
Using {{ic|rpcdebug}} is the easiest way to manipulate the kernel interfaces in place of echoing bitmasks to /proc.
 +
 
 +
{| class="wikitable"
 +
|-
 +
! Option !! Description
 +
|-
 +
| -c || Clear the given debug flags
 +
|-
 +
| -s || Set the given debug flags
 +
|-
 +
| -m ''module'' || Specify which module's flags to set or clear.
 +
|-
 +
| -v || Increase the verbosity of rpcdebug's output
 +
|-
 +
| -h || Print a help message and exit. When combined with the -v option, also prints the available debug flags.
 +
|}
 +
 
 +
For the '''-m''' option, the available modules are:
 +
 
 +
{| class="wikitable"
 +
|-
 +
! Module !! Description
 +
|-
 +
| nfsd || The NFS server
 +
|-
 +
| nfs || The NFS client
 +
|-
 +
| nlm || The Network Lock Manager, in either an NFS client or server
 +
|-
 +
| rpc || The Remote Procedure Call module, in either an NFS client or server
 +
|}
 +
 
 +
Examples:
 +
{{bc|
 +
rpcdebug -m rpc -s all    # sets all debug flags for RPC
 +
rpcdebug -m rpc -c all    # clears all debug flags for RPC
 +
 
 +
rpcdebug -m nfsd -s all  # sets all debug flags for NFS Server
 +
rpcdebug -m nfsd -c all  # clears all debug flags for NFS Server
 +
}}
 +
 
 +
Once the flags are set you can tail the journal for the debug output, usually {{ic|journalctl -fl}} or similar.
 +
 
 +
=== Kernel Interfaces ===
 +
 
 +
A bitmask of the debug flags can be echoed into the interface to enable output to syslog; 0 is the default:
 +
 
 +
{{bc|
 +
/proc/sys/sunrpc/nfsd_debug
 +
/proc/sys/sunrpc/nfs_debug
 +
/proc/sys/sunrpc/nlm_debug
 +
/proc/sys/sunrpc/rpc_debug
 +
}}
 +
 
 +
Sysctl controls are registered for these interfaces, so they can be used instead of echo:
 +
 
 +
{{bc|1=
 +
sysctl -w sunrpc.rpc_debug=1023
 +
sysctl -w sunrpc.rpc_debug=0
 +
 
 +
sysctl -w sunrpc.nfsd_debug=1023
 +
sysctl -w sunrpc.nfsd_debug=0
 +
}}
 +
 
 +
At runtime the server holds information that can be examined:
 +
 
 +
{{bc|
 +
grep . /proc/net/rpc/*/content
 +
cat /proc/fs/nfs/exports
 +
cat /proc/net/rpc/nfsd
 +
ls -l /proc/fs/nfsd
 +
}}
 +
 
 +
A rundown of {{ic|/proc/net/rpc/nfsd}} (the userspace tool {{ic|nfsstat}} pretty-prints this info):
 +
 
 +
{{bc|1=
 +
* rc (reply cache): <hits> <misses> <nocache>
 +
- hits: client it's retransmitting
 +
- misses: a operation that requires caching
 +
- nocache: a operation that no requires caching
 +
 
 +
* fh (filehandle): <stale> <total-lookups> <anonlookups> <dir-not-in-cache> <nodir-not-in-cache>
 +
- stale: file handle errors
 +
- total-lookups, anonlookups, dir-not-in-cache, nodir-not-in-cache
 +
  . always seem to be zeros
 +
 
 +
* io (input/output): <bytes-read> <bytes-written>
 +
- bytes-read: bytes read directly from disk
 +
- bytes-written: bytes written to disk
 +
 
 +
* th (threads): <threads> <fullcnt> <10%-20%> <20%-30%> ... <90%-100%> <100%>
 +
- threads: number of nfsd threads
 +
- fullcnt: number of times that the last 10% of threads are busy
 +
- 10%-20%, 20%-30% ... 90%-100%: 10 numbers representing 10-20%, 20-30% to 100%
 +
  . Counts the number of times a given interval are busy
 +
 
 +
* ra (read-ahead): <cache-size> <10%> <20%> ... <100%> <not-found>
 +
- cache-size: always the double of number threads
 +
- 10%, 20% ... 100%: how deep it found what was looking for
 +
- not-found: not found in the read-ahead cache
 +
 
 +
* net: <netcnt> <netudpcnt> <nettcpcnt> <nettcpconn>
 +
- netcnt: counts every read
 +
- netudpcnt: counts every UDP packet it receives
 +
- nettcpcnt: counts every time it receives data from a TCP connection
 +
- nettcpconn: count every TCP connection it receives
 +
 
 +
* rpc: <rpccnt> <rpcbadfmt+rpcbadauth+rpcbadclnt> <rpcbadfmt> <rpcbadauth> <rpcbadclnt>
 +
- rpccnt: counts all rpc operations
 +
- rpcbadfmt: counts if while processing a RPC it encounters the following errors:
 +
  . err_bad_dir, err_bad_rpc, err_bad_prog, err_bad_vers, err_bad_proc, err_bad
 +
- rpcbadauth: bad authentication
 +
  . does not count if you try to mount from a machine that it's not in your exports file
 +
- rpcbadclnt: unused
 +
 
 +
* procN (N = vers): <vs_nproc> <null> <getattr> <setattr> <lookup> <access> <readlink> <read> <write> <create> <mkdir> <symlink> <mknod> <remove> <rmdir> <rename> <link> <readdir> <readdirplus> <fsstat> <fsinfo> <pathconf> <commit>
 +
- vs_nproc: number of procedures for NFS version
 +
  . v2: nfsproc.c, 18
 +
  . v3: nfs3proc.c, 22
 +
  - v4, nfs4proc.c, 2
 +
- statistics: generated from NFS operations at runtime
 +
 
 +
* proc4ops: <ops> <x..y>
 +
- ops: the definition of LAST_NFS4_OP, OP_RELEASE_LOCKOWNER = 39, plus 1 (so 40); defined in nfs4.h
 +
- x..y: the array of nfs_opcount up to LAST_NFS4_OP (nfsdstats.nfs4_opcount[i])
 +
}}
 +
 
 +
=== NFSD debug flags ===
 +
{{hc|/usr/include/linux/nfsd/debug.h|2=
 +
/*
 +
* knfsd debug flags
 +
*/
 +
#define NFSDDBG_SOCK            0x0001
 +
#define NFSDDBG_FH              0x0002
 +
#define NFSDDBG_EXPORT          0x0004
 +
#define NFSDDBG_SVC            0x0008
 +
#define NFSDDBG_PROC            0x0010
 +
#define NFSDDBG_FILEOP          0x0020
 +
#define NFSDDBG_AUTH            0x0040
 +
#define NFSDDBG_REPCACHE        0x0080
 +
#define NFSDDBG_XDR            0x0100
 +
#define NFSDDBG_LOCKD          0x0200
 +
#define NFSDDBG_ALL            0x7FFF
 +
#define NFSDDBG_NOCHANGE        0xFFFF
 +
}}
 +
 
 +
=== NFS debug flags ===
 +
{{hc|/usr/include/linux/nfs_fs.h|2=
 +
/*
 +
* NFS debug flags
 +
*/
 +
#define NFSDBG_VFS              0x0001
 +
#define NFSDBG_DIRCACHE        0x0002
 +
#define NFSDBG_LOOKUPCACHE      0x0004
 +
#define NFSDBG_PAGECACHE        0x0008
 +
#define NFSDBG_PROC            0x0010
 +
#define NFSDBG_XDR              0x0020
 +
#define NFSDBG_FILE            0x0040
 +
#define NFSDBG_ROOT            0x0080
 +
#define NFSDBG_CALLBACK        0x0100
 +
#define NFSDBG_CLIENT          0x0200
 +
#define NFSDBG_MOUNT            0x0400
 +
#define NFSDBG_FSCACHE          0x0800
 +
#define NFSDBG_PNFS            0x1000
 +
#define NFSDBG_PNFS_LD          0x2000
 +
#define NFSDBG_STATE            0x4000
 +
#define NFSDBG_ALL              0xFFFF
 +
}}
 +
 
 +
=== NLM debug flags ===
 +
{{hc|/usr/include/linux/lockd/debug.h|2=
 +
/*
 +
* Debug flags
 +
*/
 +
#define NLMDBG_SVC 0x0001
 +
#define NLMDBG_CLIENT 0x0002
 +
#define NLMDBG_CLNTLOCK 0x0004
 +
#define NLMDBG_SVCLOCK 0x0008
 +
#define NLMDBG_MONITOR 0x0010
 +
#define NLMDBG_CLNTSUBS 0x0020
 +
#define NLMDBG_SVCSUBS 0x0040
 +
#define NLMDBG_HOSTCACHE 0x0080
 +
#define NLMDBG_XDR 0x0100
 +
#define NLMDBG_ALL 0x7fff
 +
}}
 +
 
 +
=== RPC debug flags ===
 +
{{hc|/usr/include/linux/sunrpc/debug.h|2=
 +
/*
 +
* RPC debug facilities
 +
*/
 +
#define RPCDBG_XPRT            0x0001
 +
#define RPCDBG_CALL            0x0002
 +
#define RPCDBG_DEBUG            0x0004
 +
#define RPCDBG_NFS              0x0008
 +
#define RPCDBG_AUTH            0x0010
 +
#define RPCDBG_BIND            0x0020
 +
#define RPCDBG_SCHED            0x0040
 +
#define RPCDBG_TRANS            0x0080
 +
#define RPCDBG_SVCXPRT          0x0100
 +
#define RPCDBG_SVCDSP          0x0200
 +
#define RPCDBG_MISC            0x0400
 +
#define RPCDBG_CACHE            0x0800
 +
#define RPCDBG_ALL              0x7fff
 +
}}
 +
 
 +
=== General Notes ===
 +
* While the number of threads can be increased at runtime via an echo to {{ic|/proc/fs/nfsd/threads}}, the cache size (double the threads, see the '''ra''' line of /proc/net/rpc/nfsd) is not dynamic. The NFS daemon must be restarted with the new thread size during initialization in order for the thread cache to properly adjust.
 +
 
 +
=== References ===
 +
* [https://github.com/torvalds/linux/tree/master/include/linux https://github.com/torvalds/linux/tree/master/include/linux]
 +
* [http://linux.die.net/man/8/rpcdebug http://linux.die.net/man/8/rpcdebug]
 +
* [http://utcc.utoronto.ca/~cks/space/blog/linux/NFSClientDebuggingBits http://utcc.utoronto.ca/~cks/space/blog/linux/NFSClientDebuggingBits]
 +
* [http://www.novell.com/support/kb/doc.php?id=7011571 http://www.novell.com/support/kb/doc.php?id=7011571]
 +
* [http://stromberg.dnsalias.org/~strombrg/NFS-troubleshooting-2.html http://stromberg.dnsalias.org/~strombrg/NFS-troubleshooting-2.html]
 +
* [http://www.opensubscriber.com/message/nfs@lists.sourceforge.net/7833588.html http://www.opensubscriber.com/message/nfs@lists.sourceforge.net/7833588.html]
 +
 
 +
== Other issues ==
 +
 
 +
=== Permissions issues ===
  
= Other Issues =
 
== Permissions Issues ==
 
 
If you find that you cannot set the permissions on files properly, make sure the user/group you are chowning are on both the client and server.
 
If you find that you cannot set the permissions on files properly, make sure the user/group you are chowning are on both the client and server.
  
 
If all your files are owned by {{ic|nobody}}, and you are using NFSv4, on both the client and server, you should:
 
If all your files are owned by {{ic|nobody}}, and you are using NFSv4, on both the client and server, you should:
 +
* For systemd, ensure that the {{ic|nfs-idmapd}} service has been started.
 
* For initscripts, ensure that {{ic|NEED_IDMAPD}} is set to {{ic|YES}} in {{ic|/etc/conf.d/nfs-common.conf}}.
 
* For initscripts, ensure that {{ic|NEED_IDMAPD}} is set to {{ic|YES}} in {{ic|/etc/conf.d/nfs-common.conf}}.
* For systemd, ensure that the rpc-idmapd service has been started.
+
 
 +
On some systems detecting the domain from FQDN minus hostname does not seem to work reliably. If files are still showing as {{ic|nobody}} after the above changes, edit /etc/idmapd.conf, ensure that {{ic|Domain}} is set to {{ic|FQDN minus hostname}}. For example:
 +
 
 +
{{hc|/etc/idmapd.conf|2=
 +
[General]
 +
 
 +
Verbosity = 7
 +
Pipefs-Directory = /var/lib/nfs/rpc_pipefs
 +
Domain = yourdomain.local
 +
 
 +
[Mapping]
 +
 
 +
Nobody-User = nobody
 +
Nobody-Group = nobody
 +
 
 +
[Translation]
 +
 
 +
Method = nsswitch
 +
}}
 +
 
 +
If nfs-idmapd.service refuses to start because it cannot open the Pipefs-directory (defined in /etc/idmapd.conf and appended with '/nfs'), issue a mkdir-command and restart the daemon.

Latest revision as of 17:51, 18 April 2016

Related articles

Dedicated article for common problems and solutions.

Server-side issues

exportfs: /etc/exports:2: syntax error: bad option list

Delete all space from the option list in /etc/exports

Group/GID permissions issues

If NFS shares mount fine, and are fully accessible to the owner, but not to group members; check the number of groups that user belongs to. NFS has a limit of 16 on the number of groups a user can belong to. If you have users with more than this, you need to enable the --manage-gids start-up flag for rpc.mountd on the NFS server.

/etc/conf.d/nfs-server.conf
# Options for rpc.mountd.
# If you have a port-based firewall, you might want to set up
# a fixed port here using the --port option.
# See rpc.mountd(8) for more details.

MOUNTD_OPTS="--manage-gids"

"Permission denied" when trying to write files as root

  • If you need to mount shares as root, and have full r/w access from the client, add the no_root_squash option to the export in /etc/exports:
/var/cache/pacman/pkg 192.168.1.0/24(rw,no_subtree_check,no_root_squash)
  • You must also add no_root_squash to the first line in /etc/exports:
/ 192.168.1.0/24(rw,fsid=root,no_root_squash,no_subtree_check)

"RPC: Program not registered" when showmount -e command issued

Make sure that nfs-server.service and rpcbind.service are running on the server site, see systemd. If they are not, start and enable them.

Client-side issues

mount.nfs4: No such device

Check that you have loaded the nfs module

lsmod | grep nfs

and if previous returns empty or only nfsd-stuff, do

# modprobe nfs

mount.nfs4: access denied by server while mounting

NFS shares have to reside in /srv - check your /etc/exports file and if necessary create the proper folder structure as described in the NFS#File system page.

Check that the permissions on your client's folder are correct. Try using 755.

or try "exportfs -rav" reload /etc/exports file.

Unable to connect from OS X clients

When trying to connect from a OS X client, you will see that everything is ok at logs, but MacOS X refuses to mount your NFS share. You have to add insecure option to your share and re-run exportfs -r.

Unreliable connection from OS X clients

OS X's NFS client is optimized for OS X Servers and might present some issues with Linux servers. If you are experiencing slow performance, frequent disconnects and problems with international characters edit the default mount options by adding the line nfs.client.mount.options = intr,locallocks,nfc to /etc/nfs.conf on your Mac client. More information about the mount options can be found here.

Intermittent client freezes when copying large files

If you copy large files from your client machine to the NFS server, the transfer speed is very fast, but after some seconds the speed drops and your client machine intermittently locks up completely for some time until the transfer is finished.

Try adding sync as a mount option on the client (e.g. in /etc/fstab) to fix this problem.

mount.nfs: Operation not permitted

After updating to nfs-utils 1.2.1-2 or higher, mounting NFS shares stopped working. Henceforth, nfs-utils uses NFSv4 per default instead of NFSv3. The problem can be solved by using either mount option 'vers=3' or 'nfsvers=3' on the command line:

# mount.nfs remote target directory -o ...,vers=3,...
# mount.nfs remote target directory -o ...,nfsvers=3,...

or in /etc/fstab:

remote target directory nfs ...,vers=3,... 0 0
remote target directory nfs ...,nfsvers=3,... 0 0

mount.nfs: Protocol not supported

Check you are not mounting including the export root. Use:

# mount SERVER:/ /mnt

instead of, i.e.:

# mount SERVER:/srv/nfs4/ /mnt

Sometimes it could be the same problem with "Operation not permitted" : nfs-utils uses NFSv4 per default instead of NFSv3. Go and see the previons section

Problems with Vagrant and synced_folders

If Vagrant scripts are unable to mount folders over NFS, installing the net-tools package may solve the issue.

Performance issues

This NFS Howto page has some useful information regarding performance. Here are some further tips:

Diagnose the problem

  • Htop should be your first port of call. The most obvious symptom will be a maxed-out CPU.
  • Press F2, and under "Display options", enable "Detailed CPU time". Press F1 for an explanation of the colours used in the CPU bars. In particular, is the CPU spending most of its time responding to IRQs, or in Wait-IO (wio)?

Server threads

Symptoms: Nothing seems to be very heavily loaded, but some operations on the client take a long time to complete for no apparent reason.

If your workload involves lots of small reads and writes (or if there are a lot of clients), there may not be enough threads running on the server to handle the quantity of queries. To check if this is the case, run the following command on one or more of the clients:

# nfsstat -rc
Client rpc stats:
calls      retrans    authrefrsh
113482     0          113484

If the retrans column contains a number larger than 0, the server is failing to respond to some NFS requests, and the number of threads should be increased.

To increase the number of threads on the server, edit the file /etc/conf.d/nfs-server.conf and set the value in the NFSD_OPTS variable. For example, to set the number of threads to 32:

/etc/conf.d/nfs-server.conf
NFSD_OPTS="32"

The default number of threads is 8. Try doubling this number until retrans remains consistently at zero. Don't be afraid of increasing the number quite substantially. 256 threads may be quite reasonable, depending on the workload. You will need to restart the NFS server daemon each time you modify the configuration file. Bear in mind that the client statistics will only be reset to zero when the client is rebooted.

Use htop (disable the hiding of kernel threads) to keep an eye on how much work each nfsd thread is doing. If you reach a point where the retrans values are non-zero, but you can see nfsd threads on the server doing no work, something different is now causing your bottleneck, and you'll need to re-diagnose this new problem.

Close-to-open/flush-on-close

Symptoms: Your clients are writing many small files. The server CPU is not maxed out, but there is very high wait-IO, and the server disk seems to be churning more than you might expect.

In order to ensure data consistency across clients, the NFS protocol requires that the client's cache is flushed (all data is pushed to the server) whenever a file is closed after writing. Because the server is not allowed to buffer disk writes (if it crashes, the client won't realise the data wasn't written properly), the data is written to disk immediately before the client's request is completed. When you're writing lots of small files from the client, this means that the server spends most of its time waiting for small files to be written to its disk, which can cause a significant reduction in throughput.

See this excellent article or the nfs manpage for more details on the close-to-open policy. There are several approaches to solving this problem:

The nocto mount option

Note: The linux kernel does not seem to honour this option properly. Files are still flushed when they're closed.

Does your situation match these conditions?

  • The export you have mounted on the client is only going to be used by the one client.
  • It doesn't matter too much if a file written on one client doesn't immediately appear on other clients.
  • It doesn't matter if after a client has written a file, and the client thinks the file has been saved, and then the client crashes, the file may be lost.

If you're happy with the above conditions, you can use the nocto mount option, which will disable the close-to-open behaviour. See the nfs manpage for details.

The async export option

Does your situation match these conditions?

  • It's important that when a file is closed after writing on one client, it is:
    • Immediately visible on all the other clients.
    • Safely stored on the server, even if the client crashes immediately after closing the file.
  • It's not important to you that if the server crashes:
    • You may loose the files that were most recently written by clients.
    • When the server is restarted, the clients will believe their recent files exist, even though they were actually lost.

In this situation, you can use async instead of sync in the server's /etc/exports file for those specific exports. See the exports manual page for details. In this case, it does not make sense to use the nocto mount option on the client.

Buffer cache size and MTU

Symptoms: High kernel or IRQ CPU usage, a very high packet count through the network card.

This is a trickier optimisation. Make sure this is definitely the problem before spending too much time on this. The default values are usually fine for most situations.

See this excellent article for information about I/O buffering in NFS. Essentially, data is accumulated into buffers before being sent. The size of the buffer will affect the way data is transmitted over the network. The Maximum Transmission Unit (MTU) of the network equipment will also affect throughput, as the buffers need to be split into MTU-sized chunks before they're sent over the network. If your buffer size is too big, the kernel or hardware may spend too much time splitting it into MTU-sized chunks. If the buffer size is too small, there will be overhead involved in sending a very large number of small packets. You can use the rsize and wsize mount options on the client to alter the buffer cache size. To achieve the best throughput, you need to experiment and discover the best values for your setup.

It is possible to change the MTU of many network cards. If your clients are on a separate subnet (e.g. for a Beowulf cluster), it may be safe to configure all of the network cards to use a high MTU. This should be done in very-high-bandwidth environments.

See also the nfs manual page for more about rsize and wsize.


Debugging

Using rpcdebug

Using rpcdebug is the easiest way to manipulate the kernel interfaces in place of echoing bitmasks to /proc.

Option Description
-c Clear the given debug flags
-s Set the given debug flags
-m module Specify which module's flags to set or clear.
-v Increase the verbosity of rpcdebug's output
-h Print a help message and exit. When combined with the -v option, also prints the available debug flags.

For the -m option, the available modules are:

Module Description
nfsd The NFS server
nfs The NFS client
nlm The Network Lock Manager, in either an NFS client or server
rpc The Remote Procedure Call module, in either an NFS client or server

Examples:

rpcdebug -m rpc -s all    # sets all debug flags for RPC
rpcdebug -m rpc -c all    # clears all debug flags for RPC

rpcdebug -m nfsd -s all   # sets all debug flags for NFS Server
rpcdebug -m nfsd -c all   # clears all debug flags for NFS Server

Once the flags are set you can tail the journal for the debug output, usually journalctl -fl or similar.

Kernel Interfaces

A bitmask of the debug flags can be echoed into the interface to enable output to syslog; 0 is the default:

/proc/sys/sunrpc/nfsd_debug
/proc/sys/sunrpc/nfs_debug
/proc/sys/sunrpc/nlm_debug
/proc/sys/sunrpc/rpc_debug

Sysctl controls are registered for these interfaces, so they can be used instead of echo:

sysctl -w sunrpc.rpc_debug=1023
sysctl -w sunrpc.rpc_debug=0

sysctl -w sunrpc.nfsd_debug=1023
sysctl -w sunrpc.nfsd_debug=0

At runtime the server holds information that can be examined:

grep . /proc/net/rpc/*/content
cat /proc/fs/nfs/exports
cat /proc/net/rpc/nfsd
ls -l /proc/fs/nfsd

A rundown of /proc/net/rpc/nfsd (the userspace tool nfsstat pretty-prints this info):

* rc (reply cache): <hits> <misses> <nocache>
- hits: client it's retransmitting
- misses: a operation that requires caching
- nocache: a operation that no requires caching

* fh (filehandle): <stale> <total-lookups> <anonlookups> <dir-not-in-cache> <nodir-not-in-cache>
- stale: file handle errors
- total-lookups, anonlookups, dir-not-in-cache, nodir-not-in-cache
  . always seem to be zeros

* io (input/output): <bytes-read> <bytes-written>
- bytes-read: bytes read directly from disk
- bytes-written: bytes written to disk

* th (threads): <threads> <fullcnt> <10%-20%> <20%-30%> ... <90%-100%> <100%>
- threads: number of nfsd threads
- fullcnt: number of times that the last 10% of threads are busy
- 10%-20%, 20%-30% ... 90%-100%: 10 numbers representing 10-20%, 20-30% to 100%
  . Counts the number of times a given interval are busy

* ra (read-ahead): <cache-size> <10%> <20%> ... <100%> <not-found>
- cache-size: always the double of number threads
- 10%, 20% ... 100%: how deep it found what was looking for
- not-found: not found in the read-ahead cache

* net: <netcnt> <netudpcnt> <nettcpcnt> <nettcpconn>
- netcnt: counts every read
- netudpcnt: counts every UDP packet it receives
- nettcpcnt: counts every time it receives data from a TCP connection
- nettcpconn: count every TCP connection it receives

* rpc: <rpccnt> <rpcbadfmt+rpcbadauth+rpcbadclnt> <rpcbadfmt> <rpcbadauth> <rpcbadclnt>
- rpccnt: counts all rpc operations
- rpcbadfmt: counts if while processing a RPC it encounters the following errors:
  . err_bad_dir, err_bad_rpc, err_bad_prog, err_bad_vers, err_bad_proc, err_bad
- rpcbadauth: bad authentication
  . does not count if you try to mount from a machine that it's not in your exports file
- rpcbadclnt: unused

* procN (N = vers): <vs_nproc> <null> <getattr> <setattr> <lookup> <access> <readlink> <read> <write> <create> <mkdir> <symlink> <mknod> <remove> <rmdir> <rename> <link> <readdir> <readdirplus> <fsstat> <fsinfo> <pathconf> <commit>
- vs_nproc: number of procedures for NFS version
  . v2: nfsproc.c, 18
  . v3: nfs3proc.c, 22
  - v4, nfs4proc.c, 2
- statistics: generated from NFS operations at runtime

* proc4ops: <ops> <x..y>
- ops: the definition of LAST_NFS4_OP, OP_RELEASE_LOCKOWNER = 39, plus 1 (so 40); defined in nfs4.h
- x..y: the array of nfs_opcount up to LAST_NFS4_OP (nfsdstats.nfs4_opcount[i])

NFSD debug flags

/usr/include/linux/nfsd/debug.h
/*
 * knfsd debug flags
 */
#define NFSDDBG_SOCK            0x0001
#define NFSDDBG_FH              0x0002
#define NFSDDBG_EXPORT          0x0004
#define NFSDDBG_SVC             0x0008
#define NFSDDBG_PROC            0x0010
#define NFSDDBG_FILEOP          0x0020
#define NFSDDBG_AUTH            0x0040
#define NFSDDBG_REPCACHE        0x0080
#define NFSDDBG_XDR             0x0100
#define NFSDDBG_LOCKD           0x0200
#define NFSDDBG_ALL             0x7FFF
#define NFSDDBG_NOCHANGE        0xFFFF

NFS debug flags

/usr/include/linux/nfs_fs.h
/*
 * NFS debug flags
 */
#define NFSDBG_VFS              0x0001
#define NFSDBG_DIRCACHE         0x0002
#define NFSDBG_LOOKUPCACHE      0x0004
#define NFSDBG_PAGECACHE        0x0008
#define NFSDBG_PROC             0x0010
#define NFSDBG_XDR              0x0020
#define NFSDBG_FILE             0x0040
#define NFSDBG_ROOT             0x0080
#define NFSDBG_CALLBACK         0x0100
#define NFSDBG_CLIENT           0x0200
#define NFSDBG_MOUNT            0x0400
#define NFSDBG_FSCACHE          0x0800
#define NFSDBG_PNFS             0x1000
#define NFSDBG_PNFS_LD          0x2000
#define NFSDBG_STATE            0x4000
#define NFSDBG_ALL              0xFFFF

NLM debug flags

/usr/include/linux/lockd/debug.h
/*
 * Debug flags
 */
#define NLMDBG_SVC		0x0001
#define NLMDBG_CLIENT		0x0002
#define NLMDBG_CLNTLOCK		0x0004
#define NLMDBG_SVCLOCK		0x0008
#define NLMDBG_MONITOR		0x0010
#define NLMDBG_CLNTSUBS		0x0020
#define NLMDBG_SVCSUBS		0x0040
#define NLMDBG_HOSTCACHE	0x0080
#define NLMDBG_XDR		0x0100
#define NLMDBG_ALL		0x7fff

RPC debug flags

/usr/include/linux/sunrpc/debug.h
/*
 * RPC debug facilities
 */
#define RPCDBG_XPRT             0x0001
#define RPCDBG_CALL             0x0002
#define RPCDBG_DEBUG            0x0004
#define RPCDBG_NFS              0x0008
#define RPCDBG_AUTH             0x0010
#define RPCDBG_BIND             0x0020
#define RPCDBG_SCHED            0x0040
#define RPCDBG_TRANS            0x0080
#define RPCDBG_SVCXPRT          0x0100
#define RPCDBG_SVCDSP           0x0200
#define RPCDBG_MISC             0x0400
#define RPCDBG_CACHE            0x0800
#define RPCDBG_ALL              0x7fff

General Notes

  • While the number of threads can be increased at runtime via an echo to /proc/fs/nfsd/threads, the cache size (double the threads, see the ra line of /proc/net/rpc/nfsd) is not dynamic. The NFS daemon must be restarted with the new thread size during initialization in order for the thread cache to properly adjust.

References

Other issues

Permissions issues

If you find that you cannot set the permissions on files properly, make sure the user/group you are chowning are on both the client and server.

If all your files are owned by nobody, and you are using NFSv4, on both the client and server, you should:

  • For systemd, ensure that the nfs-idmapd service has been started.
  • For initscripts, ensure that NEED_IDMAPD is set to YES in /etc/conf.d/nfs-common.conf.

On some systems detecting the domain from FQDN minus hostname does not seem to work reliably. If files are still showing as nobody after the above changes, edit /etc/idmapd.conf, ensure that Domain is set to FQDN minus hostname. For example:

/etc/idmapd.conf
[General]

Verbosity = 7
Pipefs-Directory = /var/lib/nfs/rpc_pipefs
Domain = yourdomain.local

[Mapping]

Nobody-User = nobody
Nobody-Group = nobody

[Translation]

Method = nsswitch

If nfs-idmapd.service refuses to start because it cannot open the Pipefs-directory (defined in /etc/idmapd.conf and appended with '/nfs'), issue a mkdir-command and restart the daemon.