2023-04-17 15:54:12 +00:00
|
|
|
---
|
|
|
|
date: 2015-04-05T00:00:00-05:00
|
|
|
|
title: "Memory mappings, core dumps, GDB and Linux"
|
2023-04-18 16:16:48 +00:00
|
|
|
tags: [free-software, linux, gdb, fedora-planet, en_us, english]
|
2023-04-17 15:54:12 +00:00
|
|
|
---
|
|
|
|
|
|
|
|
After spending the last weeks struggling with this, I decided to write a
|
|
|
|
blog post. First, what is “this” that you are talking about? The answer
|
|
|
|
is: Linux kernel's concept of memory mapping. I found it utterly
|
|
|
|
confused, beyond my expectations, and so I believe that a blog post is
|
|
|
|
the write way to (a) preserve and (b) share this knowledge. So, let's do
|
|
|
|
it!
|
|
|
|
|
|
|
|
First things first
|
|
|
|
------------------
|
|
|
|
|
|
|
|
First, I cannot begin this post without a few acknowledgements and
|
|
|
|
“thank you's”. The first goes to Oleg Nesterov (sorry, I could not find
|
|
|
|
his website), a Linux kernel guru who really helped me a lot through the
|
|
|
|
whole task. Another “thank you” goes to [Jan
|
|
|
|
Kratochvil](http://www.jankratochvil.net/), who also provided valuable
|
|
|
|
feedback by commenting my GDB patch. Now, back to the point.
|
|
|
|
|
|
|
|
The task
|
|
|
|
--------
|
|
|
|
|
|
|
|
The task was requested
|
|
|
|
[here](https://sourceware.org/bugzilla/show_bug.cgi?id=16092): GDB
|
|
|
|
needed to respect the `/proc/<PID>/coredump_filter` file when generating
|
|
|
|
a coredump (i.e., when you use the `gcore` command).
|
|
|
|
|
|
|
|
Currently, GDB has his own coredump mechanism implemented which, despite
|
|
|
|
its limitations and bugs, has been around for quite some time. However,
|
|
|
|
and maybe you don't know that, but the Linux kernel has its own
|
|
|
|
algorithm for generating the corefile of a process. And unfortunately,
|
|
|
|
GDB and Linux were not really following the same standards here...
|
|
|
|
|
|
|
|
So, in the end, the task was about synchronizing GDB and Linux. To do
|
|
|
|
that, I first had to decipher the contents of the `/proc/<PID>/smaps`
|
|
|
|
file.
|
|
|
|
|
|
|
|
The `/proc/<PID>/smaps` file
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
This special file, generated by the Linux kernel when you read it,
|
|
|
|
contains detailed information about each memory mapping of a certain
|
|
|
|
process. Some of the fields on this file are documented in the `proc(5)`
|
|
|
|
manpage, but others are missing there (asking for a patch!). Here is an
|
|
|
|
explanation of everything I needed:
|
|
|
|
|
|
|
|
- The first line of each memory mapping has the following format:
|
|
|
|
|
|
|
|
The fields here are:
|
|
|
|
|
|
|
|
a) *address* is the address range, in the process' address space,
|
|
|
|
that the mapping occupies. This part was already treated by GDB,
|
|
|
|
so I did not have to worry about it.
|
|
|
|
|
|
|
|
b) *perms* is a set of permissions (**r** ead, **w** rite, e **x**
|
|
|
|
ecute, **s** hared, **p** rivate [COW -- copy-on-write])
|
|
|
|
applied to the memory mapping. GDB was already dealing with
|
|
|
|
`rwx` permissions, but I needed to include the `p` flag as well.
|
|
|
|
I also made GDB ignore the mappings that did not have the `r`
|
|
|
|
flag active, because it does not make sense to dump something
|
|
|
|
that you cannot read.
|
|
|
|
|
|
|
|
c) *offset* is the offset into the applied to the file, if the
|
|
|
|
mapping is file-backed (see below). GDB already handled
|
|
|
|
this correctly.
|
|
|
|
|
|
|
|
d) *dev* is the device (major:minor) related to the file, if there
|
|
|
|
is one. GDB already handled this correctly, though I was using
|
|
|
|
this field for more things (continue reading).
|
|
|
|
|
|
|
|
e) *inode* is the inode on the device above. The value of zero
|
|
|
|
means that no inode is associated with the memory mapping.
|
|
|
|
Nothing to do here.
|
|
|
|
|
|
|
|
f) *pathname* is the file associate with this mapping, if there
|
|
|
|
is one. This is one of the most important fields that I had to
|
|
|
|
use, and one of the most complicated to understand completely.
|
|
|
|
GDB now uses this to heuristically identify whether the mapping
|
|
|
|
is anonymous or not.
|
|
|
|
|
|
|
|
- GDB is now also interested in `Anonymous:` and `AnonHugePages:`
|
|
|
|
fields from the `smaps` file. Those fields represent the content of
|
|
|
|
anonymous data on the mapping; if GDB finds that this content is
|
|
|
|
greater than zero, this means that the mapping is anonymous.
|
|
|
|
|
|
|
|
- The last, but perhaps most important field, is the `VmFlags:` field.
|
|
|
|
It contains a series of two-letter flags that provide very useful
|
|
|
|
information about the mapping. A description of the fields is:
|
|
|
|
a) `sh`: the mapping is shared (`VM_SHARED`)
|
|
|
|
b) `dd`: this mapping should not be dumped in a corefile
|
|
|
|
(`VM_DONTDUMP`)
|
|
|
|
c) `ht`: this is HugeTLB mapping
|
|
|
|
|
|
|
|
With that in hands, the following task was to be able to determine
|
|
|
|
whether a memory mapping is anonymous or file-backed, private or shared.
|
|
|
|
|
|
|
|
Types of memory mappings
|
|
|
|
------------------------
|
|
|
|
|
|
|
|
There can be four types of memory mappings:
|
|
|
|
|
|
|
|
1. Anonymous private mapping
|
|
|
|
2. Anonymous shared mapping
|
|
|
|
3. File-backed private mapping
|
|
|
|
4. File-backed shared mapping
|
|
|
|
|
|
|
|
It should be possible to uniquely identify each mapping based on the
|
|
|
|
information provided by the `smaps` file; however, you will see that
|
|
|
|
this is not always the case. Below, I will explain how to determine each
|
|
|
|
of the four characteristics that define a mapping.
|
|
|
|
|
|
|
|
### `Anonymous`
|
|
|
|
|
|
|
|
A mapping is anonymous if one of these conditions apply:
|
|
|
|
|
|
|
|
1. The `pathname` associated with it is either `/dev/zero (deleted)`,
|
|
|
|
`/SYSV%08x (deleted)`, or `<filename> (deleted)` (see below).
|
|
|
|
2. There is content in the `Anonymous:` or in the `AnonHugePages:`
|
|
|
|
fields of the mapping in the `smaps` file.
|
|
|
|
|
|
|
|
A special explanation is needed for the `<filename> (deleted)` case. It
|
|
|
|
is not always guaranteed that it identifies an anonymous mapping; in
|
|
|
|
fact, it is possible to have the `(deleted)` part for file-backed
|
|
|
|
mappings as well (say, when you are running a program that uses shared
|
|
|
|
libraries, and those shared libraries have been removed because of an
|
|
|
|
update, for example). However, we are trying to mimic the behavior of
|
|
|
|
the Linux kernel here, which checks to see if a file has no hard links
|
|
|
|
associated with it (and therefore is truly deleted).
|
|
|
|
|
|
|
|
Although it may be possible for the userspace to do an extensive check
|
|
|
|
(by `stat` ing the file, for example), the Linux kernel certainly could
|
|
|
|
give more information about this.
|
|
|
|
|
|
|
|
### `File-backed`
|
|
|
|
|
|
|
|
A mapping is file-backed (i.e., not anonymous) if:
|
|
|
|
|
|
|
|
1. The `pathname` associated with it contains a `<filename>`, without
|
|
|
|
the `(deleted)` part.
|
|
|
|
|
|
|
|
As has been explained above, a mapping whose `pathname` contains the
|
|
|
|
`(deleted)` string could still be file-backed, but we decide to consider
|
|
|
|
it anonymous.
|
|
|
|
|
|
|
|
It is also worth mentioning that a mapping can be simultaneously
|
|
|
|
anonymous and file-backed: this happens when the mapping contains a
|
|
|
|
valid `pathname` (without the `(deleted)` part), but **also** contains
|
|
|
|
`Anonymous:` or `AnonHugePages:` contents.
|
|
|
|
|
|
|
|
### `Private`
|
|
|
|
|
|
|
|
A mapping is considered to be private (i.e., not shared) if:
|
|
|
|
|
|
|
|
1. In the absence of the `VmFlags` field (in the `smaps` file), its
|
|
|
|
permission field has the flag `p`.
|
|
|
|
2. If the `VmFlags` field is present, then the mapping is private if
|
|
|
|
we do not find the `sh` flag there.
|
|
|
|
|
|
|
|
### `Shared`
|
|
|
|
|
|
|
|
A mapping is shared (i.e., not private) if:
|
|
|
|
|
|
|
|
1. In the absence of `VmFlags` in the `smaps` file, the permission
|
|
|
|
field of the mapping does not have the `p` flag. Not having this
|
|
|
|
flag actually means `VM_MAYSHARE` and not necessarily `VM_SHARED`
|
|
|
|
(which is what we want), but it is the best approximation we have.
|
|
|
|
2. If the `VmFlags` field is present, then the mapping is shared if
|
|
|
|
we find the `sh` flag there.
|
|
|
|
|
|
|
|
The patch
|
|
|
|
---------
|
|
|
|
|
|
|
|
With all that in mind, I hacked GDB to improve the coredump mechanism
|
|
|
|
for GNU/Linux operating systems. The main function which decides the
|
|
|
|
memory mappings that will or will not be dumped on GNU/Linux is
|
|
|
|
[linux_find_memory_regions_full](http://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/linux-tdep.c;h=4af1d01900256164a478a0159b0fcabe86d5549f;hb=HEAD#l1108);
|
|
|
|
the Linux kernel obviously uses its own function,
|
|
|
|
[vma_dump_size](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_elf.c#n1229),
|
|
|
|
to do the same thing.
|
|
|
|
|
|
|
|
Linux has one advantage: it is a kernel, and therefore has much more
|
|
|
|
knowledge about processes' internals than a userspace program. For
|
|
|
|
example, inside Linux it is trivial to check if a file marked as
|
|
|
|
"`(deleted)`" in the output of the `smaps` file has no hard links
|
|
|
|
associated with it (and therefore is not really deleted); the same
|
|
|
|
operation on userspace, however, would require root access to inspect
|
|
|
|
the contents of the `/proc/<PID>/map_files/` directory.
|
|
|
|
|
|
|
|
The case described above, if you remember, is something that impacts the
|
|
|
|
ability to tell whether a mapping is anonymous or not. I am talking to
|
|
|
|
the Linux kernel guys to see if it is possible to export this
|
|
|
|
information directly via the `smaps` file, instead of having to do the
|
|
|
|
current heuristic.
|
|
|
|
|
|
|
|
While doing this work, some strange behaviors were found in the Linux
|
|
|
|
kernel. Oleg is working on them, along with other Linux hackers. From
|
|
|
|
our side, there is still room for improvement on this code. The first
|
|
|
|
thing I can think of is to improve the heuristics for finding anonymous
|
|
|
|
mappings. Another relatively easy thing to do would be to let the user
|
|
|
|
specify a value for `coredump_filter` on the command line, without
|
|
|
|
editing the `/proc` file. And of course, keep this code always updated
|
|
|
|
with its counterpart in the Linux kernel.
|
|
|
|
|
|
|
|
Upstream discussions and commit
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
If you are interested, you can see the discussions that happened
|
|
|
|
upstream by going [to this
|
|
|
|
link](http://sourceware.org/ml/gdb-patches/2015-03/msg00816.html). This
|
|
|
|
is the fourth (and final) submission of the patch; you should be able to
|
|
|
|
find the other submissions [in the
|
|
|
|
archive](http://sourceware.org/ml/gdb-patches/2015-03/authors.html).
|
|
|
|
|
|
|
|
The final commit can be found [in the official
|
|
|
|
repository](http://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=df8411da087dc05481926f4c4a82deabc5bc3859).
|