An Introduction to Shmem/IPC in Gecko

We use shared memory (shmem) pretty extensively in the graphics stack in Gecko. Unfortunately, there isn’t a huge amount of documentation regarding how the shmem mechanisms in Gecko work and how they are managed, which I will attempt to address in this post.

Firstly, it’s important to understand how IPC in Gecko works. Gecko uses a language called IPDL to define IPC protocols. This is effectively a description language which formally defines the format of the messages that are passed between IPC actors. The IPDL code is then compiled into C++ code by our IPDL compiler, and we can then use the generated classes in Gecko’s C++ code to do IPC related things. IPDL class names start with a P to indicate that they are IPDL protocol definitions.

IPDL has a built-in shmem type, simply called mozilla::ipc::Shmem. This holds a weak reference to a SharedMemory object, and code in Gecko operates on this. SharedMemory is the underlying platform-specific implementation of shared memory and facilitates the shmem subsystem by implementing the platform-specific API calls to allocate and deallocate shared memory regions, and obtain their handles for use in the different processes. Of particular interest is that on OS X we use the Mach virtual memory system, which uses a Mach port as the handle for the allocated memory regions.

mozilla::ipc::Shmem objects are fully managed by IPDL, and there are two different types: normal Shmem objects, and unsafe Shmem objects. Normal Shmem objects are mostly intended to be used by IPC actors to send large data chunks between themselves as this is more efficient than saturating the IPC channel. They have strict ownership policies which are enforced by IPDL; when the Shmem object is sent across IPC, the sender relinquishes ownership and IPDL restricts the sender’s access rights so that it can neither read nor write to the memory, whilst the receiver gains these rights. These Shmem objects are created/destroyed in C++ by calling PFoo::AllocShmem() and PFoo::DeallocShmem(), where PFoo is the Foo IPDL interface being used. One major caveat of these “safe” shmem regions is that they are not thread safe, so be careful when using them on multiple threads in the same process!

Unsafe Shmem objects are basically a free-for-all in terms of access rights. Both sender and receiver can always read/write to the allocated memory and careful control must be taken to ensure that race conditions are avoided between the processes trying to access the shmem regions. In graphics, we use these unsafe shmem regions extensively, but use locking vigorously to ensure correct access patterns. Unsafe Shmem objects are created by calling PFoo::AllocUnsafeShmem(), but are still destroyed in the same manner as normal Shmem objects by simply calling PFoo::DeallocShmem().

With the work currently ongoing to move our compositor to a separate GPU process, there are some limitations with our current shmem situation. Notably, a SharedMemory object is effectively owned by an IPDL channel, and when the channel goes away, the SharedMemory object backing the Shmem object is deallocated. This poses a problem as we use shmem regions to back our textures, and when/if the GPU process dies, it’d be great to keep the existing textures and simply recreate the process and IPC channel, then continue on like normal. David Anderson is currently exploring a solution to this problem, which will likely be to hold a strong reference to the SharedMemory region in the Shmem object, thus ensuring that the SharedMemory object doesn’t get destroyed underneath us so long as we’re using it in Gecko.

Pushing to git from Mozilla Toronto

Today, Ehsan and I set up a highly experimental git server in Mozilla Toronto, and it seems to be working relatively well (for now). If you want access, give me a ping and I’ll sort it out for you (should be accessible to anyone on a Mozilla corporate network, I think). We’re still ironing out the kinks though, so please only use it if you’re fairly well versed with git.

The server (currently accessible via “teenux.local”) hosts repositories mirroring mozilla-central, mozilla-inbound and a push-only repository for try:

git@teenux.local:mozilla-central.git
git@teenux.local:mozilla-inbound.git
git@teenux.local:try.git

To use it, you need to add a new section to your .ssh/config like this:

Host teenux.local
User git
ForwardAgent yes
IdentityFile path/to/your/hg/private_key

You may also need to register the private key you use for hg.mozilla.org with ssh-agent by doing:

ssh-add path/to/your/hg/private_key

That should be it. Now you can go ahead and clone git://github.com/mozilla/mozilla-central.git and set up a remote for the repositories hosted on teenux for pushing to. You can theoretically clone from teenux as well but given that the server isn’t anywhere near as reliable as github, I recommend you stick to github for fetching changes, and only use teenux for pushing.

There is a minor caveat: you must push to the master branch for this to work. So a typical command would be:

git push -f try my_local_branch:master

Enjoy!

Using git to push to Mozilla’s hg repositories

Last night I spent a huge amount of time working with Nicolas Pierron on setting up a two-way git-hg bridge to allow for those of us using git to push straight to try/central/inbound without manually importing patches into a local mercurial clone.

The basic design of the bridge is fairly simple: you have a local hg clone of mozilla-central, which has remote paths set up for try, central and inbound. It is set up as an hg-git hybrid and so the entire history is also visible via git at hg-repo/.hg/git. This is a fairly standard set up as far as hg-git goes.

Then on top of that, there’s a special git repository for each remote path in hg (try, central and inbound) inside .hg/repos. These are all set up with special git hooks such that when you push to the master branch of one of these repositories, they will automatically invoke hg-git, import the commits into hg and then invoke hg to push to the true remote repository on hg.mozilla.org.

Simple, right? Well, the good news is that for the most part, people shouldn’t need to actually set up this system. There is infrastructure in place to make it just look like a multi-user git repository that people can authenticate against and push to. So ultimately we can set this up on, say, git.mozilla.org and to push to try we just push to remote ssh://git.mozilla.org/try.git, or something. Authentication is handled by the system just by using ssh’s ForwardAgent option, so in theory it should be as secure as hg.mozilla.org (but don’t quote me on that!).

Now onto setting it up; first you have to clone mozilla-central from hg:

hg clone ssh://hg.mozilla.org/mozilla-central

Then edit the .hg/hgrc to contain the following:

[paths]
mozilla-central = ssh://hg.mozilla.org/mozilla-central
mozilla-inbound = ssh://hg.mozilla.org/integration/mozilla-inbound
try-pushonly = ssh://hg.mozilla.org/try
[extensions]
hgext.bookmarks =
hggit =

The -pushonly suffix on the try path tells the bridge to not bother pulling from try when synchronising the repositories. The other two will be kept in sync.

The next step is go ahead and use Ehsan’s git-mapfile to short-cut the repository creation process. By default, the bridge will use hg-git to create the embedded git repository, and doing this requires that hg-git processes every single commit in the entire repository, which takes days. The git-mapfile is the map that hg-git uses to determine which hg commit IDs correspond to which git commit IDs, and using Ehsan’s git-mapfile along with a clone of the canonical mozilla-central git repository at git://github.com/mozilla/mozilla-central.git will allow us to create these local repositories in a matter of minutes instead of days.

git clone https://github.com/ehsan/mozilla-history-tools
cp mozilla-history-tools/updates/git-mapfile /path/to/your/hg/clone/.hg/git-mapfile
cd /path/to/your/hg/clone/.hg
git clone --bare git://github.com/mozilla/mozilla-central.git git

This lays the groundwork, but there is still a little more to do. Unfortunately, this git repository contains a huge amount of commit history from the CVS era that isn’t present in the hg repositories, so if you try and push using the bridge, hg-git will see these commits that aren’t in the hg repository and try to import all these CVS commits into hg. To work around this, we can hack the git-mapfile. The basic idea here is to grab a list of all the git commit SHA1s that correspond to CVS commits, then map those in the git-mapfile to dummy hg commits (such as “0000000000000000000000000000000000000000”). Unfortunately, hg-git requires that all the mappings are unique, so we need to generate a unique dummy commit ID for each and every CVS commit in git.

If you’re using Ehsan’s repository, go ahead and just grab my git-mapfile from http://people.mozilla.org/~gwright/git-cvs-mapfile-non-unique for just the CVS commits and pre-pend that to your .hg/git-mapfile.

Now comes the fun part; setting up the bridge itself.

git clone git://github.com/nbp/spidermonkey-dev-tools.git
mkdir /path/to/your/hg/clone/.hg/bridge/
cp spidermonkey-dev-tools/git-hg-bridge/*.sh /path/to/your/hg/clone/.hg/bridge/

Then, you need to add the following to your .hg/git/config’s origin remote to ensure that the branches are set up correctly for inbound and central:

fetch = +refs/heads/master:refs/heads/mozilla-central/master
fetch = +refs/heads/inbound:refs/heads/mozilla-inbound/master

This is because pull.sh expects to find the mozilla-central history at a branch called mozilla-central/master, and the inbound history at mozilla-inbound/master.

Now to create the special push-only repositories. First pull.sh needs to be modified in order to allow for the short circuiting; temporarily remove the “return 0” call on line 112 after “No update needed”, then:

cd /path/to/your/hg/clone/.hg/bridge/
./pull.sh ../..
(Add the return 0 call back to pull.sh now)

This will create three repositories in /path/to/your/hg/clone/.hg/repos that correspond to try, mozilla-central and mozilla-inbound. If you now set these repositories as remotes in your main git working tree such as:

git remote add try /path/to/your/hg/clone/.hg/repos/try

You can just push to try by pushing to the master branch of that remote! The first push will take a while as the push-only repository has no commits in it (this should not be an issue for mozilla-central and mozilla-inbound pushes), but after that they should be nice and fast. Here’s an example:

[george@aluminium mozilla-central]$ git push -f try gwright/skia_rebase:master
Counting objects: 3028016, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (549026/549026), done.
Writing objects: 100% (3028016/3028016), 729.80 MiB | 59.21 MiB/s, done.
Total 3028016 (delta 2452303), reused 3015470 (delta 2440473)
remote: fatal: Not a valid commit name 0000000000000000000000000000000000000000
remote: Will Force update!
remote: Get git repository lock for pushing to the bridge.
remote: To ./.hg/git
remote:  * [new branch]      refs/push/master -> try-pushonly/push
remote: Get mercurial repository lock for pushing from the bridge.
remote: Convert changes to mercurial.
remote: abort: bookmark 'try-pushonly/push' does not exist
remote: importing git objects into hg
remote: pushing to ssh://hg.mozilla.org/try
remote: searching for changes
remote: remote: adding changesets
remote: remote: adding manifests
remote: remote: adding file changes
remote: remote: added 4 changesets with 7 changes to 877 files (+1 heads)
remote: remote: Looks like you used try syntax, going ahead with the push.
remote: remote: If you don't get what you expected, check http://trychooser.pub.build.mozilla.org/ for help with building your trychooser request.
remote: remote: Thanks for helping save resources, you're the best!
remote: remote: You can view the progress of your build at the following URL:
remote: remote:   https://tbpl.mozilla.org/?tree=Try&rev=b9783e130dd6
remote: remote: Trying to insert into pushlog.
remote: remote: Please do not interrupt...
remote: remote: Inserted into the pushlog db successfully.
To /home/george/dev/hg/mozilla-central/.hg/repos/try
 * [new branch]      gwright/skia_rebase -> master

And that’s it! Mad props to Nicolas Pierron for the huge amount of work he put in on building this solution. If you’re in the Mozilla Mountain View office, you can ping him to get access to his git push server, but hopefully Ehsan and I will work on getting an accessible server out there for everyone.

Setting up a chroot for Android development

Let’s say you want to build Android. This is not an unreasonable thing to want to do; around here, the most common reason for doing this is to get easy access to debug symbols in system libraries.

However, Android only really supports building on a very specific distribution of Linux, and that is normally the LTS release of Ubuntu (currently 10.04 “Lucid”).

Luckily, it’s relatively easy to set up a chroot in Linux to build in, such that you don’t need to maintain a completely separate installation of Linux if you want to run, say, Ubuntu 11.10 instead.

First you need to install schroot and debootstrap:

sudo apt-get install schroot debootstrap

schroot is a tool to allow you to easily run a command or a login shell within a chroot that you have previously set up. debootstrap is a Debian tool to bootstrap a Debian (and by extension, Ubuntu) release inside a directory which can then be used as a chroot.

Once you have those installed, fire up your favourite text editor and append something similar to the following to your /etc/schroot/schroot.conf file:

[lucid]
description=Ubuntu Lucid
type=directory
location=/var/chroot/lucid
priority=3
users=george
groups=users
root-groups=root

Now, you need to actually create the chroot. To do this, you need to use debootstrap. In this case, I’m going to create a Lucid chroot:

sudo debootstrap --arch amd64 lucid /var/chroot/lucid http://mirrors.rit.edu/ubuntu

The first argument here specifies the CPU architecture you want to install in the chroot (typically either i386 or amd64), the second is the distribution codename used in the repositories, the third is the directory in which to install the chroot and finally the last argument is the mirror you wish to use to download the Debian packages from for installing.

This will take a little while but once it’s done you can simply run the following:

schroot -c lucid

If all goes well, you’ll be greeted with a friendly and informative prompt thus:

(lucid)george@sodium:~$ 

You can then, inside this shell, follow the instructions for building and flashing Android yourself without any trouble.

As far as I can tell, there’s nothing here that should be Debian (or Debian derivative) specific, so hopefully if you’re running a different distribution, so long as you can get hold of debootstrap and schroot you should be fine.

(Update 2012/03/29 – correct the Lucid version number)

Debugging OpenGL on Android without losing your sanity

Recently I’ve had to work more and more on OpenGL code as part of my job, which is hardly surprising given that my new job is to work on graphics. One thing that’s annoyed me since I started, however, is the relative difficulty of debugging OpenGL code compared to normal C/C++ code that just runs on the CPU.

The main reason for this, I’ve found, is that keeping track of which OpenGL commands have been issued, with what parameters, and what state has been set on your GL context is actually a rather hard task. In fact, it’s such a common problem that some bright hackers came up with the holy grail of OpenGL debugging tools – a tool called apitrace.

Put simply, apitrace is just a command tracer that logs all the OpenGL calls you make from your application. Why is that so wonderful? Well, for a start it decouples your development and testing environments. It allows you to record a series of OpenGL commands on a target device or piece of software that’s exhibiting a bug, which you can then replay and debug at your leisure in your native development environment.

Anyone who has had the pleasure of trying to debug OpenGL ES 2.0 code running on an embedded device running something like Android will understand the value here. You can just trace your buggy application, put the trace file on your desktop or laptop, analyse the GL commands you issued and modify them, then fix the bug. Problem solved! No messing around with a huge number of printf() statements or GL debugging states.

Well that’s all well and good, but how do you use this thing? Turns out on Android, that’s not so easy. First off you’ll need to grab my Android branch of apitrace (I’m working on getting these patches upstreamed, so don’t worry), and build it for your device:

export ANDROID_NDK=/path/to/your/android/ndk
cmake \
 -DCMAKE_TOOLCHAIN_FILE=android/android.toolchain.cmake \
 -DANDROID_API_LEVEL=9 -Bbuild -H.
make -C build

When this is done, you’ll find a file called egltrace.so in build/wrappers which you can then put somewhere on your device, such as /data/local.

However, on Android I have yet to find a way to preload a library, using the LD_PRELOAD environment variable or otherwise, so you’ll have to put the following lines of code before you make any gl calls in your application:

setenv("TRACE_FILE", "/path/to/trace/file", false);
dlopen("/data/local/egltrace.so", RTLD_LAZY);

This will ensure that the symbols can be found, but you also need to actually look up the value of each gl function you’re hoping to use before you can start to get anywhere. In the case of glGetError(), this can be:

typedef glGetErrorFn (GLenum (*)(void));
glGetErrorFn fGetError =
    (glGetErrorFn)dlsym(RTLD_DEFAULT, "glGetError");

Unfortunately this will need to be done for all the symbols you’re planning to use, but on the up side you get total control over when your dynamic libraries are loaded and used, which means you can optimise your startup time accordingly.

Once that’s all set up you can go ahead and run your application and grab the tracing output. This is where the fun part starts. apitrace has a GUI written in Qt by Zack Rusin that can be used to do all sorts of crazy stuff, such as:

  • View all GL commands issued, frame by frame
  • Modify parameters passed into GL commands on the fly
  • View current texture data*
  • View bound shader programs
  • Inspect GL context state at any point
  • Replay traces
  • View current uniforms

qapitrace in action

You get the idea. Whilst not all of the features seem to be working at the moment with EGL/GLESv2 traces, I hope to devote some spare cycles to fixing those. The most important one to me right now is that qapitrace is unable to view current texture data from traces we obtain from Firefox/Android. It seems unlikely that it’s an issue with our tracing support as replaying the traces using eglretrace works fine, but without investigating further I can’t say whether this is a limitation of qapitrace with EGL/GLESv2 or an issue with our tracing in Firefox. I do get the impression that upstream are targeting desktop GL rather than embedded GL, but that just gives me an opportunity to learn a bit more GL and help out!

Getting EGL playbacks to work on Linux can be a bit trying however. First off you will need to get the Mesa EGL and GLESv2 development libraries, as well as the usual requirements for building apitrace – Qt version 4.7 and so on – and you can build as per the installation instructions. Before running qapitrace or eglretrace though, you will need to set the following environment variable or I found (on my system at least) DRI fails to authenticate with the kernel:


export EGL_SOFTWARE=true

Of course everything that’s been said here also works great for debugging desktop GL applications, but there’s significantly less pain involved as you shouldn’t need to resort to dlopen/dlsym magic.