When sync IPC under the top-level PCompositorManager protocol does not
reply within a certain time threshold we purposefully kill the GPU
process. While this allows the user to recover from a stuck GPU
process, we have little visibility about the underlying cause.
This patch makes it so that we generate a paired minidump for the GPU
and parent processes prior to killing the GPU process in
GPUProcessHost::KillHard(). The implementation roughly follows the
equivalent for content processes in ContentParent::KillHard().
As the GPU process can be purposefully killed during normal operation,
and because generating minidumps can be expensive, we are careful to
only do so when the new argument aGenerateMinidump is true. We
additionally remove the aReason argument as it is unused (and
currently innacurate in some places).
As these minidumps may not automatically submitted we limit the
minidumps generation to twice per session in order to avoid
accumulating a large number of unsubmitted minidumps on disk.
Differential Revision: https://phabricator.services.mozilla.com/D202166
When sync IPC under the top-level PCompositorManager protocol does not
reply within a certain time threshold we purposefully kill the GPU
process. While this allows the user to recover from a stuck GPU
process, we have little visibility about the underlying cause.
This patch makes it so that we generate a paired minidump for the GPU
and parent processes prior to killing the GPU process in
GPUProcessHost::KillHard(). The implementation roughly follows the
equivalent for content processes in ContentParent::KillHard().
As the GPU process can be purposefully killed during normal operation,
and because generating minidumps can be expensive, we are careful to
only do so when the new argument aGenerateMinidump is true. We
additionally remove the aReason argument as it is unused (and
currently innacurate in some places).
As these minidumps may not automatically submitted we limit the
minidumps generation to twice per session in order to avoid
accumulating a large number of unsubmitted minidumps on disk.
Differential Revision: https://phabricator.services.mozilla.com/D202166
When sync IPC under the top-level PCompositorManager protocol does not
reply within a certain time threshold we purposefully kill the GPU
process. While this allows the user to recover from a stuck GPU
process, we have little visibility about the underlying cause.
This patch makes it so that we generate a paired minidump for the GPU
and parent processes prior to killing the GPU process in
GPUProcessHost::KillHard(). The implementation roughly follows the
equivalent for content processes in ContentParent::KillHard().
As the GPU process can be purposefully killed during normal operation,
and because generating minidumps can be expensive, we are careful to
only do so when the new argument aGenerateMinidump is true. We
additionally remove the aReason argument as it is unused (and
currently innacurate in some places).
As these minidumps may not automatically submitted we limit the
minidumps generation to twice per session in order to avoid
accumulating a large number of unsubmitted minidumps on disk.
Differential Revision: https://phabricator.services.mozilla.com/D202166
By using IDXGIOutput6, we could check if system enables HDR on Windows.
DeviceManagerDx::SystemHDREnabled() is expected to be called in GPU process.
Differential Revision: https://phabricator.services.mozilla.com/D202186
By using IDXGIOutput6, we could check if system enables HDR on Windows.
DeviceManagerDx::SystemHDREnabled() is expected to be called in GPU process.
Differential Revision: https://phabricator.services.mozilla.com/D202186
This patch makes it so that if we encounter an issue launching the GPU
process, we attempt to fallback first over disabling the GPU process.
This is because the different acceleration modes we might start with
change how we initialize. It may be possible to have a pure Software
WebRender GPU process with a given users configuration.
Differential Revision: https://phabricator.services.mozilla.com/D197827
This patch reworks CompositorManagerParent to be able to:
1) Get the owning IPDL actor for a given namespace
2) Rework shutdown to be more consistent
3) Block until either the relevant shared surface is added or the actor
is destroyed.
Differential Revision: https://phabricator.services.mozilla.com/D195964
We already avoid launching the GPU process during shutdown, but shutdown
could happen after we initiated the launch and before it completes. Top
level IPDL protocols assert that we don't create them during shutdown,
so we need to be more diligent about checking for that.
Differential Revision: https://phabricator.services.mozilla.com/D175054
This patch moves the media engine from the RDD process to the new
utility process, and create video bridge between the new utility process
and the GPU process in order to share the texture.
Differential Revision: https://phabricator.services.mozilla.com/D155901
WebRenderErrors errors after initialization will often be memory/driver
related. We can try to recover from this more gracefully by tearing down
the compositors instead of just crashing when we have finished falling
back. If one has a GPU process, it will be killed, and otherwise, it
will simulate a device reset in the parent process. This graceful
recovery can only be used if the process is declared "stable", according
to existing criteria for the GPU process (minimum uptime and frames
composited).
Differential Revision: https://phabricator.services.mozilla.com/D146477
The GPU process is destroyed in `ShutdownPhase::XPCOMShutdown` thus we shall not try to create it if we are in or beyond that phase.
Actually we might want to consider to block the creation even earlier if it does not exist, maybe from `ShutdownPhase::AppShutdown`, but this patch wants to just repair the crashes.
Differential Revision: https://phabricator.services.mozilla.com/D143349
If the android system kills the GPU process to free memory while the
app is in the background, then we want to avoid immediately restarting
the GPU process.
To achieve this, we make GPUProcessManager keep track of whether it is
in the foreground or background. If HandleProcessLost() gets called
while in the background then we destroy the existing compositor
sessions as before, but return early instead of immediately
relaunching the process. If the process has not been launched when the
app later gets foregrounded then we do so then.
The final part of HandleProcessLost(), which reinitializes the content
bridges and emits the "compositor-reinitialized" signal, has been
moved to a new function ReinitializeRendering(). If the GPU process
has been disabled, this gets called as-before at the end of
HandleProcessLost(). When the GPU process is enabled, however, we now
call it from OnProcessLaunchComplete(), so that it gets called
regardless of whether the process is launched immediately or after a
delay.
While we're here, rename the functions RebuildRemoteSessions() and
RebuildInProcessSessions() to DestroyRemoteCompositorSessions() and
DestroyInProcessCompositorSessions(), to better reflect what they
actually do: the "rebuilding" part occurs later on. Also update the
mega-comment documenting the restart sequence, as it was somewhat
outdated.
In case a caller of EnsureGPUReady() gets called before the foreground
signal arrives (eg in nsBaseWidget::CreateCompositorSession() due to a
refresh tick paint), make EnsureGPUReady() launch the GPU process
itself if the GPU process is enabled but not yet launched. As a
consequence, to avoid launching the GPU process unnecessarily, change
a couple callers of EnsureGPUReady() to simply check whether the
process is enabled instead.
Additionally, guard against a null pointer deref if the compositor has
been destroyed when the widget receives a memory pressure event. This
is now more likely to occur as there may be a gap between the
compositor being destroyed and recreated.
Differential Revision: https://phabricator.services.mozilla.com/D139042
This simplifies the logic around descriptors significantly, which is
especially useful considering how few places use the type. There is a
small change required on Windows to create the NamedPipe directly and
transfer around each end's handle, rather than connecting between
processes after the fact.
A named pipe has to be used, rather than an anonymous pipe, as
bidirectional communication is required.
Differential Revision: https://phabricator.services.mozilla.com/D130381
This simplifies the logic around descriptors significantly, which is
especially useful considering how few places use the type. There is a
small change required on Windows to create the NamedPipe directly and
transfer around each end's handle, rather than connecting between
processes after the fact.
A named pipe has to be used, rather than an anonymous pipe, as
bidirectional communication is required.
Differential Revision: https://phabricator.services.mozilla.com/D130381
Add a function to GPUProcessManager to force the GPU process to crash,
and expose it through gfxInfo. Expose this to geckoview tests via the
test-support webextension.
Add a junit test GpuCrashTest, which triggers a GPU process crash and
ensures the crash reporter was notified.
Additionally, ensure the TestCrashHandler service is stopped in
between tests, as otherwise only the first crash test to run will be
notified of the crash.
Differential Revision: https://phabricator.services.mozilla.com/D132812
GPU process crash reports are handled by calling GenerateCrashReport()
in GPUChild::ActorDestroy() if the reason is AbnormalShutdown. This
ensures we only create crash report if the process actually crashed,
and not when it was deliberately stopped.
However, sometimes actors other than GPUChild are the first to be
destroyed immediately after a crash, for example CompositorBridgeChild
or UiCompositorControllerChild. If such an actor receives an
ActorDestroy message with AbnormalShutdown as the reason, they will
call GPUProcessManager::NotifyRemoteActorDestroyed(), which leads to
GPUProcessHost::Shutdown(), which will close the PGPU channel. This
creates a race condition after a GPU process crash, where sometimes
the channel gets closed gracefully and ActorDestroy will receive a
NormalShutdown reason rather than AbnormalShutdown.
This patch adds a flag to GPUProcessHost::Shutdown() indicating
whether it is being called in response to an unexpected shutdown being
detected by another actor. If set, it sets a flag on the
GPUChild. When GPUChild::ActorDestroy() eventually gets called, it
knows to act in response to a crash if either the reason is
AbnormalShutdown or this flag has been set.
Differential Revision: https://phabricator.services.mozilla.com/D132811
This patch ensures that, following a GPU process crash, we
re-initialize the compositor and resume painting on Android.
nsWindow::GetWindowRenderer() is made to always reinitialize the
window renderer if there is none, like on other platforms. We
therefore no longer need to track whether webrender is being disabled,
as this is no longer a special case.
Previously we started the compositor as initially paused in
nsBaseWidget::CreateCompositorSession only if the widget did not yet
have a surface. Now we must unconditionally (re)start it as initially
paused, as even though the widget in the parent process may have a
surface, we will not have been able to send it to the GPU process
yet. We will send the surface to the compositor once control flow
returns to nsWindow::CreateLayerManager, where we will also now resume
the compositor if required.
Finally, we must ensure that we manually trigger a paint, both in the
parent and content processes. On other platforms this occurs
automatically following a GPU process loss through various refresh
driver events. On Android, however, nothing causes the refresh driver
to paint by itself, and we cannot receive input without first
initializing our APZ controllers, which does not happen until the
compositor receives a display list. We therefore must manually
schedule a paint. We do so from nsWindow::NotifyCompositorSessionLost
for the parent process, and BrowserChild::ReinitRendering for content
processes.
Differential Revision: https://phabricator.services.mozilla.com/D131232
Originally, we would restart the GPU process a fixed number of attempts
based on the layers.gpu-process.max_restarts pref. With this patch, we
now use this pref to control how many "unstable" restarts we allow. A
restart is "stable" if and only if the process uptime exceeds the pref
layers.gpu-process.stable.min-uptime-ts and if the process renders a
total number of frames exceeding the pref
layers.gpu-process.stable.frame-threshold. This allows users to keep the
GPU process for a lot longer if they are encountering infrequent
crashes. Should the user experience the GPU process crashing quickly
and/or without rendering many frames, we will disable it as before after
a few attempts and move into the parent process.
Differential Revision: https://phabricator.services.mozilla.com/D114531
We can disable WebRender because the GPU process crashed, or we
encountered a graceful runtime error in WebRender. This patch adds two
new prefs to control how that fallback works.
gfx.webrender.fallback.software-d3d11 controls if WebRender falls back
to Software WebRender + D3D11 compositing. If true, and the user is
allowed to get Software WebRender, we will fallback to Software
WebRender with the D3D11 compositor first.
gfx.webrender.fallback.software controls if WebRender falls back to
Software WebRender. If true, and the user is allowed to get Software
WebRender, we will fallback to Software WebRender without the D3D11
compositor.
gfx.webrender.fallback.basic controls if WebRender or Software
WebRender falls back to Basic. If true, it falls back to Basic.
Otherwise it continues to use Software WebRender without the D3D11
compositor. Note that this means OpenGL on Android.
This patch also means that gfx.webrender.all=true and MOZ_WEBRENDER=1
no longer disables Software WebRender. It will still prefer (Hardware)
WebRender but we want to allow fallback to Software WebRender for
configurations that forced WebRender on.
Differential Revision: https://phabricator.services.mozilla.com/D103491
We can disable WebRender because the GPU process crashed, or we
encountered a graceful runtime error in WebRender. This patch adds two
new prefs to control how that fallback works.
gfx.webrender.fallback.software-d3d11 controls if WebRender falls back
to Software WebRender + D3D11 compositing. If true, and the user is
allowed to get Software WebRender, we will fallback to Software
WebRender with the D3D11 compositor first.
gfx.webrender.fallback.software controls if WebRender falls back to
Software WebRender. If true, and the user is allowed to get Software
WebRender, we will fallback to Software WebRender without the D3D11
compositor.
gfx.webrender.fallback.basic controls if WebRender or Software
WebRender falls back to Basic. If true, it falls back to Basic.
Otherwise it continues to use Software WebRender without the D3D11
compositor. Note that this means OpenGL on Android.
This patch also means that gfx.webrender.all=true and MOZ_WEBRENDER=1
no longer disables Software WebRender. It will still prefer (Hardware)
WebRender but we want to allow fallback to Software WebRender for
configurations that forced WebRender on.
Differential Revision: https://phabricator.services.mozilla.com/D103491
We can disable WebRender because the GPU process crashed, or we
encountered a graceful runtime error in WebRender. This patch adds two
new prefs to control how that fallback works.
gfx.webrender.fallback.software-d3d11 controls if WebRender falls back
to Software WebRender + D3D11 compositing. If true, and the user is
allowed to get Software WebRender, we will fallback to Software
WebRender with the D3D11 compositor first.
gfx.webrender.fallback.software controls if WebRender falls back to
Software WebRender. If true, and the user is allowed to get Software
WebRender, we will fallback to Software WebRender without the D3D11
compositor.
gfx.webrender.fallback.basic controls if WebRender or Software
WebRender falls back to Basic. If true, it falls back to Basic.
Otherwise it continues to use Software WebRender without the D3D11
compositor. Note that this means OpenGL on Android.
This patch also means that gfx.webrender.all=true and MOZ_WEBRENDER=1
no longer disables Software WebRender. It will still prefer (Hardware)
WebRender but we want to allow fallback to Software WebRender for
configurations that forced WebRender on.
Differential Revision: https://phabricator.services.mozilla.com/D103491
Aside from on Windows, we do not appear to handle device resets properly
without the GPU process. This patch adds in the necessary plumbing to
handle the device reset properly. It also ensures that whenever we check
for a device reset reason, we handle all of the reasons (e.g. not just
the NV video memory purge reset reason) to ensure they are not lost, and
handles them all consistently in the same manner.
It also tracks the number of device resets for thresholding purposes
with an in process compositor. While it will only disable WebRender on
Linux at this time, it will put a note in the critical log if the
threshold was exceeded on all platforms. This may prove useful in
evaluating whether or not we should do the same everywhere.
Differential Revision: https://phabricator.services.mozilla.com/D98705
We don't know why we see initialization failures in the telemetry which
makes it hard to investigate why users aren't getting WebRender and
instead fallback to basic. Let's expose the detailed error message
WebRender already generates and puts in the critical log.
Differential Revision: https://phabricator.services.mozilla.com/D89185
Change GPUProcessManager::OnPreferenceChange() and
RDDProcessManager::OnPreferenceChange() to only queue preference changes when
the child process is in the act of launching. And don't start observing
preference changes until we launch the child process rather that at startup.
Differential Revision: https://phabricator.services.mozilla.com/D62112
--HG--
extra : moz-landing-system : lando
The inclusions were removed with the following very crude script and the
resulting breakage was fixed up by hand. The manual fixups did either
revert the changes done by the script, replace a generic header with a more
specific one or replace a header with a forward declaration.
find . -name "*.idl" | grep -v web-platform | grep -v third_party | while read path; do
interfaces=$(grep "^\(class\|interface\).*:.*" "$path" | cut -d' ' -f2)
if [ -n "$interfaces" ]; then
if [[ "$interfaces" == *$'\n'* ]]; then
regexp="\("
for i in $interfaces; do regexp="$regexp$i\|"; done
regexp="${regexp%%\\\|}\)"
else
regexp="$interfaces"
fi
interface=$(basename "$path")
rg -l "#include.*${interface%%.idl}.h" . | while read path2; do
hits=$(grep -v "#include.*${interface%%.idl}.h" "$path2" | grep -c "$regexp" )
if [ $hits -eq 0 ]; then
echo "Removing ${interface} from ${path2}"
grep -v "#include.*${interface%%.idl}.h" "$path2" > "$path2".tmp
mv -f "$path2".tmp "$path2"
fi
done
fi
done
Differential Revision: https://phabricator.services.mozilla.com/D55443
--HG--
extra : moz-landing-system : lando
On android, android's nsWindow creates LayerManaer only in nsWindow::Create(). When WebRender error happened, gecko just stopped rendering by disabling Webrender.
The nsWindow needs to re-create LayerManager during disabling Webrender. Further, during disabling WebRender, All GeckoSurfaceTextures should not be attached to GLContext. It is for preventing a conflict with AttachToGLContext() call in SurfaceTextureHost::EnsureAttached().
Differential Revision: https://phabricator.services.mozilla.com/D26687
--HG--
extra : moz-landing-system : lando