In bug 1762025 we found ourselves in a situation where we are unable
to ever create an EGLSurface, meaning we were continually attempting
to initialize a compositor, encountering a NEW_SURFACE webrender
error, causing us to tear down the compositor and create a new one,
repeating the cycle indefinitely.
This leaves the user with an unusable browser. We'd be better off
simply crashing. This patch adds a flag to FallbackFromAcceleration(),
which forces a crash if we have already exhausted all of the fallback
options. When calling FallbackFromAcceleration due to a NEW_SURFACE
error we set the flag to true. It may also be worthwhile setting this
flag for more errors in the future.
Differential Revision: https://phabricator.services.mozilla.com/D143486
The GPU process is destroyed in `ShutdownPhase::XPCOMShutdown` thus we shall not try to create it if we are in or beyond that phase.
Actually we might want to consider to block the creation even earlier if it does not exist, maybe from `ShutdownPhase::AppShutdown`, but this patch wants to just repair the crashes.
Differential Revision: https://phabricator.services.mozilla.com/D143349
SimulateDeviceReset() works differently from actual device reset handling. It seems better to make SimulateDeviceReset() more similar to actual device reset handling.
Differential Revision: https://phabricator.services.mozilla.com/D140161
We noticed a cold_view_nav_start regression on Fenix from enabling the
GPU process, and profiles showed time spent synchronously waiting for
the GPU process to launch. This occured because the compositor was
being created in nsWindow::Create, and as it requires the GPU process
to be running it had to block until launch completed. The process is
launched when the gfxPlatform is first initialized, but that was only
occuring immediately prior to creating the compositor, which did not
give it enough time to complete asynchronously.
This patch makes it so that we initialize the gfxPlatform slightly
earlier, and importantly delay creating the compositor until it is
actually required. This gives the process enough time to launch
asynchronously meaning we do not have to block.
We started deliberately creating the compositor early on Android
because of bug 1453501, to avoid a race condition where the compositor
didn't exist when RemoteLayerTreeOwner::Initialize was called, causing
us to use a basic layer manager. However, since bug 1741156 landed we
now create the compositor on-demand, meaning this is no longer a
possibility.
Delaying compositor creation can, however, uncover another race
condition. If the UICompositorControllerChild is opened on the UI
thread before the main thread is able to set its pointer to the
widget, then the java GeckoSession will never be notified that the
compositor has been opened, and composition will never be
resumed. This patch fixes this issue by setting the
UiCompositorControllerChild's widget pointer in its constructor rather
than immediately afterwards.
Differential Revision: https://phabricator.services.mozilla.com/D139842
We noticed a cold_view_nav_start regression on Fenix from enabling the
GPU process, and profiles showed time spent synchronously waiting for
the GPU process to launch. This occured because the compositor was
being created in nsWindow::Create, and as it requires the GPU process
to be running it had to block until launch completed. The process is
launched when the gfxPlatform is first initialized, but that was only
occuring immediately prior to creating the compositor, which did not
give it enough time to complete asynchronously.
This patch makes it so that we initialize the gfxPlatform slightly
earlier, and importantly delay creating the compositor until it is
actually required. This gives the process enough time to launch
asynchronously meaning we do not have to block.
We started deliberately creating the compositor early on Android
because of bug 1453501, to avoid a race condition where the compositor
didn't exist when RemoteLayerTreeOwner::Initialize was called, causing
us to use a basic layer manager. However, since bug 1741156 landed we
now create the compositor on-demand, meaning this is no longer a
possibility.
Delaying compositor creation can, however, uncover another race
condition. If the UICompositorControllerChild is opened on the UI
thread before the main thread is able to set its pointer to the
widget, then the java GeckoSession will never be notified that the
compositor has been opened, and composition will never be
resumed. This patch fixes this issue by setting the
UiCompositorControllerChild's widget pointer in its constructor rather
than immediately afterwards.
Differential Revision: https://phabricator.services.mozilla.com/D139842
If the android system kills the GPU process to free memory while the
app is in the background, then we want to avoid immediately restarting
the GPU process.
To achieve this, we make GPUProcessManager keep track of whether it is
in the foreground or background. If HandleProcessLost() gets called
while in the background then we destroy the existing compositor
sessions as before, but return early instead of immediately
relaunching the process. If the process has not been launched when the
app later gets foregrounded then we do so then.
The final part of HandleProcessLost(), which reinitializes the content
bridges and emits the "compositor-reinitialized" signal, has been
moved to a new function ReinitializeRendering(). If the GPU process
has been disabled, this gets called as-before at the end of
HandleProcessLost(). When the GPU process is enabled, however, we now
call it from OnProcessLaunchComplete(), so that it gets called
regardless of whether the process is launched immediately or after a
delay.
While we're here, rename the functions RebuildRemoteSessions() and
RebuildInProcessSessions() to DestroyRemoteCompositorSessions() and
DestroyInProcessCompositorSessions(), to better reflect what they
actually do: the "rebuilding" part occurs later on. Also update the
mega-comment documenting the restart sequence, as it was somewhat
outdated.
In case a caller of EnsureGPUReady() gets called before the foreground
signal arrives (eg in nsBaseWidget::CreateCompositorSession() due to a
refresh tick paint), make EnsureGPUReady() launch the GPU process
itself if the GPU process is enabled but not yet launched. As a
consequence, to avoid launching the GPU process unnecessarily, change
a couple callers of EnsureGPUReady() to simply check whether the
process is enabled instead.
Additionally, guard against a null pointer deref if the compositor has
been destroyed when the widget receives a memory pressure event. This
is now more likely to occur as there may be a gap between the
compositor being destroyed and recreated.
Differential Revision: https://phabricator.services.mozilla.com/D139042
Following a GPU process restart ZoomConstraints do not currently get
set for the newly recreated APZCTreeManagers, meaning it is no longer
possible to asynchronously zoom pages.
To solve this, we make ZoomConstraintsClient observe a new
"compositor-reinitialized" topic. We send this notification in
GPUProcessManager::HandleProcessLost() to notify
ZoomConstraintsClients for parent process documents, and in
ContentChild::RecvReinitRendering() for documents in their respective
content processes. This must be performed after the compositor has
been reinitialized so that the APZCTreeManagerChild is able to send
the constraints to the APZCTreeManagerParent in the compositor
process.
Differential Revision: https://phabricator.services.mozilla.com/D135207
Add a function to GPUProcessManager to force the GPU process to crash,
and expose it through gfxInfo. Expose this to geckoview tests via the
test-support webextension.
Add a junit test GpuCrashTest, which triggers a GPU process crash and
ensures the crash reporter was notified.
Additionally, ensure the TestCrashHandler service is stopped in
between tests, as otherwise only the first crash test to run will be
notified of the crash.
Differential Revision: https://phabricator.services.mozilla.com/D132812
GPU process crash reports are handled by calling GenerateCrashReport()
in GPUChild::ActorDestroy() if the reason is AbnormalShutdown. This
ensures we only create crash report if the process actually crashed,
and not when it was deliberately stopped.
However, sometimes actors other than GPUChild are the first to be
destroyed immediately after a crash, for example CompositorBridgeChild
or UiCompositorControllerChild. If such an actor receives an
ActorDestroy message with AbnormalShutdown as the reason, they will
call GPUProcessManager::NotifyRemoteActorDestroyed(), which leads to
GPUProcessHost::Shutdown(), which will close the PGPU channel. This
creates a race condition after a GPU process crash, where sometimes
the channel gets closed gracefully and ActorDestroy will receive a
NormalShutdown reason rather than AbnormalShutdown.
This patch adds a flag to GPUProcessHost::Shutdown() indicating
whether it is being called in response to an unexpected shutdown being
detected by another actor. If set, it sets a flag on the
GPUChild. When GPUChild::ActorDestroy() eventually gets called, it
knows to act in response to a crash if either the reason is
AbnormalShutdown or this flag has been set.
Differential Revision: https://phabricator.services.mozilla.com/D132811
This patch ensures that, following a GPU process crash, we
re-initialize the compositor and resume painting on Android.
nsWindow::GetWindowRenderer() is made to always reinitialize the
window renderer if there is none, like on other platforms. We
therefore no longer need to track whether webrender is being disabled,
as this is no longer a special case.
Previously we started the compositor as initially paused in
nsBaseWidget::CreateCompositorSession only if the widget did not yet
have a surface. Now we must unconditionally (re)start it as initially
paused, as even though the widget in the parent process may have a
surface, we will not have been able to send it to the GPU process
yet. We will send the surface to the compositor once control flow
returns to nsWindow::CreateLayerManager, where we will also now resume
the compositor if required.
Finally, we must ensure that we manually trigger a paint, both in the
parent and content processes. On other platforms this occurs
automatically following a GPU process loss through various refresh
driver events. On Android, however, nothing causes the refresh driver
to paint by itself, and we cannot receive input without first
initializing our APZ controllers, which does not happen until the
compositor receives a display list. We therefore must manually
schedule a paint. We do so from nsWindow::NotifyCompositorSessionLost
for the parent process, and BrowserChild::ReinitRendering for content
processes.
Differential Revision: https://phabricator.services.mozilla.com/D131232
Declare a GPU process and corresponding Service in the
AndroidManifest. This is of a new class GeckoServiceGpuProcess which
inherits from GeckoServiceChildProcess, and provides a binder
interface ICompositorSurfaceManager which allows the parent process to
set the compositor Surface for a given widget ID, and the compositor
in the GPU process to look up the Surface for a widget ID. The
ICompositorSurfaceManager interface is exposed to the parent process
through a new method getCompositorSurfaceManager() in the
IChildProcess interface.
Add a new connection type for GPU processes to GeckoProcessManager,
along with a function to look up the GPU process connection and fetch
the ICompositorSurfaceManager binder. When the GPU process is launched
we store the returned binder in the GPUProcessHost, and when each
widget's compositor is created we store a reference to the binder in
the UiCompositorControllerChild.
Each nsWindow is given a unique ID, and whenever the Surface changes
due to an Android resume event, it sends the new surface for that ID
to the GPU process (if enabled) by calling
ICompositorSurfaceManager.onSurfaceChanged().
Stop inheriting AndroidCompositorWidget from InProcessCompositorWidget
and instead inherit from CompositorWidget directly. This class holds a
reference to the Surface that will be rendered in to. The
CompositorBridgeParent notifies the CompositorWidget whenever it has
been resumed, allowing it to fetch the new Surface. For the
cross-process CompositorWidgetParent implementation it fetches that
Surface from the CompositorSurfaceManagerService, whereas the
InProcessAndroidCompositorWidget can read it directly from the real
widget.
AndroidCompositorWidget::GetClientSize() can now calculate its size
from the Surface, rather than racily reading the value from the
nsWindow. This means RenderCompositorEGL and RenderCompositorOGLSWGL
can now use GetClientSize() again rather than querying their own size
from the Surface.
With this patch, setting layers.gpu-process.enabled to true will cause
us to launch a GPU process and render from it. We do not yet
gracefully recover from a GPU process crash, nor can we render
anything using SurfaceTextures (eg video or webgl). Those will come in
future patches.
Differential Revision: https://phabricator.services.mozilla.com/D131231
On Android the APZ controller thread is the android UI thread, rather
than the Gecko main thread as on other platforms. There some places
where the main thread requires to call IAPZCTreeManager functions that
must run on the controller thread. Currently we use the function
DispatchToControllerThread() prior to calling various IAPZCTreeManager
APIs to achieve this.
This works just fine for now, as there is no GPU process on
Android. However, once we do add a GPU process we will encounter
issues:
Firstly, there will now be a cross-process APZInputBridge rather than
using an in-process APZCTreeManager. The PAPZInputBridge protocol is
managed by PGPU, and therefore must run on the main thread in the
parent process. The input we require to send over the bridge, however,
originates from the UI thread.
To solve this we can convert PAPZInputBridge to a top-level protocol,
and bind it to the UI thread on Android. We can then send input
directly from the UI thread without issues.
Secondly, the PAPZCTreeManager protocol must also run from the main
thread in the parent process, as it is managed by
PCompositorBridge. Unlike PAPZInputBridge we cannot convert
PAPZCTreeManager in to a top level protocol, as it relies on the
ordering guarantees with PCompositorBridge.
We must therefore ensure that we only dispatch IAPZCTreeManager calls
to the controller thread when using an in-process
APZCTreeManager. Out-of-process calls, on the other hand, must be
dispatched to the main thread where we can send IPDL commands from. To
do this, we move the dispatch logic away from the callsites of
IAPZCTreeManager APIs, and in to the APZCTreeManager and
APZCTreeManagerChild implementations themselves.
Differential Revision: https://phabricator.services.mozilla.com/D131120
Current code is not explicit about device recreation before session re-creation. It is actually done by nsWindow::OnPaint() before OnInProcessDeviceReset() call. But it is not explicit.
gfxWindowsPlatform::HandleDeviceReset() does d3d device re-creation if it is necessary.
Differential Revision: https://phabricator.services.mozilla.com/D130957
Originally, we would restart the GPU process a fixed number of attempts
based on the layers.gpu-process.max_restarts pref. With this patch, we
now use this pref to control how many "unstable" restarts we allow. A
restart is "stable" if and only if the process uptime exceeds the pref
layers.gpu-process.stable.min-uptime-ts and if the process renders a
total number of frames exceeding the pref
layers.gpu-process.stable.frame-threshold. This allows users to keep the
GPU process for a lot longer if they are encountering infrequent
crashes. Should the user experience the GPU process crashing quickly
and/or without rendering many frames, we will disable it as before after
a few attempts and move into the parent process.
Differential Revision: https://phabricator.services.mozilla.com/D114531
If a user is able to get D3D11, and Software WebRender hasn't been
forced on (either by the Fission experiment or our pref), then we prefer
D3D11 in late beta and release. This will allow users who start with
D3D11 in the GPU process, to fallback to Software WebRender in the GPU
process.
Differential Revision: https://phabricator.services.mozilla.com/D114286
We can disable WebRender because the GPU process crashed, or we
encountered a graceful runtime error in WebRender. This patch adds two
new prefs to control how that fallback works.
gfx.webrender.fallback.software-d3d11 controls if WebRender falls back
to Software WebRender + D3D11 compositing. If true, and the user is
allowed to get Software WebRender, we will fallback to Software
WebRender with the D3D11 compositor first.
gfx.webrender.fallback.software controls if WebRender falls back to
Software WebRender. If true, and the user is allowed to get Software
WebRender, we will fallback to Software WebRender without the D3D11
compositor.
gfx.webrender.fallback.basic controls if WebRender or Software
WebRender falls back to Basic. If true, it falls back to Basic.
Otherwise it continues to use Software WebRender without the D3D11
compositor. Note that this means OpenGL on Android.
This patch also means that gfx.webrender.all=true and MOZ_WEBRENDER=1
no longer disables Software WebRender. It will still prefer (Hardware)
WebRender but we want to allow fallback to Software WebRender for
configurations that forced WebRender on.
Differential Revision: https://phabricator.services.mozilla.com/D103491
We can disable WebRender because the GPU process crashed, or we
encountered a graceful runtime error in WebRender. This patch adds two
new prefs to control how that fallback works.
gfx.webrender.fallback.software-d3d11 controls if WebRender falls back
to Software WebRender + D3D11 compositing. If true, and the user is
allowed to get Software WebRender, we will fallback to Software
WebRender with the D3D11 compositor first.
gfx.webrender.fallback.software controls if WebRender falls back to
Software WebRender. If true, and the user is allowed to get Software
WebRender, we will fallback to Software WebRender without the D3D11
compositor.
gfx.webrender.fallback.basic controls if WebRender or Software
WebRender falls back to Basic. If true, it falls back to Basic.
Otherwise it continues to use Software WebRender without the D3D11
compositor. Note that this means OpenGL on Android.
This patch also means that gfx.webrender.all=true and MOZ_WEBRENDER=1
no longer disables Software WebRender. It will still prefer (Hardware)
WebRender but we want to allow fallback to Software WebRender for
configurations that forced WebRender on.
Differential Revision: https://phabricator.services.mozilla.com/D103491
We can disable WebRender because the GPU process crashed, or we
encountered a graceful runtime error in WebRender. This patch adds two
new prefs to control how that fallback works.
gfx.webrender.fallback.software-d3d11 controls if WebRender falls back
to Software WebRender + D3D11 compositing. If true, and the user is
allowed to get Software WebRender, we will fallback to Software
WebRender with the D3D11 compositor first.
gfx.webrender.fallback.software controls if WebRender falls back to
Software WebRender. If true, and the user is allowed to get Software
WebRender, we will fallback to Software WebRender without the D3D11
compositor.
gfx.webrender.fallback.basic controls if WebRender or Software
WebRender falls back to Basic. If true, it falls back to Basic.
Otherwise it continues to use Software WebRender without the D3D11
compositor. Note that this means OpenGL on Android.
This patch also means that gfx.webrender.all=true and MOZ_WEBRENDER=1
no longer disables Software WebRender. It will still prefer (Hardware)
WebRender but we want to allow fallback to Software WebRender for
configurations that forced WebRender on.
Differential Revision: https://phabricator.services.mozilla.com/D103491
Lack of support of (hardware) WebRender should not block Software
WebRender support. This can happen when ANGLE isn't supported, for
example.
Differential Revision: https://phabricator.services.mozilla.com/D100445
Lack of support of (hardware) WebRender should not block Software
WebRender support. This can happen when ANGLE isn't supported, for
example.
Differential Revision: https://phabricator.services.mozilla.com/D100445
Aside from on Windows, we do not appear to handle device resets properly
without the GPU process. This patch adds in the necessary plumbing to
handle the device reset properly. It also ensures that whenever we check
for a device reset reason, we handle all of the reasons (e.g. not just
the NV video memory purge reset reason) to ensure they are not lost, and
handles them all consistently in the same manner.
It also tracks the number of device resets for thresholding purposes
with an in process compositor. While it will only disable WebRender on
Linux at this time, it will put a note in the critical log if the
threshold was exceeded on all platforms. This may prove useful in
evaluating whether or not we should do the same everywhere.
Differential Revision: https://phabricator.services.mozilla.com/D98705
We use MOZ_WEBRENDER to force WebRender on in our testing
infrastructure. We may silently fallback to basic during our tests if we
encounter an error and the test may pass as a result. It would be best
if we asserted we don't lose WebRender while testing.
Differential Revision: https://phabricator.services.mozilla.com/D97004
If we don't disable hardware acceleration when we lose WebRender, we
will likely fallback to OpenGL compositing. This is not a supported
configuration. We already do this step if we decide against using
WebRender during configuration, but not if it was lost at runtime.
Differential Revision: https://phabricator.services.mozilla.com/D97355
Use ID3D11VideoProcessor for video frame rendering.
WebRenderError::VIDEO_OVERLAY does not cause disabling WebRender. It just change gfxVars::UseWebRenderDCompVideoOverlayWin() to false.
Differential Revision: https://phabricator.services.mozilla.com/D88763
We don't know why we see initialization failures in the telemetry which
makes it hard to investigate why users aren't getting WebRender and
instead fallback to basic. Let's expose the detailed error message
WebRender already generates and puts in the critical log.
Differential Revision: https://phabricator.services.mozilla.com/D89185