This patch does the following:
1. Remove testing files from disk because they are no longer required.
2. Load completions from previous version of HashStore until an update
is applied.
3. Older version of HashStore(.sbstore) & PrefixSet(.vlpset) will be
removed during an update
Differential Revision: https://phabricator.services.mozilla.com/D36002
--HG--
extra : moz-landing-system : lando
Create test entries via update introduces performance overhead.
We can store them directly in LookupCache and do not save test entries
to disk.
Differential Revision: https://phabricator.services.mozilla.com/D34576
--HG--
extra : moz-landing-system : lando
1. VariableLengthPrefixSet supports getting/setting prefixes with
AddPrefixArray and AddCompletesArray
2. VariableLengthPrefixSet supports passing prefix as an integer in
Match API. This is because how V2 and V4 see prefixes as an integer
works differently.
Differential Revision: https://phabricator.services.mozilla.com/D34547
--HG--
extra : moz-landing-system : lando
The goal of the series of patches is to improve Safe Browsing performance by
skipping uncessary file IO.
The first two patches is to remove the dependency between LookupCache and HashStore, so HashStore is only
responsible for udpates.
Before this patch, LookupCacheV2 treats prefixes and completions
differently. It uses two data structures to maintain
prefixes:
1. mPrefixSet to store prefixes from .pset
2. mUpdateCompletions to store completions from .sbstore
After this patch
1. LookupCacheV2 & LookupCacheV4 both use variable-length
prefix set. mUpdateCompletions and mPrefixSet are removed and
mVLPrefixSet is used to store all prefixes data.
2. Move common function to base class.
Note that in this patch, conversion between 4/32 bytes prefixes and
mVLPrefixSet is not yet included, it will be handled in next patch.
This patch tries not to deal with any logic changes, only focus on refining
LookupCacheV2 & LookupCacheV4 class structure to use variable-length
prefixset for both classes.
Differential Revision: https://phabricator.services.mozilla.com/D34546
--HG--
extra : moz-landing-system : lando
After calling Lookup API per table, Safe Browsing outputs too many debug
message for a single URL lookup. Refine the current output.
Differential Revision: https://phabricator.services.mozilla.com/D27066
--HG--
extra : moz-landing-system : lando
Here is the flow how prefixes are handled during an V4 update:
1. Prefixes are received from Safe Browsing update, stored in ProtocolBuffer
2. Copy the prefixes from ProtocolBuffer to TableUpdate structure
3. Prefixes in TableUpdate are merged with local prefixes (stored in LookupCacheV4)
4. Merged prefixes are processes by PrefixSet to generate the in-memory prefix
set data structure (MakePrefixSet).
In this patch, we free the prefixes stored in TableUpdate right after step3.
This reduces the peak memory used during an update (peak happens in step 4).
Differential Revision: https://phabricator.services.mozilla.com/D26860
--HG--
extra : moz-landing-system : lando
After this patch, we may have the following files in SafeBrowsing
directory:
- (v2) .sbstore : Store V2 chunkdata, for update, MD5 integrity check
while load
- (v2) .pset : Store V2 prefixset, for lookup, load upon startup, no
integrity check
- (v4) .metadata : Store V4 state, for update, no integrity check
- (v4) .vlpset : Store V4 prefixset, for lookup, load upon startup,
CRC32 integrity check
- (v4) .pset : V4 prefix set before this patch, should be removed
The magic string is also added to ".vlpset" header so we can add
a telemetry to see if sanity check is good enough for prefix set
integrity check (The telemetry is not yet added). If yes, we can remove
the CRC32 in the future for even better performance.
Differential Revision: https://phabricator.services.mozilla.com/D21463
--HG--
extra : moz-landing-system : lando
SafeBrowsing prefix files LOAD/SAVE operations are handled in xxxPrefixSet.cpp.
It would be more clear if xxxPrefixSet.cpp only processes prefix data,
while LookupCacheV2/LookupCacheV4 which use prefix set process file.
This patch doesn't change any behavior, testcases need to update because
the LookupCache & xxxPrefixSet APIs are changed.
Differential Revision: https://phabricator.services.mozilla.com/D21462
--HG--
extra : moz-landing-system : lando
SHA256 is an expensive operation, we should avoid using them if
possible. SafeBrowsing prefix files are loaded during startup and
verify integrity with SHA256 which may affect the performance
especially on the low-end device.
This patch simply removes the SHA256 integrity check. CRC32 version
integrity check will be introduced in the other patch.
This patch also changes the behavior of recording
"Telemetry::URLCLASSIFIER_VLPS_LOAD_CORRUPT" a little bit.
It used to records only once per session(during startup, the first
time we load prefix set), now it records per update.
Differential Revision: https://phabricator.services.mozilla.com/D21461
--HG--
extra : moz-landing-system : lando
SafeBrowsing V4 protocol use SHA-256 as the checksum to check integrity
of update data and also the integrity of prefix files.
SafeBrowsing V2 HashStore use MD5 as the checksum to check integrity of
.sbstore
Since we are going to use CRC32 as the integrity check of V4 prefix files,
I think rename V4 "checksum" to SHA256 can improve readability.
Differential Revision: https://phabricator.services.mozilla.com/D21460
--HG--
extra : moz-landing-system : lando
After this patch, we may have the following files in SafeBrowsing
directory:
- (v2) .sbstore : Store V2 chunkdata, for update, MD5 integrity check
while load
- (v2) .pset : Store V2 prefixset, for lookup, load upon startup, no
integrity check
- (v4) .metadata : Store V4 state, for update, no integrity check
- (v4) .vlpset : Store V4 prefixset, for lookup, load upon startup,
CRC32 integrity check
- (v4) .pset : V4 prefix set before this patch, should be removed
The magic string is also added to ".vlpset" header so we can add
a telemetry to see if sanity check is good enough for prefix set
integrity check (The telemetry is not yet added). If yes, we can remove
the CRC32 in the future for even better performance.
Differential Revision: https://phabricator.services.mozilla.com/D21463
--HG--
extra : moz-landing-system : lando
SafeBrowsing prefix files LOAD/SAVE operations are handled in xxxPrefixSet.cpp.
It would be more clear if xxxPrefixSet.cpp only processes prefix data,
while LookupCacheV2/LookupCacheV4 which use prefix set process file.
This patch doesn't change any behavior, testcases need to update because
the LookupCache & xxxPrefixSet APIs are changed.
Differential Revision: https://phabricator.services.mozilla.com/D21462
--HG--
extra : moz-landing-system : lando
SHA256 is an expensive operation, we should avoid using them if
possible. SafeBrowsing prefix files are loaded during startup and
verify integrity with SHA256 which may affect the performance
especially on the low-end device.
This patch simply removes the SHA256 integrity check. CRC32 version
integrity check will be introduced in the other patch.
This patch also changes the behavior of recording
"Telemetry::URLCLASSIFIER_VLPS_LOAD_CORRUPT" a little bit.
It used to records only once per session(during startup, the first
time we load prefix set), now it records per update.
Differential Revision: https://phabricator.services.mozilla.com/D21461
--HG--
extra : moz-landing-system : lando
SafeBrowsing V4 protocol use SHA-256 as the checksum to check integrity
of update data and also the integrity of prefix files.
SafeBrowsing V2 HashStore use MD5 as the checksum to check integrity of
.sbstore
Since we are going to use CRC32 as the integrity check of V4 prefix files,
I think rename V4 "checksum" to SHA256 can improve readability.
Differential Revision: https://phabricator.services.mozilla.com/D21460
--HG--
extra : moz-landing-system : lando
I tried to make TableUpdateArray point to const TableUpdate objects
everywhere but there were two problems:
- HashStore::ApplyUpdate() triggers a few Merge() calls which include
sorting the underlying TableUpdate object first.
- LookupCacheV4::ApplyUpdate() calls TableUpdateV4::NewChecksum() when the
checksum is missing and that sets mChecksum.
MozReview-Commit-ID: LIhJcoxo7e7
--HG--
extra : rebase_source : f6ca4bf42d1ddb897a974a0b19c7185b0b6b93af
Manually keeping tabs on the lifetime of these objects is a pain
and is the likely source of some of our crashes. I suspect we might
also be leaking memory.
This change creates an explicit copy of the main array into the
update thread to avoid using a non-thread-safe shared data
structure. This is a shallow copy. Only the pointers to the
TableUpdates are copied, which means one pointer per list (e.g. 5
in total for google4 in a new profile).
MozReview-Commit-ID: 221d6GkKt0M
--HG--
extra : rebase_source : e1b81f11bb9b41e465571a95845079f455b5868e
This should help narrow down which of the code paths is responsible
for the intermittent failures we are seeing.
MozReview-Commit-ID: JHVZzixpOg6
--HG--
extra : rebase_source : c2eefb0bf0f281adf393d2a531db3e5d3ce023d6
Given we're no longer using dependent strings in
LookupCacheV4::PrefixString(), we will end up make a copy of the
prefixes at some point. Let's do it early and remove a bunch of
complicated code.
Make the string copies fallible so that we return an error and
fail the update instead of crashing.
MozReview-Commit-ID: 5cZHSDIJSlD
--HG--
extra : rebase_source : 0ad130c2be9caf528bb764296836e91fa8a30916
Also document the meaning of mPrimed in LookupCache.h.
MozReview-Commit-ID: 63GAHwU3Rx3
--HG--
extra : rebase_source : 11307604f957efbd6879db83f916a9cb369f93ff
Dependent strings are recommended only when dealing with a character
buffer (i.e. char*). Using it here makes it more likely that we'll
hang on to a string buffer that will be deallocated.
nsCString will by default share the underlying string buffers when
it can (i.e. when copying entire strings on the heap) so it should
be able to avoid unnecessary copies.
MozReview-Commit-ID: 3rTUYmouzcT
--HG--
extra : rebase_source : 8563bc242bf41cf8e12aa0b4ccba32c6a38d952a
RegenActiveTables() relies on mPrimed being set correctly and so
the V4 lookup cache should behave the same way as the V2 one.
The V2 lookup cache on the other hand was unnecessarily setting
mPrimed to true twice.
MozReview-Commit-ID: LwNdI9DTqZ7
--HG--
extra : rebase_source : ff7d7a051f28867be5c35c1079407bb2bfd3a490
Disk corruption can lead to the stored length of a value to be
unreasonably large and trigger an OOM.
Since values are all currently <= 32 bytes, we can safely enforce
a 256-byte upper bound.
MozReview-Commit-ID: XygReOpEK3
--HG--
extra : rebase_source : cd2bc880d43612b04d43fb4132bba32bdc3d0f3e
After adopting the new thread model for safebrowsing, we will create a new
lookup cache for update so we can still check lookup cache at the same time.
Prefix set, completions will be generated when we open the new lookup cache
but it won't include cache, so we will loss cache after that.
This patch will copy cache data from old lookup cache to new lookup
cache while update.
MozReview-Commit-ID: L0WpiHOGIGm
--HG--
extra : rebase_source : ebaf249535561b87a983a3f24dacb8fbab7c096e
In this patch, we will make Safebrowsing V2 caching use the same algorithm as V4.
So we remove "mMissCache" for negative caching and TableFresness check for
positive caching.
But Safebrowsing V2 doesn't contain negative/positive cache duration information in
gethash response. So we hard-code a fixed value, 15 minutes, as cache duration.
In this way, we can sync the mechanism we handle caching for V2 and V4.
An extra effort for V2 here is that we need to manually record prefixes misses
because we won't get any response for those prefixes(implemented in
nsUrlClassifierLookupCallback::CacheMisses).
--HG--
extra : rebase_source : 0fb7938464ad24a01a51493ecb1b5c0197c16123
In Bug 1323953, we always send 4-bytes prefix for completion and the prefix is also
used as the key to store cache result from gethash request.
Since it is always 4-bytes, we could convert it to integer for simplicity.
MozReview-Commit-ID: Lkvrg0wvX5Z
--HG--
extra : rebase_source : 7002b1f55bc0f88ed90340082574cc931f8db87a
LookupCacheV4::Has implements safebrowsing v4 caching logic.
1. Check if fullhash match any prefix in local database:
- If not, the URL is safe.
2. Check if prefix is in the cache(prefix is always the first 4-byte of
the fullhash, Bug 1323953):
- If not, send fullhash request
3. Check if fullhash is in the positive cache:
- If fullhash is found and it is not expired, the URL is not safe.
- If fullhash is found and it is expired, send fullhash request.
4. If fullhash is not found, check negative cache expired time:
- If negative cache time is not expired, the URL is safe.
- If negative cache time is expired, send fullhash request.
MozReview-Commit-ID: GRX7CP8ig49
--HG--
extra : rebase_source : 6d3ae8929b11731584810c44d46a1526f7d83e12
LookupCacheV4::Has implements safebrowsing v4 caching logic.
1. Check if fullhash match any prefix in local database:
- If not, the URL is safe.
2. Check if prefix is in the cache(prefix is always the first 4-byte of
the fullhash, Bug 1323953):
- If not, send fullhash request
3. Check if fullhash is in the positive cache:
- If fullhash is found and it is not expired, the URL is not safe.
- If fullhash is found and it is expired, send fullhash request.
4. If fullhash is not found, check negative cache expired time:
- If negative cache time is not expired, the URL is safe.
- If negative cache time is expired, send fullhash request.
MozReview-Commit-ID: GRX7CP8ig49
--HG--
extra : rebase_source : bcb5fa7aa2b7b116d862e3382447611424603c2d