forked from mirrors/gecko-dev
		
	 c9738d3a71
			
		
	
	
		c9738d3a71
		
	
	
	
	
		
			
			Rather than checking every minute for work to do, use .exchange() to check if sShouldStartFrecencyRecalculation switches to true and if so dispatch a runnable to notify PlacesFrecencyRecalculator. The latter sets back sShouldStartFrecencyRecalculation to false once the recalculation is complete. Differential Revision: https://phabricator.services.mozilla.com/D171093
		
			
				
	
	
		
			229 lines
		
	
	
	
		
			10 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			229 lines
		
	
	
	
		
			10 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| =======
 | ||
| Ranking
 | ||
| =======
 | ||
| 
 | ||
| Before results appear in the UrlbarView, they are fetched from providers.
 | ||
| 
 | ||
| Each `UrlbarProvider <https://firefox-source-docs.mozilla.org/browser/urlbar/overview.html#urlbarprovider>`_
 | ||
| implements its own internal ranking and returns sorted results.
 | ||
| 
 | ||
| Externally all the results are ranked by the `UrlbarMuxer <https://searchfox.org/mozilla-central/source/browser/components/urlbar/UrlbarMuxerUnifiedComplete.jsm>`_
 | ||
| according to an hardcoded list of groups and sub-grups.
 | ||
| 
 | ||
| .. NOTE:: Preferences can influence the groups order, for example by putting
 | ||
|   Firefox Suggest before Search Suggestions.
 | ||
| 
 | ||
| The Places provider, responsible to return history and bookmark results, uses
 | ||
| an internal ranking algorithm called Frecency.
 | ||
| 
 | ||
| Frecency implementation
 | ||
| =======================
 | ||
| 
 | ||
| Frecency is a term derived from `frequency` and `recency`, its scope is to provide a
 | ||
| ranking algorithm that gives importance both to how often a page is accessed and
 | ||
| when it was last visited.
 | ||
| Additionally, it accounts for the type of each visit through a bonus system.
 | ||
| 
 | ||
| To account for `recency`, a bucketing system is implemented.
 | ||
| If a page has been visited later than the bucket cutoff, it gets the weight
 | ||
| associated with that bucket:
 | ||
| 
 | ||
| - Up to 4 days old - weight 100 - ``places.frecency.firstBucketCutoff/Weight``
 | ||
| - Up to 14 days old - weight 70 - ``places.frecency.secondBucketCutoff/Weight``
 | ||
| - Up to 31 days old - weight 50 - ``places.frecency.thirdBucketCutoff/Weight``
 | ||
| - Up to 90 days old - weight 30 - ``places.frecency.fourthBucketCutoff/Weight``
 | ||
| - Anything else - weight 10 - ``places.frecency.defaultBucketWeight``
 | ||
| 
 | ||
| To account for `frequency`, the total number of visits to a page is used to
 | ||
| calculate the final score.
 | ||
| 
 | ||
| The type of each visit is taken into account using specific bonuses:
 | ||
| 
 | ||
| Default bonus
 | ||
|   Any unknown type gets a default bonus. This is expected to be unused.
 | ||
|   Pref ``places.frecency.defaultVisitBonus`` current value: 0.
 | ||
| Embed
 | ||
|   Used for embedded/framed visits not due to user actions. These visits today
 | ||
|   are stored in memory and never participate to frecency calculation.
 | ||
|   Thus this is currently unused.
 | ||
|   Pref ``places.frecency.embedVisitBonus`` current value: 0.
 | ||
| Framed Link
 | ||
|   Used for cross-frame visits due to user action.
 | ||
|   Pref ``places.frecency.framedLinkVisitBonus`` current value: 0.
 | ||
| Download
 | ||
|   Used for download visits. It’s important to support link coloring for these
 | ||
|   visits, but they are not necessarily useful address bar results (the Downloads
 | ||
|   view can do a better job with  these), so their frecency can be low.
 | ||
|   Pref ``places.frecency.downloadVisitBonus`` current value: 0.
 | ||
| Reload
 | ||
|   Used for reload visits (refresh same page). Low because it should not be
 | ||
|   possible to influence frecency by multiple reloads.
 | ||
|   Pref ``places.frecency.reloadVisitBonus`` current value: 0.
 | ||
| Redirect Source
 | ||
|   Used when the page redirects to another one.
 | ||
|   It’s a low value because we give more importance to the final destination,
 | ||
|   that is what the user actually visits, especially for permanent redirects.
 | ||
|   Pref ``places.frecency.redirectSourceVisitBonus`` current value: 25.
 | ||
| Temporary Redirect
 | ||
|   Used for visits resulting from a temporary redirect (HTTP 307).
 | ||
|   Pref ``places.frecency.tempRedirectVisitBonus`` current value: 40.
 | ||
| Permanent Redirect
 | ||
|   Used for visits resulting from a permanent redirect (HTTP 301). This is the
 | ||
|   new supposed destination for a url, thus the bonus is higher than temporary.
 | ||
|   In this case it may be advisable to just pick the bonus for the source visit.
 | ||
|   Pref ``places.frecency.permRedirectVisitBonus`` current value: 50.
 | ||
| Bookmark
 | ||
|   Used for visits generated from bookmark views.
 | ||
|   Pref ``places.frecency.bookmarkVisitBonus`` current value: 75.
 | ||
| Link
 | ||
|   Used for normal visits, for example when clicking on a link.
 | ||
|   Pref ``places.frecency.linkVisitBonus`` current value: 100.
 | ||
| Typed
 | ||
|   Intended to be used for pages typed by the user, in reality it is used when
 | ||
|   the user picks a url from the UI (history views or the Address Bar).
 | ||
|   Pref ``places.frecency.typedVisitBonus`` current value: 2000.
 | ||
| 
 | ||
| The above bonuses are applied to visits, in addition to that there are also a
 | ||
| few bonuses applied in case a page is not visited at all, both of these bonuses
 | ||
| can be applied at the same time:
 | ||
| 
 | ||
| Unvisited bookmarked page
 | ||
|   Used for pages that are bookmarked but unvisited.
 | ||
|   Pref ``places.frecency.unvisitedBookmarkBonus`` current value: 140.
 | ||
| Unvisited typed page
 | ||
|   Used for pages that were typed and now are bookmarked (otherwise they would
 | ||
|   be orphans).
 | ||
|   Pref ``places.frecency.unvisitedTypedBonus`` current value: 200.
 | ||
| 
 | ||
| Two special frecency values are also defined:
 | ||
| 
 | ||
| - ``-1`` represents a just inserted entry in the database, whose score has not
 | ||
|   been calculated yet.
 | ||
| - ``0`` represents an entry for which a new value should not be calculated,
 | ||
|   because it has a poor user value (e.g. place: queries) among search results.
 | ||
| 
 | ||
| Finally, because calculating a score from all of the visits every time a new
 | ||
| visit is added would be expensive, only a sample of the last 10
 | ||
| (pref ``places.frecency.numVisits``) visits is used.
 | ||
| 
 | ||
| How frecency for a page is calculated
 | ||
| -------------------------------------
 | ||
| 
 | ||
| .. mermaid::
 | ||
|     :align: center
 | ||
|     :caption: Frecency calculation flow
 | ||
| 
 | ||
|     flowchart TD
 | ||
|         start[URL]
 | ||
|         a0{Has visits?}
 | ||
|         a1[Get last 10 visit]
 | ||
|         a2[bonus = unvisited_bonus + bookmarked + typed]
 | ||
|         a3{bonus > 0?}
 | ||
|         end0[Frecency = 0]
 | ||
|         end1["frecency = age_bucket_weight * (bonus / 100)"]
 | ||
|         a4[Sum points of all sampled visits]
 | ||
|         a5{points > 0?}
 | ||
|         end2[frecency = -1]
 | ||
|         end3["Frecency = visit_count * (points / sample_size)"]
 | ||
|         subgraph sub [Per each visit]
 | ||
|             sub0[bonus = visit_type_bonus]
 | ||
|             sub1{bookmarked?}
 | ||
|             sub2[add bookmark bonus]
 | ||
|             sub3["score = age_bucket_weight * (bonus / 100)"]
 | ||
|             sub0 --> sub1
 | ||
|             sub1 -- yes --> sub2
 | ||
|             sub1 -- no --> sub3
 | ||
|             sub2 --> sub3;
 | ||
|         end
 | ||
|         start --> a0
 | ||
|         a0 -- no --> a2
 | ||
|         a2 --> a3
 | ||
|         a3 -- no --> end0
 | ||
|         a3 -- yes --> end1
 | ||
|         a0 -- yes --> a1
 | ||
|         a1 --> sub
 | ||
|         sub --> a4
 | ||
|         a4 --> a5
 | ||
|         a5 -- no --> end2
 | ||
|         a5 -- yes --> end3
 | ||
| 
 | ||
| 1. If the page is visited, get a sample of ``NUM_VISITS`` most recent visits.
 | ||
| 2. For each visit get a transition bonus, depending on the visit type.
 | ||
| 3. If the page is bookmarked, add to the bonus an additional bookmark bonus.
 | ||
| 4. If the bonus is positive, get a bucket weight depending on the visit date.
 | ||
| 5. Calculate points for the visit as ``age_bucket_weight * (bonus / 100)``.
 | ||
| 6. Sum points for all the sampled visits.
 | ||
| 7. If the points sum is zero, return a ``-1`` frecency, it will still appear in the UI.
 | ||
|    Otherwise, frecency is ``visitCount * points / NUM_VISITS``.
 | ||
| 8. If the page is unvisited and not bookmarked, or it’s a bookmarked place-query,
 | ||
|    return a ``0`` frecency, to hide it from the UI.
 | ||
| 9. If it’s bookmarked, add the bookmark bonus.
 | ||
| 10. If it’s also a typed page, add the typed bonus.
 | ||
| 11. Frecency is ``age_bucket_weight * (bonus / 100)``
 | ||
| 
 | ||
| When frecency for a page is calculated
 | ||
| --------------------------------------
 | ||
| 
 | ||
| Operations that may influence the frecency score are:
 | ||
| 
 | ||
| * Adding visits
 | ||
| * Removing visits
 | ||
| * Adding bookmarks
 | ||
| * Removing bookmarks
 | ||
| * Changing the url of a bookmark
 | ||
| 
 | ||
| Frecency is recalculated:
 | ||
| 
 | ||
| * Immediately, when a new visit is added. The user expectation here is that the
 | ||
|   page appears in search results after being visited. This is also valid for
 | ||
|   any History API that allows to add visits.
 | ||
| * In background on idle times, in any other case. In most cases having a
 | ||
|   temporarily stale value is not a problem, the main concern would be privacy
 | ||
|   when removing history of a page, but removing whole history will either
 | ||
|   completely remove the page or, if it's bookmarked, it will still be relevant.
 | ||
|   In this case, when a change influencing frecency happens, the ``recalc_frecency``
 | ||
|   database field for the page is set to ``1``.
 | ||
| 
 | ||
| Recalculation is done by the `PlacesFrecencyRecalculator <https://searchfox.org/mozilla-central/source/toolkit/components/places/PlacesFrecencyRecalculator.sys.mjs>`_ module.
 | ||
| The Recalculator is notified when ``PlacesUtils.history.shouldStartFrecencyRecalculation``
 | ||
| value changes from false to true, that means there's values to recalculate.
 | ||
| A DeferredTask is armed, that will look for a user idle opportunity
 | ||
| in the next 5 minutes, otherwise it will run when that time elapses.
 | ||
| Once all the outdated values have been recalculated
 | ||
| ``PlacesUtils.history.shouldStartFrecencyRecalculation`` is set back to false
 | ||
| until the next operation invalidating a frecency.
 | ||
| The recalculation task is also armed on the ``idle-daily`` notification.
 | ||
| 
 | ||
| When the task is executed, it recalculates frecency of a chunk of pages. If
 | ||
| there are more pages left to recalculate, the task is re-armed. After frecency
 | ||
| of a page is recalculated, its ``recalc_frecency`` field is set back to ``0``.
 | ||
| 
 | ||
| Frecency is also decayed daily during the ``idle-daily`` notification, by
 | ||
| multiplying all the scores by a decay rate  of ``0.975`` (half-life of 28 days).
 | ||
| This guarantees entries not receiving new visits or bookmarks lose relevancy.
 | ||
| 
 | ||
| 
 | ||
| Adaptive Input History
 | ||
| ======================
 | ||
| 
 | ||
| Input History (also known as Adaptive History) is a feature that allows to
 | ||
| find urls that the user previously picked. To do so, it associates search strings
 | ||
| with picked urls.
 | ||
| 
 | ||
| Adaptive results are usually presented before frecency derived results, making
 | ||
| them appear as having an infinite frecency.
 | ||
| 
 | ||
| When the user types a given string, and picks a result from the address bar, that
 | ||
| relation is stored and increases a use_count field for the given string.
 | ||
| The use_count field asymptotically approaches a max of ``10`` (the update is
 | ||
| done as ``use_count * .9 + 1``).
 | ||
| 
 | ||
| On querying, all the search strings that start with the input string are matched,
 | ||
| a rank is calculated per each page as ``ROUND(MAX(use_count) * (1 + (input = :search_string)), 1)``,
 | ||
| so that results perfectly matching the search string appear at the top.
 | ||
| Results with the same rank are additionally sorted by descending frecency.
 | ||
| 
 | ||
| On daily idles, when frecency is decayed, also input history gets decayed, in
 | ||
| particular the use_count field is multiplied by a decay rate  of ``0.975``.
 | ||
| After decaying, any entry that has a ``use_count < 0.975^90 (= 0.1)`` is removed,
 | ||
| thus entries are removed if unused for 90 days.
 |