forked from mirrors/gecko-dev
Bug 1806793 - Update documentation on how to manage en-US dictionary, r=sylvestre
Depends on D165303 Differential Revision: https://phabricator.services.mozilla.com/D165304
This commit is contained in:
parent
df55534a51
commit
d974a31757
4 changed files with 150 additions and 78 deletions
|
|
@ -1,28 +1,158 @@
|
|||
======================================
|
||||
Managing the built-in en-US dictionary
|
||||
======================================
|
||||
|
||||
The en-US build of Firefox includes a built-in Hunspell dictionary based on the
|
||||
`SCOWL`_ dataset. This document describes the process to add new words to the
|
||||
dictionary, or update it to the current upstream version.
|
||||
|
||||
For more information about Hunspell or the affix file format, you can check
|
||||
`the Ubuntu man page for hunspell
|
||||
<https://manpages.ubuntu.com/manpages/bionic/man5/hunspell.5.html>`_.
|
||||
|
||||
Requesting to add new words to the en-US dictionary
|
||||
===================================================
|
||||
|
||||
If you’d like to add new words to the dictionary, you can `file a bug`_. Try to
|
||||
provide information on the terms you want to add, in particular references to
|
||||
external sources that confirm the usage of the term.
|
||||
|
||||
Adding new words to the en-US dictionary
|
||||
========================================
|
||||
|
||||
Occasionally bugs are filed pointing out situations where perfectly
|
||||
legitimate words are missing from the English spell check dictionary in
|
||||
Firefox. This article describes the process for adding a word to the
|
||||
dictionary.
|
||||
This section describes the process for adding a word to the dictionary:
|
||||
|
||||
The process is pretty straight-forward:
|
||||
|
||||
#. Get a clone of mozilla-central (see :ref:`Firefox Contributors' Quick Reference`), if
|
||||
you don't already have one, and make sure you can build it
|
||||
#. Get a clone of mozilla-central (see :ref:`Firefox Contributors' Quick
|
||||
Reference`), if you don’t already have one, and make sure you can build it
|
||||
successfully.
|
||||
#. Get into the dictionary sources directory using this command:
|
||||
``cd extensions/spellcheck/locales/en-US/hunspell/dictionary-sources``
|
||||
#. There's a special script used for editing dictionaries. The script
|
||||
#. There’s a special script used for editing dictionaries. The script
|
||||
only works if you have the environment variable ``EDITOR`` set to the
|
||||
executable of an editor program; if you don't have it set, you can use
|
||||
``EDITOR=vim sh edit-dictionary`` to edit using vim (or you can
|
||||
substitute some other editor), or you can just type
|
||||
executable of an editor program; if you don’t have it set, you can use
|
||||
``EDITOR=vim sh edit-dictionary`` to edit using ``vim`` (or you can
|
||||
substitute it with another editor), or you can just type
|
||||
``sh edit-dictionary`` if you have an ``EDITOR`` already specified.
|
||||
#. Add and remove words in the dictionary file, then quit the editor.
|
||||
#. Use ``sh merge-dictionaries`` to process the dictionary changes you've
|
||||
made.
|
||||
#. Move the revised dictionary file into position: ``mv en-US.dic ..``
|
||||
#. Build Firefox and test your updated dictionary. Once you're
|
||||
#. Build Firefox and test your updated dictionary. Once you’re
|
||||
satisfied, use the process described in :ref:`write_a_patch` to create a
|
||||
patch.
|
||||
|
||||
Note that the update script will modify 2 files, and both need to be committed:
|
||||
|
||||
* ``en-US.dic``: the dictionary actually shipping in the build and uses
|
||||
ISO-8859-1 encoding.
|
||||
* ``utf8/en-US.dic``: a version of the same dictionary with UTF-8 encoding. This
|
||||
is used to work around issues with Phabricator, and it allows to display
|
||||
actual changes in the diff.
|
||||
|
||||
Upgrading dictionary to a new upstream version of SCOWL
|
||||
=======================================================
|
||||
|
||||
The English dictionary available in mozilla-central is based on the
|
||||
`SCOWL`_ dictionary. Some scripts distributed with the SCOWL package are
|
||||
used to generate the files for the en-US dictionary.
|
||||
|
||||
The working directory for this process is
|
||||
``extensions/spellcheck/locales/en-US/hunspell/dictionary-sources``.
|
||||
|
||||
#. Download the latest version of the dictionary from `SCOWL`_ homepage or
|
||||
`SourceForce`_ as a tarball (tag.gz) and unpack it in the working directory.
|
||||
Rename the resulting folder from ``scowl-YYYY.MM.DD`` to ``scowl``.
|
||||
#. Run the script ``sh make-new-dict`` to generate a new dictionary and make
|
||||
sure it runs without any errors. For more details on this script, see the
|
||||
`make-new-dict`_ section.
|
||||
#. Do a sanity check on the resulting dictionary file ``en_US-mozilla.dic``. For
|
||||
example, make sure that the size is about the same as the original dictionary
|
||||
(or slightly larger).
|
||||
#. If everything looks correct, use ``sh install-new-dict`` to copy the
|
||||
generated file in the right position and use the process described in
|
||||
:ref:`write_a_patch` to create a patch.
|
||||
|
||||
Info about the file structure
|
||||
=============================
|
||||
|
||||
mozilla-exclusions.txt
|
||||
----------------------
|
||||
|
||||
``mozilla-exclusions.txt`` is used to explicitly exclude some words from
|
||||
suggestions. The ``make-new-dict`` script will add them to the dictionary file
|
||||
with the ``/!`` flag.
|
||||
|
||||
Terms should be added to this file with exactly the same format used in the .dic
|
||||
file, including affix rules if available.
|
||||
|
||||
mozilla-specific.txt
|
||||
--------------------
|
||||
|
||||
This file contains Mozilla-specific words that should not be submitted
|
||||
upstream. For example, ``Firefox`` should go in this file (see `bug 237921`_).
|
||||
|
||||
Note that the file ``5-mozilla-specific.txt`` is generated by expanding
|
||||
``mozilla-specific.txt`` and should not be edited directly.
|
||||
|
||||
utf8 folder
|
||||
-----------
|
||||
|
||||
``dictionary-sources/utf8`` is used to store a copy with UTF-8 encoding of the
|
||||
dictionary files. This is used to work around limitations in Phabricator, which
|
||||
treats ISO-8859-1 files as binary and won’t display a diff when updating them.
|
||||
|
||||
Info about the included scripts
|
||||
===============================
|
||||
|
||||
make-new-dict
|
||||
-------------
|
||||
|
||||
The dictionary upgrade scripts ``make-new-dict`` works by expanding (i.e.
|
||||
“unmunching”) the affix compression dictionaries to create wordlists and
|
||||
use those to generate a new dictionary.
|
||||
|
||||
The upgrade script expects the current upstream version to be kept in the
|
||||
directory ``orig``.
|
||||
|
||||
The script will create a few files in ``dictionary-sources/support_file`` in the
|
||||
following order:
|
||||
|
||||
* ``0-special.txt`` contains numbers and ordinals expanded from SCOWL
|
||||
``en.dic.supp``.
|
||||
* ``1-base.txt`` contains words expanded from ``en_US-custom.dic`` in the
|
||||
**previous** version of SCOWL (from the ``orig`` folder).
|
||||
* ``2-mozilla.txt`` contains words expanded from the current Mozilla dictionary.
|
||||
* ``3-upstream.txt`` contains words expanded from ``en_US-custom.dic`` in the
|
||||
**new** version of SCOWL (from the ``scowl/speller`` folder).
|
||||
* ``2-mozilla-removed.txt`` contains words that are only available in the SCOWL
|
||||
dictionary, i.e. removed by Mozilla.
|
||||
* ``2-mozilla-added.txt`` contains words that are only available in the current
|
||||
Mozilla dictionary, i.e. added by Mozilla.
|
||||
* ``4-patched.txt`` contains words from the new SCOWL dictionary
|
||||
(``3-upstream.txt``), with words from (``2-mozilla-removed.txt``) removed and
|
||||
words (``2-mozilla-added.txt``) added.
|
||||
* ``5-mozilla-specific.txt`` is expanded from ``mozilla-specific.txt`` using the
|
||||
current affix rules from the Mozilla dictionary.
|
||||
* ``5-mozilla-removed.txt`` and ``5-mozilla-added.txt`` contain words that are
|
||||
respectively removed and added by Mozilla compared to the **new** SCOWL
|
||||
version. These files could be used to submit upstream changes, but words
|
||||
included in ``5-mozilla-specific.txt`` should be removed from this list.
|
||||
|
||||
The new dictionary is available as ``en_US-mozilla.dic`` and should be copied
|
||||
over using the ``install-new-dict`` script.
|
||||
|
||||
install-new-dict
|
||||
----------------
|
||||
|
||||
The script:
|
||||
|
||||
* Creates a copy of ``orig`` as ``support_files/orig-bk`` and copies the new
|
||||
upstream version to ``orig``.
|
||||
* Copies the existing Mozilla dictionary in ``support_files/mozilla-bk``.
|
||||
* Converts the dictionary (.dic) generated by ``make-new-dict`` from UTF-8 to
|
||||
ISO-8859-1 and moves it to the parent folder.
|
||||
* Sets the affix file (.aff) to use ``ISO8859-1`` as ``SET`` instead of the
|
||||
original ``UTF-8``, removes ``ICONV`` patterns (input conversion tables).
|
||||
|
||||
|
||||
.. _SCOWL: http://wordlist.aspell.net
|
||||
.. _file a bug: https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=Spelling%20checker
|
||||
.. _SourceForce: https://sourceforge.net/projects/wordlist/files/SCOWL/
|
||||
.. _bug 237921: https://bugzilla.mozilla.org/show_bug.cgi?id=237921
|
||||
|
|
|
|||
|
|
@ -1,6 +1,2 @@
|
|||
README_mozilla
|
||||
|
||||
To edit the dictionary use "dictionary-sources/edit-dictionary".
|
||||
|
||||
For additional info see dictionary-sources/README.
|
||||
|
||||
See Firefox Source Docs for information about these scripts, and how to add new words.
|
||||
https://firefox-source-docs.mozilla.org/extensions/spellcheck/index.html
|
||||
|
|
|
|||
|
|
@ -1,56 +0,0 @@
|
|||
ADDING OR REMOVING ENTRIES IN THE DICTIONARY:
|
||||
|
||||
To edit the dictionary use "edit-dictionary" and than copy the
|
||||
resulting "en-US.dic" file info place.
|
||||
|
||||
UPGRADING TO A NEW UPSTREAM VERSION:
|
||||
|
||||
In order to upgrade to the latest dictionary some scripts found in
|
||||
SCOWL (the source of the en_US Hunspell dictionary) are used. The
|
||||
en_US dictionary is also generated from the SCOWL source.
|
||||
|
||||
1) Unpack the tarball (tar.gz) version of the latest version of SCOWL
|
||||
in the current directory and rename the directory from
|
||||
"scowl-YYYY.MM.DD" to "scowl". You can find the latest version at
|
||||
http://wordlist.aspell.net/ or
|
||||
http://sourceforge.net/projects/wordlist/files/SCOWL/
|
||||
|
||||
2) Run the script "./make-new-dict" to generate a new dictionary and
|
||||
make sure it runs without any errors.
|
||||
|
||||
3) Do a quick sanity check on the resulting dictionary
|
||||
"en_US-mozilla.dic". For example make sure the size is about the same
|
||||
(it should likely be slightly large) as the original dictionary.
|
||||
|
||||
4) Once everything is okay copy the new dictionary in place using
|
||||
"./install-new-dict" and commit the changes.
|
||||
|
||||
NOTES ON UPGRADE PROCESS:
|
||||
|
||||
The dictionary upgrade scripts work by expanding (i.e. unmunching) the
|
||||
affix compression dictionaries to create simple wordlists and use
|
||||
those to generate a new dictionary.
|
||||
|
||||
The upgrade script expects the original upstream version to be kept in
|
||||
the directory "orig".
|
||||
|
||||
The install script renames "orig" to "orig-bk" and copies the new
|
||||
upstream version to "orig". The install script also copies the
|
||||
original Mozilla dictionary to the "mozilla-bk".
|
||||
|
||||
SUBMITTING MOZILLA SPECIFIC CHANGES UPSTREAM:
|
||||
|
||||
The upgrade script creates two files that can be reviewed and possible
|
||||
submitted upstream. The file "5-mozilla-removed" lists words that were
|
||||
removed in the Mozilla dictionary and the file "5-mozilla-added"
|
||||
contains the list of words that were added. When submitting new words
|
||||
upstream Mozilla specific words that are found in "5-mozilla-specific"
|
||||
(expanded from mozilla-specific.txt) should likely be removed from the list.
|
||||
|
||||
ABOUT mozilla-specific.txt:
|
||||
|
||||
This file contains Mozilla-specific words that should not be submitted
|
||||
upstream. For example, "Firefox" goes here. (See bug 237921).
|
||||
|
||||
Note that the file 5-mozilla-specific is generated by expanding
|
||||
mozilla-specific.txt and should not be edited directly.
|
||||
|
|
@ -0,0 +1,2 @@
|
|||
See Firefox Source Docs for information about these scripts, and how to add new words.
|
||||
https://firefox-source-docs.mozilla.org/extensions/spellcheck/index.html
|
||||
Loading…
Reference in a new issue