diff --git a/.hgignore b/.hgignore index 33a123a1b3f4..aa8ee10d9fd8 100644 --- a/.hgignore +++ b/.hgignore @@ -267,3 +267,8 @@ toolkit/components/certviewer/content/package-lock.json ^tools/esmify/jscodeshift.cmd ^tools/esmify/jscodeshift.ps1 ^tools/esmify/package-lock.json + +# Ignore support files for en-US dictionary updates +^extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/scowl +^extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/support_files/ +^extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/*en_US-mozilla* diff --git a/extensions/spellcheck/docs/index.rst b/extensions/spellcheck/docs/index.rst index aeea5960d880..2f83dc156e83 100644 --- a/extensions/spellcheck/docs/index.rst +++ b/extensions/spellcheck/docs/index.rst @@ -30,9 +30,9 @@ This section describes the process for adding a word to the dictionary: #. There’s a special script used for editing dictionaries. The script only works if you have the environment variable ``EDITOR`` set to the executable of an editor program; if you don’t have it set, you can use - ``EDITOR=vim sh edit-dictionary`` to edit using ``vim`` (or you can + ``EDITOR=vim sh edit-dictionary.sh`` to edit using ``vim`` (or you can substitute it with another editor), or you can just type - ``sh edit-dictionary`` if you have an ``EDITOR`` already specified. + ``sh edit-dictionary.sh`` if you have an ``EDITOR`` already specified. #. Add and remove words in the dictionary file, then quit the editor. #. Build Firefox and test your updated dictionary. Once you’re satisfied, use the process described in :ref:`write_a_patch` to create a @@ -59,13 +59,13 @@ The working directory for this process is #. Download the latest version of the dictionary from `SCOWL`_ homepage or `SourceForce`_ as a tarball (tag.gz) and unpack it in the working directory. Rename the resulting folder from ``scowl-YYYY.MM.DD`` to ``scowl``. -#. Run the script ``sh make-new-dict`` to generate a new dictionary and make +#. Run the script ``sh make-new-dict.sh`` to generate a new dictionary and make sure it runs without any errors. For more details on this script, see the - `make-new-dict`_ section. + `make-new-dict.sh`_ section. #. Do a sanity check on the resulting dictionary file ``en_US-mozilla.dic``. For example, make sure that the size is about the same as the original dictionary (or slightly larger). -#. If everything looks correct, use ``sh install-new-dict`` to copy the +#. If everything looks correct, use ``sh install-new-dict.sh`` to copy the generated file in the right position and use the process described in :ref:`write_a_patch` to create a patch. @@ -76,7 +76,7 @@ mozilla-exclusions.txt ---------------------- ``mozilla-exclusions.txt`` is used to explicitly exclude some words from -suggestions. The ``make-new-dict`` script will add them to the dictionary file +suggestions. The ``make-new-dict.sh`` script will add them to the dictionary file with the ``/!`` flag. Terms should be added to this file with exactly the same format used in the .dic @@ -101,10 +101,10 @@ treats ISO-8859-1 files as binary and won’t display a diff when updating them. Info about the included scripts =============================== -make-new-dict -------------- +make-new-dict.sh +---------------- -The dictionary upgrade scripts ``make-new-dict`` works by expanding (i.e. +The dictionary upgrade scripts ``make-new-dict.sh`` works by expanding (i.e. “unmunching”) the affix compression dictionaries to create wordlists and use those to generate a new dictionary. @@ -136,17 +136,17 @@ following order: included in ``5-mozilla-specific.txt`` should be removed from this list. The new dictionary is available as ``en_US-mozilla.dic`` and should be copied -over using the ``install-new-dict`` script. +over using the ``install-new-dict.sh`` script. -install-new-dict ----------------- +install-new-dict.sh +------------------- The script: * Creates a copy of ``orig`` as ``support_files/orig-bk`` and copies the new upstream version to ``orig``. * Copies the existing Mozilla dictionary in ``support_files/mozilla-bk``. -* Converts the dictionary (.dic) generated by ``make-new-dict`` from UTF-8 to +* Converts the dictionary (.dic) generated by ``make-new-dict.sh`` from UTF-8 to ISO-8859-1 and moves it to the parent folder. * Sets the affix file (.aff) to use ``ISO8859-1`` as ``SET`` instead of the original ``UTF-8``, removes ``ICONV`` patterns (input conversion tables). diff --git a/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/edit-dictionary b/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/edit-dictionary deleted file mode 100755 index ad77e47fd12a..000000000000 --- a/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/edit-dictionary +++ /dev/null @@ -1,31 +0,0 @@ -#!/bin/sh -# This Source Code Form is subject to the terms of the Mozilla Public -# License, v. 2.0. If a copy of the MPL was not distributed with this -# file, You can obtain one at http://mozilla.org/MPL/2.0/. - -# -# edit-dictionary - -set -e - -if [ -z "$EDITOR" ]; then - echo 'Need to set the $EDITOR environment variable to your favorite editor!' - exit 1 -fi - -# Strip the first line that contains the count -tail -n +2 ../en-US.dic > en-US.stripped - -# Open the patched hunspell editor and let the user edit it -echo "Now the dictionary is going to be opened for you to edit. When you're done, just quit the editor" -echo -n "Press Enter to begin." -read foo -$EDITOR en-US.stripped - -# Add back the line count -wc -l < en-US.stripped | tr -d '[:blank:]' > en-US.dic -LC_ALL=C sort en-US.stripped >> en-US.dic - -# Clean up -rm -f en-US.stripped - diff --git a/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/edit-dictionary.sh b/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/edit-dictionary.sh new file mode 100755 index 000000000000..7687f8e72789 --- /dev/null +++ b/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/edit-dictionary.sh @@ -0,0 +1,37 @@ +#! /usr/bin/env sh + +# This Source Code Form is subject to the terms of the Mozilla Public +# License, v. 2.0. If a copy of the MPL was not distributed with this +# file, You can obtain one at http://mozilla.org/MPL/2.0/. + +set -e + +if [ -z "$EDITOR" ]; then + echo 'Need to set the $EDITOR environment variable to your favorite editor.' + exit 1 +fi + +# Copy the current en-US dictionary and strip the first line that contains +# the count. +tail -n +2 ../en-US.dic > en-US.stripped + +# Convert the file to UTF-8 +iconv -f iso-8859-1 -t utf-8 en-US.stripped > en-US.utf8 +rm en-US.stripped + +# Open the hunspell dictionary and let the user edit it +echo "Now the dictionary is going to be opened for you to edit. Quit the editor to finish editing." +echo "Press Enter to begin." +read foo +$EDITOR en-US.utf8 + +# Add back the line count and sort the lines +wc -l < en-US.utf8 | tr -d '[:blank:]' > en-US.dic +LC_ALL=C sort en-US.utf8 >> en-US.dic +rm -f en-US.utf8 + +# Convert back to ISO-8859-1 +iconv -f utf-8 -t iso-8859-1 en-US.dic > ../en-US.dic + +# Keep a copy of the UTF-8 file in /utf8 +mv en-US.dic utf8/en-US-utf8.dic diff --git a/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/install-new-dict b/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/install-new-dict deleted file mode 100755 index 24492cad22c2..000000000000 --- a/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/install-new-dict +++ /dev/null @@ -1,39 +0,0 @@ -#!/bin/sh - -# -# This script copies the new dictionary created by make-new-dict in -# place. -# - -set -e - -WKDIR="`pwd`" -export SCOWL="$WKDIR/scowl/" -SPELLER="$SCOWL/speller" - -set -x - -if [ -e orig-bk ]; then echo "$0: directory 'orig-bk' exists." 1>&2 ; exit 0; fi -mv orig orig-bk -mkdir orig -cp $SPELLER/en_US-custom.dic $SPELLER/en_US-custom.aff $SPELLER/README_en_US-custom.txt orig - -mkdir mozilla-bk -mv ../en-US.dic ../en-US.aff ../README_en_US.txt mozilla-bk - -# Convert the affix file to ISO8859-1 -sed -i=bak -e '/^ICONV/d' -e 's/^SET UTF-8$/SET ISO8859-1/' en_US-mozilla.aff - -# Convert the dictionary to ISO8859-1 -mv en_US-mozilla.dic en_US-mozilla-utf8.dic -iconv -f utf-8 -t iso-8859-1 < en_US-mozilla-utf8.dic > en_US-mozilla.dic - -cp en_US-mozilla.aff ../en-US.aff -cp en_US-mozilla.dic ../en-US.dic -cp README_en_US-mozilla.txt ../README_en_US.txt - -set +x - -echo "New dictionary copied into place. Please commit the changes." - - diff --git a/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/install-new-dict.sh b/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/install-new-dict.sh new file mode 100755 index 000000000000..26ce06dec269 --- /dev/null +++ b/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/install-new-dict.sh @@ -0,0 +1,41 @@ +#! /usr/bin/env sh + +# This Source Code Form is subject to the terms of the Mozilla Public +# License, v. 2.0. If a copy of the MPL was not distributed with this +# file, You can obtain one at http://mozilla.org/MPL/2.0/. + +# This script copies the new dictionary created by make-new-dict in +# place. + +set -e + +WKDIR="`pwd`" +export SCOWL="$WKDIR/scowl/" +SUPPORT_DIR="$WKDIR/support_files/" +SPELLER="$SCOWL/speller" + +if [ -e "$SUPPORT_DIR/orig-bk" ]; then + echo "$0: directory '$SUPPORT_DIR/orig-bk' exists." 1>&2 + exit 0 +fi + +mv orig "$SUPPORT_DIR/orig-bk" +mkdir orig +cp $SPELLER/en_US-custom.dic $SPELLER/en_US-custom.aff $SPELLER/README_en_US-custom.txt orig + +mkdir "$SUPPORT_DIR/mozilla-bk" +mv ../en-US.dic ../en-US.aff ../README_en_US.txt "$SUPPORT_DIR/mozilla-bk" + +# Convert the affix file to ISO-8859-1 +cp en_US-mozilla.aff utf8/en-US-utf8.aff +sed -i "" -e '/^ICONV/d' -e 's/^SET UTF-8$/SET ISO8859-1/' en_US-mozilla.aff + +# Convert the dictionary to ISO-8859-1 +mv en_US-mozilla.dic utf8/en-US-utf8.dic +iconv -f utf-8 -t iso-8859-1 < utf8/en-US-utf8.dic > en_US-mozilla.dic + +cp en_US-mozilla.aff ../en-US.aff +cp en_US-mozilla.dic ../en-US.dic +mv README_en_US-mozilla.txt ../README_en_US.txt + +echo "New dictionary copied into place. Please commit the changes." diff --git a/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/make-new-dict b/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/make-new-dict deleted file mode 100755 index 012b9154e257..000000000000 --- a/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/make-new-dict +++ /dev/null @@ -1,69 +0,0 @@ -#!/bin/sh - -# -# This script creates a new dictionary by expanding the original, -# Mozilla's, and the upstream dictionary to remove affix flags and -# then doing the wordlist equivalent of diff3 to create a new -# dictionary. -# -# The files 2-mozilla-add and 2-mozilla-rem contain words added and -# removed, receptively in the Mozilla dictionary. The final -# dictionary will be in hunspell-en_US-mozilla.zip. -# - -set -e - -export LANG=C -export LC_ALL=C -export LC_CTYPE=C -export LC_COLLATE=C - -WKDIR="`pwd`" - -export SCOWL="$WKDIR/scowl/" - -ORIG="$WKDIR/orig/" -SPELLER="$SCOWL/speller" - -expand() { - grep -v '^[0-9]\+$' | $SPELLER/munch-list expand $1 | sort -u -} - -cd $SPELLER -MK_LIST="../mk-list -v1 --accents=both en_US 60" -cat < params.txt -With Input Command: $MK_LIST -EOF -# note: output of make-hunspell-dict is utf-8 -$MK_LIST | ./make-hunspell-dict -one en_US-custom params.txt > ./make-hunspell-dict.log -cd $WKDIR - -# Note: Input and output of "expand" is always iso-8859-1. -# All expanded word list files are thus in iso-8859-1. - -expand $SPELLER/en.aff < $SPELLER/en.dic.supp > 0-special # input: ASCII - -# input in utf-8, expand expects iso-8859-1 so use iconv -iconv -f utf-8 -t iso-8859-1 $ORIG/en_US-custom.dic | expand $SPELLER/en_US-custom.aff > 1-base.txt - -expand ../en-US.aff < ../en-US.dic > 2-mozilla.txt # input: iso-8850-1 - -# input in utf-8, expand expects iso-8859-1 so use iconv -iconv -f utf-8 -t iso-8859-1 $SPELLER/en_US-custom.dic | expand $SPELLER/en_US-custom.aff > 3-upstream.txt - -comm -23 1-base.txt 2-mozilla.txt > 2-mozilla-rem -comm -13 1-base.txt 2-mozilla.txt > 2-mozilla-add -comm -23 3-upstream.txt 2-mozilla-rem | cat - 2-mozilla-add | sort -u > 4-patched.txt - -# note: output of make-hunspell-dict is utf-8 -cat 4-patched.txt | comm -23 - 0-special | $SPELLER/make-hunspell-dict -one en_US-mozilla /dev/null - -# sanity check should yield identical results -#comm -23 1-base.txt 3-upstream.txt > 3-upstream-rem -#comm -13 1-base.txt 3-upstream.txt > 3-upstream-add -#comm -23 2-mozilla.txt 3-upstream-rem | cat - 3-upstream-add | sort -u > 4-patched-v2.txt - -expand ../en-US.aff < mozilla-specific.txt > 5-mozilla-specific - -comm -12 3-upstream.txt 2-mozilla-rem > 5-mozilla-removed -comm -13 3-upstream.txt 2-mozilla-add > 5-mozilla-added diff --git a/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/make-new-dict.sh b/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/make-new-dict.sh new file mode 100755 index 000000000000..1a190e8c5b10 --- /dev/null +++ b/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/make-new-dict.sh @@ -0,0 +1,102 @@ +#! /usr/bin/env sh + +# This Source Code Form is subject to the terms of the Mozilla Public +# License, v. 2.0. If a copy of the MPL was not distributed with this +# file, You can obtain one at http://mozilla.org/MPL/2.0/. + +# This script creates a new dictionary by expanding the original, +# Mozilla's, and the upstream dictionary to remove affix flags and +# then doing the wordlist equivalent of diff3 to create a new +# dictionary. +# +# The files 2-mozilla-add and 2-mozilla-rem contain words added and +# removed, respectively in the Mozilla dictionary. The final +# dictionary will be in hunspell-en_US-mozilla.zip. + +set -e + +export LANG=C +export LC_ALL=C +export LC_CTYPE=C +export LC_COLLATE=C + +WKDIR="`pwd`" + +export SCOWL="$WKDIR/scowl/" + +ORIG="$WKDIR/orig/" +SUPPORT_DIR="$WKDIR/support_files/" +SPELLER="$SCOWL/speller" + +expand() { + grep -v '^[0-9]\+$' | $SPELLER/munch-list expand $1 | sort -u +} + +mkdir -p $SUPPORT_DIR +cd $SPELLER +MK_LIST="../mk-list -v1 --accents=both en_US 60" +cat < params.txt +With Input Command: $MK_LIST +EOF +# Note: the output of make-hunspell-dict is UTF-8 +$MK_LIST | ./make-hunspell-dict -one en_US-custom params.txt > ./make-hunspell-dict.log +cd $WKDIR + +# Note: Input and output of "expand" is always ISO-8859-1. +# All expanded word list files are thus in ISO-8859-1. +expand $SPELLER/en.aff < $SPELLER/en.dic.supp > $SUPPORT_DIR/0-special.txt + +# Input is UTF-8, expand expects ISO-8859-1 so use iconv +iconv -f utf-8 -t iso-8859-1 $ORIG/en_US-custom.dic | expand $ORIG/en_US-custom.aff > $SUPPORT_DIR/1-base.txt + +# The existing Mozilla dictionary is already in ISO-8859-1 +expand ../en-US.aff < ../en-US.dic > $SUPPORT_DIR/2-mozilla.txt + +# Input is UTF-8, expand expects ISO-8859-1 so use iconv +iconv -f utf-8 -t iso-8859-1 $SPELLER/en_US-custom.dic | expand $SPELLER/en_US-custom.aff > $SUPPORT_DIR/3-upstream.txt + +# Suppress common lines and lines only in the 2nd file, leaving words that are +# only available in the 1st file (SCOWL), i.e. were removed by Mozilla. +comm -23 $SUPPORT_DIR/1-base.txt $SUPPORT_DIR/2-mozilla.txt > $SUPPORT_DIR/2-mozilla-removed.txt + +# Suppress common lines and lines only in the 1st file, leaving words that are +# only available in the 2nd file (current Mozilla dictionary), i.e. were added +# by Mozilla. +comm -13 $SUPPORT_DIR/1-base.txt $SUPPORT_DIR/2-mozilla.txt > $SUPPORT_DIR/2-mozilla-added.txt + +# Suppress common lines and lines only in the 2nd file, leaving words that are +# only available in the 1st file (words from the new upstream SCOWL dictionary). +# The result is upstream, minus the words removed, plus the words added. +comm -23 $SUPPORT_DIR/3-upstream.txt $SUPPORT_DIR/2-mozilla-removed.txt | cat - $SUPPORT_DIR/2-mozilla-added.txt | sort -u > $SUPPORT_DIR/4-patched.txt + +# Note: the output of make-hunspell-dict is UTF-8 +cat $SUPPORT_DIR/4-patched.txt | comm -23 - $SUPPORT_DIR/0-special.txt | $SPELLER/make-hunspell-dict -one en_US-mozilla /dev/null + +# Exclude specific words from suggestions +while IFS= read -r line +do + # If the string already contains an affix, just add !, otherwise add /! + if [[ "$line" == *"/"* ]]; then + sed -i "" "s|^$line$|$line!|" en_US-mozilla.dic + else + sed -i "" "s|^$line$|$line/!|" en_US-mozilla.dic + fi +done < "mozilla-exclusions.txt" + +# Sanity check should yield identical results +#comm -23 $SUPPORT_DIR/1-base.txt $SUPPORT_DIR/3-upstream.txt > $SUPPORT_DIR/3-upstream-remover.txt +#comm -13 $SUPPORT_DIR/1-base.txt $SUPPORT_DIR/3-upstream.txt > $SUPPORT_DIR/3-upstream-added.txt +#comm -23 $SUPPORT_DIR/2-mozilla.txt $SUPPORT_DIR/3-upstream-removed.txt | cat - $SUPPORT_DIR/3-upstream-added.txt | sort -u > $SUPPORT_DIR/4-patched-v2.txt + +expand ../en-US.aff < mozilla-specific.txt > 5-mozilla-specific.txt + +# Update Mozilla removed and added wordlists based on the new upstream +# dictionary, save them as UTF-8 and not ISO-8951-1 +comm -12 $SUPPORT_DIR/3-upstream.txt $SUPPORT_DIR/2-mozilla-removed.txt > $SUPPORT_DIR/5-mozilla-removed.txt +iconv -f iso-8859-1 -t utf-8 $SUPPORT_DIR/5-mozilla-removed.txt > 5-mozilla-removed.txt +comm -13 $SUPPORT_DIR/3-upstream.txt $SUPPORT_DIR/2-mozilla-added.txt > $SUPPORT_DIR/5-mozilla-added.txt +iconv -f iso-8859-1 -t utf-8 $SUPPORT_DIR/5-mozilla-added.txt > 5-mozilla-added.txt + +# Clean up some files +rm hunspell-en_US-mozilla.zip +rm nosug diff --git a/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/mozilla-exclusions.txt b/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/mozilla-exclusions.txt new file mode 100644 index 000000000000..eca039ff53d5 --- /dev/null +++ b/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/mozilla-exclusions.txt @@ -0,0 +1,2 @@ +nigga/SM +niggaz diff --git a/tools/lint/shellcheck.yml b/tools/lint/shellcheck.yml index d0b390d4e87e..0100e3d5cc25 100644 --- a/tools/lint/shellcheck.yml +++ b/tools/lint/shellcheck.yml @@ -2,6 +2,7 @@ shellcheck: description: Shell script linter include: + - extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/ - taskcluster/docker/ exclude: [] # 1090: https://github.com/koalaman/shellcheck/wiki/SC1090