Command Line Interface

The command line interface (CLI) is the primary way of using mltype. After installation, one can use the entrypoint mlt that is going to be in the path.

$ mlt
Usage: mlt [OPTIONS] COMMAND [ARGS]...

  Tool for improving typing speed and accuracy

Options:
  --help  Show this message and exit.

Commands:
  file    Type text from a file.
  ls      List all language models
  random  Sample characters randomly from a provided vocabulary
  raw     Provide text manually
  replay  Compete against a past performance
  sample  Sample text from a language
  train   Train a language

Note that mltype uses the folder ~/.mltype (in the home directory) for storing all relevant data. See below the usual structure.

- .mltype/
   - config.ini
   - checkpoints/
       - a/  # training checkpoints of model a
       - b/  # training checkpoints of model b
   - languages/
       - a  # some model
       - b  # some other model
       ...
   - logs/
      ..

file

Type random (or fixed) lines from a text file. This command has two main modes:

  1. Random lines - Select random consecutive lines. One needs to specify --n-lines and optionally the random-state (for reproducibility).

  2. Fixed lines - One needs to specify --start-line and --end-line.

Arguments

  • PATH - Path to the text file to read from

Options

  • -e, --end-line INTEGER - The end line of the excerpt to use. Needs to be used together with start-line.

  • -f, --force-perfect - All characters need to be typed correctly

  • -i, --instant-death - End game after the first mistake

  • -l, --n-lines INTEGER - Number of consecutive lines to be selected at random. Cannot be used together with start-line and end-line.

  • -o, --output-file PATH - Path to where to save the result file

  • -r, --random-state INTEGER

  • -s, --start-line INTEGER - the start line of the excerpt to use. needs to be used together with end-line.

  • -t, --target-wpm INTEGER - The desired speed to be shown as a guide

  • -w, --include-whitespace - Include whitespace characters.

Examples

Let us first create a text file

echo $'zeroth\nfirst\nsecond\nthird\nfourth\nfifth\nsixth' > text.txt
cat text.txt
zeroth
first
second
third
fourth
fifth
sixth

To select contiguous lines randomly, one can to specify -l, --n_lines representing the number of lines to use.

mlt file -l 2 text.txt

Which would open the typing interface with 2 random contiguous lines

second third

The other option would be to use the deterministic mode and select the starting and ending line manually

mlt file -s 0 -e 3 text.txt
zeroth first second

As multiple commands, one can specify a target speed and an output file. Note that we follow the Python convention - line counting starts from zero and the intervals contain the starting line but not the ending one.

Note that one can keep the whitespace characters (including newlines) in the text by adding the -w, --include_whitespace option

mlt file -l 2 -w text.txt
second
third

ls

List available language models. One can use them with sample.

Please check the official github to download pretrained models - mltype github.

Note

mlt ls simply lists all the files present in ~.mltype/languages.

Examples

mlt ls
python
some_amazing_model
wikipedia

random

Generate random sequence of characters based on provided counts. The absolute counts are converted to relative counts (probability distribution) that we sample from.

Note

mlt random samples characters independently unlike mlt sample which conditions on previous characters.

Arguments

  • CHARACTERS - Characters to include in the vocabulary. The higher the number of occurances of a given character the higher the probabilty of this character being sampled.

Options

  • -f, --force-perfect - All characters need to be typed correctly

  • -i, --instant-death - End game after the first mistake

  • -n, --n-chars INTEGER - Number of characters to sample

  • -o, --output-file PATH - Path to where to save the result file

  • -t, --target-wpm INTEGER - The desired speed to be shown as a guide

Examples

Let’s say we want to practise typing of digits. However, we would like to spend more time on 5’s and 6’s since they are harder.

mlt random "123455556666789    "

This would give us something like this.

546261561 3566  53 5496 556659554 435 1386559569  5 85641553465118589

We see that the most frequent characters are 5’s, 6’s and spaces.

raw

Provide text manually.

Arguments

  • TEXT - Text to be transfered to the typing interface

Options

  • -f, --force-perfect - All characters need to be typed correctly

  • -i, --instant-death - End game after the first mistake

  • -o, --output-file PATH - Path to where to save the result file

  • -r, --raw-string - If active, then newlines and tabs are not seen as special characters

  • -t, --target-wpm INTEGER - The desired speed to be shown as a guide

Examples

Let’s say we have some text in the clipboard that we just paste and type. Additionally, we want to see the 80 word per minute (WPM) marker. Lastly, no errors are acceptable—instant death mode.

mlt raw -i -t 80 "Hello world I will write you quickly"
Hello world I will write you quickly

replay

Play against a past performance. To save a past performance one can use the option -o, --output_file of the following commands

Arguments

  • REPLAY_FILE - Past performance to play against

Options

  • -f, --force-perfect - All characters need to be typed correctly

  • -i, --instant-death - End game after the first mistake

  • -t, --target-wpm INTEGER - The desired speed to be shown as a guide

  • -w, --overwrite PATH - Overwrite in place if faster

Examples

We ran mlt sample ... -o replay_file and we are not particularly happy about the performance. We would like to replay the same text and try to improve our speed. In case we do, we would like the replay_file to be updated automatically (using the -w, --overwrite option).

mlt replay -w replay_file
Some text we already typed before.

sample

Generate text using a character-level language model.

Note

As opposed to mlt random, the mlt sample command is taking into consideration all the previous characters and therefore could generate more realistic text.

To see all the available models use ls. Please check the official github to download pretrained models - mltype github.

Arguments

  • MODEL_NAME - Name of the language model

Options

  • -f, --force-perfect - All characters need to be typed correctly

  • -i, --instant-death - End game after the first mistake

  • -k, --top-k INTEGER - Consider only the top k most probable characters

  • -n, --n-chars INTEGER - Number of characters to generate

  • -o, --output-file PATH - Path to where to save the result file

  • -r, --random-state INTEGER - Random state for reproducible results

  • -s, --starting-text TEXT - Initial text used as a starting condition

  • -t, --target-wpm INTEGER - The desired speed to be shown as a guide

  • -v, --verbose Show progressbar when generating text

Examples

We want to practise typing Python without having to worry about having real source code. Assuming we have a decent language model for Python (see train) called amazing_python_model then we can do the following

mlt sample amazing_python_model
spatial_median(X, method="lar", call='Log', Cov']) glm.fit(X, y) assert_all
close(ref_no_encoded_c

Maybe we would like to give the model some initial text and let it complete it for us.

mlt sample -s "@pytest.mark.parametrize" amazing_python_model
@pytest.mark.parametrize('solver', ['sparse_cg', 'sag', 'saga'])
@pytest.mark.parametrize('copy_X', ['not a number', -0.10]]

train

Train a character-level language model. The trained model can then be used with sample.

In the background, we use an LSTM and feedforward network architecture to achieve this task. The user can set most of the important hyperparameters via the CLI options. Note that one can train without a GPU, however, to get access to bigger networks and faster training (~minutes/hours) GPUs are recommended.

Arguments

  • PATH_1, PATH_2, … - Paths to files or folders containing text to be trained on

  • MODEL_NAME - Name of the trained model

Options

  • -b, --batch-size INTEGER - Number of samples in a batch

  • -c, --checkpoint-path PATH - Load a checkpoiont and continue training it

  • -d, --dense-size INTEGER - Size of the dense layer

  • -e, --extensions TEXT - Comma-separated list of allowed extensions

  • -f, --fill-strategy TEXT - Either zeros or skip. Determines how to deal with out of vocabulary characters

  • -g, --gpus INTEGER - Number of gpus. In not specified, then none. If -1, then all.

  • -h, --hidden_size INTEGER - Size of the hidden state

  • -i, --illegal-chars TEXT - Characters to exclude from the vocabulary.

  • -l, --n-layers :code`INTEGER` - Number of layesr in the recurrent network

  • -m, --use-mlflow - Use MLFlow for logging

  • -n, --max-epochs INTEGER - Maximum number of epochs

  • -o, --output-path PATH - Custom path where to save the trained models and logging details. If not provided it defaults to ~/.mltype.

  • -s, --early-stopping - Enable early stopping based on validation loss

  • -t, --train-test-split FLOAT - Train test split - value between (0, 1)

  • -v, --vocab-size INTEGER - Number of the most frequent characters to include in the vocabulary

  • -w, --window-size INTEGER - Number of previous characters to consider for prediction

Examples

Let’s assume we have a book in fulltext saved in the book.txt file. Our goal would be to train a model that learns the language used in this book and is able to sample new pieces of text that resemble the original.

See below a list of hyperparameters that work reasonably well and the training can be done in a few hours (on a GPU)

  • --batch-size 128

  • --dense-size 1024

  • --early-stopping

  • --gpus 1

  • --hidden-size 512

  • --max-epochs 10

  • --n-layers 3

  • --vocab-size 70

  • --window-size 100

So overall the commands looks like

mlt train book.txt cool_model -n 3 -s -g 1 -b 128 -l 3 -h 512 -d 1024 -w 100 -v 80

During the training, one can see progress bars and the training and validation loss (using pytorch-lightning in the background). Once the training is done, the best model (based the validation loss) will be stored in ~/.mltype/languages/cool_model.

There are several important customizatons that one should be aware of.

Using MLflow

If one wants to get more training progress information theere is a flag --use-mlflow (requiring mlflow being installed). To launch the ui run the following commands

cd ~/.mltype/logs
mlflow ui

Multiple files

mlt train supports training from multiple files and folders. This is really useful if we want to recursively create a training set of all files in a given folder (e.g. github repository). Additionally, one can use the --extensions to control what files are considered when traversing a folder.

mlt train main.py folder_with_a_lot_of_files model --extensions ".py"

The above command will create a training set out of all files inside of the folder_with_a_lot_of_files folder having the “.py” suffix and also the main.py.

Excluding undesirable characters

If the input files contain some characters that we do not want the model to have in its vocabulary, we can simply use the --illegal-chars option. Internally, when an out of vocabulary character is encounter, there are two strategies to handle this (controled via --fill-strategy)

  • zeros - vector of zeros is used

  • skip - only consider samples that do not have out of vocabulary characters anywhere in their window

mlt train book.txt cool_model --illegal-chars "~{}`[]"

Configuration file

mltype supports a configuration file that can be used for the following tasks.

  1. Setting reasonable defaults for any of the CLI commands

  2. Defining custom parameters that cannot be set via the CLI

The configuration file is optional and one does not have to create it. By default it should be located under ~/.mltype/config.ini. One can also pass it dynamically via the --config option available for all commands.

See below an example configuration file.

[general]
models_dir = /home/my_models
color_default_background = terminal
color_wrong_foreground = yellow

[sample]
# one needs to use underscores instead of hyphens
n_chars = 500
target_wpm = 70

[raw]
instant_death = True

General section

The general section can be used for defining special parameters that cannot be set via the options of the CLI. Below is a complete list of valid parameters.

  • models_dir: Alternative location of the language models. The default directory is ~/.mltype/languages. It influences the behavior of ls and sample.

  • color_default_background: Background color of a default character. Note that it is either the character that has not been typed yet or that was backspaced (error correction).

  • color_default_foreground: Foreground (font) color of a default character

  • color_correct_background: Background color of a correct character

  • color_correct_foreground: Foreground color of a correct character

  • color_wrong_background: Background color of wrong character

  • color_wrong_foreground: Foreground color of a wrong character

  • color_replay_background: Background color of a replay character

  • color_replay_foreground: Foreground color of a replay character

  • color_target_background: Background color of a target character

  • color_target_foreground: Foreground color of a target character

Note

Available colors

  • terminal - the color is inherited from the terminal

  • black

  • red

  • green

  • yellow

  • blue

  • magenta

  • cyan

  • white

Other sections

All the other sections are identical to the commands names, that is

  • file

  • ls

  • random

  • raw

  • replay

  • sample

  • train

Note that if the same option is specified both in the configuartion file and the CLI option the CLI value will have preference.

Note

Formatting rules

  • The section names and parameter names are case insensitive

  • One needs to use underscores instead of hyphens