Installation of Tensorflow and PyTorch on Windows GPU machines

Tensorflow

GPU support on native-Windows is only available for 2.10 or earlier versions, starting in TF 2.11, CUDA build is not supported for Windows. For using TensorFlow GPU on Windows, you will need to build/install TensorFlow in WSL2 or use tensorflow-cpu with TensorFlow-DirectML-Plugin

from Tensorflow

According to the website, Tensorflow v2.10 is the last version available. This version requires the following installations:

  • Python 3.10
  • CUDA v11.2
  • cuDNN v8.1

The CUDA Toolkit installation package can be downloaded from this link. We only need to select the correct version and OS and follow the instructions.

For cuDNN, the library files can be found at this link. Registration is required to download the file. The file contains three subfolders, namely bin, include and lib. We can just copy these folders to the corresponding subfolder of the CUDA installation. Alternatively, we can create separate folders for these files and add the path to bin folder to the environment variable.

When the installation is done, we can install Python 3.10. We have to download the Python from Python website. The one in Microsoft Store may not work. Someone said that the Python in Microsoft Store is run in sandbox and does not have access to the GPU resources.

After Python installation, we can install tensorflow v2.10 using the following command.

After installation, we can test if tensorflow has access to the GPUs

PyTorch

The installation instructions for PyTorch can be found here. Choose the corresponding OS and CUDA version. The installation of PyTorch for CUDA v11.8 on Windows is:

If the above is to be specified in requriements.txt, the following lines shall be added.

After installation, it can be tested using the following commands:

Notes on NLP

Tokenization

Tokenization is the process of breaking down raw text data into smaller, meaningful units called tokens. These tokens are used as the basic building blocks for natural language processing (NLP) tasks such as language modeling, text classification, and machine translation. Tokenization can be performed using various techniques, such as whitespace tokenization, regular expression tokenization, and subword tokenization. The choice of tokenization technique depends on the specific use case and the language being processed. The resulting tokens are then used to create a vocabulary that maps each token to a unique integer index, which is used by machine learning algorithms to process text data.

Tokenization Methods

  1. Word-based tokenization: This method splits text into words based on spaces and punctuations. It’s the most basic form of tokenization and is used in many NLP tasks. However, it can be problematic for languages where there are no clear word boundaries, such as Chinese or Japanese.
  2. Subword-based tokenization: This method splits text into subwords based on their frequency of occurrence in the training data. Subwords are parts of words that frequently occur together, such as prefixes, suffixes, and common word fragments. This method can handle words that are not in the dictionary, and it’s commonly used in transformer-based models like BERT and GPT-2.
  3. Character-based tokenization: This method splits text into individual characters. It’s useful for languages where there are no clear word boundaries, but it can produce longer sequences than word-based or subword-based tokenization.
  4. Byte-pair encoding (BPE) tokenization: This method is similar to subword-based tokenization, but instead of using pre-defined subwords, it learns the subwords on the fly based on the input text. It starts by treating each character as a separate token and then iteratively merges the most frequent pairs of tokens until a maximum vocabulary size is reached.
  5. Sentencepiece tokenization: This method is similar to BPE tokenization, but it uses a more sophisticated algorithm that takes into account the likelihood of a sequence of tokens occurring together. It can handle languages with complex writing systems like Chinese and Japanese.
  6. Unigram tokenization: This method is similar to BPE tokenization, but it uses a different algorithm that learns the subwords based on the likelihood of a sequence of characters occurring together. It’s faster and produces smaller vocabularies than BPE, but it may not perform as well on languages with complex writing systems.

Building Dictionary Using Tokenizer Function of HuggingFace

v4l2, gstreamer, v4l2loopback, ffmpeg

v4l2

gstreamer

Installation

v4l2loopback

Referece: https://github.com/umlaeute/v4l2loopback

Clone the physical camera to virtual camera for multiple access

ffmpeg

Git Commands

General

Recreate the master/main

Docker Notes

Enter the container:

Docker Installation on Ubuntu

Docker Offline Deployment

Multiple IP Addresses on Windows 10

Open command prompt with administrative rights

Setting Proxy in Ubuntu

apt

snap

Change Default Username of Raspberry Pi

The Raspbian OS comes with a default username ‘pi’. For security reason, it is suggested to change the username. This article shows how to change it.

Since pi is the only account after the installation and we cannot change the account while it is logged in, we have to login as root. By default, the sulogon of root account is disabled. We enable it by the following steps:

Set a password for root

Open ssh configuration file

Find the line for PermitRootLogin and set it to yes

Restart the SSH service to apply changes:

Log out user pi and login with root

Change the default username:

Rename the home directory:

Change the group name:

Reboot and done.

To disable root login:

References