Quickies: Part 2 — LLM/Transformer

Muthu Arumugam
3 min readAug 4, 2023

--

Generated by Bing using DALL-E

To read about AI/LLM intro, click here: https://medium.com/@muthuka/quickies-part-1-ai-llm-5dbaf989e620

LLMs are not new. We were researching them for a while. The most recent notable prediction model was RNN, Recurrent Neural Networks. We didn’t have a lot of computing resources and also the model didn’t understand much about the context. It was successful in predicting words in a sentence but our language is so complex & redundant that the success rate wasn’t good enough.

Attention Is All You Need — A paper published in 2017 changed these limitations. This paper gave a new way to make computers understand the context and process them in parallel with multi-core GPUs, etc. The previous generation focused on looking at the meaning in a sentence with sequence in mind but the new Transformer architecture mapped each word with every other word in a sentence. For example, “John was shopping with a cart in a neighboring grocery store that evening”. For humans, it’s obvious that the cart is owned by the store but not by John. For computers with an RNN model, this is not easy or obvious.

Let’s break down how they work. Machine Learning models are gigantic statistical calculators where they deal with numbers. We will understand a little bit of their inner working here.

  1. Tokenizer: Break down a sentence into tokens as numbers (John, was, shop, ing, with, a, cart, in, neighbor, hood,…) with the tokenizer method.
  2. Embedding Layer: Pass the tokens to a vector trainable space where they can map and understand the meaning and context of these tokens (Word2vec used this concept).
  3. Encoder/Decoder: Pass these vectors to a multi-headed self-attention layer where each word is analyzed for different contexts in the range of 12–100 attention heads. Each learns a different meaning of the word.

Encodes inputs (“prompts”) with contextual understanding and produces one vector per input token. Decoder accepts input tokens and generates new tokens.

Different types of models exist.

  1. Encoder-only models — Sequence to sequence type where input and output are of the same length. (e.g BERT)
  2. Encoder-Decoder models — Input and outputs can be of different lengths of these sequence-to-sequence tasks. (e.g BART & FLAN-T5)
  3. Decoder-only models — Many capabilities and evolving to do more of summarizing, and other tasks. (e.g BLOOM, Jurassic, LLaMA)

We can interact with these models with “prompts” in plain English language instead of a computer language. This is called Prompt Engineering.

This is what is happening we started seeing ChatGPT allowed us to interact with a GPT-4 model with a chatbot interface. We are beginning to see a glimpse of what these models can do. Exciting times.

To learn about Transformer/Prompt Engineering, click here.

Disclaimer: This is not generated by an AI bot. Also, a lot of these were learned through the DeepLearning.ai course at Coursera. It’s a great course. The title image was generated by DALL-E through the Bing chatbot.

--

--

Muthu Arumugam
Muthu Arumugam

Written by Muthu Arumugam

A serial entrepreneur looking into NFT and its utility

No responses yet