Classes

Class 12 - 4/21

Creating user interfaces with React

What is a single page app?

“A single-page application (SPA) is a web application or web site that interacts with the user by dynamically rewriting the current page rather than loading entire new pages from a server. This approach avoids interruption of the user experience between successive pages, making the application behave more like a desktop application.”

React is JavaScript library from Facebook, that is designed to create interactive UIs.

https://reactjs.org/

Other libraries for single page apps

Current frameworks

The latest in web technologies is using something called a framework. A web application framework is designed to facilitate the development of dynamic websites - sites where the content can change dynamically

Examples of sites using web frameworks:

Advantages:

Disadvantages:

Component based workflow: Your user interface is a collection of components. This is really great for staying organized when building UI’s. React uses the virtual dom to run very fast and render only the parts of the site that have changed.

Getting set up:

The back end for all of this is Node.js. Node.js is javascript that runs server-side using Chrome’s V8 engine. To start, we need to install node:

$ brew install node

This should give us access to npx and npm - the Node.js package manager and package runner. Node Package Manager (NPM) is where all our different frameworks and javascript libraries live - you can think of it as pip for javascript. You might also see another package manager called yarn - this was created by Facebook in order to deal with issues with NPM that have now been solved. You can use NPM in place of Yarn - more here. I’ve been having issues with npm on create-react-app, so I’d advise using yarn instead. The commands are mostly the same.

We will be using a starter command called create-react-app. It’s a really great way to get everything set up with minimal effort. You can read more here

Simply run:

$ npx create-react-app my-app

If you get a cryptic error like I was, you can run the following command:

$ npm i babel-preset-react-app@7.0.0

React uses something called JSX - this is a combination of html and javascript. You can use react without JSX, but it isn’t recommended.

You can find the code from the class here

Resources:

Class 11 - 4/14

Using WebGL for interactive web spaces

Making web apps

What makes up a website?

HTML

CSS

Javscript

Why use webgl?

For this class, we will look at using PIXI.js, a performant game engine which can also be used for interactive browser-based 2D graphics. You can read more about it here

First, we bulk resize our images using the jupyter notebook for resizing images.

We also need to export the positions from our audio or image notebook. Both have been updated to export the JSON from those files.

You can find the code here

Additional resources:

Class 10 - 4/7

Semantic organization of text

Analyzing text:

Bag of words model:

Bag of words (BOW) is a representation of text that describes the occurance of words within a document

It measures:

It’s call bag of words because we discard the structure of the words and focus only on whether or not the words appear in the document and how frequently, and not where they occur or in what order

There are a few different ways to approach bag of words:

Count Occurance

Let’s look at an example:

We can look at each line as a “document”. First - what is the unique vocabulary of this document? (ignoring case and punctuation)

This is a unique vocabulary of 8 words out of a corpus of 20

Next we create document vectors - the goal is to turn each document into a vector of that we can use as input or output for a model. We have a vocabulary of 8 so our vector length is 8. Thus, our first document becomes:

[1, 1, 1, 1, 1, 0, 0, 0]

For large corpus, we get something called a sparse vector - tons of vocabulary so vectors have many zeros - sparse vectors are harder to compute. We can ignore stop words, punctuation, case, fix mispellings, stemming (play -> playing). The problem with this solution is it high frequency words can dominate the model and cause bias. Think about it - less frequent words might be more important

TF-IDF

TF-IDF takes another approach - that high frequency may not able to provide much information gain. In another word, rare words contribute more weights to the model.

Term Frequency: a scoring of the frequency of the word in the current document. Inverse Document Frequency: a scoring of how rare the word is across documents.

The scores are a weighting where not all words are equally as important or interesting.

Term Frequency (TF): is a scoring of the frequency of the word in the current document. Since every document is different in length, it is possible that a term would appear much more times in long documents than shorter ones. The term frequency is often divided by the document length to normalize.

      number of times term t appears in the document
tf =   ------------------------------------------
          total number of terms in the document


Inverse Document Frequency (IDF): is a scoring of how rare the word is across documents. IDF is a measure of how rare a term is. The rarer the term, the higher the IDF score.

                total number of documents
idf = loge(-----------------------------------)
           number of documents with term in it


tfidf is the multiplication of these two factors

tfidf = tf * idf

Doc2vec

Distributed Representations of Sentences and Documents - introduced in 2014 https://arxiv.org/abs/1405.4053. We will use it to perform feature extraction on a corpus. Unlike bag of words models, the idea with Doc2Vec is to maintain the the ordering and semantics of the words. This should give us better features than tfidf.

The idea for doc2vec started with Word2Vec. Word2vec is a three layer neural network with one input, one hidden and an output layer. The input layer corresponds to signals for context (surrounding words) and output layer correspond to signals for predicted target word. As the training procedure repeats this process over large number of sentences (or phrases), the weights “stabilize”. These weights are then used as the vectorized representations of words.

Additional Resources:

Class 7 - 3/10

Feature extraction and exploration methodologies

What is feature extraction?

KMeans Clustering

UMAP

Additional resources:

Class 6 - 3/3

Multilayer Perceptrons and Gradient Descent

what are we covering today?

Plain vanilla aka multilayer perceptron

We are going to use tensorflow 2.0 - just released!

mnist : 28 x 28 numbers - 784 total numbers

each one holds the grayscale value of that pixel (0 - 1)

general structure of the network:

input - > hidden layer -> output

so how does this work?

when the network sees some specific features, certain parts of it activate in response

just like our perceptron, we have weights for each connection

we take the weighted sum, and then calculate our activation. We want our activations to be between 0 and 1. In order to achieve this, we use the sigmoid function : 1 / (1 + e^-x). So the activation of the layer is a measure of how positive the weighted sum is.

Stochastic Gradient Descent:

Some terminology:

Additional resources:

Class 5 - 2/17

Datasets and Scraping

What are datasets? How are they used?

Image Datasets:

Recommendation Systems:

Music:

Other interesting datasets:

Scraping datasets

Can’t find a way to download what you want? First try seeing if anyone else has downloaded it yet. Is there a download functionality? Try contacting someone and seeing if they’ll share the data.

Is scraping legal?

You still need to keep copyright in mind - you could violate copyright by redisplaying content.

Class 4 - 2/10

Perceptrons

Earlier we covered machine learning such as linear regression and KNN.

Supervised Machine Learning is all about ‘learning’ a function given a training set of examples.

Machine learning methods should derive a function that can generalize well to inputs not in the training set, since then we can actually apply it to inputs for which we do not have an output.

In this class we are going to cover the simplest model of an artificial neural network out there.

What is an artificial neural network (ANN)? What is a perceptron?

These all started as attempts mathematically model the neuron. The question generally was - what kind of computational systems can be made that are inspired by the biological neuron?

The perceptron is a form of supervised learning that can differentiate between linearly separable datasets.

Further material:

Watch:

Read:

Class 3 - 2/3

Vectors and classification

What is a vector?

A vector’s magnitude is the length - the size. A vector is a set of instructions on how to get from the tail to the tip

Why vectors?

Vectors can be used for physics. Vectors can explain movement, static forces, and so many other things. Vectors can be used to describe position in a space. But when we talk about space we often think of 2d or 3d. Vectors can exist in many more dimenstions. And sometimes it makes sense to pair a bunch of numbers together! There are a bunch of operations we want to do on both numbers so we consider them together

Vector math we reviewed:

Machine learning:

What is machine learning?

Supervised learning:

Unsupervised learning:

Reinforcement learning

Classification and regression

Both involve making a prediction based on input data

Classification

Regression -Continuous identification - for example, percentage of emotion present, age

KNN - K Nearest Neighbor

The notebook from the class should be listed on the “notebooks” page.

More reading:

Class 2 - 1/27

using pip (pythons package manager) we installed jupyter notebook and other dependencies:

$ pip3 install jupyterlab
$ pip3 install matplotlib

We then run the command below to open a jupyter notebook. Make sure you are in the correct directory when you run this command.

$ jupyter notebook

You can review the in class exercises we did here:

Some recommended resources:

Class 1 - 1/20

Introductory class. We began by having everyone answer the following questions:

This class is about using machine learning to explore archives.

The data problem - we have more information now than ever before, and it is growing exponentially.

Cultural institutions are digitizing and open sourcing massive collections. Beyond this, what is a dataset? Perhaps shopping inventories are datasets? Movies are a dataset. Google photos is a dataset.

There are growing datasets that we want to be able to search on a semantic basis.

At the same time, people expect better and better digital experiences. Companies like Google, Uber, etc are spending tons on design to create the best experiences they can. They are setting the bar for online experiences.

Design is about sorting/organizing/making information digestible. We are designing the experience of exploring a dataset using machine learning as a design tool.

A main concept for the class is using machine learning and AI to augment or facilitate, not to create.

We looked at the following references (the first six do not use ML):

And more listed here

What is scripting?

Python is a scripting language that we will be using in this class. It is highly versatile, high-level, general-purpose programming language that is quickly becoming one of the most popular languages.

We covered the terminal in advance of Python (next class)

Quick guide here

Import commands to be familiar with:

Using the terminal, we installed Homebrew

First, make sure you have Xcode command-line tools installed. Note - this is not the Xcode editor. These are separate tools that run from your command line. Run the commands below (without the $ at the beginning of the line):

$ xcode-select --install

Next, we install Homebrew by running the following command:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Run the following command once you’re done to ensure Homebrew is installed and working properly:

$ brew doctor

Next, we install Python

$ brew install python

Check your install with the following command

$ python --version

It should say report a version of python 3 or higher