Fast TFIDF Vectorization / model

Makes sense to quickly kick off with a simpler faster model.
To so we need to 

- [ ] Improve the tokenizer
- [ ] Store different vector types in the table (which entails different model types and a more relational way to think of predictions?)
- [ ] Calcualte stop words and store in the db (make sure they update when a user adds data) 
- [ ] Account for vocab / vector size