Hands-on Tutorials

How to train object matching model with no labeled data and use it in production

Currently, most machine-learning-related business cases are solved as a classification problems. Classification algorithms are so well studied in practice that even if the original problem is not directly a classification task, it is usually decomposed or approximately converted into one.

However, despite its simplicity, the classification task has requirements that could complicate its production integration and scaling. E.g. it requires a fixed number of classes, where each class should have a sufficient number of training samples.

In this article, I will describe how we overcome these limitations by switching to metric learning. By the example of matching job positions and…

Step-by-step guide on how to build a neural search service.

How to build a neural search service with BERT + Qdrant + FastAPI

Information retrieval technology is one of the main technologies that enabled the modern Internet to exist. These days, search technology is the heart of a variety of applications. From web-pages search to product recommendations. For many years, this technology didn’t get much change until neural networks came into play.

In this tutorial we are going to find answers to these questions:

  • What is the difference between regular and neural search?
  • What neural networks could be used for search?
  • In what tasks is neural network search useful?
  • How to build and…

Here is a list of tools I find worth a try if you are going to set up a new ML project. This list is not intended to be exhaustive overview and it does not include any ML frameworks or libraries.

It is focused on auxiliary tools that can make development easier and experiments reproducible. Some of this tools I have used in real projects, others I just tried on a toy example, but found interesting to use in future.

Starting a new project

cookiecutter — a scaffold generator for all sorts of projects. It allows generating boilerplate code for an empty project. Useful…

Let’s deal with it

Pretrained fastText embeddings are great. They were trained on a many languages, carry subword information, support OOV words.

But their main disadvantage is the size. Even compressed version of the binary model takes 5.4Gb. This fact makes it impossible to use pretrained models on a laptop or a small VM instances.

Being loaded into RAM, this model takes even more memory ~ 16Gb. So you can’t use it directly with Google Colab, which only gives you 12 GB of RAM.

There are two main reasons why binary models occupy so much memory.

The first one is that binary model carries…

Vasnetsov Andrey

I am lead ML Engineer. Working on search relevance and NLP related projects.Interested in application of one-shot and transfer learning in NLP.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store