📝 Guest Post: Switching from Spreadsheets to Experiment Tracker and How It Improved My Model Development Process*

Mar 31, 2023

In this guest post, neptune.ai shares the story of one of its users, Nikita Kozodoi. He talks about his model development process before and after using Neptune. Give it a read! You can find the full article on Neptune’s MLOps Blog.

Many ML projects have a similar workflow. You start with a simple pipeline with a benchmark model.

Next, you begin incorporating improvements: adding features, augmenting the data, tuning the model… On each iteration, you evaluate your solution and keep changes that improve the target metric.

This workflow involves running a lot of experiments. As time goes by, it becomes difficult to keep track of the progress and positive changes.

Instead of working on new ideas, you spend time thinking:

“Have I already tried this thing?”,
“What was that hyperparameter value that worked so well last week?”

You end up running the same stuff multiple times. If you are not tracking your experiments yet, I highly recommend you start!

In my previous Kaggle projects, I used to rely on spreadsheets for tracking. It worked very well in the beginning, but soon I realized that setting up and managing spreadsheets with experiment meta-data requires loads of additional work. I got tired of manually filling in model parameters and performance values after each experiment and really wanted to switch to an automated solution.

“[Neptune] allowed me to save a lot of time and focus on modeling decisions which helped me to earn three medals in Kaggle competitions.”

This was when I discovered neptune.ai. This tool allowed me to save a lot of time and focus on modeling decisions, which helped me to earn three medals in Kaggle competitions.

In this post, I will share my story of switching from spreadsheets to Neptune for experiment tracking. I will describe a few disadvantages of spreadsheets, explain how Neptune helps to address them and give a couple of tips on using Neptune for Kaggle.

What is wrong with spreadsheets for experiment tracking?

Spreadsheets are great for many purposes. To track experiments, you can simply set up a spreadsheet with different columns containing the relevant parameters and performance of your pipeline. It is also easy to share this spreadsheet with teammates.

Sounds great, right?

Unfortunately, there are a few problems with this.

Manual work

After doing it for a while, you will notice that maintaining a spreadsheet starts eating too much time. You need to manually fill in a row with meta-data for each new experiment and add a column for each new parameter. This will get out of control once your pipeline becomes more sophisticated.

It is also very easy to make a typo, which can lead to bad decisions.

When working on one deep learning competition, I incorrectly entered a learning rate in one of my experiments. Looking at the spreadsheet, I concluded that a high learning rate decreases accuracy and went on to work on other things. It was only a few days later that I realized that there was a typo and that poor performance actually came from a low learning rate. This cost me two days of work invested in the wrong direction based on a false conclusion.

No live tracking

With spreadsheets, you need to wait until an experiment is completed in order to record the performance.

Apart from being frustrated with doing it manually every time, this also does not allow you to compare intermediate results across the experiments, which is helpful to see if a new run looks promising.

Of course, you can log in model performance after every epoch, but doing it manually for each experiment requires even more time and effort. I never had enough diligence to do it regularly and ended up spending some computing resources not optimally.

Attachment limitations

Another issue with spreadsheets is that they only support textual meta-data that can be entered in a cell.

What if you want to attach other meta-data like:

model weights,
source code,
plots with model predictions,
input data version?

You need to manually store this stuff in your project folders outside of the spreadsheet.

In practice, it gets complicated to organize and sync experiment outputs between local machines, Google Colab, Kaggle Notebooks, and other environments your teammates might use. Having such meta-data attached to a tracking spreadsheet seems useful, but it is very difficult to do it.

Switching from spreadsheets to Neptune

Some months ago, our team worked on a Cassava Leaf Disease competition and used Google spreadsheets for experiment tracking. One month into the challenge, our spreadsheet was already cluttered:

Some runs were missing performance because one of us forgot to log it in and did not have the results anymore.
PDFs with loss curves were scattered over Google Drive and Kaggle Notebooks.
Some parameters might have been entered incorrectly, but it was too time-consuming to restore and double-check older script versions.

It was difficult to make good data-driven decisions based on our spreadsheet.

Even though there were only four weeks left, we decided to switch to Neptune.

If you’re interested in how exactly neptune.ai works, check out this short video:

What is so good about Neptune?

Less manual work

One of the key advantages of Neptune over spreadsheets is that it saves you a lot of manual work. With Neptune, you use the API within the pipeline to automatically upload and store meta-data while the code is running.

You don’t have to manually put it in the results table, and you also save yourself from making a typo. Since the meta-data is sent to Neptune directly from the code, you will get all numbers right, no matter how many digits they have.

“… the time saved from logging in each experiment accumulates very quickly and leads to tangible gains… This gives you an opportunity to … better focus on the modeling decisions.”

It may sound like a small thing, but the time saved from logging in each experiment accumulates very quickly and leads to tangible gains by the end of the project. This gives you an opportunity to not think too much about the actual tracking process and better focus on the modeling decisions. In a way, this is like hiring an assistant to take care of some boring (but very useful) logging tasks so that you can focus more on the creative work.

Live tracking

What I like a lot about Neptune is that it allows you to do live tracking. If you work with models like neural networks or gradient boosting that require a lot of iterations before convergence, you know it is quite useful to look at the loss dynamics early to detect issues and compare models.

Tracking intermediate results in a spreadsheet is too frustrating. Neptune API can log in performance after every epoch or even every batch so that you can start comparing the learning curves while your experiment is still running.

“…many ML experiments have negative results… Using the Neptune dashboard to compare the intermediate plots with the first few performance values may be enough to realize that you need to stop the experiment and change something.”

This proves to be very helpful. As you might expect, many ML experiments have negative results (sorry, but this great idea you worked on for a few days actually decreases the accuracy).

This is completely fine because this is how ML works.

What is not fine is that you may need to wait a long time until getting that negative signal from your pipeline. Using the Neptune dashboard to compare the intermediate plots with the first few performance values may be enough to realize that you need to stop the experiment and change something.

Attaching outputs

Another advantage of Neptune is the ability to attach pretty much anything to every experiment run. This really helps to keep important outputs such as model weights and predictions in one place and easily access them from your experiments table.

This is particularly helpful if you and your colleagues work in different environments and have to manually upload the outputs to sync the files.

I also like the ability to attach the source code to each run to make sure you have the notebook version that produced the corresponding result. This can be very useful in case you want to revert some changes that did not improve the performance and would like to go back to the previous best version.

Final thoughts

In this post, I shared my story of switching from spreadsheets to Neptune for tracking ML experiments and emphasized some advantages of Neptune. I would like to stress once again that investing time in infrastructure tools – be it experiment tracking, code versioning, or anything else – is always a good decision and will likely pay off with increased productivity.

Tracking experiment meta-data with spreadsheets is much better than not doing any tracking. It will help you to better see your progress, understand what modifications improve your solution, and help make modeling decisions. Doing it with spreadsheets will also cost you some additional time and effort. Tools like Neptune take the experiment tracking to the next level, allowing you to automate the metadata logging and focus on the modeling decisions.

I hope you find my story useful. Good luck with your future ML projects!

You can find the full article on Neptune’s MLOps Blog.

TheSequence

Discussion about this post