The Transformer Baseline for Task 1 on Spoiler Type Detection blocks when I install it locally using Conda

Christian · October 28, 2022, 9:36pm

I started to look into the Transformer Baseline for task 1, and have problems with SimpleTransformers in a local installation.

Everything works when I use the installation of the baseline in the provided Docker Container, but when I install SimpleTransformers on my machine with Conda, it blocks for inputs with more than ten instances (both, for training and prediction). There are no helpful issues in the GitHub repository of SimpleTransformers.

I installed SimpleTransformers using a Conda environment, as recommendet here:

This is my script that blocks:

github.com

Christian-Falkenberg/stuff/blob/main/testtraining.py

from simpletransformers.classification import ClassificationModel, ClassificationArgs
import pandas as pd
import logging


logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

# Preparing train data
train_data = [
    ["Aragorn was the heir of Isildur", 1],
    ["Frodo was the heir of Isildur", 0],
    ["Aragorn was the heir of Isildur", 1],
    ["Frodo was the heir of Isildur", 0],
    ["Aragorn was the heir of Isildur", 1],
    ["Frodo was the heir of Isildur", 0],
    ["Aragorn was the heir of Isildur", 1],
    ["Frodo was the heir of Isildur", 0],
    ["Aragorn was the heir of Isildur", 1],

This file has been truncated. show original

This behavior occurs for both environments, i.e., with CPU and GPU. Memory also seems to be no problem. It always holds at 0/38, no matter which Transformer I use.

maik_froebe · October 29, 2022, 5:32am

Dear Christian,

Thanks for contributing to the forum, and it is cool that you use the Transformer baseline as starting point!

I answer in English because I hope that this way, more people can benefit from the forum (I translated your initial question to English for the same reason).

I understand that training and prediction work on your machine if you use the Docker container we provide for the Transformer baseline. But if you install SimpleTransformers locally in a Conda environment, it hangs when more than ten instances are passed to the library.

The good thing is, that our Docker container also uses Conda. So you can compare both Conda environments to identify the difference. I think there might be some version mismatch that causes this behavior.

For instance, here I run which pip3 in the Baseline container:

docker run --rm -ti --entrypoint bash webis/pan-clickbait-spoiling-baselines:task1-transformer-0.0.2 -c 'which pip3'

It outputs /opt/conda/bin/pip3 (as the docker container uses Conda).

With this, you can now look at all versions of installed libraries and dependencies in the Conda environment to figure out what might cause the problem with a command like:

docker run --rm -ti --entrypoint bash webis/pan-clickbait-spoiling-baselines:task1-transformer-0.0.2 -c 'pip3 freeze'

If your problem persists, I would suggest that you prepare a minimal Notebook in Google Colab so that we can look at the problem in the exact same environment (because otherwise, I have problems reproducing your problem. I likely use a different Operating System, and also a different Conda version).

I hope that this helps you.

Best regards, (and happy spoiling :))

Maik

Christian · October 29, 2022, 7:19pm

The problem was apparently how simpletransformers performs multiprocessing.
Within the ClassificationArgs passed to the ClassificationModel I had to turn off all multiprocessing:

use_multiprocessing=False
use_multiprocessing_for_evaluation = False

I don’t know why multiprocessing doesn’t work for me but everything works fine without it.

maik_froebe · October 31, 2022, 12:37pm

Nice that you found a solution!