Coder Social home page Coder Social logo

Comments (5)

stephen-hoover avatar stephen-hoover commented on August 27, 2024

@waltaskew , there's another potential cause. If the ModelPipeline creates its own APIClient object, it uses the resources="all" parameter. This lets it subscribe to PubNub notifications, which are still in internal beta. Every time you call model.train, you subscribe to another notifications channel. I would expect that this opens at least one file handle.

The code samples you provide aren't runnable as they are -- in your actual code, are you storing the output of model.train anywhere? If the CivisFuture goes out of scope, ideally it would be garbage collected and the PubNub connection closed. Maybe that's not happening.

from civis-python.

waltaskew avatar waltaskew commented on August 27, 2024

@waltaskew , there's another potential cause. If the ModelPipeline creates its own APIClient object, it uses the resources="all" parameter. This lets it subscribe to PubNub notifications, which are still in internal beta. Every time you call model.train, you subscribe to another notifications channel. I would expect that this opens at least one file handle.

I was seeing three separate network connections for each model.train, which I imagine included the connections for creating and running the custom script.

The code samples you provide aren't runnable as they are -- in your actual code, are you storing the output of model.train anywhere? If the CivisFuture goes out of scope, ideally it would be garbage collected and the PubNub connection closed. Maybe that's not happening.

I'm holding onto the output of model.train but don't need to try and get the result of the future to see file handle leeks.

Here's a full example:

import os
import subprocess
import string

import civis.ml
import numpy
import pandas
import sklearn.linear_model


data = pandas.DataFrame(
    numpy.random.rand(400, 12),
    columns=list(string.ascii_letters[:12]),
)
file_id = civis.ml._model._stash_local_dataframe(data)

lsof_cmd = ('lsof', '-i', '-a', '-p', '%d' % os.getpid())


def print_open_connections():
    try:
        out = subprocess.check_output(lsof_cmd)
        print(out.decode('ascii'))
    except subprocess.CalledProcessError as err:
        if err.returncode == 1:
            print('no open connections')
        else:
            raise


trainings = []
for i in range(5):
    model = civis.ml.ModelPipeline(
        model=sklearn.linear_model.LogisticRegression(),
        dependent_variable=data.columns[-1],
    )
    print('about to train model %d' % i)
    print_open_connections()
    print('')

    trainings.append(model.train(file_id=file_id))

    print('started training model %d' % i)
    print_open_connections()
    print('')

with output on my linux box like:

about to train model 0
no open connections

started training model 0
COMMAND  PID      USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python3 7624 waltaskew    4u  IPv4 265214      0t0  UDP 10.0.2.15:49861->192.168.0.1:domain
python3 7624 waltaskew    5u  IPv4 266158      0t0  TCP 10.0.2.15:53856->ec2-34-231-234-145.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    6u  IPv4 265215      0t0  UDP 10.0.2.15:47403->192.168.0.1:domain


about to train model 1
COMMAND  PID      USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python3 7624 waltaskew    4u  IPv4 266253      0t0  TCP 10.0.2.15:60050->ec2-54-241-191-234.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    5u  IPv4 266158      0t0  TCP 10.0.2.15:53856->ec2-34-231-234-145.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    6u  IPv4 266256      0t0  TCP 10.0.2.15:33790->ec2-54-241-191-243.us-west-1.compute.amazonaws.com:https (ESTABLISHED)


started training model 1
COMMAND  PID      USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python3 7624 waltaskew    4u  IPv4 266253      0t0  TCP 10.0.2.15:60050->ec2-54-241-191-234.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    5u  IPv4 266158      0t0  TCP 10.0.2.15:53856->ec2-34-231-234-145.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    6u  IPv4 266256      0t0  TCP 10.0.2.15:33790->ec2-54-241-191-243.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    7u  IPv4 266362      0t0  UDP 10.0.2.15:43976->192.168.0.1:domain
python3 7624 waltaskew    8u  IPv4 266210      0t0  TCP 10.0.2.15:48754->ec2-52-70-212-109.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    9u  IPv4 266363      0t0  UDP 10.0.2.15:40100->192.168.0.1:domain


about to train model 2
COMMAND  PID      USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python3 7624 waltaskew    4u  IPv4 266253      0t0  TCP 10.0.2.15:60050->ec2-54-241-191-234.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    5u  IPv4 266158      0t0  TCP 10.0.2.15:53856->ec2-34-231-234-145.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    6u  IPv4 266256      0t0  TCP 10.0.2.15:33790->ec2-54-241-191-243.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    7u  IPv4 266380      0t0  TCP 10.0.2.15:48226->ec2-52-9-63-129.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    8u  IPv4 266210      0t0  TCP 10.0.2.15:48754->ec2-52-70-212-109.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    9u  IPv4 266383      0t0  TCP 10.0.2.15:35204->ec2-54-241-191-232.us-west-1.compute.amazonaws.com:https (ESTABLISHED)


started training model 2
COMMAND  PID      USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python3 7624 waltaskew    4u  IPv4 266253      0t0  TCP 10.0.2.15:60050->ec2-54-241-191-234.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    5u  IPv4 266158      0t0  TCP 10.0.2.15:53856->ec2-34-231-234-145.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    6u  IPv4 266256      0t0  TCP 10.0.2.15:33790->ec2-54-241-191-243.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    7u  IPv4 266380      0t0  TCP 10.0.2.15:48226->ec2-52-9-63-129.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    8u  IPv4 266210      0t0  TCP 10.0.2.15:48754->ec2-52-70-212-109.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    9u  IPv4 266383      0t0  TCP 10.0.2.15:35204->ec2-54-241-191-232.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   10u  IPv4 266419      0t0  UDP 10.0.2.15:46760->192.168.0.1:domain
python3 7624 waltaskew   11u  IPv4 267421      0t0  TCP 10.0.2.15:49036->ec2-34-206-6-184.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   12u  IPv4 266420      0t0  UDP 10.0.2.15:50333->192.168.0.1:domain


about to train model 3
COMMAND  PID      USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python3 7624 waltaskew    4u  IPv4 266253      0t0  TCP 10.0.2.15:60050->ec2-54-241-191-234.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    5u  IPv4 266158      0t0  TCP 10.0.2.15:53856->ec2-34-231-234-145.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    6u  IPv4 266256      0t0  TCP 10.0.2.15:33790->ec2-54-241-191-243.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    7u  IPv4 266380      0t0  TCP 10.0.2.15:48226->ec2-52-9-63-129.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    8u  IPv4 266210      0t0  TCP 10.0.2.15:48754->ec2-52-70-212-109.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    9u  IPv4 266383      0t0  TCP 10.0.2.15:35204->ec2-54-241-191-232.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   10u  IPv4 266443      0t0  TCP 10.0.2.15:52028->ec2-52-9-63-131.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   11u  IPv4 267421      0t0  TCP 10.0.2.15:49036->ec2-34-206-6-184.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   12u  IPv4 266440      0t0  TCP 10.0.2.15:59806->ec2-54-219-189-245.us-west-1.compute.amazonaws.com:https (ESTABLISHED)


/home/waltaskew/src/gfk_multi_output/venv/lib/python3.6/site-packages/sklearn/base.py:311: UserWarning: Trying to unpickle estimator LogisticRegression from version 0.18.1 when using version 0.19.1. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
started training model 3
COMMAND  PID      USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python3 7624 waltaskew    4u  IPv4 266253      0t0  TCP 10.0.2.15:60050->ec2-54-241-191-234.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    5u  IPv4 266158      0t0  TCP 10.0.2.15:53856->ec2-34-231-234-145.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    6u  IPv4 266256      0t0  TCP 10.0.2.15:33790->ec2-54-241-191-243.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    7u  IPv4 266380      0t0  TCP 10.0.2.15:48226->ec2-52-9-63-129.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    8u  IPv4 266210      0t0  TCP 10.0.2.15:48754->ec2-52-70-212-109.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    9u  IPv4 266383      0t0  TCP 10.0.2.15:35204->ec2-54-241-191-232.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   10u  IPv4 266443      0t0  TCP 10.0.2.15:52028->ec2-52-9-63-131.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   11u  IPv4 267421      0t0  TCP 10.0.2.15:49036->ec2-34-206-6-184.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   12u  IPv4 266440      0t0  TCP 10.0.2.15:59806->ec2-54-219-189-245.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   13u  IPv4 266468      0t0  UDP 10.0.2.15:50985->192.168.0.1:domain
python3 7624 waltaskew   14u  IPv4 266469      0t0  UDP 10.0.2.15:49757->192.168.0.1:domain
python3 7624 waltaskew   16u  IPv4 267486      0t0  TCP 10.0.2.15:53884->ec2-34-231-234-145.compute-1.amazonaws.com:https (ESTABLISHED)


about to train model 4
COMMAND  PID      USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python3 7624 waltaskew    4u  IPv4 266253      0t0  TCP 10.0.2.15:60050->ec2-54-241-191-234.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    5u  IPv4 266158      0t0  TCP 10.0.2.15:53856->ec2-34-231-234-145.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    6u  IPv4 266256      0t0  TCP 10.0.2.15:33790->ec2-54-241-191-243.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    7u  IPv4 266380      0t0  TCP 10.0.2.15:48226->ec2-52-9-63-129.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    8u  IPv4 266210      0t0  TCP 10.0.2.15:48754->ec2-52-70-212-109.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    9u  IPv4 266383      0t0  TCP 10.0.2.15:35204->ec2-54-241-191-232.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   10u  IPv4 266443      0t0  TCP 10.0.2.15:52028->ec2-52-9-63-131.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   11u  IPv4 267421      0t0  TCP 10.0.2.15:49036->ec2-34-206-6-184.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   12u  IPv4 266440      0t0  TCP 10.0.2.15:59806->ec2-54-219-189-245.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   13u  IPv4 266492      0t0  TCP 10.0.2.15:59822->ec2-54-219-189-245.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   14u  IPv4 267515      0t0  TCP 10.0.2.15:52044->ec2-52-9-63-131.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   16u  IPv4 267486      0t0  TCP 10.0.2.15:53884->ec2-34-231-234-145.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   17u  IPv4 266494      0t0  TCP 10.0.2.15:54716->s3-1-w.amazonaws.com:https (ESTABLISHED)


/home/waltaskew/src/gfk_multi_output/venv/lib/python3.6/site-packages/sklearn/base.py:311: UserWarning: Trying to unpickle estimator LogisticRegression from version 0.18.1 when using version 0.19.1. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
started training model 4
COMMAND  PID      USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python3 7624 waltaskew    4u  IPv4 266253      0t0  TCP 10.0.2.15:60050->ec2-54-241-191-234.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    5u  IPv4 266158      0t0  TCP 10.0.2.15:53856->ec2-34-231-234-145.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    6u  IPv4 266256      0t0  TCP 10.0.2.15:33790->ec2-54-241-191-243.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    7u  IPv4 266380      0t0  TCP 10.0.2.15:48226->ec2-52-9-63-129.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    8u  IPv4 266210      0t0  TCP 10.0.2.15:48754->ec2-52-70-212-109.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew    9u  IPv4 266383      0t0  TCP 10.0.2.15:35204->ec2-54-241-191-232.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   10u  IPv4 266443      0t0  TCP 10.0.2.15:52028->ec2-52-9-63-131.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   11u  IPv4 267421      0t0  TCP 10.0.2.15:49036->ec2-34-206-6-184.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   12u  IPv4 266440      0t0  TCP 10.0.2.15:59806->ec2-54-219-189-245.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   13u  IPv4 266492      0t0  TCP 10.0.2.15:59822->ec2-54-219-189-245.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   14u  IPv4 267515      0t0  TCP 10.0.2.15:52044->ec2-52-9-63-131.us-west-1.compute.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   15u  IPv4 266520      0t0  UDP 10.0.2.15:50313->192.168.0.1:domain
python3 7624 waltaskew   16u  IPv4 267486      0t0  TCP 10.0.2.15:53884->ec2-34-231-234-145.compute-1.amazonaws.com:https (ESTABLISHED)
python3 7624 waltaskew   17u  IPv4 266521      0t0  UDP 10.0.2.15:59484->192.168.0.1:domain
python3 7624 waltaskew   19u  IPv4 267559      0t0  TCP 10.0.2.15:49064->ec2-34-206-6-184.compute-1.amazonaws.com:https (ESTABLISHED)

from civis-python.

waltaskew avatar waltaskew commented on August 27, 2024

Eventually the connections fall into CLOSE_WAIT status indicating that the server has hung up the connection, but even after long amounts of time the client doesn't appear to attempt to close the connections to reap the file handles of the CLOSE_WAIT connections.

from civis-python.

stephen-hoover avatar stephen-hoover commented on August 27, 2024

I wouldn't expect that you'd need to request the result of the model training to see an open connection. We subscribe to the PubNub channel immediately, so that we don't miss a completion message.

But it's also true that every new APIClient opens its own requests.Session and never explicitly closes it. You're keeping the reference to those client objects around when you store the output of model.train. I'm not sure why we need 3 open files. I'm also surprised that all of them are pointing to EC2. Shouldn't there be one per model which connects to PubNub?

Is there a reason to keep the Session to the Civis API permanently open? We could instead re-open it every time we want to make a request.

from civis-python.

stephen-hoover avatar stephen-hoover commented on August 27, 2024

I've verified that I see an increase of 3 open file handles with each new model in the above code. If I pass a pre-constructed client object with resources="base", there's no increase in open file handles. If I pass a pre-constructed client object with resources="all", I see an increase of 2 open file handles for each new model. So it looks like our PubNub connection uses 2 file handles.

from civis-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.