Coder Social home page Coder Social logo

apache / incubator-datalab Goto Github PK

View Code? Open in Web Editor NEW
153.0 17.0 58.0 248.5 MB

Apache DataLab (incubating)

Home Page: https://datalab.apache.org/

License: Apache License 2.0

Python 38.04% Shell 1.07% Ruby 0.78% Java 37.73% HTML 5.14% TypeScript 10.57% CSS 0.26% Dockerfile 0.09% Groovy 0.21% HCL 2.78% Smarty 0.69% SCSS 2.64%
datalab

incubator-datalab's People

Contributors

a1expol avatar adamsdisturber avatar andrianakovalyshyn avatar bodnarmykola avatar denysyankiv avatar dependabot[bot] avatar dg1202 avatar dyoma33 avatar dzenbuddiii avatar epambohdanhliva avatar frikitrok avatar gennadiyshpak avatar ioleksandr avatar kinashyurii avatar leonidfrolov avatar marianhladun avatar moskovych avatar ochaparin avatar ofuks avatar oleksandrrepnikov avatar omartushevskyi avatar owlleg6 avatar petro-kotsiuba avatar ppapou avatar useinfaradzhev avatar vadymkuznetsov avatar viravit avatar viravitan avatar yuratyhun avatar yuriyholinko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

incubator-datalab's Issues

[Data Engine]: Add Spot instances support for Standalone Spark cluster

As a user I want to use spot functionality during Standalone Spark cluster creation so that I can reduce financial charges on AWS/GCP/MS Azure.

Acceptance criteria:

  1. It should be check box for spot instance on Standalone Spark cluster creation popup
  2. Check box for Low spot instance should be checked off by default
  3. If user unchecks check box for spot instance and hits 'Create' button a Standalone Spark cluster will be created without spot price
  4. If user checks off check box for spot instance and hits 'Create' button a Standalone Spark cluster will be created with spot price

Implement job status tracker on DataLab Web UI

As a user I want to know job status (completed/failed) in case of instance stops/terminates by scheduler.

Acceptance criteria:

  1. Failed status is conveyed on DataLab ui if job status is failed
  2. Completed status is conveyed on DataLab ui if job status is successful

Implement demo mode for marketplace

As a potential user I want try DLab usage in order to study it better and to know if it is acceptable for me, so that I do not waste money and time for deployment for cognitive purpose.

[Azure][GCP]: Add TensorFlow-RStudio template

As a user I want to have a TensorFlow-RStudio template on Azure/GCP.

Acceptance criteria:

  1. It should be TensorFlow-Rstudio template in drop down list in Notebook creation popup
  2. Playbook for RStudio should be run successfully on TensorFlow-RStudio template (via local and remote kernels)
  3. Playbook which use GPU should be run successfully on TensorFlow-RStudio template (via local and remote kernels)

Support access to browser bucket via administration page

As an admin I want to set permissions to bucket browser per particular users/groups, so that I can distinguish who is able to read/upload/download objects via bucket browser.

Add 'Bucket Browser Actions' to roles in administrative page, so that administrator can distinguish access to bucket and other permissions via bucket browser among the users.

'Bucket Browser Actions' should have the following points:

  • Allow to download object via bucket browser
  • Allow to upload object via bucket browser
  • Allow to view object via bucket browser
  • Allow to delete object via bucket browser

Bucket browser

As a user I want to use bucket browser, so that I can manage the objects in my bucket via DLab UI.

Acceptance criteria:

  1. User has access to endpoint_shared bucket and project bucket (only if he is assigned to this project) or to custom bucket

  2. Another user does not have access to project bucket (if he is not assigned to this project)

  3. User can upload and download files to and from bucket

  4. User can create/delete folder

  5. User can delete file

  6. User can copy path folder/file

  7. User can see bucket structure (tree)

  8. To go to bucket manager user can from Notebook name popup and 'Bucket browser' button in 'List of resource page'.

Billing details not populating on UI

Hi,

I have been running a Datalab instance but am not seeing any data under the billing section in the front end. AWS is definitely incurring costs as a result of Datalab ($40 for the month of June).

Is there some specific condition that needs to be fulfilled before billing is populated?

This is the command I used to create the Datalab:

/usr/bin/python3 ~/incubator-datalab/infrastructure-provisioning/scripts/deploy_datalab.py
--conf_service_base_name datalab-base-name
--conf_tag_resource_id datalab-resource-id
--conf_os_family debian
--key_path /home/ubuntu/.ssh/
--conf_key_name datalab
--action create
--keycloak_realm_name master
--keycloak_user XXXXXXXX
--keycloak_user_password XXXXXXXX
--keycloak_auth_server_url http://XX.XX.XXX.XX:8080
'aws'
--aws_region eu-west-1
--aws_zone eu-west-1a
--aws_ssn_instance_size t2.medium
--aws_billing_bucket datalabbilling
--aws_account_id XXXXXXXXXXX
--aws_access_key XXXXXXXXXXXXXXX
--aws_secret_access_key XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

An excerpt from the logs in /var/opt/datalab/log/ssn/billing.log:

2023-07-03 11:25:30.107 INFO 94517 --- [cluster-ClusterId{value='64a2b029b4d0cc7135f64a6e', description='null'}-localhost:27017] org.mongodb.driver.cluster : Discovered cluster type of STANDALONE
2023-07-03 11:25:30.919 INFO 94517 --- [main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8088 (https) with context path '/api/billing'
2023-07-03 11:25:30.922 INFO 94517 --- [main] com.epam.datalab.BillingAwsApplication : Started BillingAwsApplication in 21.602 seconds (JVM running for 23.418)
2023-07-03 11:25:30.926 DEBUG 94517 --- [main] com.epam.datalab.BillingServiceImpl : Billing report configuration file: /opt/datalab/conf/billing.yml
INFO [2023-07-03 11:30:00,751] org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/api/billing]: Initializing Spring DispatcherServlet 'dispatcherServlet'
INFO [2023-07-03 11:30:00,751] org.springframework.web.servlet.DispatcherServlet: Initializing Servlet 'dispatcherServlet'
INFO [2023-07-03 11:30:00,760] org.springframework.web.servlet.DispatcherServlet: Completed initialization in 9 ms
DEBUG [2023-07-03 11:30:01,109] com.epam.datalab.module.aws.AdapterS3File: Adapter S3 will be opened for READ
DEBUG [2023-07-03 11:30:02,317] com.epam.datalab.module.aws.AdapterS3File: New report files in bucket folder datalabbilling not found
DEBUG [2023-07-03 11:30:02,318] com.epam.datalab.module.aws.AdapterS3File: Adapter S3 has been opened
DEBUG [2023-07-03 11:30:02,318] com.epam.datalab.core.parser.ParserByLine: Source data has multy entry true
DEBUG [2023-07-03 11:45:00,243] com.epam.datalab.module.aws.AdapterS3File: Adapter S3 will be opened for READ
DEBUG [2023-07-03 11:45:00,332] com.epam.datalab.module.aws.AdapterS3File: New report files in bucket folder datalabbilling not found
DEBUG [2023-07-03 11:45:00,332] com.epam.datalab.module.aws.AdapterS3File: Adapter S3 has been opened
DEBUG [2023-07-03 11:45:00,332] com.epam.datalab.core.parser.ParserByLine: Source data has multy entry true

Excerpt from the logs in /var/opt/datalab/log/ssn/selfservice.log:

INFO [2023-07-03 11:25:30,534] org.eclipse.jetty.server.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@5a654e05{/,null,AVAILABLE}
INFO [2023-07-03 11:25:30,559] org.eclipse.jetty.server.AbstractConnector: Started application@359e27d2{SSL,[ssl, http/1.1]}{0.0.0.0:8443}
INFO [2023-07-03 11:25:30,565] org.eclipse.jetty.server.AbstractConnector: Started admin@277bc3a5{SSL,[ssl, http/1.1]}{0.0.0.0:8444}
INFO [2023-07-03 11:25:30,565] org.eclipse.jetty.server.Server: Started @23057ms
INFO [2023-07-03 11:25:30,616] com.epam.datalab.backendapi.dropwizard.listeners.MongoStartupListener: Populating DataLab default roles into database
INFO [2023-07-03 11:25:30,648] com.epam.datalab.backendapi.dropwizard.listeners.MongoStartupListener: Check for connected endpoints:
connected endpoints: 1
connected clouds: [AWS]
INFO [2023-07-03 11:30:00,075] com.epam.datalab.backendapi.schedulers.CheckInfrastructureStatusScheduler: Trying to update infrastructure statuses
INFO [2023-07-03 11:30:00,194] com.epam.datalab.backendapi.service.impl.InfrastructureInfoServiceImpl: EnvResources is empty: EnvResourceList{host=[], cluster=[]} , didn't send request to provisioning service
INFO [2023-07-03 11:30:00,208] com.epam.datalab.backendapi.schedulers.billing.BillingScheduler: Trying to update billing
INFO [2023-07-03 11:30:02,580] com.epam.datalab.backendapi.service.impl.BillingServiceImpl: Updating billing information for endpoint local. Billing data []
INFO [2023-07-03 11:45:00,064] com.epam.datalab.backendapi.schedulers.CheckInfrastructureStatusScheduler: Trying to update infrastructure statuses
INFO [2023-07-03 11:45:00,094] com.epam.datalab.backendapi.service.impl.InfrastructureInfoServiceImpl: EnvResources is empty: EnvResourceList{host=[], cluster=[]} , didn't send request to provisioning service

Show more informative error message for user in case of failing notebook/cluster/library

As a user I want to see more informative error message about failing instances on DLab UI, so that I do not need to go to SSN and look through logs.

Acceptance criteria:

  1. After notebook creation has failed user can see error message by clicking name notebook on 'resources list' page
  2. After computational resource creation has failed user can see error message by clicking on name computational resource on 'resources list' page

For example, if user used all shapes limit of Amazon user can see the next error message: 'Shapes limit is exceeded'.
Content of error message should depend on error type.

Add possibility to recreate edge node  in case of edge failing or termination

As a user I want to use my instances in case if edge fails or again to create previously terminated edge in the same project using the same endpoint, so that it allows me do not create a project with new name and use the previous project name.

If we terminate edge (or edge has been failed) we could not create the new edge in the same project and the same endpoint.
Edge not could be failed during stopping/starting/creating/terminating.

Statuses for recreate:

  • edge node is terminated from Cloud Web Console - recreate should be

  • edge node is terminated from Web DataLab UI - recreate should be

  • edge node failed during stopping/starting - return cloud status - recreate should NOT be

  • edge node failed during creating - recreate should be

  • edge node failed during terminating - recreate should be

If at least one instance exists - SMART recreate.

If instances do not exist - create all resources.

[Azure]: Instance in billing combine as one point

As a user I want to have clear vision what instances and what statuses do they have in 'Billing report' page.

For example, In 'billing report' cost edge consists of three points:

  • Bandwidth
  • Virtual Network
  • Virtual Machine.
    And also all three points have statuses. It confuses a user. Because if only one edge (instance) is running, but in billing report it shows as three instances are running.
    Azure

Add ability to share AMI(s) created by user across DataLab project

As a user I want to choose to make custom AMi of notebook public or not public, so that I can shared my own custom AMI with other users by my desire.

Acceptance criteria:

  1. Add possibility to choose during creation AMI:
  • share within one project
  • share across the projects
  1. User's personal data on notebook should not be shared

On-prem deploy

Hi,

Is there a way to deploy Datalab on-prem (preferably using Kubernetes), rather than on AWS/GCP/Azure?

Thanks,

Flexible disk size for instances

As a user I want to choose volume type, size, lops during instance creation or change the parameters of already existed Notebook so that I can add more space by necessary on AWS/GCP/Azure.

Acceptance criteria:

  1. 'Custom configuration' should be on notebook creation popup.
  2. 'Custom configuration' consists of:
  • Additional disk/storage?
  • Spark configurations
  1. On top of that user can change disk on already exciting instance. 'Custom configuration should be on notebook name popup'

Additional disk should be only for notebook or for computational resource as well?

Support localization 

As a user I want to have formats for dates, currency which are used in my time zone.

Using proper local formats for dates for all DLab, currency.

Implement queue for load process

As a user I want to have stable working SSN in case if several users upload simultaneously a lot of objects via bucket browser.

If a few users simultaneously upload a lot of objects via bucket browser the SSN will be loaded.
So implement queue for upload process.

Adjust permission to Notebook links from DevOps side

As a user I want that another user is not supposed to go to my Notebook event if he has the link to my Notebook, so that I will be confident that my Notebook data is in security.

If user (Project_admin of another project or not admin) has a notebook link of the other user he can go to this Notebook via his own credentials and view files of the other user on this Notebook.

So  we should limit the access to this link from DevOps side (by the level of Keycloak).

Add support of Nexus repository

As a user I want to create instance from local repository, so that I can prevent some random issues during creation.

1. It should be the single source from which changes will be performed.

2. Go and distribute all traffic through this sandbox

Support library installation of particular version from DLab UI

As a user I want to install particular version of library, so that I can easily upgrade or downgrade the library by demand via DLab UI.

How it works now:

  1. If library is already installed on instance (this library was previous installed during notebook creation) and user installs the same library via DLab UI. And as result DLab shows installing -> installed, but in fact this library is not installed by DLab, Dlab finds that such library is previously installed and changes status installing -> installed. This case is actual when user want to install upper version of library. So make possible to install library like <library_name==version> via DLab UI. For example, this bug was found with library 'request'.
  2. Also if update Python 3 via terminal it shows that it is upgraded but in some minutes it is downgraded again. (It was on Jupyter)

How should it work:

  • If user types wrong library version the output appears where there is a list of versions for this library
  • If user types right library version this library version should be installed
  • On top of that what dependencies are added during installation should be conveyed to user.  In what way it should be conveyed?
  • The latest installed library should be in the top of the libraries grid.  

Add option turn on/off billing to administration page

As a customer I want to have possibility to turn on/off billing, so that I can manage this option by demand.

Add possibility to turn on/off billing even for own resources.

For administration page for role add: 'View full billing report for currently logged in use'.
So billing consists of:

  • View full billing report for all users
  • View full billing report for currently logged in user

If user does not select any option - billing is disabled -> 'Billing report' page is not available

If user selects 'View billing report for all users' automatically 'View full billing report for currently logged in user' is selected as well.

If user selects 'View full billing report for currently logged in user' only this item is checked off and billing is available only for own resource.

[GCP][Notebooks]: Add custom images

As a user I want to create Notebook using custom image, so that I can reduce time of libraries installation on a new Notebook.

Acceptance criteria:

  1. User is able to create image from running notebook
  2. User is able to create notebook from custom image

Add possibility to delete any/all libraries in failing status

As a user I want to have possibility to delete failing libraries, so that i decreases the grid space which is occupied by failing libraries.

Acceptance criteria:

  1. There is 'Delete' button nearby failed library
  2. If user chooses 'Delete' button the failing library is deleted from instance

Connection via SSH fails when running deploy_datalab.py

I am running the deploy_datalab.py script with the below command:

/usr/bin/python3 /home/vboxuser/incubator-datalab/infrastructure-provisioning/scripts/deploy_datalab.py \ --conf_service_base_name datalab_poc \ --conf_os_family debian \ --key_path /home/vboxuser/key \ --conf_key_name datalabs_key \ --conf_tag_resource_id datalab \ --keycloak_auth_server_url XXXXXXXXXXXXXXX \ --keycloak_realm_name master \ --keycloak_user XXXXX \ --keycloak_user_password XXXXX \ --action create \ 'aws' \ --aws_access_key XXXXXXXXXXXXXXX \ --aws_secret_access_key "XXXXXXXXXXXXXXX " \ --aws_account_id XXXXXXXXXXXXXXX \ --aws_region XX-XXXX-X \ --aws_zone XXXX-XXX

When the script gets to the part where it attempts an SSH connection to the EC2 instance that was created, it does the 15 attempts and seems to succeed each time, but overall it fails. Excerpt from the logs are attached.

datalabs_log.txt

Thank you.

mvn package -DskipTests fails on AArch64, Fedora 33 (Java 11)

...
[INFO] Reactor Summary for dlab 1.0:
[INFO]
[INFO] dlab ............................................... FAILURE [ 9.322 s]
[INFO] common ............................................. SKIPPED
[INFO] dlab-utils ......................................... SKIPPED
[INFO] dlab-model ......................................... SKIPPED
[INFO] dlab-webapp-common ................................. SKIPPED
[INFO] provisioning-service ............................... SKIPPED
[INFO] dlab-mongo-migration ............................... SKIPPED
[INFO] self-service ....................................... SKIPPED
[INFO] billing-azure ...................................... SKIPPED
[INFO] billing-gcp ........................................ SKIPPED
[INFO] billing-aws ........................................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.231 s
[INFO] Finished at: 2020-11-17T13:24:59+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.7:check (default) on project dlab: Too many unapproved licenses: 1 -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Support audit

As a user I want to see history changes, so that I can easily find out what/when/who has made changes.

Audit by DLab level

Possibility to find out:

  • Who has deleted or added user?
  • Who has created notebook/compute?
  • Who has stopped/started/terminated notebook/compute/project?
  • Who has edited group/project/notebook/compute?
  • When changes have been made
  • What changes have been made?
    In general everything that has been done via DLab UI should be in history of changes.

History by changes should be in the following consistency:
user → time→ action

Add HDInsight on Azure

As a user I want to use HDInsight on Azure via DLab, so that I can simplify running big data frameworks.

Upgrade DataLab

As a user I want to have possibility to upgrade current DataLab version, so that I do not need fully redeploy.

Acceptance criteria:

  1. Administrator can upgrade DataLab
  2. All previous data (billing/instances) are not removed/changed after upgrading

Convey Notebook links of other users to administrator

As an admin I want to see user Notebook links, so that I can easily go to users Notebook by demand.

Notebook links are portrayed only for own resources. Administrator does not know the Notebook link of other user.

So convey notebook links to administrator in 'Environment management' page:

  1. Administrator can view  links of another user
  2. User is not able to view links of the other user.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.