Coder Social home page Coder Social logo

stevensmiley1989 / cleveland_dataset Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 4.0 38.93 MB

The purpose of this Repository is to investigate different potential supervised Machine Learning (ML) algorithms for creating binary classification models that could serve as diagnostics for heart disease.

License: Other

Jupyter Notebook 100.00%

cleveland_dataset's Introduction

Cleveland_Dataset

Repository by Steven Smiley

This respository hosts the files I used to analyze and evaluate the Cleveland dataset in Python.

** Revision 1 has a couple changes such that the MinMaxScaler() comes after the Test/Train Split. This is to prevent Data Leakage. There is no significant change in the results between Revision 0 and 1. The MLP for Revision 1 ended up having the same Accuracy as Revision 0. The SVM Accuracy for Revision 1 was not as high as Revision 0. However, the MLP was tied with SVM for Accuracy in Revision 0. The MLP ended up with a higher AUC in Revision 1 than Revision 0. Thus, the MLP wins the battle for the models without data leakage, but the overall accuracy and AUC values are not significantly different. Excellent diagnositcs in both.

Table of Contents to Repository

1 Jupyter Notebook

Jupyter Notebook(s) written in Python.

Notebook Description
Cleveland.ipynb My Jupyter notebook.

Single input file (processed.cleveland.data) contains all of the information for the Cleveland dataset.

processed.cleveland.data

The Outputs from the Jupyter notebook are placed in the following two folders: Models & Figures

4 Credits/References

  1. Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

  2. Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D.

  3. University Hospital, Zurich, Switzerland: William Steinbrunn, M.D.

  4. University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D.

  5. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation:Robert Detrano, M.D., Ph.D.

  6. Olson, Randal S. et al. “Data-driven advice for applying machine learning to bioinformatics problems.” Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 23 (2017): 192-203.

  7. SciPy. Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, CJ Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E.A. Quintero, Charles R Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. (2019) SciPy 1.0–Fundamental Algorithms for Scientific Computing in Python. preprint arXiv:1907.10121

  8. Python. a) Travis E. Oliphant. Python for Scientific Computing, Computing in Science & Engineering, 9, 10–20 (2007) b) K. Jarrod Millman and Michael Aivazis. Python for Scientists and Engineers, Computing in Science & Engineering, 13, 9–12 (2011)

  9. NumPy. a) Travis E. Oliphant. A guide to NumPy, USA: Trelgol Publishing, (2006). b) Stéfan van der Walt, S. Chris Colbert and Gaël Varoquaux. The NumPy Array: A Structure for Efficient Numerical Computation, Computing in Science & Engineering, 13, 22–30 (2011)

  10. IPython. a) Fernando Pérez and Brian E. Granger. IPython: A System for Interactive Scientific Computing, Computing in Science & Engineering, 9, 21–29 (2007)

  11. Matplotlib. J. D. Hunter, “Matplotlib: A 2D Graphics Environment”, Computing in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007.

  12. Pandas. Wes McKinney. Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51–56 (2010)

  13. Scikit-Learn. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay. Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 12, 2825–2830 (2011)

  14. Scikit-Image. Stéfan van der Walt, Johannes L. Schönberger, Juan Nunez-Iglesias, François Boulogne, Joshua D. Warner, Neil Yager, Emmanuelle Gouillart, Tony Yu and the scikit-image contributors. scikit-image: Image processing in Python, PeerJ 2:e453 (2014)

5 Contact-Info

Feel free to contact me to discuss any issues, questions, or comments.

6 License

This repository contains a variety of content; some developed by Steven Smiley, and some from third-parties. The third-party content is distributed under the license provided by those parties.

The content developed by Steven Smiley is distributed under the following license:

*I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer.

Copyright 2020 Steven Smiley

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

cleveland_dataset's People

Contributors

stevensmiley1989 avatar

Stargazers

Mario Faúndez Vidal avatar Pradosh Priyadarshan avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.