Coder Social home page Coder Social logo

ahead's Introduction

AHEAD

Interview preparation for applying for the start-up company AHEAD.

Quick Start

Here are a few quick ways to make use of the current repository and the related ML models.

  1. A tiny Swin Transformer model for image classification
    In [1]: import FlowCal
       ...: import numpy as np
       ...: from ahead.util import get_fsc_ssc_chunks
       ...: from PIL import Image
       ...: from transformers import pipeline
    
    In [2]: pipe = pipeline(
       ...:     "image-classification",
       ...:     "phunc20/swin-tiny-patch4-window7-224-finetuned-wuhan")
    
    In [3]: file_flow_id = "flowrepo_covid_EU_013_flow_001"
       ...: fcs_path = next((data_dir/f'raw_fcs/{file_flow_id}').glob("*.fcs"))
    
    In [4]: chunk_generator = get_fsc_ssc_chunks(
       ...:     fcs_path,
       ...:     chunk_size=10_000,
       ...:     typ="A",
       ...:     gate_fraction=0.75,
       ...: )
    
    In [5]: chunk = next(chunk_generator)
       ...: chunk.shape
    Out[5]: (10000, 2)
    
    In [6]: FlowCal.plot.density2d(
       ...:     chunk,
       ...:     mode="scatter",
       ...:     savefig="chunk.png",
       ...: )
    
    In [7]: image = Image.open("chunk.png")
       ...: pipe(image)
    
    Out[7]: [{'score': 0.999931812286377, 'label': 'sick'},
       ...:  {'score': 6.822467548772693e-05, 'label': 'healthy'}]

Python Packages

  • Python version 3.8 is recommended by the FlowCal doc page
  • And the packages inside requirements.txt
    • For Python newcomers, it suffices to
      $ pip install -r requirements.txt
    • In particular, FlowCal alone requires many dependecies, so it is highly possible that any package you need is included

Alternatively, instead of requirements.txt, one could also install the current repo (so that one could use some of the repo's code)

  • Either by
    $ pip install git+https://github.com/phunc20/ahead.git
  • Or download the code and install from local
    $ pip install -e <downloaded_repos_local_path>
    $ # e.g. say you've downloaded (and unzipped) the repo at
    $ # ~/downloads/ahead
    $ # then it suffices to
    $ # pip install -e ~/downloads/ahead

Take-Home Exam

Difficulties

  1. (# events) in each sample differs quite drastically
    • It could go as small as 680 events
    • Could also go as big as several hundreds of thousand events, e.g. 360_000
  2. Seemingly imbalanced binary classes (Sick/Healthy)

Experiments

First come a few observations:

  1. For each FCS file, We are given 35 channels, among which we are allowed to only use 31, i.e. those whose "use" column is marked 1 in EU_marker_channel_mapping.xlsx

    • In this particular challenge, the marker information seems useless because including it won't bring any benefit.
    • As someone with almost no domain knowledge, I honestly do not know which of the 31 channels to discard/keep. I came up with 3 ways to proceed:
      1. Use all 31 channels
      2. Use 31 channels with dimensionality reduction
      3. Suggested by FlowCal's tutorial and by the problem description, maybe we could use FSC-SSC (gated) density plot to do some kind of image classification. In particular,
        • I shall not attempt to do calibration to MEF because we are not provided with calibration bead data
        • I do not see fluorescence channel either, so I will only focus on 2D FSC-SSC density plots
        • If we choose to do this as an image classification task, then one immediate benefit is that even patients whose sample are of less number of events could be inferred using the same model (although the correctness of the inference is not guaranteed)
      4. As an alternative to the previous, i.e the 3rd way, aside from
        • FSC-A vs SSC-A
        • FSC-W vs SSC-W
        • FSC-H vs SSC-H we could have tried all sorts of 2D density plots invovling all the combinations of the 31 channels, i.e. roughly 31 choose 2 combinations
  2. As mentioned in the above subsection, the number of events varies quite a lot. Consequently, I decide to group

    • 500
    • 1,000
    • 5,000
    • 10,000

    events together to make up an instance.
    By doing so,

    • Not only do we increase the number of samples/instances
    • But it also enables us to do prediction when future data come in small number of events

    This should be similar to sliding windows. To make things simple, I choose not to have the windows overlap one another.

Rmk.

  • At the end, I do not find enough time to do all the listed experiments

Validation Set

Possessing cell samples from 40 patients, let's isolate 4 patients (2 sick, 2 healthy) to form our validation set. Besides, let's choose those whose (# events) are intermediate.

  • 2 sick: flowrepo_covid_EU_034_flow_001 and flowrepo_covid_EU_048_flow_001 with 98_608 and 123_154 events, resp.
  • 2 healthy: flowrepo_covid_EU_013_flow_001 and flowrepo_covid_EU_004_flow_001 with 170_075 and 183_001 events, resp.

Code

Most of my developped code are in

  • docs/notebooks/
  • and ahead/

Because I developped the code on a very old laptop (Thinkpad X61s with Intel(R) Core(TM)2 Duo CPU, to be more precise), most of the code I wrote I ran on Colab. Sorry for not devoting more time on converting them into Python script.

But from another point of view, since most of the code carries experiment smells, i.e. nothing really of production-ready, so I think it isn't so bad that they were in Jupyter-notebook form.

Bonus Question

By reading the question description and by reading FlowCal's tutorials side-by-side, it's hard to not have suspected that the gating techniques introduced in https://flowcal.readthedocs.io/en/latest/python_tutorial/gate.html may help with the bonus question. In particular, the FlowCal.gate.density2d helps reduce manual works to only one input arg: gate_fraction. I think it could be considered automatic.

ahead's People

Contributors

phunc20 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.